Wednesday, February 08, 2012

Get-ChildItem and the–Include and –Filter parameters

I saw a good question the other day in the PowerShell.Com Learn PowerShell Forum which related to using –Include when calling Get-ChildItem (or DIR, or LS!). The OP had a bunch of files in a folder (C:\Data) and wanted to get at just the *.txt files as follows:

Get-ChildItem –Path C:\Data –Include *.Txt

But it did not work – it returned no files at all (even though there were some in the folder. The reason is clear if you read the great help text closely: the Include switch is only active if you are also using the –Recurse parameter! Another small to make is that the –include property specifies a globbed string (I.e. a file name specified with Wild cards) and not a regular expression.

The simplest way to just get the text files form a single folder would be:

Get-ChildItem –Path C:\Data\*.Txt

Another way to get just the Text files in a given folder would be to use the –Filter parameter. The –Filter parameter  is sent to the provider and is used to qualify the –Path value. You can call it like:

Get-ChildItem –Path C:\Data\ –Filter *.Txt

So you have two ways to get a subset of files in a folder using a form of early filtering. And if you use –Recurse (thus are getting all the files in a folder and it’s child folders), you can use either the –Include or –Filter parameters. They give the same result. Well almost. If you use –Include *.txt, you only get the files which have an extension of .txt. Using  -Filter, you get a slightly different result as shown here:

Psh[Cookham8:C:\foo]>Get-ChildItem -path c:\DATA  -filter *.txt -recurse  -ea Silentlycontinue
    Directory: C:\DATA
Mode                LastWriteTime     Length Name                                                        
----                -------------     ------ ----                                      
-a---          2/5/2012   3:48 PM          0 copy.txtfoo                                                        
-a---          2/5/2012   3:48 PM          0 one.txt
-a---          2/5/2012   3:48 PM          0 three.txt
-a---          2/5/2012   3:48 PM          0 two.txt

As you can see, *.txt also copies *.txtfoo. I’m sure this is ‘by design’ but it doesn’t seem to map to my expectations. Still, in most cases, extensions do not overlap like this (of course PowerShell is an exception – with .PS1 and .PS1XML extenstions!).

As an alternative to using either –Filter or –Include, you could always get ALL the child items (e.g. all the file and folder objects in C:\Data), and pipe the output to Where-Object for further (later) filtering. This works, but as most IT admins know, early filtering tends to be more efficient. But the filtering done by –Filter may be surprising – If I use –Filter *.ps1, then PowerShell returns me all the PowerShell scripts in

The –Filter parameter also turns out to have an additional benefit. The value of Filter you specify is used by the provider to qualify the path value. By comparison, the –Include specifies a filter for PowerShell to apply. Thus, the –Filter parameter generates early, early filtering, whereas-Include is later early filtering! The performance difference between the two approaches turns out to be significant! I wrote a little script to calculate the costs of using the three methods, as follows:

# Test-Filtering.PS1
$start = get-date
$f = Get-ChildItem -path c:\windows  -include *ps1 -recurse -ea Silentlycontinue
$end = Get-Date
"Using -Include    : {0,4} files in {1,-6:n2} seconds" -f $f.count,$time1.totalseconds
$f| ft name,length

$start = get-date
$f=Get-ChildItem -path c:\windows  -filter *ps1 -recurse  -ea Silentlycontinue
$end = Get-Date
"Using -Filter     : {0,4} files in {1,-6:n2} seconds" -f $f.count,$time2.totalseconds
$f | ft name,length

$start = get-date
$f=Get-ChildItem -path c:\windows  -recurse  -ea Silentlycontinue | where {$_.extension -eq '.ps1'}
$end = Get-Date
"Using Where clause: {0,4} files in {1,-6:n2} seconds" -f $f.count,$time3.totalseconds

The results are here:


I have to say, the numbers were not what I was expecting! I was surprised how much faster –Filter turned out to be – around 3 times quicker than either –Include, or using late filtering. And compared to using –Include, late filtering is really not all that much slower.


1 comment:

jv said...

You ae correct but his gives a more accurate measurement.

measure-command {Get-ChildItem -path c:\windows -filter *.ps1 -recurse -ea 0 }
measure-command {Get-ChildItem -path c:\windows\* -include *.ps1 -recurse -ea 0}