Wednesday, November 09, 2011

Performance with PowerShell

Over the weekend, at the most recent PowerShell PowerCamp, I got to discussing the performance of PowerShell. The point I was making was that PowerShell made doing some things very easy, even though they were not performant. Two examples are the early filtering  of WMI data using –Filter (vs using Where-Object after you retrieve all the data from a remote machine) and the two variants of ForEach.

In the case of WMI, where you early filter properties/occurrences on the target machine, PowerShell has less data to serialize and transmit across the network. Also, late filtering requires more local memory, and additional processing. Thus I’d expect early filtering to be faster. We are thus comparing two statements which might look something like this:

Get-WMIObject win32_share -computer  Cookham1 -filter "Description='remote admin'"

versus

Get-WMIObject win32_share -computer  Cookham1  | Where {$_.description -eq 'remote admin'}

In the first example, only one share is returned from Cookham1, whereas in the second example multiple shares are returned and are then filtered locally (aka late filtering). If I wrap both of these commands in a Measure-Command, and do the operation a number of times, the code and results look like this:

 

Psh[Cookham8:fmt:\]>"Early Filter:"
" {0} ms"  -f  ((Measure-command {1..100 | foreach {
Get-WMIObject win32_share -computer  Cookham1 -filter "Description='remote admin'"}}).totalmilliseconds).tostring("f")

"Late filter:"
" {0} ms"  -f  ((Measure-command {1..100 | foreach {
Get-WMIObject win32_share -computer  Cookham1  | Where {$_.description -eq 'remote admin'}}}).totalmilliseconds).tostring("f")
Early Filter:
1948.91 ms
Late filter:
2715.44 ms

So the difference between late filter and early filter is around 28%, although if I run this test a few times, the numbers do vary a bit, but almost always early filtering is in the region of 20% faster.

But a much bigger difference was observed by Anita Boorboom, a Dutch SharePoint guru, in the second case, i.e. using For-Each-object (vs using ForEach in a pipeline).

When you use the foreach operator in a pipeline, PowerShell is able to optimise the creation of objects at one stage of a pipeline and their consumption in the next. Using Foreach-Object, you need to first persist all the objects you wish to iterate across, then perform the iteration. The latter clearly requires a bit more processing and it is likely to require more memory (which can be a bad thing if the collection of objects is large! I knew this, but Anita’s results were a little more than I was expecting, so I duplicated her scripts, well nearly, and found here results were indeed correct, like this:

$items = 1..10000
Write-Host "ForEach-Object: "
" {0} ms"  -f ((Measure-Command { $items | ForEach-Object { "Item: $_" } }).totalmilliseconds).tostring("f") 
Write-Host "Foreach: "
" {0} ms" -f ((Measure-Command {Foreach ($item in $items) { "Item: $item" }}).totalmilliseconds).tostring("f")
ForEach-Object:
  629.73 ms
Foreach:
31.84 ms

Thus the pipelined foreach is nearly 20 times faster for this experiment. I ran this code several times, and the multipler was consistently in the 20-30 times as fast range. That floored me. The For-Each Object does require PowerShell to instantiate every object in memory, then to iterate over it, vs iterating as it instantiates. But I did not expect a 20-30 fold difference in performance!

So it’s obvious that some language constructs will be a little more efficient, You also need to consider the time it takes to write the code, and how often it will be run.  In the first case above, I managed to save just over 750ms by using early WMI filtering. But it probably took me more than that just to write the code for early binding. And for a lot of admins that don’t know WMI very well, filtering using Where-Object is familiar and uses PowerShell Syntac (the –filter clause on Get-WMIObject used WQL which is different). In the second case, the difference was staggering. Of course, when the processing you want to apply to the collection members is non-trivial (i.e. more than a couple of lines of code), you often find the improvement in readability of the resulting script block to be worth considering. By using task oriented variable names, the resulting code is easier to read then when you use $_. And for some production orient5ed scripts, that improvement in readability may be worthwhile.

In summary, there always a lot of different ways to achieve the same result in PowerShell. I advocate using what is easiest for you to remember. At the same time, PowerShell can provide some big performance differences between the approaches – and it pays to know more!

Technorati Tags: ,

1 comment:

Ryan said...

Isn't the pipelined foreach CmdLet 20 times slower than the foreach statement?

One way to speed up a 'foreach' in a pipeline is to use a scriptblock with a process block (&{process{'code here'}})

In this example, the scriptblock takes about 1/10th of the time as the ForEach-Object CmdLet.


'{0} ms' -f (Measure-Command { 1..10000| foreach { $_*$_ } }).TotalMilliseconds
'{0} ms' -f (Measure-Command { 1..10000| &{process{ $_*$_ }} }).TotalMilliseconds


614.3745 ms
62.1588 ms


One caveat to the scriptblock method is that it executes in a child scope. This can be worked around by dot-sourcing the scriptblock, but this is a little slower.


'{0} ms' -f (Measure-Command { 1..10000| . {process{ $_*$_ }} }).TotalMilliseconds

96.4542 ms