Friday, February 24, 2023

Having Fun with Get-Content

Most IT Pros know and use the Get-Content to get the contents of a file into a variable/array. You can then process the array to do useful things. I use this cmdlet a lot - in my last book (which has 111 total scripts), I used the cmdlet in over 15|% of the scripts.

Most usage cases involve relatively small files, but sometimes the files can get quite big. And when you import large files, you find that Get-Content is slow. There are a couple of reasons for this blog post shows. One important reason is that Get-Content uses a PowerShell provider - but there is more!

As it turns out, you can't use Get-Content with the Registry, Certificate, or WSMan providers. And for all but the File System provider, the cmdlet does not return much of value (or return information you could not easily get another way). I would argue almost all usage of the cmdlet is based on the file system provider.

As an alternative to using Get-Content, you could use the IO.File .Net class, and invoke the ReadAllLines() method.

To test this out, I downloaded a large text file (War and Peace), then tested the two methods of retrieving the text. Here is what I see:

PS C:\Foo>  # 1. Get War and Peace
PS C:\Foo> $URI = ''
PS C:\Foo> $WAP = Invoke-WebRequest -URI $URI
PS C:\Foo> $Outfile = '.\WarAndPeace.txt'
PS C:\Foo> $WAP.Content |  Out-File -Path $OutFile
PS C:\Foo> Get-ChildItem  -Path $Outfile

    Directory: C:\Foo

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          24/02/2023    15:30        4434672 WarAndPeace.txt

PS C:\Foo> # 2. Get the contents into a variable using Get-Content
PS C:\Foo> $M1 = Measure-Command -Expression {
             $File1 = Get-Content $Outfile
PS C:\Foo> "Using Get-Content took {0:n2} milliseconds" -f $M1.TotalMilliSeconds
Using Get-Content took 663.80 milliseconds
PS C:\Foo>
PS C:\Foo> # 3. Now with .NET
PS C:\Foo> $M2 = Measure-Command -Expression {
             $File2 =  [IO.File]::ReadAllLines($Outfile)
PS C:\Foo> "Using Native .net {0:n2} milliseconds" -f $M2.TotalMilliSeconds
Using Native .net 91.14 milliseconds

As you can see, using the native method is a lot faster (nearly 6 times faster).  But why is this?

Well, the first reason is that using a provider is just slower. But another reason Get-Content is so much slower is that it adds several properties to every line returned. You can see this as follows:

PS C:\Foo> # 4. look at output types
PS C:\Foo> "Get-Content produces a $($File1.GetType().FullName) object"
Get-Content produces a System.Object[] object
PS C:\Foo> ".NET produces a $($File2.GetType().Fullname) object"
.NET produces a System.String[] object
PS C:\Foo> 
PS C:\Foo> # 5. And look at what Get-Content does for us:
PS C:\Foo> $File1 | Get-Member -MemberType Properties

   TypeName: System.String

Name         MemberType   Definition
----         ----------   ----------
PSChildName  NoteProperty string PSChildName=WarAndPeace.txt
PSDrive      NoteProperty PSDriveInfo PSDrive=C
PSParentPath NoteProperty string PSParentPath=C:\Foo
PSPath       NoteProperty string PSPath=C:\Foo\WarAndPeace.txt
PSProvider   NoteProperty ProviderInfo PSProvider  Microsoft.PowerShell.Core\FileSystem
ReadCount    NoteProperty long ReadCount=1
Length       Property     int Length {get;}

PS C:\Foo> $File2 | Get-Member -MemberType Properties

   TypeName: System.String

Name   MemberType Definition
----   ---------- ----------
Length Property   int Length {get;}

As you can see, Get-Content returns an object array of strings - where each member (ie each line of the text file) has 7 additional properties over and beyond what is in a string array. So if you import a 56,859-line text file, Get-Content adds 390,013 properties to the array that pretty much NO one needs or uses. And that takes time.

So,  if you are using Get-Content to retrieve text from a file, and performance is important, consider using .NET.

No comments: