Most usage cases involve relatively small files, but sometimes the files can get quite big. And when you import large files, you find that Get-Content is slow. There are a couple of reasons for this blog post shows. One important reason is that Get-Content uses a PowerShell provider - but there is more!
As it turns out, you can't use Get-Content with the Registry, Certificate, or WSMan providers. And for all but the File System provider, the cmdlet does not return much of value (or return information you could not easily get another way). I would argue almost all usage of the cmdlet is based on the file system provider.
As an alternative to using Get-Content, you could use the IO.File .Net class, and invoke the ReadAllLines() method.
To test this out, I downloaded a large text file (War and Peace), then tested the two methods of retrieving the text. Here is what I see:
PS C:\Foo> # 1. Get War and Peace
PS C:\Foo> $URI = 'http://textfiles.com/etext/FICTION/warpeace.txt'
PS C:\Foo> $WAP = Invoke-WebRequest -URI $URI
PS C:\Foo> $Outfile = '.\WarAndPeace.txt'
PS C:\Foo> $WAP.Content | Out-File -Path $OutFile
PS C:\Foo> Get-ChildItem -Path $Outfile
Directory: C:\Foo
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 24/02/2023 15:30 4434672 WarAndPeace.txt
PS C:\Foo> # 2. Get the contents into a variable using Get-Content
PS C:\Foo> $M1 = Measure-Command -Expression {
$File1 = Get-Content $Outfile
}
PS C:\Foo> "Using Get-Content took {0:n2} milliseconds" -f $M1.TotalMilliSeconds
Using Get-Content took 663.80 milliseconds
PS C:\Foo>
PS C:\Foo> # 3. Now with .NET
PS C:\Foo> $M2 = Measure-Command -Expression {
$File2 = [IO.File]::ReadAllLines($Outfile)
}
PS C:\Foo> "Using Native .net {0:n2} milliseconds" -f $M2.TotalMilliSeconds
Using Native .net 91.14 milliseconds
As you can see, using the native method is a lot faster (nearly 6 times faster). But why is this?
Well, the first reason is that using a provider is just slower. But another reason Get-Content is so much slower is that it adds several properties to every line returned. You can see this as follows:
PS C:\Foo> # 4. look at output types
PS C:\Foo> "Get-Content produces a $($File1.GetType().FullName) object"
Get-Content produces a System.Object[] object
PS C:\Foo> ".NET produces a $($File2.GetType().Fullname) object"
.NET produces a System.String[] object
PS C:\Foo>
PS C:\Foo> # 5. And look at what Get-Content does for us:
PS C:\Foo> $File1 | Get-Member -MemberType Properties
TypeName: System.String
Name MemberType Definition
---- ---------- ----------
PSChildName NoteProperty string PSChildName=WarAndPeace.txt
PSDrive NoteProperty PSDriveInfo PSDrive=C
PSParentPath NoteProperty string PSParentPath=C:\Foo
PSPath NoteProperty string PSPath=C:\Foo\WarAndPeace.txt
PSProvider NoteProperty ProviderInfo PSProvider Microsoft.PowerShell.Core\FileSystem
ReadCount NoteProperty long ReadCount=1
Length Property int Length {get;}
PS C:\Foo> $File2 | Get-Member -MemberType Properties
TypeName: System.String
Name MemberType Definition
---- ---------- ----------
Length Property int Length {get;}
As you can see, Get-Content returns an object array of strings - where each member (ie each line of the text file) has 7 additional properties over and beyond what is in a string array. So if you import a 56,859-line text file, Get-Content adds 390,013 properties to the array that pretty much NO one needs or uses. And that takes time.
So, if you are using Get-Content to retrieve text from a file, and performance is important, consider using .NET.
No comments:
Post a Comment