Posted in
Windows Powershell |
4 Comments | 2,872 views | 28/02/2015 09:31
Well, if you read first part, now we will continue with text manipulations on PowerShell.
Test File: 424390 lines, 200 MB Microsoft IIS Log
In first part, winner was “System.IO.StreamReader” so I’ll continue with that.
1. Let’s try a Replace on our script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
}
$ReadLogFile.Close()
After replace, script execution time: 3.2394121 seconds.
So what happens if I use Regex instead of Replace?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent -replace "\\", "\\\\"
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent -replace "\\", "\\\\"
}
$ReadLogFile.Close()
Now script execution time: 25.1311866 seconds. So .Net Replace is your best friend :)
Winner: Replace
2. What happens if I use -notlike in my text operation?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
[int]$TestCount = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
if ($LogContent -notlike "#*")
{
$TestCount++
}
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
[int]$TestCount = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
if ($LogContent -notlike "#*")
{
$TestCount++
}
}
$ReadLogFile.Close()
Script takes 50.1493736 seconds.
But do I have another way for this query? Yes, I can use something like this. Let’s try -ne:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| $LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
[int]$TestCount = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
if ($LogContent.Substring(0,1) -ne "#")
{
$TestCount++
}
}
$ReadLogFile.Close() |
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
[int]$LineNumber = 0;
[int]$TestCount = 0;
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
$LogContent = $ReadLogFile.ReadLine()
$LineNumber++
$LogContent = $LogContent.Replace('\','\\')
if ($LogContent.Substring(0,1) -ne "#")
{
$TestCount++
}
}
$ReadLogFile.Close()
Script takes 25.3682308 seconds. OMG! :)
So using -eq/-ne queries 50% faster than -like/-notlike queries. Try to use them if it’s possible.
Winner: -EQ
To be continued.. :)