search
Categories
Sponsors
VirtualMetric Hyper-V Monitoring, Hyper-V Reporting
Archive
Blogroll

Badges
MCSE
Community

Cozumpark Bilisim Portali
PowerShell Performance Tips for Large Text Operations – Part 2: Text Manipulation
Posted in Windows Powershell | 4 Comments | 2,872 views | 28/02/2015 09:31

Well, if you read first part, now we will continue with text manipulations on PowerShell.

Test File: 424390 lines, 200 MB Microsoft IIS Log

In first part, winner was “System.IO.StreamReader” so I’ll continue with that.

1. Let’s try a Replace on our script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
 
[int]$LineNumber = 0;
 
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
	$LogContent = $ReadLogFile.ReadLine()
	$LineNumber++
 
	$LogContent = $LogContent.Replace('\','\\')
}
 
$ReadLogFile.Close()

After replace, script execution time: 3.2394121 seconds.

So what happens if I use Regex instead of Replace?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
 
[int]$LineNumber = 0;
 
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
	$LogContent = $ReadLogFile.ReadLine()
	$LineNumber++
 
	$LogContent = $LogContent -replace "\\", "\\\\"
}
 
$ReadLogFile.Close()

Now script execution time: 25.1311866 seconds. So .Net Replace is your best friend :)

Winner: Replace

2. What happens if I use -notlike in my text operation?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
 
[int]$LineNumber = 0;
[int]$TestCount = 0;
 
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
	$LogContent = $ReadLogFile.ReadLine()
	$LineNumber++
 
	$LogContent = $LogContent.Replace('\','\\')
 
	if ($LogContent -notlike "#*")
	{
		$TestCount++
	}
}
 
$ReadLogFile.Close()

Script takes 50.1493736 seconds.

But do I have another way for this query? Yes, I can use something like this. Let’s try -ne:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$LogFilePath = "C:\large.log"
$FileStream = New-Object -TypeName IO.FileStream -ArgumentList ($LogFilePath), ([System.IO.FileMode]::Open), ([System.IO.FileAccess]::Read), ([System.IO.FileShare]::ReadWrite);
$ReadLogFile = New-Object -TypeName System.IO.StreamReader -ArgumentList ($FileStream, [System.Text.Encoding]::ASCII, $true);
 
[int]$LineNumber = 0;
[int]$TestCount = 0;
 
# Read Lines
while (!$ReadLogFile.EndOfStream)
{
	$LogContent = $ReadLogFile.ReadLine()
	$LineNumber++
 
	$LogContent = $LogContent.Replace('\','\\')
 
	if ($LogContent.Substring(0,1) -ne "#")
	{
		$TestCount++
	}
}
 
$ReadLogFile.Close()

Script takes 25.3682308 seconds. OMG! :)

So using -eq/-ne queries 50% faster than -like/-notlike queries. Try to use them if it’s possible.

Winner: -EQ

To be continued.. :)


Comments (4)

Weekly IT Newsletter – February 23-27, 2015 | Just a Lync Guy

March 1st, 2015
17:36:30

[…] PowerShell Performance Tips for Large Text Operations – Part 2: Text Manipulation – […]


NeWay Technologies – Weekly Newsletter #136 – February 27, 2015 | NeWay

March 1st, 2015
17:37:23

[…] PowerShell Performance Tips for Large Text Operations – Part 2: Text Manipulation – […]


NeWay Technologies – Weekly Newsletter #136 – February 26, 2015 | NeWay

March 1st, 2015
17:40:18

[…] PowerShell Performance Tips for Large Text Operations – Part 2: Text Manipulation – […]


Wojciech Sciesinski

March 5th, 2015
23:53:44

Very interesting tips. I wait for next “episodes”.

Thank you!



Leave a Reply