Monday, November 26, 2012

String Split Performance


I was working on a file parsing and loading program during the last week that seemed to run a bit slower than I expected, so I decided to run a profiler and see what the slowdown was. Not surprisingly SQLBULKCOPY was the primary slowdown, what was surprising was that splitting the incoming strings into an array was taking up 20% of the time.
I know there are several methods to splitting strings, and I dug around and found one I hadn't heard of, and built a quick test to compare them.
The data file consisted of pipe delimited data, 25 columns and 13,009,846 rows. The test was performed on a Core i5 32 bit Windows 7 machine. I ran the test 10 times for each of the methods and the following represents the average time in milliseconds (multiply by 1,000 to get seconds).

Test 1: Split(sr.ReadLine(), "|")
Avg ms: 42,920

Test 2: sr.ReadLine().Split("|")
Avg ms: 19,613

Test 3: Strings.Split(sr.ReadLine(), "|")
Avg ms: 45,185

Test 2 was the clear winner, with an average time less than 50% of the other 2 methods.
Test 3 was the method I hadn't heard of, most posts I read about it showed that it should have been the winner.

On your machine and your environment you may see drastically different results. It only takes a few minutes to set up a test and try the methods out for yourself.

2 final comments. I did run regex.split for anyone wondering, but the times were not even in the same ballpark so I excluded them above. From past experience regex can be a fast and powerful alternative when the splitting is more complicated than a single delimiter.

I assumed that creating the string in the loop would slow down things drastically, what with creating a string 13 million times, but the results were essentially the same.