RegEx vs. XML
I recently ran into a situation where I needed a simple method of parsing values from a cookie and placing into a class in my asp.net 2.0 application.
Initially, I created the class and used the built in .net Serialization - but the resulting xml was so verbose - especially for an array of values (strings for instance), that the values quickly grew larger than the 4K restriction that browsers place on cookie size.
I began to research an alternative method. Generating the initial information for the cookie was extremely simple and efficient, but actually parsing the returned values brought up a question of what would be fastest and most efficient - using RegEx for the parsing or using an XML reader.
The resulting xml document was very simple:
<userdata>
<theme>themename</theme>
<implementation>implementationname</implementation>
<roles>1|2|3|4|5</roles>
</userdata>
As you can see, using either RegEx Groups or XML Reader/Document methods it would be very easy to get the values from this xml schema - but which is more efficient?
To answer that question, I wrote a very quick console application to measure the speed of retrieving the values and the object parsing it using the RegEx classes or parsing it via the XML classes. The results are as follows:
1 Million
XML: 10 sec
RegEx: 9 sec
Difference: 1 sec
10 Million
XML: 1 min 41 sec
RegEx: 1 min 24 sec
Difference: 17 sec
100 Million
XML: 16 min 54 sec
RegEx: 13 min 59 sec
Difference: 2 min 55 sec
As you can see - in the smaller magnitude of iterations - up to about 1 Million - there was very little difference.
Once you move to 10 Million+ iterations, though, the efficiency of the RegEx processing really begins to stand out.
I certainly wouldn't recommend using RegEx for parsing out all XML files - but in this very simple example, I can comfortably say that it is the most efficient.
Has anyone had similar experiences? Is there anything that I'm not taking into account?