Home | About Me | Developer PFE Blog | Become a Developer PFE

Contact

Categories

On this page

Yield Return…a little known but incredible language feature
Linking your pivot collections the fun and easy way
Pivoting ASP.NET event log error messages
Visualizing Azure performance counters…
Don’t guess when it comes to performance…a RegEx story.
Optimizing your life: E-Mail Edition
Exposing any class as a WCF endpoint…
Parsing ASP.NET event log error messages for fun and profit
PowerShell: Restoring a whole heap of SQL Databases
Micro optimization or just good coding practice?

Archive

Blogroll

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Sign In

# Sunday, 17 October 2010
Sunday, 17 October 2010 17:54:49 (Central Daylight Time, UTC-05:00) ( .NET | Performance )

“We shall neither fail nor falter; we shall not weaken or tire…give us the tools and we will finish the job.” – Winston Churchill

I don’t often blog about specific language features but over the past few weeks I’ve spoken to a few folks that did not know of the “yield” keyword and the “yield return” and “yield break” statements, so I thought it might be a good opportunity to shed some light on this little known but extremely useful C# feature.  Chances are, you’ve probably indirectly used this feature before and just never known it.

We’ll start with the problem I’ve seen that plagues many applications.  Often times, you’ll call a method that returns a List<T> or some other concrete collection.  The method probably looks something like this:

public static List<string> GenerateMyList()
{
List<string> myList = new List<string>();

for (int i = 0; i < 100; i++)
{

myList.Add(i.ToString());

}

return myList;

}

I’m sure your logic is going to be significantly more complex than what I have above but you get the idea.  There are a few problems and inefficiencies with this method.  Can you spot them?

  • The entire List<T> must be stored in memory.
  • Its caller must wait for the List<T> to be returned before it can process anything.
  • The method itself returns a List<T> back to its caller.

As an aside – with public methods, you should strive to not return a List<T> in your methods.  Full details can be found here.  The main idea here is that if you choose to change the method signature and return a different collection type in the future, this would be considered a breaking change to your callers.

In any case, I’ll focus on the first two items in the list above.  If the List<T> that is returned from the GenerateMyList() method is large then that will be a lot of data that must be kept around in memory.  In addition, if it takes a long time to generate the list, your caller is stuck until you’ve completely finished your processing.

Instead, you can use that nifty “yield” keyword.  This allows the GenerateMyList() method to return items to its caller as they are being processed.  This means that you no longer need to keep the entire list in memory and can just return one item at a time until you get to the end of your returned items.  To illustrate my point, I’ll refactor the above method into the following:

private static IEnumerable<string> GenerateMyList()
{
for (int i = 0; i < 100; i++)
{
string value = i.ToString();
Console.WriteLine("Returning {0} to caller.", value);
yield return value;
}

Console.WriteLine("Method done!");

yield break;
}

A few things to note in this method.  The return type has been changed to an IEnumerable<string>.  This is one of those nifty interfaces that exposes an enumerator.  This allows its caller to cycle through the results in a foreach or while loop.  In addition, the “yield return i.ToString()” will return that item to its caller at that point and not when the entire method has completed its processing.  This allows for a very exciting caller-callee type relationship.  For example, if I call this method like so:

IEnumerable<string> myList = GenerateMyList();

foreach (string listItem in myList)
{
Console.WriteLine("Item: " + listItem);
}

The output would be:

image

Thus showing that each item gets returned at the time we processed it.  So, how does this work?  Well, at compile time, we will generate a class to implement the behavior in the iterator.  Essentially, this means that the GenerateMyList() method body gets placed into the MoveNext() method.  In-fact, if you open up the compiled assembly in Reflector, you see that plumbing in place (comments are mine and some code was omitted for clarity’s sake):

private bool MoveNext()
{

this.<>1__state = -1;
this.<i>5__1 = 0;
// My for loop has changed to a while loop.
while (this.<i>5__1 < 100)
{
// Sets a local value
this.<value>5__2 = this.<i>5__1.ToString();
// Here is my Console.WriteLine(...)
Console.WriteLine("Returning {0} to caller.", this.<value>5__2);
// Here is where the current member variable
// gets stored.
this.<>2__current = this.<value>5__2;
this.<>1__state = 1;
// We return "true" to the caller so it knows
// there is another record to be processed.
return true;
...
}
// Here is my Console.WriteLine() at the bottom
// when we've finished processing the loop.
Console.WriteLine("Method done!");
break;


}

Pretty straightforward.  Of course, the real power is that it the compiler converts the “yield return <blah>” into a nice clean enumerator with a MoveNext().  In-fact, if you’ve used LINQ, you’ve probably used this feature without even knowing it.  Consider the following code:

private static void OutputLinqToXmlQuery()
{
XDocument doc = XDocument.Parse
(@"<root><data>hello world</data><data>goodbye world</data></root>");

var results = from data in doc.Descendants("data")
select data;

foreach (var result in results)
{
Console.WriteLine(result);
}
}

The “results” object, by default will be of type “WhereSelectEnumerableIterator” which exposes a MoveNext() method.  In-fact, that is also why the results object doesn’t allow you to do something like this:

var results = from data in doc.Descendants("data")
select data;

var bad = results[1];

The IEnumerator does not expose an indexer allowing you to go straight to a particular element in the collection because the full collection hasn’t been generated yet.  Instead, you would do something like this:

var results = from data in doc.Descendants("data")
select data;

var good = results.ElementAt(1);

And then under the covers, the ElementAt(int) method will just keep calling MoveNext() until it reaches the index you specified.  Something like this:

Note:  this is my own code and is NOT from the .NET Framework – it is merely meant to illustrate a point.

public static XElement MyElementAt(this IEnumerable<XElement> elements, 
int index)
{
int counter = 0;
using (IEnumerator<XElement> enumerator =
elements.GetEnumerator())
{
while(enumerator.MoveNext()){
if (counter == index)
return enumerator.Current;
counter++;
}
}

return null;
}

Hope this helps to demystify some things and put another tool in your toolbox.

Until next time.

# Monday, 11 October 2010
Monday, 11 October 2010 00:09:42 (Central Daylight Time, UTC-05:00) ( ASP.NET | Development | Pivot )

In my previous post, I added to my series of entries on making sense of your ASP.NET event log error messages.  Note that this is entry #4 in this series.  The previous three entries can be found here:

In that last post, I walked through the PAuthorLib.dll and showed you how to crawl through your event log error messages and create a pivot collection.  The result of that initial effort was a nice view into our events:

image

While this certainly got the job done and is a very powerful and compelling view into our data, we need to realize that as our data grow, the amount of entries in our linked collection is limited.  From the Developer Overview, we see that the maximum number of items we should have in a single collection is 3,000:

image

So, while a simple collection will get the job done for your smaller amounts of data, you will really run into some challenges with your larger datasets like our ASP.NET event log error messages.  To combat this limitation you can create what’s called a Linked Collection.  The idea is that it’s just a way for you to link together related collections in order to provide a seamless experience for your users.  In our case, a natural break for our collections will be based upon the exception type with a summary collection and then a separate collection for each exception type.  If I were to draw this out: 

PivotCollections

Event Log Header Collection Source

The idea behind this structure is that the Exception summary would simply link to each of these exception collections.  First, we’ll create a colleciton source for our exception summary.  As in our previous collection source (in my last blog post), we inherit from the AbstractCollectionSource class and use the LoadHeaderData() method to add our facet categories to the collection.  In this case, we’ll create two categories – the number of Occurrences and the Urls where the exception occurred.  Another difference is that we are going to pass the already parsed collection of messages into the constructor.  The reason for that is so we don’t have to repeat the parsing of the event log messages multiple times.

class EventLogHeaderCollectionSource : AbstractCollectionSource
{

private IEnumerable<EventLogMessage> m_messages = null;

public EventLogHeaderCollectionSource(IEnumerable<EventLogMessage> messages,
string inputFile)
: base(inputFile)
{

m_messages = messages;


}

#region Facets

private const string OCCURRENCES = "Occurrences";
private const string URLS = "Urls";

#endregion

protected override void LoadHeaderData()
{
this.CachedCollectionData.FacetCategories.
Add(new PivotFacetCategory(OCCURRENCES, PivotFacetType.Number));
this.CachedCollectionData.FacetCategories.
Add(new PivotFacetCategory(URLS, PivotFacetType.String));

this.CachedCollectionData.Name =
"ASP.NET Error Messages - Summary";
this.CachedCollectionData.Copyright =
new PivotLink("Source", "http://www.samuraiprogrammer.com");

}
}

Then, in the LoadItems() method, we provide the logic to generate the PivotItem collection.  The one key item to make note of is the use of the Href property of the PivotItem object.  This is where we specify the collection we wish to link to this item.  Since each of the PivotItems will be a summary of the number of each exception type – we’ll name the sub-collections by its exception type.  For example, NullReferenceException.cxml, SqlException.cxml, etc.

protected override IEnumerable<PivotItem> LoadItems()
{
var results = from log in m_messages
group log by log.Exceptiontype into l
orderby l.Count() descending, l.Key
select new
{
ExceptionType = l.Key,
ExceptionCount = l.Count()
};


int index = 0;
foreach (var result in results)
{
PivotItem item = new PivotItem(index.ToString(), this);
item.Name = result.ExceptionType;
item.Description = "# of Exceptions: " + result.ExceptionCount.ToString();
item.AddFacetValues(OCCURRENCES, result.ExceptionCount);
item.Href = result.ExceptionType + ".cxml";

...

index++;
yield return item;
}

yield break;
}

Event Log Collection Source Redux

Previously, when we generated the pivot collections, we were outputting all of the records into a single collection.  Now that we are generating a collection for each exception type, we will need to put a filter in our exception collection and then incorporate that filter into our item generation.  Other than that, the code we wrote previously remains largely unchanged, so I left the majority of it out and only included the snippets that we care about below.

class EventLogCollectionSource : AbstractCollectionSource
{
private IEnumerable<EventLogMessage> m_messages = null;
private string m_exceptionType = string.Empty;

public EventLogCollectionSource(
IEnumerable<EventLogMessage> messages,
string exceptionType,
string path)
: base(path)
{
m_messages = messages;
m_exceptionType = exceptionType;
}

protected override void LoadHeaderData()
{
...
this.CachedCollectionData.Name =
string.Format("{0} Error Messages", m_exceptionType);
...
}

protected override IEnumerable<PivotItem> LoadItems()
{
var results = (from message in m_messages
where message.Exceptiontype == m_exceptionType
select message);

int index = 0;
foreach (EventLogMessage message in results)
{
PivotItem item =
new PivotItem(index.ToString(), this);
item.Name = message.Exceptiontype;
item.Description = message.Exceptionmessage;

...

index++;
yield return item;
}
yield break;
}
}

Generate and test the collection

Then, the only thing we have left to do is generate and test our linked collections.  I won’t go into a lengthy explanation of how we generate the collections because I did that in the last blog entry.  I will show the broad strokes required to tie this all together, though:

// Load the raw messages into a collection
IEnumerable<EventLogMessage> messages =
LoadEventLogMessages(inputFile).ToList();

// Generate summary pivot collection
EventLogHeaderCollectionSource sourceSummary =
new EventLogHeaderCollectionSource(messages, inputFile);
...
summaryTargetFilter1.Write(sourceSummaryFilter1);

// Get the aggregate results so we know the filters
// for our exception pivot collections
var summaryResults = from log in messages
group log by log.Exceptiontype into l
orderby l.Count() descending, l.Key
select new
{
ExceptionType = l.Key,
ExceptionCount = l.Count()
};

foreach (var resultItem in summaryResults)
{
// Generate pivots for each exception type
EventLogCollectionSource source =
new EventLogCollectionSource(messages,
resultItem.ExceptionType,
inputFile);
...
targetFilter1.Write(sourceFilter1);
}

Once we we have this code and everything has been generated, if we open the output folder, we’ll see the following structure:

image

We see our ExceptionSummary pivot collection and all of the deep zoom folders.  So, when we open the Pivot tool, we’ll see a nice parent collection:

image

This gives us a nice breakdown of the number of occurrences for each exception in our source data.  Immediately we see an outlier (more on that later) between the 6,000 and 7,000 item mark and when we select that tile, we see the following:

image

We also see a green “Open” box (surrounded in a red rectangle, my emphasis) which links to our NullReferenceException.cxml.  When we click that box, the tool will immediately open that collection in the UI for our perusal – providing a very similar look to what we saw in the last entry:

image

Closing Thoughts

Now, you may have noticed a contradiction above.  I said that a collection should have no more than 3,000 items and yet, with the NullReferenceException collection, we saw in the summary that it had over 6,000 items.  That is a very good point and will be a subject of a future blog post.  I wanted to illustrate the simple collections and the linked collections before we got into that third type of collection from above – the Dynamic Collection.  Stay tuned!

# Monday, 04 October 2010
Monday, 04 October 2010 10:43:49 (Central Daylight Time, UTC-05:00) ( ASP.NET | Development | Pivot )

logo-pivotUnless you’ve been hiding under the proverbial rock, you’ve probably seen the recent Pivot hoopla.  If you’re not familiar with it, it’s a way to visualize a large amount of data in a nice filterable format.  The nice thing about it is that it’s really easy to put together a pivot collection and there are a ton of tools available for just this purpose.  Just do a search on CodePlex for Pivot and you’ll get about 40’ish good results for tools you can use to create a Pivot Collection. 

So, I was putting together a proof-of-concept for an internal project and thought I would continue on with my series of blog posts on ASP.NET Error Message event logs with a post on how to visualize this data using a pivot.  You may wish to read parts 1 and 2 here:

So, when I put together my pivot, I worked out a 3 step process:

  1. Figure out what you want to Pivot
  2. Find an API and convert the data
  3. Generate and Test the collection

Let’s begin, shall we.

Figure out what you want to Pivot

The structure for the Pivot Collection is deceptively simple -

<?xml version="1.0"?> 
<Collection Name="Hello World Collection" …>
<FacetCategories>
<FacetCategory Name="Hello World Facet Category One" Type="String"/>
</FacetCategories>
<Items ImgBase="helloworld.dzc">
<Item Img="#0" Id="0" Href="http://www.getpivot.com" Name="Hello World!">
<Description> This is the only item in the collection.</Description>
<Facets>
<Facet Name="Hello World Facet Category One">
<String Value="Hello World Facet Value"/>
</Facet>
</Facets>
</Item>
</Items>
</Collection>

The way that I think about the Items in the Collection are in the same way that you might think about an object.  For example, a Car object might have the following properties:

  • Advertising blurb
  • Car and Driver Reviews
  • Color
  • Make
  • Model
  • Engine
  • 0-60mph time
  • Max Speed

The common values like the Color, Make, Model, 0-60mph time and max speed become the facets or attributes that describe your object in relation to other instances of objects.  Things like the advertising blurbs and car and driver reviews or descriptions belong instead as properties of your Item directly in the Description element.

For our data, namely ASP.NET exceptions, we’re going to define an exception as the following:

  • Item
    • Name = Exception Type
    • Description = Exception Message
    • Facets
      • Request Path
      • Stack Trace
      • Event Time
      • Top Method of Stack Trace
      • Top My Code Method of Stack Trace

This should allow us to group and drill through the common properties that might link exceptions together and still provide detailed error information when necessary.

Find an API and code it

The second step here is to find some code/API/tool that we can enhance for our purposes.  There are some great tools published by the Live Labs team – for example:

While both tools could be used in this instance, in part 2 we found that some of our Event Logs we were parsing contained more than 10,000 items and I wanted a bit more control over how I converted the data.  “No touching” is the phrase of the day.  Fortunately, the command line tool was published on CodePlex with an API we can use.  Once you download the product you see that it contains 3 assemblies:

image

The last item there is the PauthorLib.dll which encapsulates many of the extension points within this great tool.  In-fact, it exposes about 7 different namespaces for our purposes:

image

For our purposes, we are going to focus on the Streaming set of namespaces.  Why?  Well, this is namely because we are going to be dealing with a lot of data and I didn’t want to load everything into memory before writing it to disk.  If you look at the contents of the Streaming namespace, you’ll see a great class called “AbstractCollectionSource”.  This looks fairly promising because it exposes two main methods:

class EventLogExceptionCollectionSource : AbstractCollectionSource
{
protected override void LoadHeaderData()
{
throw new NotImplementedException();
}

protected override IEnumerable<PivotItem> LoadItems()
{
throw new NotImplementedException();
}
}

Before we do anything, though, we need a constructor.  The constructor will be responsible for taking a string representing the path to our data and passing it to our base class’s constructor.

public EventLogExceptionCollectionSource(string filePath)
: base(filePath)
{

// Do nothing else.

}
Then, the first method, LoadHeaderData, is where we define our facets – Request Path, Stack Trace, etc.  - as well as the data types that each facet will be.  So, our code will be fairly simple and straight-forward:
protected override void LoadHeaderData()
{

this.CachedCollectionData.FacetCategories.Add(
new PivotFacetCategory(STACKTRACE,
PivotFacetType.LongString));
this.CachedCollectionData.FacetCategories.Add(
new PivotFacetCategory(REQUESTPATH,
PivotFacetType.String));
this.CachedCollectionData.FacetCategories.Add(
new PivotFacetCategory(EVENTTIME,
PivotFacetType.DateTime));
this.CachedCollectionData.FacetCategories.Add(
new PivotFacetCategory(TOPMETHOD,
PivotFacetType.String));
this.CachedCollectionData.FacetCategories.Add(
new PivotFacetCategory(TOPAPPMETHOD,
PivotFacetType.String));

this.CachedCollectionData.Name = "Event Log Error Messages";


}

The second method, LoadItems(), is responsible for doing exactly what it suggests – this is where we load the data from whichever source we care about and then convert it into our PivotItem collection.  For our purposes, we’re going to load the XML file we defined in Part 1 of this series into a list of EventLogMessage objects and then convert those EventLogMessage objects into PivotItem objects:

protected override IEnumerable<PivotItem> LoadItems()
{
// Load XML file
XDocument document = XDocument.Load(this.BasePath);

// Populate collection of EventLogMessage objects
var messages = from message in document.Descendants("Message")
select EventLogMessage.Load(message.Value);

int index = 0;
foreach (EventLogMessage message in messages)
{

PivotItem item = new PivotItem(index.ToString(), this);
item.Name = message.Exceptiontype;
item.Description = message.Exceptionmessage;
item.AddFacetValues(REQUESTPATH, message.Requestpath);
item.AddFacetValues(STACKTRACE, message.Stacktrace);
item.AddFacetValues(EVENTTIME, message.Eventtime);
item.AddFacetValues(TOPMETHOD, message.StackTraceFrames[0].Method);
item.AddFacetValues(TOPAPPMETHOD, GetFirstNonMicrosoftMethod(message.StackTraceFrames));

index++;
yield return item;

}
}

The key method calls from above are the AddFacetValues(…) method calls.  This method essentially sets the attributes we wish to have our data pivot upon.  This, by itself, isn’t enough to generate our great pivot calls – we need to call our code from somewhere.  Since this is a simple app, we’re going to make it a console app.  For our Collection to get generated we need to use a few other objects included in this API:

  • EventLogExceptionCollectionSource – The class we created above.
  • HtmlImageCreationSourceFilter – This class will generate the tile in the Pivot based upon some HTML template we specify.
  • LocalCxmlCollectionTarget – Generates the Collection XML file at the path we specify.
  • DeepZoomTargetFilter – Generates the deep zoom files to support our collection XML file and also enables all of our fancy transitions.

In practice, the code is pretty simple and straight forward and I applaud the people who wrote this library:

private static void GenerateExceptionPivot(string inputFile, string outputFolder)
{
string collectionName = Path.Combine(outputFolder, "MyExceptions.cxml");

EventLogExceptionCollectionSource source =
new EventLogExceptionCollectionSource(inputFile);
HtmlImageCreationSourceFilter sourceFilter1 =
new HtmlImageCreationSourceFilter(source);
sourceFilter1.HtmlTemplate =
"<html><body><h1>{name}</h1>{description}</body></html>";
sourceFilter1.Width = 600;
sourceFilter1.Height = 600;

LocalCxmlCollectionTarget target =
new LocalCxmlCollectionTarget(collectionName);
DeepZoomTargetFilter targetFilter1 =
new DeepZoomTargetFilter(target);
targetFilter1.Write(sourceFilter1);

}

That last statement, targetFilter1.Write(…) is what will actually execute everything and write our resultant files to disk.

Generate and Test the collection

So, now if we run our console application and call that GenerateExceptionPivot(…) method, we’ll get some great output.

image

What’s nice about the Library is that it provides progress as it iterates through your data (in the red rectangle) and also in the blue rectangle, we need that it’s multi-threaded by default.  This is primarily for the most intensive part of the operation – the creation of the deep zoom artifacts.  If you have one of those new fangled machines with 2+ cores, you can tweak the number of threads that it will spawn for this operation by setting the ThreadCount property of the DeepZoomTargetFilter object.  This may or may not improve your performance but it’s nice that the option is available.

...
DeepZoomTargetFilter targetFilter1 =
new DeepZoomTargetFilter(target);

targetFilter1.ThreadCount = 100;
targetFilter1.Write(sourceFilter1);
...

Once our collection has been generated, we can browse it in an explorer.exe window just to get an idea of what our code has wrought:

image

And then to test it, you can just point the Live Labs Pivot application at our “MyExceptions.cxml” file and view the wonderful data.  For example, you can look at the Event Time in the histogram view to see how your exceptions broke down over time.  You can also filter your data by things like the RequestPath (the page that threw the exception) or the Method that was at the top of the callstack.

image

Then, you can zoom in on a specific time slice you care about:

image

Then, if you want to view the details for a specific instance, just click the corresponding tile.  Then, a new side bar will appear on the right hand side with all of the details we stored in this record:

image

We generated a single collection in this blog post.  One thing to keep in mind is that each collection should have no more than 3,000 items.  For collections in which you want to have more than 3,000 items, you should look at potentially creating a Linked Collection.  That will be the subject of an upcoming blog post. 

Until next time!

# Monday, 27 September 2010
Monday, 27 September 2010 00:19:44 (Central Daylight Time, UTC-05:00) ( Azure | Perfmon )

I’ve blogged before about how some things are different in the cloud, namely Azure.  That post dealt with finding out why your Azure Role crashed by using some of the logging facilities and storage available in Azure.  That’s kind of a worst case scenario, though.  Often, you’ll just want to gauge the health of your application.  In the non-cloud world, you’d just use the Windows Performance Monitor to capture performance counters that tell you things like your CPU utilization, memory usage, request execution time, etc.  All of these counters are great to determine your overall application and server health as well as helping you troubleshoot problems that may have arisen with your application. 

I’m often surprised at how many developers have never used Perfmon before or think that it is purely a task that administrators need to care about.  Performance is everyone’s responsibility and a tool like Perfmon is invaluable if you need to gain an accurate understanding of your application’s performance. 

When you’ve needed to log counters in the past, you would have probably become familiar with this dialog:

image

In the Azure world, you can add counters to be collected in a variety of ways.  A popular way is just to add the counters you want in your OnStart() event. 

public override bool OnStart()
{
DiagnosticMonitorConfiguration dmc = DiagnosticMonitor.GetDefaultInitialConfiguration();

// Add counter(s) to collect.
dmc.PerformanceCounters.DataSources.Add(
new PerformanceCounterConfiguration()
{
CounterSpecifier = @"\Processor(*)\*",
SampleRate = TimeSpan.FromSeconds(5)
});

// Set transfer period and filter.
dmc.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(5);

// Start logging, using the connection string in your config file.
DiagnosticMonitor.Start("DiagnosticsConnectionString", dmc);

...

return base.OnStart();
}

This gets the job done quite nicely and even though we’re sampling our performance counter every 5 seconds, the data won’t get written to our persistent storage until we hit the 5 minute mark.

So, let’s fast forward a bit.  You’ve added counters to monitor and now you want to view some of this data.   Fortunately, there are a few tools at your disposal:

For example, using the Azure Management Tool, you can navigate to the Diagnostics/Analysis section to view the performance monitor counters you stored. 

image

Herein lies one of the challenges you may encounter.  Azure is kind of a different beast and it doesn’t store the data in storage using the typical BLG format.  Instead, if you look at the raw data, it will be in an XML or tabular format and may look like:

image

While this is nice and clean, the problem you’ll hit is the visualization of the data.  Most organizations use Perfmon to visualize the counter data you record.  Alas, in this case you can’t use it that way right out of the box.  Fortunately, my friend Tom Fuller and I realized this gap and he wrote a plug-in to the Azure Management Tool that will transform this data from the format above into a native BLG format.  He wrote up a great blog entry with step-by-step instructions and the source code here. You can also download a compiled version of the extension here (along with the Windows Azure MMC) here.  To install the plugin, just copy the DLL into the \WindowsAzureMMC\release folder:

image

Then, when you open up the tool, you’ll see another option in that dropdown in the Diagnostics > Analysis window:

image

Then, when you click show, you’ll get a nice popup telling you to click here to view your data in Perfmon:

image

Then, when you click on it, Perfmon will start with all of the counters added to the display:

image

A few things to note:

  • It will store the resulting BLG file at the following path:

c:\Users\<user name>\AppData\Local\Temp\<File Name>.blg

The BLG can be used in a variety of applications.  For example, you can use it with the great PAL tool.  

  • All of the counters will appear in the graph by default.  Depending upon the amount of data you’re capturing, this could give you data overload.  Just something to be aware of.

Until next time.

# Sunday, 19 September 2010
Sunday, 19 September 2010 13:30:02 (Central Daylight Time, UTC-05:00) ( Performance | RegEx )

Ricky_Bobby_I_wanna_go_fastMany times, when I work with a customer, it’s because they’ve tried to accomplish something and need a little extra help.  Often, this falls into the application optimization area.  For example, a few years ago, I had a customer that was developing a rather sophisticated SharePoint workflow that had some custom code that would process and merge two Excel spreadsheets together.  They were using Excel 2007 so their merging was being done using the excellent Open XML SDK.  To their credit, the application did what it needed to do – but it took about an hour to process these spreadsheets.  The developers on the project knew about the performance problems but as so often happens, they thought they knew where the bottlenecks were and how they should approach optimizing it.  So, they started injecting some tracing into their code and worked hard to optimize this lengthy process.  After a while, though, they had only shaved a few seconds off of that 60 minute time and while they did show some improvement – they knew they needed to get the processing done even faster for this to become a viable solution for their organization.  So, they sent me a simple repro of the code and together in just a few short days, we were able to get the processing from 60 minutes to under a minute.  That’s a BIG win.  Big like the Titanic big. 

I love these types of engagements because, well, I like to make things faster.  The biggest problem that I see, though, is that some people shy away from using the tools in optimization scenarios because they’ve been so invested in their code that they think they know why it’s not performing.  This is the “psychic” effect and the mindset is usually something like this: 

“I wrote this darn code and while I was writing it – I knew this method could be improved so now I’m going to finally optimize the darn thing.” 

It sounds good in-theory, right?  You wrote the code, so you should know how to improve it and really, what’s a tool going to tell you that you don’t already know?  In truth, it can tell you quite a bit.  In other situations, the developers will add some instrumentation (via tracing/debugging statements) to what they perceive as the critical code paths and the resultant timing points to one area when the problem really resides in a completely different section of code.  That’s right, folks, your tracing statements may be lying to you.  So what’s a developer to do?  Well, use the tools, of course – the right tool for the right job, as they say.  Let me expand upon that in the context of some code that yours truly wrote a little while ago.

The Problem

A few short weeks ago, I posted a entry about parsing your ASP.NET event log error messages.  For those that didn’t read it, we just convert the EventLog messages to XML, parse the messages using RegEx.Match and then generate some statistics on them.  The code I provided on that blog entry appeared to be very fast when it was processing a few records – but as the XML files grew, I started noticing that it seemed to take longer and longer to process the results.  So, I did what any developer does initially.  We’ll add some instrumentation into our code to see if we can figure out the problem.  This usually takes the form of something like the following:

Console.WriteLine("Start:  Load XDocument." + DateTime.Now.ToString());
XDocument document = XDocument.Load(@"C:\Work\TestData\AllEVTX\MyData.xml");
Console.WriteLine("End: Load XDocument." + DateTime.Now.ToString());

Console.WriteLine("Start: Load Messages into object." + DateTime.Now.ToString());
var messages = from message in document.Descendants("Message")
select EventLogMessage.Load(message.Value);
Console.WriteLine("End: Load Messages into object." + DateTime.Now.ToString());

Console.WriteLine("Start: Query Objects." + DateTime.Now.ToString());
var results = from log in messages
group log by log.Exceptiontype into l
orderby l.Count() descending, l.Key
select new
{
ExceptionType = l.Key,
ExceptionCount = l.Count()
};
Console.WriteLine("End: Query Objects." + DateTime.Now.ToString());

Console.WriteLine("Start: Output Exception Type Details." + DateTime.Now.ToString());
foreach (var result in results)
{
Console.WriteLine("{0} : {1} time(s)",
result.ExceptionType,
result.ExceptionCount);
}
Console.WriteLine("End: Output Exception Type Details." + DateTime.Now.ToString());

As you can see, all I’ve done is taken the code and slapped some Console.WriteLine statements with a DateTime.Now call to find out when the operation starts and when it completes.  If I run this code I get the following timings:

Start:  Load XDocument.9/19/2010 11:28:08 AM
End:  Load XDocument.9/19/2010 11:28:09 AM
Start:  Load Messages into object.9/19/2010 11:28:09 AM
End: Load Messages into object.9/19/2010 11:28:09 AM
Start:  Query Objects.9/19/2010 11:28:09 AM
End:  Query Objects.9/19/2010 11:28:09 AM
Start:  Output Exception Type Details.9/19/2010 11:28:09 AM
End:  Output Exception Type Details.9/19/2010 11:28:49 AM

This is clearly a problem  Now, the question is why?

The Research

This might lead you to believe that the simple loop I have to get the ExceptionType and the ExceptionCount was the root of all evil, so to speak.  The problem, though, is that there’s not really a lot you can do to improve this:

foreach (var result in results)
{
Console.WriteLine("{0} : {1} time(s)",
result.ExceptionType,
result.ExceptionCount);
}

Oh sure, you could use the excellent Parallel functionality in .NET 4 to send off the work to alternate threads.  So, you might do something like this:

Parallel.ForEach(results, result=> {

Console.WriteLine("{0} : {1} times(s)",
result.ExceptionType,
result.ExceptionCount);

});

But if you re-run the test, you get the following results:

Start:  Load XDocument.9/19/2010 11:46:20 AM
End:  Load XDocument.9/19/2010 11:46:20 AM
Start:  Load Messages into object.9/19/2010 11:46:20 AM
End: Load Messages into object.9/19/2010 11:46:20 AM
Start:  Query Objects.9/19/2010 11:46:20 AM
End:  Query Objects.9/19/2010 11:46:20 AM
Start:  Output Exception Type Details.9/19/2010 11:46:20 AM
End:  Output Exception Type Details.9/19/2010 11:46:54 AM

Wait a second…is it taking longer now???  That’s just not right.  At this point, you should take a step back and tell yourself “STOP GUESSING AND JUST USE THE PROFILER.”  So, always being one to listen to myself, I kick off the Visual Studio profiler and immediately it shows us the “hot path”.  This was actually a nice improvement in the VS2010 profiler.  Right on the summary page, it will show us our most expensive call paths:

image

But wait, it’s pointing to the RegEx.Match(…).  Why is it pointing to that?  From my own metrics above, I see that the loading of the Message strings into an object takes less than a second to execute.  Well, the real reason is that LINQ uses kind of a lazy loading algorithm.  Basically, it won’t necessarily process your queries until you try to do something with the data.  Yes, that’s an over-simplification but in our case, it means that my EventLogMessage.Load(…) method won’t be called until I actually try to do something with the data.  So, now, armed with this information, I can take a look at my Load() method and see what it’s actually doing and how it’s using the RegEx.Match(…) functionality:

Match myMatch = s_regex.Match(rawMessageText);

EventLogMessage message = new EventLogMessage();

message.Eventcode = myMatch.Groups["Eventcode"].Value;
message.Eventmessage = myMatch.Groups["Eventmessage"].Value;
message.Eventtime = myMatch.Groups["Eventtime"].Value;
message.EventtimeUTC = myMatch.Groups["EventtimeUTC"].Value;
...

return message;

So, basically we’re using the Match(…) to take that Message property and parse it out into the properties of the EventLogMessage object.  The fact that this is the slowest part of the code shouldn’t necessarily shock you.  Jeff Atwood wrote up a good blog entry a few years ago on something like this.  If we look just at the RegEx.Match(…) in the profiler, we see that the problem isn’t necessarily with each Match call but the overall cost with 10,000+ calls:

Function Name Number of Calls Min Elapsed Inclusive Time Avg Elapsed Inclusive Time Max Elapsed Inclusive Time
System.Text.RegularExpressions.Regex.Match(string)
10,881 1.01 3.27 684.29

The Fix

So, now that we know the problem and the reason for it – what’s a Dev to do?  Well, at this point, we should be thinking of an alternate way of performing this parsing without using RegEx.  The simplest method is just to use some general string parsing, so we’ll start off with replacing our large RegEx pattern with a simple string array:

public static readonly string[] ParserString = new string[] {
@"Event code:" ,
@"Event message:" ,
@"Event time:" ,
@"Event time (UTC):" ,
@"Event ID:" ,
...
@"Stack trace:" ,
@"Custom event details:"};

This array will be used in our fancy new GetValue method:

private static string GetValue(string rawMessageText, int Key)
{
int startLoc = rawMessageText.IndexOf(ParserString[Key]);
int endLoc;
if (Key + 1 == ParserString.Length)
endLoc = rawMessageText.Length;
else
endLoc = rawMessageText.IndexOf(ParserString[Key + 1], startLoc);

return rawMessageText.Substring(startLoc + ParserString[Key].Length, endLoc - startLoc - ParserString[Key].Length);
}

This method accepts our raw message string and a key index.  This method just finds the existence of a string like “Event message:” in our raw message and then finds the index of the next string, like “Event time:” and subtracts the two to get at the value of the field.  For example, given the following string:

“… Event message: An unhandled exception has occurred. Event time: …”

The red text are the keys and the highlighted blue text is the string between them.  The idea for the above GetValue(…) method was actually provided by a fellow PFE engineer, Richard Lang during a late night chat session.

The last step to this process is just to call the GetValue(…) method from our new Load method:

public static EventLogMessage Load(string rawMessageText)
{
EventLogMessage message = new EventLogMessage();

int Key = 0;
message.Eventcode = GetValue(rawMessageText, Key++);
message.Eventmessage = GetValue(rawMessageText, Key++); ;
message.Eventtime = GetValue(rawMessageText, Key++); ;
...
return message;

}

So, now we’ve essentially removed the need for RegEx by implementing our own string parsing algorithm.  Once we compile the code and run our application again, we see some major improvements:

Start:  Load XDocument.9/19/2010 12:55:31 PM
End:  Load XDocument.9/19/2010 12:55:31 PM
Start:  Load Messages into object.9/19/2010 12:55:31 PM
End: Load Messages into object.9/19/2010 12:55:31 PM
Start:  Query Objects.9/19/2010 12:55:31 PM
End:  Query Objects.9/19/2010 12:55:31 PM
Start:  Output Exception Type Details.9/19/2010 12:55:31 PM
End:  Output Exception Type Details.9/19/2010 12:55:33 PM

We essentially improved the processing of our records from 40 seconds to under 2 seconds.  I’d say that’s a pretty big improvement.  Even better, we can use our Parallel.ForEach(…) code from above to make this even faster since we’re no longer bound by the RegEx parser:

Start:  Load XDocument.9/19/2010 1:00:13 PM
End:  Load XDocument.9/19/2010 1:00:13 PM
Start:  Load Messages into object.9/19/2010 1:00:13 PM
End: Load Messages into object.9/19/2010 1:00:13 PM
Start:  Query Objects.9/19/2010 1:00:13 PM
End:  Query Objects.9/19/2010 1:00:13 PM
Start:  Output Exception Type Details.9/19/2010 1:00:13 PM
End:  Output Exception Type Details.9/19/2010 1:00:14 PM

So now it takes just about a second to process these records.  Considering it’s processing over 10,000 event log messages, I’d say this is acceptable performance, for now.

Closing Comments

I just want to say a few things real quick.  RegEx is not inherently evil.  It is still one of the fastest and easiest methods to consume and parse data.  You should not feel the need to go back to your own applications that are working just fine and refactor all of your code to strip out the RegEx expressions.  It just so happens that sometimes too much of a good thing can be bad for your health.  In our case, with over 10,000 RegEx.Match(…) calls in rapid succession, the RegEx appeared to be our bottleneck.  This may or may not be the root cause of your own performance problems.  The key takeaway from this blog entry should be that you should NOT guess when it comes to optimizing code paths.  Instead, you should use the tools available at your disposal to find the bottleneck. 

Until next time.

# Saturday, 11 September 2010
Saturday, 11 September 2010 20:43:24 (Central Daylight Time, UTC-05:00) ( Life | Productivity )

icon_Outlook_webAs with most people, I hate to do the same thing more than once.  In-fact, I had a job in college where I audited expense reports and then had to enter them into an application so they could be paid.  The auditing of the expense report was actually fun because I got to see some interesting things people would attempt to pass off as “business expenses” – like the guy that tried to expense an entire addition to his house claiming that it was for his new office.  That said, I absolutely HATED the data entry part.  It was repetitive and boring and I just couldn’t understand why the person submitting expense reports couldn’t do the data entry into some application (they were already using Excel) and then electronically submit the data into our system for review and payment.  That led me to design and build a simple application (in my off-time) to do exactly that.  It was fairly sophisticated (for a college student doing something in his down time).  The project was a tremendous success and it increased efficiency on my team to the Nth degree because it cut out steps, reduced the volume of duplicate data entry and everything was online.  It was just a better workflow.  I had optimized a fairly simple business process and I was seeing the tremendous dividends that such a project could have in the business world.  In-fact, it was so successful and my team became so efficient that I eventually got “let go” because there just wasn’t enough work to go around anymore and the other folks on the team were permanent employees and I was a lowly hourly worker. 

So, while that situation didn’t turn out so well for me, I became addicted to optimization – be it in software or my own personal life.  And I’ve carried that lesson forward into the years since.  For example, a few years back, I wrote a Quick Link Plugin for Windows Live Writer.  It was a very simple plugin that would open up a form in Windows Live Writer to do a web search and then quickly insert the link into your blog post.  This was another way that I was optimizing my (digital) life in that I hated having to leave the WLW environment to look something up and then copying/pasting the link back into the WLW window.  It’s painful, repetitive and just plain wasteful. 

Scott Hanselman blogged a bit ago about whether recipients of your e-mails deserve the gift of your keystrokes.  I found it an interesting read and I highly recommend it.  As an alternative to reaching only one person, he recommended taking it an extra step and blogging about it or posting it to a wiki and then sending the link to the individual.  It’s a great idea but in my role, I’m often asked similar questions over and over and the customer doesn’t want to wade through a list of links to get the answer they need.  Or, taking it one step further – once you blog something once, how do you retain that link somewhere so you don’t have to go hunting around for the URL when you need it later on down the line.   

My Solution (for now)

The solution that I’ve used for years is to put these resources and snippets in my Outlook Signature list.  If you’re not aware of Outlook signatures, well, I can’t really help you.  Kidding!  Signatures have been in Outlook for years and have been used primarily to append contact information and what-not into your e-mail.  It always struck me as kind of odd that Outlook allows you to add so many signatures when really, how many ways to say “I’m so and so and here is my information” could you possibly have?  So, I’ve decided to repurpose that list for my own uses.  I will essentially add a new signature for each important blurb/link/etc. to my list.  Then, when I am deeply into writing an e-mail and I get a question I’ve answered before, like:

“How can I turn on ValidateIntegratedModeConfiguration for all of the sites on my server and not allow my applications to override it?”

I can simply go to my signature list and select that item and it will magically appear in my e-mail:

image

Less clicks and key presses for me, the person gets their response quicker – it’s a win-win all around.  I’ve even prefixed my most frequently used signatures at the top with an “_” in the name since I don’t want to go hunting for the stuff I use frequently.  You may have also noticed that I have one called “_Expense Report”.  This is just a simple e-mail that I send to my boss when I have some receipts I need to send her so I can get my expense reports approved.  Yes, it’s likely a micro-optimization but if it shaves about 2 minutes off each of those e-mails I send:

2 minute savings * 4 expense reports/month * 12 months/year = 96 minutes savings/year

This makes it totally worth it for me.  I can’t wait until we put something like the Visual Studio “Snippet” functionality in Outlook or someone writes an Evernote add-in for Outlook to allow me to search and insert text from my notes.  That would just make my life complete.

Until next time…

# Friday, 03 September 2010
Friday, 03 September 2010 01:41:11 (Central Daylight Time, UTC-05:00) ( .NET 4 | WCF )

Often, I’ll have a customer that wants to utilize a new feature in the .NET framework but they don’t necessarily know where to start or they have a large investment in another area and they’re afraid of the amount of rework required to get something working.  This week, for example, I was teaching a customer about WPF and they asked:

“We have several class libraries and we’d like to expose them as WCF services.  How can we do it?”

Yes, it was a bit off-topic from WPF but not really.  I’ve found that when a customer chooses to start investing in a newer technology, like WPF, I’ve found that they also want to look at ways to revamp their existing applications to use other new features.  But I digress…

In pre-.NET 4.0, if you wanted to expose a class as a WCF endpoint and make it available in IIS as an SVC file, you would have to do the following:

  1. Create a new WCF Service (in the same category as WebSite).
  2. Add  a bunch of new “WCF Service” classes (files with an extension of “.SVC”).
  3. Write some code to call out to your existing class library from the WCF Service.

It’s not exactly a painful experience but would require a bit of work to get everything up and running.  In many cases, if you have a huge investment in something like a class library, the amount of work may be incredibly daunting.  Fortunately, using some of the features in WCF 4 (part of .NET 4), this type of story is ridiculously easy.  The steps that we’ll follow are the following:

  1. Decorate your existing class.
  2. Add configuration.
  3. Deploy to IIS

Then, I’ll discuss additional options for debugging your new service.

Step 1:  Decorate your existing class

So, I’m sure we all have a class in some Class Library somewhere that looks like this:

public class MyBusinessLogic
{

public string GetName(int id)
{

return "My ID is: " + id;

}

public MyDataClass GetEmpInformation(int id)
{

return new MyDataClass()
{
FirstName = "Greg",
LastName = "Varveris",
Occupation = "Developer"
};


}
}

This is just a simple little business logic class that exposes two public methods – GetName and GetEmpInformation.  So, let’s say we wanted to make this class accessible as a WCF service.  The first thing we need to do is add a reference to the .NET 4 version of System.ServiceModel in our Class Library project. 

image

I’ve included a screenshot to our left just to further show that there is no special sauce here.  All of the assemblies being referenced in this project are straight up class library defaults.  Now that we have the assembly referenced, let’s decorate our class with two attributes:

  • ServiceContract – This decorates our class and tells .NET that we have a WCF Service.
  • OperationContract – This decorates our methods and tells .NET which methods/functions we want to expose in our service.

The decorated class will now look like:

[ServiceContract]
public class MyBusinessLogic
{

[OperationContract]
public string GetName(int id)
{

return "My ID is: " + id;

}

[OperationContract]
public MyDataClass GetEmpInformation(int id)
{

return new MyDataClass()
{
FirstName = "Greg",
LastName = "Varveris",
Occupation = "Developer"
};


}
}

And really that’s all you have to do from the code side.

Step 2:  Add Configuration

WCF configuration has always been synonymous with “WOAH” before.  Even using the great configuration editor tool, you may end up with a large amount of XML in your .config file.  This is mostly because WCF is just so darn extensible.  Well, for our scenario, we’ll use some great new features in WCF 4 to make the configuration a breeze:

  • File-less Activation
  • Default Configuration model

You can read-up about each of these features in the nice MSDN documentation. Basically, WCF 4 exposes the ability to map a file-name to a class and then it can expose a default endpoint (using BasicHttpBinding).  For our scenario my entire config file is:

<configuration>
<system.serviceModel>
<serviceHostingEnvironment>
<serviceActivations>
<add relativeAddress="MyBusinessLogic.svc" service="WCFSample.Library.MyBusinessLogic"/>
</serviceActivations>
</serviceHostingEnvironment>
<behaviors>
<serviceBehaviors>
<behavior>
<serviceMetadata httpGetEnabled="true"/>
</behavior>
</serviceBehaviors>
</behaviors>
</system.serviceModel>
</configuration>

That’s it.  In the first snippet, we are adding a relative address of “MyBusinessLogic.svc” which will point to our MyBusinessLogic class we defined above.  The second section along the bottom is just to enable the service metadata. 

The only minor manual snafu you’ll need to handle is that Visual Studio won’t let you add a web.config to a Class Library project.  This can be easily remedied in a variety of ways.  I just added an app.config to my project and then changed the name from “app.config” to “web.config”.  Then, you just need to tell it to Copy to the output directory.  This makes our project look like the following:

image

Step 3:  Deploy to IIS

Now that we have our minuscule configuration and trivial decorations to our class, we need to host our service somewhere.  For this, we’ll just use IIS with an ASP.NET 4.0 application pool.  The structure of this folder will need to have the assemblies in a “bin” folder and the web.config in the root.

image

The bin folder is required so that the IIS engine can find your assembly.  Once you configure this in IIS, you can simply navigate to your service (make sure you include the SVC in your url) in a browser and you’ll see the glorious WCF Service start page. For example, on my machine, when I navigate to: http://localhost/SampleWCFService/MyBusinessLogic.svc, I’ll see:

image

Debugging your service

So you have created your service, added configuration to it and deployed it to your hosting environment and everything is running smoothly.  What if you want to debug the service in Visual Studio?  Well, you really have 3 different options:

  1. Attach to the instance of w3wp hosting your WCF service.
  2. Cassini…errr, the ASP.NET Development Server.
  3. IIS Express

I personally prefer either options 2 or 3 as it provides that seamless debugging experience we all crave where we just hit the “Green Arrow” to start the debugging session.  With either option you select, we’ll need to make one or two minor configuration changes.

First, we’ll need to tell the solution to start our process.  For this, you can right-click on your solution file in the Solution explorer and select “Properties”.  In this window, you’ll need to select the Multiple Startup Projects radio button and set both of your projects (the Client and the Service) to the “Start” option.

image

Second, you’ll need to add a pre/post-build step for your service to structure the output folder correctly.  Since we will deleting the contents of the “Bin” folder wherever our assemblies are being output, the command we will use is:

if exist "$(TargetDir)bin" del /q "$(TargetDir)bin\*.*"

Then, for our post-build step, we will need to create the bin folder if it doesn’t exist and then copy over our DLL’s into that bin folder.  Those commands will be:

if not exist "$(TargetDir)bin" mkdir "$(TargetDir)bin"
copy "$(TargetDir)\*.dll" "$(TargetDir)\bin"
copy "$(TargetDir)\*.pdb" "$(TargetDir)\bin"
del "$(TargetDir)\*.dll"
del "$(TargetDir)\*.pdb"

That will take our output directory from zero to hero quite nicely:

image

Now that the prerequisites are met, we just need to point our local hosting tool (either the Development Server or IIS Express) at that folder.  We’ll do this in the Debug tab of the project properties.   Set the “Start External Program” radio button and set the path to:

C:\Program Files (x86)\Common Files\microsoft shared\DevServer\10.0\WebDev.WebServer40.EXE

Then, in the Command Line Arguments, specify the following:

/port:12345
/path:"<path to folder containing bin & web.config>"
/vpath:/myservice

Then, you can set a Service Reference on whatever vpath you chose above.  For example, my service reference points to:  http://localhost:12345/myservice/MyBusinessLogic.svc

Once the service reference is set, you can just hit the fancy green arrow to start debugging:

image

Enjoy!
# Sunday, 29 August 2010
Sunday, 29 August 2010 17:10:07 (Central Daylight Time, UTC-05:00) ( ASP.NET | Best Practice | Development )

image Sometimes a customer will ask me to look at their site and make some recommendations on what can be improved.  One of the many things I’ll look at is their event logs.  One of the nice things about ASP.NET is that when you encounter an unhandled exception, an event will be placed into your Application event log.  The message of the event log entry will usually include lots of good stuff like the application, path, machine name, exception type, stack trace, etc.  Loads of great stuff and all for free.  For customers that don’t have a centralized exception logging strategy, this can be a gold mine.

The way it usually works is that they will provide me an EVTX from their servers.  If you’re not aware, an EVTX is just an archive of the events from the event log you specify.  By itself, looking at the raw event logs from a server can be quite daunting.  There are usually thousands of entries in the event log and filtering down to what you actually care about can be exhausting.  Even if you do find a series of ASP.NET event log messages, the problem has always been – how do you take all of this great information that’s just dumped into the Message property of the event log entry and put it into a format you can easily report on, generate statistics, etc.  Fortunately, I have a non-painful solution. 

I’ve broken this down into a relatively simple 4-step process:

  • Get the EVTX
  • Generate a useful XML file
  • Parse into an object model
  • Analyze and report on the data

Let’s get to it.

Step 1:  Get the EVTX

This step is pretty short and sweet.  In the Event Log manager, select the “Application” group and then select the “Save All Events As…” option. 

image

That will produce an EVTX file with whatever name you specify.  Once you have the file, transfer it to your machine as you generally do not want to install too many tools in your production environment.

Step 2:  Generate a useful XML file

Now that we have the raw EVTX file, we can get just the data we care about using a great tool called LogParserJeff Atwood did a nice little write-up on the tool but simply put it’s the Swiss Army knife of parsing tools.  It can do just about anything data related you would wish using a nice pseudo-SQL language.  We’ll use the tool to pull out just the data from the event log we want and dump it into an XML file.  The query that we can use for this task is daunting in its simplicity:

SELECT Message INTO MyData.xml
FROM ‘*.evtx’
WHERE EventID=1309

The only other thing we need to tell LogParser is the format in which it the data is coming in and the format to put it into.  This makes our single command the following:

C:\>logparser -i:EVT -o:XML
"SELECT Message INTO MyData.xml FROM ‘*.evtx’ WHERE EventID=1309"

This will produce a nice XML file that looks something like the following:

<?xml version="1.0" encoding="ISO-10646-UCS-2" standalone="yes" ?>
<ROOT DATE_CREATED="2010-08-29 06:04:20" CREATED_BY="Microsoft Log Parser V2.2">
<ROW>
<Message>Event code: 3005 Event message: An unhandled exception has occurred...
</Message>
</ROW>
...
</ROOT>

One thing that you may notice is that all of the nicely formatted data from our original event log message is munged together into one unending string.  This will actually work in our favor but more on that in the next step.

Step 3:  Parse into an object model

So, now that we have an XML file with all of our event details, let’s do some parsing.  Since all of our data is in one string, the simplest method is to apply a RegEx expression with grouping to grab the data we care about. 

In a future post, I’ll talk about a much faster way of getting this type of data without a RegEx expression.  After all, refactoring is a way of life for developers.

private const string LargeRegexString = @"Event code:(?<Eventcode>.+)" +
@"Event message:(?<Eventmessage>.+)" +
@"Event time:(?<Eventtime>.+)" +
@"Event time \(UTC\):(?<EventtimeUTC>.+)" +
@"Event ID:(?<EventID>.+)" +
@"Event sequence:(?<Eventsequence>.+)" +
@"Event occurrence:(?<Eventoccurrence>.+)" +
@"Event detail code:(?<Eventdetailcode>.+)" +
@"Application information:(?<Applicationinformation>.+)" +
@"Application domain:(?<Applicationdomain>.+)" +
@"Trust level:(?<Trustlevel>.+)" +
@"Full Application Virtual Path:(?<FullApplicationVirtualPath>.+)" +
@"Application Path:(?<ApplicationPath>.+)" +
@"Machine name:(?<Machinename>.+)" +
@"Process information:(?<Processinformation>.+)" +
@"Process ID:(?<ProcessID>.+)" +
@"Process name:(?<Processname>.+)" +
@"Account name:(?<Accountname>.+)" +
@"Exception information:(?<Exceptioninformation>.+)" +
@"Exception type:(?<Exceptiontype>.+)" +
@"Exception message:(?<Exceptionmessage>.+)" +
@"Request information:(?<Requestinformation>.+)" +
@"Request URL:(?<RequestURL>.+)" +
@"Request path:(?<Requestpath>.+)" +
@"User host address:(?<Userhostaddress>.+)" +
@"User:(?<User>.+)" +
@"Is authenticated:(?<Isauthenticated>.+)" +
@"Authentication Type:(?<AuthenticationType>.+)" +
@"Thread account name:(?<Threadaccountname>.+)" +
@"Thread information:(?<Threadinformation>.+)" +
@"Thread ID:(?<ThreadID>.+)" +
@"Thread account name:(?<Threadaccountname>.+)" +
@"Is impersonating:(?<Isimpersonating>.+)" +
@"Stack trace:(?<Stacktrace>.+)" +
@"Custom event details:(?<Customeventdetails>.+)";

Now that we have our RegEx, we’ll just write the code to match it against a string and populate our class.   While I’ve included the entire regex above, I’ve only included a partial implementation of the class population below.

public class EventLogMessage
{

private static Regex s_regex = new Regex(LargeRegexString, RegexOptions.Compiled);

public static EventLogMessage Load(string rawMessageText)
{

Match myMatch = s_regex.Match(rawMessageText);
EventLogMessage message = new EventLogMessage();
message.Eventcode = myMatch.Groups["Eventcode"].Value;
message.Eventmessage = myMatch.Groups["Eventmessage"].Value;
message.Eventtime = myMatch.Groups["Eventtime"].Value;
message.EventtimeUTC = myMatch.Groups["EventtimeUTC"].Value;
message.EventID = myMatch.Groups["EventID"].Value;
message.Eventsequence = myMatch.Groups["Eventsequence"].Value;
message.Eventoccurrence = myMatch.Groups["Eventoccurrence"].Value;
...
return message;
}

public string Eventcode { get; set; }
public string Eventmessage { get; set; }
public string Eventtime { get; set; }
public string EventtimeUTC { get; set; }
public string EventID { get; set; }
public string Eventsequence { get; set; }
public string Eventoccurrence { get; set; }
...
}

The last step is just to read in the XML file and instantiate these objects.

XDocument document = XDocument.Load(@"<path to data>\MyData.xml");

var messages = from message in document.Descendants("Message")
select EventLogMessage.Load(message.Value);

Now that we have our objects and everything is parsed just right, we can finally get some statistics and make sense of the data.

Step 4:  Analyze and report on the data

This last step is really the whole point of this exercise.  Fortunately, now that all of the data is an easily query’able format using our old friend LINQ, the actual aggregates and statistics are trivial.  Really, though, everyone’s needs are going to be different but I’ll provide a few queries that might be useful.

Query 1:  Exception Type Summary

For example, let’s say you wanted to output a breakdown of the various Exception Types in your log file.  The query you would use for that would be something like:

var results = from log in messages
group log by log.Exceptiontype into l
orderby l.Count() descending, l.Key
select new
{
ExceptionType = l.Key,
ExceptionCount = l.Count()
};

foreach (var result in results)
{

Console.WriteLine("{0} : {1} time(s)",
result.ExceptionType,
result.ExceptionCount);

}

This would then output something like:

WebException : 15 time(s)
InvalidOperationException : 7 time(s)
NotImplementedException : 2 time(s)
InvalidCastException : 1 time(s)
MissingMethodException : 1 time(s)

Query 2:  Exception Type and Request URL Summary

Let’s say that you wanted to go deeper and get the breakdown of which URL’s generated the most exceptions.  You can just expand that second foreach loop in the above snippet to do the following:

foreach (var result in results)
{

Console.WriteLine("{0} : {1} time(s)",
result.ExceptionType,
result.ExceptionCount);

var requestUrls = from urls in messages
where urls.Exceptiontype == result.ExceptionType
group urls by urls.RequestURL.ToLower() into url
orderby url.Count() descending, url.Key
select new
{
RequestUrl = url.Key,
Count = url.Count()
};

foreach (var url in requestUrls){

Console.WriteLine("\t{0} : {1} times ",
url.RequestUrl,
url.Count);
}
}

This then would produce output like this:

WebException  : 15 time(s)
http://localhost/menusample/default.aspx : 11 times
http://localhost:63188/menusample/default.aspx : 4 times
InvalidOperationException : 7 time(s)
http://localhost:63188/menusample/default.aspx : 6 times
http://localhost/menusample/default.aspx : 1 times
NotImplementedException : 2 time(s)
http://localhost/samplewebsiteerror/default.aspx : 2 times
InvalidCastException : 1 time(s)
http://localhost:63188/menusample/default.aspx : 1 times
MissingMethodException : 1 time(s)
http://localhost:63188/menusample/default.aspx : 1 times

Query 3:  Exception Type, Request URL and Method Name Summary

You can even go deeper, if you so desire, to find out which of your methods threw the most exceptions.  For this to work, we need to make a slight change to our EventLogMessage class to parse the Stack Trace data into a class.  First, we’ll start with our simple little StackTraceFrame class:

public class StackTraceFrame
{
public string Method { get; set; }

}

Second, add a new property to our EventLogMessage class to hold a List<StackTraceFrame>:

public List<StackTraceFrame> StackTraceFrames { get; set; }

Lastly, add a method (and its caller) to parse out the stack frames and assign the resulting List to the StackTraceFrames property mentioned above:

public EventLogMessage(string rawMessageText)
{
Match myMatch = s_regex.Match(rawMessageText);
...
Stacktrace = myMatch.Groups["Stacktrace"].Value;
...
StackTraceFrames = ParseStackTrace(Stacktrace);
}

private List<StackTraceFrame> ParseStackTrace(string stackTrace)
{
List<StackTraceFrame> frames = new List<StackTraceFrame>();
string[] stackTraceSplit = stackTrace.Split(new string[] { " at " },
StringSplitOptions.RemoveEmptyEntries);
foreach (string st in stackTraceSplit)
{
if (!string.IsNullOrEmpty(st))
{
frames.Add(new StackTraceFrame() { Method = st });
}
}
return frames;
}

Please Note:  You could enhance the ParseStackTrace(…) method to parse out the source files, line numbers, etc. I’ll leave this as an exercise for you, dear reader.

Now that we have the infrastructure in place, the query is just as simple.  We’ll just nest this additional query inside of our URL query like so:

foreach (var url in requestUrls){

Console.WriteLine("\t{0} : {1} times ",
url.RequestUrl,
url.Count);

var methods = from method in messages
where string.Equals(method.RequestURL,
url.RequestUrl,
StringComparison.InvariantCultureIgnoreCase)
&&
string.Equals(method.Exceptiontype,
result.ExceptionType,
StringComparison.InvariantCultureIgnoreCase)
group method by method.StackTraceFrames[0].Method into mt
orderby mt.Count() descending, mt.Key
select new
{
MethodName = mt.Key,
Count = mt.Count()
};

foreach (var method in methods)
{
Console.WriteLine("\t\t{0} : {1} times ",
method.MethodName,
method.Count
);
}
}

This would then produce output like the following:

WebException  : 15 time(s)
http://localhost/menusample/default.aspx : 11 times
System.Net.HttpWebRequest.GetResponse() : 11 times
http://localhost:63188/menusample/default.aspx : 4 times
System.Net.HttpWebRequest.GetResponse() : 4 times
InvalidOperationException : 7 time(s)
http://localhost:63188/menusample/default.aspx : 6 times
System.Web.UI.WebControls.Menu... : 6 times
http://localhost/menusample/default.aspx : 1 times
System.Web.UI.WebControls.Menu... : 1 times

One last thing you may notice is that the in the example above, the first frame for each of those exceptions are somewhere in the bowels of the .NET BCL.  You may want to filter this out even further to only return YOUR method.  This can be accomplished very easily with the method below.  It will simply loop through the StackTraceFrame List and return the first method it encounters that does not start with “System.” or “Microsoft.”.

private static string GetMyMethod(List<StackTraceFrame> frames)
{

foreach (StackTraceFrame frame in frames)
{

if (!frame.Method.StartsWith("System.") &&
!frame.Method.StartsWith("Microsoft."))
return frame.Method;


}

return "No User Code detected.";
}

Then, you can just call that method from the new query we wrote above:

var methods = from method in messages
where ...
group method by
GetMyMethod(method.StackTraceFrames) into mt
...

Finally, with this new snippet in place, we’ll get output like this:

WebException  : 15 time(s)
http://localhost/menusample/default.aspx : 11 times
_Default.Page_Load(Object sender, EventArgs e)...: 8 times
No User Code detected. : 3 times

http://localhost:63188/menusample/default.aspx : 4 times
_Default.Page_Load(Object sender, EventArgs e)... : 1 times
No User Code detected. : 1 times
WebControls.CustomXmlHierarchicalDataSourceView.Select()... : 2 times

As you can see, the sky’s the limit.

Enjoy!

# Sunday, 22 August 2010
Sunday, 22 August 2010 11:14:37 (Central Daylight Time, UTC-05:00) ( PowerShell | Scripting )

powershell_icon[1]_2 PowerShell is one of those things that falls into my “other duties as assigned” repertoire.  It’s something that I’ve used for years to get things done but it’s not often I encounter a Dev at a customer that has worked with it much.  In my honest opinion, I think adoption would increase if the PowerShell studio had intellisense or if it was just another project type in Visual Studio.  That’s purely my opinion and is not a reflection on my employer in any way. 

In any case, when I get pinged to help someone on a PowerShell task, I’ll usually jump on it.  A customer at a large company came to me with a problem.  He wanted to restore about 500 databases from their production environment to a development/staging environment.  He found this script that should theoretically get the job done but he was running into a few problems.  The problems he was encountering were mostly related to different drive mappings and different logical database names.  Of course, we had to use a trick to get to that point.

The Problem

He originally was receiving the following error:

“Exception calling "SqlRestore" with "1" argument(s): "Restore failed for Server '<Server2>'. "

At :line:97 char:20

+       $restore.SqlRestore <<<< ($instance)

The line the error above references is the line where the Restore action is called to be executed and the character is the 'e' on $restore.SQLRestore'”

While it’s not a very informative message, we can get more information by running the command:

$error[0]|format-list –force

This provides a lot more information on what the cause of the error is.   For example, when I force an error on the SqlRestore method, I might get the above error message by default but once I execute the above command, I’ll see:

“Exception             : System.Management.Automation.MethodInvocationException: Exception calling "SqlRestore" with "1" argument(s): "Restore failed for Server 'GREGVAR1\SQLEXPRESS2'. " ---> Microsoft.SqlServer.Management.Smo.FailedOperationException: Restore failed for Server 'GREGVAR1\SQLEXPRESS2'.  ---> Microsoft. SqlServer.Management.Common.ConnectionFailureException: Failed to connect to server GREGVAR1\SQLEXPRESS2. ---> System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)…”

So, after he ran the magic command, the errors he was actually experiencing were the following:

  • The script expects the database’s logical name to be the file name of the database.
  • Some of the databases had a full text catalog associated with it.  The script was not handling the relocation of the full text catalog either.

Many thanks to Michiel Wories for initially introducing me to that really cool command a year or so ago.

The Research

So, once we knew the problem, we needed a method to get additional information from the backup file.  Enter the Restore.ReadFileList(…) method.  You can use this method like the following:

 $server = New-Object("Microsoft.SqlServer.Management.Smo.Server") 
$instance
$backupDevice = New-Object("Microsoft.SqlServer.Management.Smo.BackupDeviceItem")
($restorefile, "File") $smoRestore = new-object("Microsoft.SqlServer.Management.Smo.Restore") #restore settings $smoRestore.NoRecovery = $false; $smoRestore.ReplaceDatabase = $true; $smoRestore.Action = "Database" $smoRestore.PercentCompleteNotification = 10; $smoRestore.Devices.Add($backupDevice) # Get Database Logical File Names $sourceLogicalNameDT = $smoRestore.ReadFileList($server)

The ReadFileList gives us a DataTable which contains tons of additional information about the contents of the file we are attempting to restore.  Once we  have that object, we clearly saw that it contains all of the information we could possibly need:

LogicalName          : foodb
PhysicalName         : T:\…\DB\foo.mdf
Type                 : D
FileGroupName        : PRIMARY
Size                 : 104857600
MaxSize              : 35184372080640
FileId               : 1
CreateLSN            : 0
DropLSN              : 0
UniqueId             : 4992c6f2-2282-4391-851e-d1177ab03920
ReadOnlyLSN          : 0
ReadWriteLSN         : 0
BackupSizeInBytes    : 30081024
SourceBlockSize      : 512
FileGroupId          : 1
LogGroupGUID         :
DifferentialBaseLSN  : 1297000000068200037
DifferentialBaseGUID : 69e8b951-7db1-4a1b-b0fc-e2cb012b3bcf
IsReadOnly           : False
IsPresent            : True

LogicalName          : foodb_log
PhysicalName         : R:\…\Logs\foo_log.LDF
Type                 : L

LogicalName          : sysft_OtherTables
PhysicalName         : T:\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\OtherTables004e
Type                 : F

Now that we have all of this great information, we need to iterate through the rows and put the values we care about into some local variables.  For that, we can use the foreach construct:

 $FileType = ""  
foreach($Row in $sourceLogicalNameDT) { # Put the file type into a local variable. # This will be the variable that we use to find out which file # we are working with. $FileType = $Row["Type"].ToUpper() # If Type = "D", then we are handling the Database File name. If ($FileType.Equals("D")) { $sourceDBLogicalName = $Row["LogicalName"] } # If Type = "L", then we are handling the Log File name. elseif ($FileType.Equals("L")) { $sourceLogLogicalName = $Row["LogicalName"] } # If Type = "F", then we are handling hte Full Text Catalog File Name. elseif ($FileType.Equals("F")) { $sourceFTSLogicalName = $Row["LogicalName"] # We may also want to grab the full path of the Full Text catalog. $sourceFTSPhysicalName = $Row["PhysicalName"] } }

The Solution

Now, that we have all of the data we need, we can populate the Relocate File objects so that the SqlRestore object will know how to handle these additional files:

 #specify new data and log files (mdf and ldf)            
 $smoRestoreDBFile = New-Object("Microsoft.SqlServer.Management.Smo.RelocateFile")            
 $smoRestoreLogFile = New-Object("Microsoft.SqlServer.Management.Smo.RelocateFile")            
              
 #the logical file names should be the logical filename stored in the backup media                     
 $smoRestoreDBFile.LogicalFileName = $sourceDBLogicalName            
 $smoRestoreDBFile.PhysicalFileName = $mdfFilePath + "\" + $sourceDBLogicalName + ".mdf"            
 $smoRestoreLogFile.LogicalFileName = $sourceLogLogicalName            
 $smoRestoreLogFile.PhysicalFileName = $ldfFilePath + "\" + $sourceLogLogicalName + ".ldf"            
             
 $smoRestore.RelocateFiles.Add($smoRestoreDBFile)            
 $smoRestore.RelocateFiles.Add($smoRestoreLogFile)            
            
 # Check to see if the $SourceFTSLogicalName is empty or not. If its not empty            
 # then we do have a full text catalog present and thus we add the appropriate            
 # entries to restore those files. If the variable is empty we just continue with            
 # the restore.            
 if (![String]::IsNullOrEmpty($sourceFTSLogicalName)) {            
                
     "We DO have a Full Text Catalog in our Backup"            
                    
     # Adding full text catalog restore parameters.            
     $smoRestoreFTSFile = New-Object("Microsoft.SqlServer.Management.Smo.RelocateFile")            
     $smoRestoreFTSFile.LogicalFileName = $sourceFTSLogicalName            
                    
     # Here we specify the new location by truncating the first 45 characters in the path            
     # that is specified on the backup file. This could be done a better way.            
     $smoRestoreFTSFile.PhysicalFileName = $ftsFilePath + "\" + 
$sourceFTSPhysicalName.Substring(45) $smoRestore.RelocateFiles.Add($smoRestoreFTSFile) }

Lastly, we just need to execute the Restore command and clear out the variables for the next iteration of the loop.

 # Restore Database            
 $smoRestore.SqlRestore($server)            
                            
 # We now clear the variables before the next loop starts                        
 Remove-Variable sourceDBLogicalName            
 Remove-Variable sourceLogLogicalName            
 Remove-Variable smoRestoreDBFile            
 Remove-Variable smoRestoreLogFile            
            
 # If a full text catalog was present, we clear those variables too                           
 if (![String]::IsNullOrEmpty($sourceFTSLogicalName)) {                       
     Remove-Variable sourceFTSLogicalName            
     Remove-Variable smoRestoreFTSFile            
 }

And that’s all there is to it.  Keep in mind that these are the guts of the script and not the full contents.  If there’s interest, I can post the whole script.  Just drop a comment.

Enjoy!

# Friday, 20 August 2010
Friday, 20 August 2010 23:36:27 (Central Daylight Time, UTC-05:00) ( .NET | Best Practice | C# | Code Reviews | Development | Performance )

cheetah This is a common topic and I thought I’d write up some thoughts I have on it.  In-fact, I was just working with a customer on improving their code reviews and what they should be checking for and the question arose - “Should performance be targeted during a code review?”  It’s an interesting question.  I’m a big fan of performance testing early and often and not waiting until the end of a dev cycle but code reviews, IMO, should focus on logic, maintainability and best practices.  I may be in the minority and if you look around the web, you’ll see varying opinions on the topic.  For example, one of the PAG articles states:

“Code reviews should be a regular part of your development process. Performance and scalability code reviews focus on identifying coding techniques and design choices that could lead to performance and scalability issues. The review goal is to identify potential performance and scalability issues before the code is deployed. The cost and effort of fixing performance and scalability flaws at development time is far less than fixing them later in the product deployment cycle.

Avoid performance code reviews too early in the coding phase because this can restrict your design options. Also, bear in mind that that performance decisions often involve tradeoffs. For example, it is easy to reduce maintainability and flexibility while striving to optimize code.”

As I mentioned above, I am a huge proponent of performance analysis and optimization many times throughout a typical product development cycle.  I can say with a fair amount of certainty that if you don’t build performance reviews into your project plan at regular intervals, you will hit some problem (or multiple problems) in production and have to refactor some code. 

Circling back to the original question, though, are code reviews the place for performance analysis?  Typically, I’d recommend using them to squash little bits of bad code but maintainability and code-cleanliness should be first and foremost in your minds.  That said, if you see a pattern that you know can be improved, by all means bring it up.  What’s an example of that type of situation? 

Let’s take a look at predicates, specifically their usage in the Find method of a List<T>.  If you’re not aware, the Find() method performs a linear search through all of the items until it finds the first match – then it returns.  This makes it a O(n) operation where “n” is the number of items in the list.  Basically, this means that the more items you have in the list, the longer a Find() operation can potentially take.  So, if we slam about 10,000 elements into a list:

private static List<Data> LoadList()
{
List<Data> myList = new List<Data>();
for (int i = 0; i < 10000; i++)
{
myList.Add(new Data() { Id = "Id" + i.ToString(),
Value = "Value" + i.ToString() });
}

return myList;
}

Then, if someone wants to return the instance of the Data class that contains an Id of say “Id10000”, they might write the following code:

static Data Find1(List<Data> myList, string idToFind)
{
Data data = myList.Find(s =>
s.Id.ToLower() ==
idToFind.ToLower());

return data;
}

Now, keep in mind that the predicate is executed for each element in the List<T> until it finds the instance you care about.  With that in mind, we would probably want to refactor out the “idToFind.ToLower()” above the predicate since that value isn’t changing.  So, you might end-up with something like this:

static Data Find2(List<Data> myList, string idToFind)
{

idToFind = idToFind.ToLower();

Data data = myList.Find(s =>
s.Id.ToLower() ==
idToFind);

return data;
}

Another route you may want to go is just to use the string.Equals(…) method to perform the comparison.  That would look like:

static Data Find3(List<Data> myList, string idToFind)
{

Data data = myList.Find(s =>
string.Equals(
s.Id,
idToFind,
StringComparison.
InvariantCultureIgnoreCase)
);

return data;

}

Fact is, the last method IS the fastest way to perform the operation.  I can say that without even needing to run it through a profiler.  But if you don’t believe me…  

Function Name
Elapsed
Inclusive Time
...Find1(System.Collections.Generic.List`1<....Data>,string)
6.34
...Find2(System.Collections.Generic.List`1<....Data>,string)
4.47
...Find3(System.Collections.Generic.List`1<....Data>,string)
3.65

That’s something I might put into the category of a micro-optimization AND just good coding practice.  But is this something that should be caught during a code review?  I’d say “yes” because logically it all makes sense and none of the solutions would really hurt maintainability or readability of the code.
 
So, I’d tag this as a good coding practice.  Other thoughts on the topic?
 
Enjoy!