BKdotNET - Bill Knaus's Dev Blog

Better solutions through smarter code.

Performance Monitoring ASP.NET apps - the art of the insane

So Microsoft professes that all these fabulous counters in .NET that are available to the Performance Monitor tool provide a good level of insight... well bah!  bah! I say.

Consider a multi-tier environment with multiple approaches to remoting (.NET Remoting and Web Services).  Now - consider multiple front-ends, multiple - nigh - dozens of services - and one or two databases.

Try to get monitors on that beastie.  Takes a while just to set them all up... Then I try using the performance monitor ActiveX control to try and go through the data... but therein lies the problem.  Different counters, when recorded and analyzed later, will have zeros for some values, which causes any line graph to look crappy and be somewhat useless.  The thorns in my side are the .NET Framework monitors mostly.

Then I go through the effort to capture all this - and then the .NET SqlClient counters don't even pick up any activity.  What the deuce!?

So - trying to see if I can find someway to make this all more meaningful (honestly, I just want a nice pretty graph) I dump the binary data to a comma-delimited text file - pull it into excel - and this is where I discovered the zeros - and multiple records for the same second.

Feeling jipped - I try writing the data directly to SQL Server - wow - this was a mistake.  Logging a LOT of counters across 4 identical machines - and two other machines with 1/3 as many counters (9000 individual counters in all since I'm monitoring several web services in separate app pools across multiple servers -- and in some cases multiple threads) -- within 15 minutes I had created a database that was over 300MB in size - although I did get to see how effective SQL Server 2005 garbage collects (or ineffective in some cases).

The sheer amount of data is not queryable.  Really - it isn't.  So that was a major waste of my time today.  I went back to the binary files and pulling them into the activex control within MMC.

Finding a way to trend the data is really where the "art" comes in.  Being able to select which data you want to look at - then see a trend, drop another counter across all the servers - now its unreadable again.

PLEASE MICROSOFT!  THERE MUST BE A BETTER WAY!

Then there's the art of figuring out what's going on - if these servers are getting requests on the front end - and these web servers on the web service layer are getting this many requests - which servers are really hitting which?  Its pure speculation.

So now I'm looking at how to make this all easier to interpret.  How to easily pull all this data together.  Right now I want to ditch the MS counters and build in our own so we are getting the data we care about attached to instances which make sense.

The .NET Framework makes all of this possible - it's a matter of the strategy by which you go about it all.  At a minimum it means touching every web service method in the service layer... and on the front-end... I'm thinking key UI methods and page event handlers. 

The biggest key is going to be merging the logging strategy with the counter strategy in such a way as you can see the counters within the logs.  Example: we log whenever we add an object to the cache api - well - at the same time we should also be querying the Cache API Objects counter (or our own counter so we can actually count how many of what type of object) to and write out how many are there.

But it has to be a real strategy - something from the ground up - not something cooked up as we go along... otherwise it won't be as effective.

In the end - it's a "what do we want to see".  Me - I'd like to see a visual representation of the environment and watch where the traffic is going.  Isolate a particular request and watch it travel from the front to the back... which web services are being called a which points - and how long each one takes - otherwise it's still speculation.