By Peco
There are three web system operations phenomena that highlight why APM is critical to the business and not just a weapon for fighting off zombies (runaway processes, stalled threads, or infinitely recursive callstacks).
1) Silence is eerie. Web visitors generally don't complain when there is a problem. From my experience in web operations, fewer than 1% of people open trouble tickets when there is an issue. You can't wait on issues to be reported to take action, you would have lost a lot of business by then. APM based on
real user monitoring (such as OPNET AppResponse Xpert) or application instrumentation (such as OPNET AppInternals Xpert) are extremely valuable to help uncover application health issues even when all seems quiet. With AppInternals Xpert you can identify the entry classes of your application and start collecting usage, performance and health metrics. It will start computing a baseline for them right away, but as soon as there is abnormal behavior you can get notified or have the system take a healing action. If you have configured transaction tracing for a highly personalized application with key known customers, you could even trace failed transactions to a user and proactively resolve issues.
2) Features creep up on web apps, and so do response times. Unfortunately, users often abandon their application sessions when performance becomes annoying. I did a study on one ecommerce application and found a strong correlation between page abandonment and performance. For every 2s over the normal 2s response time we were losing about 1% of views. So at 10s degradation the loss was ~4% and so on. If this is a revenue-generating application it will have direct top and bottom line impact. APM to the rescue again! With AppInternals Xpert and AppResponse Xpert you can set thresholds on the 2s mark and passively observe all the user sessions. With AppResponse you can take it a step further monitor and alert on the page weight (in terms of bytes and object count) to get an early read on feature creep right after an application release. If your designers are known to be naughty, also keep an eye on graphic image sizes to ensure they don't bloat over time (most of the content on a page is images).
3) There are spiders everywhere. There are a myriad of bots that crawl the web for different purposes -- some legitimate like google search may be for you, while others not so much -- and once they stumble across your app, they explore every corner of its existence. What that means is a) increased load on your application that was probably not planned for and b) occasionally hitting exceptions in the app that no one thought possible. Both a) and b) are hard to test for so it is likely that most apps are at risk. AppResponse Xpert to the rescue! You can monitor for known bots based on user agent, but also can check for bot-like behavior in other user sessions. Sometimes bots will masquerade as well known browser user agents, and we can use APM analytics to identify them. If you find aggressive bots that are not adding business value you can adjust your robots.txt policy or just contact them directly and ask them to stop visiting your site. If that fails there is always the trusted guardian, the corporate firewall to block them.
I suppose there is also a fourth phenomenon… Sometimes it is we web operations folks that cause the bumps in the night. Be it overly aggressive active monitoring, or last minute system changes applied during the Sunday morning maintenance cycle. Using lightweight instrumentation technologies like AppInternals Xpert or completely passive monitoring technologies like AppResponse Xpert help us reduce this problem. We know exactly when the network latency quadruppled on the core router or when the load balancer started dropping sessions. APM watches the watchmen! :)
Thursday, October 27, 2011
Subscribe to:
Post Comments (Atom)
2 comments:
Great post! Happy Halloween....
Excellent article!
Post a Comment