by Mike
Modern loan management systems are more costly and complicated than ever before. I suppose this makes sense given the consolidation in the financial services industry leading to more (and more complicated) assets managed by fewer people. It also makes sense that a high definition APM approach is especially valuable to banks that require these systems to perform.
An OPNET expert was recently called in to assist with a critical performance problem related to a web-based loan origination application. It had just been purchased, customized, and released at our client, a large North American bank. Within a few weeks, users noticed increasingly frequent periods of poor performance that progressed quickly to daily periods of complete unresponsiveness, sometimes necessitating multiple restarts throughout the day. Major business issue.
This application was multi-tiered and used several external integration points. Too few resources with sufficient cross-functional technical knowledge and application visibility were in place to effectively troubleshoot. The client initially engaged the application vendor to assist. Application logs and basic resource monitoring tools led only to loosely constructed hypotheses. No clear root cause, no fix.
Within hours of the start of our engagement, using AppResponse Xpert, we were able to validate observed periods of delay and isolate that delay to the application’s middleware tier (Apache Tomcat, java-based). We viewed the class-level method calls within the JVM using AppInternals Xpert and noticed very high heap consumption and garbage collection within the JVM during periods of slow end user experience. Sustained CPU spikes coincided with high JVM memory usage.
We set up a policy to capture transaction traces within the JVM when users were waiting more than 15 seconds for a response from the server. This allowed us to see which methods were called as the result of each single active page request, during the problem. One particular method call (“nextRecord”) was called up to 50,000 times for one particular JSP, immediately after a SQL query was executed. The application vendor confirmed our suspicion that this method was processing data returned from the DB.
We analyzed this same transaction in AppTransaction Xpert, which not only has the ability to group packets by tier-pair and TCP connection, but it can also group packets by individual request response pairs for synchronous TCP protocols such as SQLNet. The visualization showed us immediately that multiple queries were returning results sets of 5MB each, consistent with the processing seen from the java traces in AppInternals Xpert. Tomcat was very busy processing data returned from the DB during the period of slowness. We captured the full query text and forwarded it to the developers, who agreed that the query was improperly constructed.
The vendor was able to deliver a code change with the associated fixes in place the following day. Though load testing was conducted as part of the release process for this application, it did not include a realistically populated database, in the test environment. As a result, the testing missed the impact that a fully populated DB had on certain common user workflows. We encouraged the client to consider a high definition APM approach to testing before the next release.
No comments:
Post a Comment