In my prior
blog post, I wrote about different network problems that negatively impact
application performance. In this post, I’ll follow up with non-network problems
that impact application performance, but for which the network provides a
unique vantage point from where such problems can be identified and solved. In
the next post, I’ll tie everything together by describing how to determine if
the network is at fault and how to get the other organizations to understand
more about application performance.
Slow Client
Many modern web-based applications often push a bunch of the
user interaction work to the client workstation. Sometimes it is done in a way
that pushes a lot of data to the workstation where some JavaScript code processes
the data. I’ve seen applications that had long, multi-second pauses because the
JavaScript process had to handle hundreds or thousands of rows of data before
the client display could be updated.
A good Application Performance Management (APM) system
identifies clients that have these types of delays. It requires looking at the
client-to-server transactions and identifying when the client is paused due to
internal processing. The analysis needs to differentiate between the client
workstation application pauses and the “think time” of the human who is
interacting with the application.
Slow Server
The server teams don’t like to hear it, but the most common
causes of slow application performance are the applications or the servers
themselves. I’ve found that it frequently is not the network that is the cause,
even though the network often gets the blame.
Modern applications are typically deployed on a multi-tiered
infrastructure. There often is a front-end web server that talks with an
application server. The application server in turn talks with a middleware
server that queries one or more database servers for the data it needs. These
servers may all talk with DNS servers to look up IP addresses or to map IP
addresses back to server names. All it takes is for any one of these servers to
have performance problems and the whole application runs slow. Of course, the
problem is then one of identifying the slow server out of the set of servers
that implement an application.
Understanding the interactions between multiple components
in an application is an essential part of understanding the root cause of
performance problems. This process, called Application Dependency Mapping, is typically
part of an integrated APM approach, and ideally leverages information from already
in-place monitoring solutions to draw a dependency map between system
components. The network provides a unique vantage point to derive these
relationships, and as such the network team can provide strong value to the
application and server teams.
Although we can collect a lot of very rich information from
the network, using packet capture tools to answer the question of “Is it the
network or the application?” could take many, many hours of work. All the
while, the application is running slow, affecting the productivity of anyone
using that application.
I’ve used Application Response Xpert to significantly reduce
the time to identify why a slow application was slow. Once you have set up the
proper monitoring points and some basic configurations, it is very easy to use and provides immediate value for “the network
is slow” fire drills. The information gathered by AppResponse Xpert also
provides input to AppMapper Xpert, to automatically draw dependency maps of critical
applications.
Identifying Database Scaling Problems
A common cause of application slowness is that the
application was developed with a small data set on a fast LAN development
environment. Then the application is rolled out to production. It may initially
run with acceptable performance. But over time, as the database grows, it becomes
slower and slower. A quick analysis with AppResponse Xpert shows that one of
the key middleware servers is making a lot of requests to a database server.
One client request can result in many database requests or perhaps result in
the transfer of a significant volume of data. Changing the database query to be
more efficient typically solves the problem.
I’ve also found the case where a database server takes many
seconds to return data to the middleware or application server. The application
team can use AppResponse Xpert’s Database Monitoring module to identify the
offending query. Sometimes a good development team can look at the user
transaction and quickly determine what queries are likely to be the culprit
while other times, the application is making so many database queries that a
SQL query analysis tool is really what is needed. In the cases I’ve seen, the queries
were poorly structured, sometimes joining large tables that resulted in
extremely long query times on production data sets. Simply rewriting the
queries dropped the query times by several orders of magnitude. This is where
these tools pay off. The advantage using deep packet inspection on the network to
identify problems with SQL queries is that there is no overhead added to the
database. This is another example of how
the network team can provide value to other IT teams.
Chatty Conversation
Another typical example of problems within the application
is the chatty conversation. One application server, or perhaps the client
itself, will make many, many small requests to execute one transaction on
behalf of the person running the application. It runs fine as long as the
network latency between the client and server is low. However, with the advent
of virtualization, the server team may have configured automatic migration of
the server image to a lightly loaded host. This might move a server image to a
location that puts it several milliseconds further away from other servers or
from its disk storage system. A few milliseconds may not be much unless the
application does hundreds or perhaps thousands of small requests to complete
one transaction. Suddenly, the
application goes from an acceptable level of performance to unacceptable
performance. Of course, database size also affects the performance because the
number of small requests goes up with the database size.
You need visibility into the number of requests between
systems, where the systems are connected to the network, and the delays between
requests. Getting a baseline of system performance against which you can
measure future performance is extremely useful for identifying whether a given
application is performing as expected and possibly identifying which server
needs to be examined.
This kind of examination can be automated by AppTransaction
Xpert, which can capture baseline transactions from the packet store of
AppResponse Xpert and predict the change in their response times given different
network parameters such as latency, bandwidth, and loss rate.
Slow Network Services
Finally, the problem may be due to slow network services.
This isn’t the network itself, but services that most network-based
applications depend upon for proper operation. Consider an application that
makes queries to a DNS server, but the primary DNS doesn’t exist, so the app
must time out the first request before attempting to query the second DNS
server. I’ve seen applications that would have a 30-60 second delay upon the
first execution, but would then run fine for a while. Periodically, the
application would be very slow, but run fine the rest of the time. Intermittent
problems are very challenging to diagnose, so this is where having something
like AppResponse Xpert watching and recording all the transactions is extremely
helpful. Just identify the time of the slow performance and look for something
in the data. In this case, it would be an unanswered DNS request, which was
successful when tried against the secondary DNS server.
Summary
Accurately diagnosing application performance can be impossible
or very time consuming with the wrong tools. With the right tools and a good
installation, where the tools capture the necessary data, the analysis and
diagnosis can proceed very quickly. In addition, these tools not only help to defend
or troubleshoot the network, but also provide value to other IT teams in the
organization. I know of one site that went from not being able to help diagnose
slow applications, to being able to provide deep visibility into what an
application is doing from the network perspective, and providing real value to
the application teams to solve the problem.
No comments:
Post a Comment