Tuesday, November 16, 2010

Troubleshooting Citrix Performance

by Matt Hill

“Citrix is slow!” This complaint sends shivers down the spines of even the most seasoned IT support engineers. Intuitively, we know that this means it is going to be a long, painful night. The evening will be spent bickering between teams and the end result will be multiple competing theories to explain the root cause of the problem. This is usually the point where managers are left wondering, “Why is troubleshooting Citrix performance so difficult?”

One reason Citrix performance troubleshooting is challenging is because the feedback is coming from users who often do not understand the technology. Users are accustomed to desktop applications which are highly responsive. Applications deployed over Citrix may sometimes seem to enter a strange black hole where mouse clicks and keystrokes take a few moments to register. Why does one application act normally and another act strangely? It must be the network. When a user provides feedback about slowness, it is important to probe and discover the true cause of the frustration, not what the user thinks is the problem. Is their mouse stuttering when they try and click a button? Has a dialog box hung or become unresponsive? Does it take a long time for a search to complete? Each of these symptoms points to a different culprit.

Pinpointing the source of the slowness is usually where most of the time is spent. This is because we often go looking in the wrong places! Citrix can get the blame when the true source of the problem has nothing to do with Citrix. As troubleshooters, we must take a step back and look at the big picture first. In order to do that, it’s important to have the right tools available. Many times we will go straight for the microscope (i.e. a packet analyzer such as Wireshark) when what we really need to start with is binoculars. A number of products on the market, such as OPNET’s AppResponse Xpert, provide high level analysis of the performance across multiple tiers so it can be quickly determined whether we’re dealing with a network issue across the WAN or a backend issue in the application. One of the nice things about this class of products is that the analysis is already done for you in real time. You do not have to perform hundreds of mundane spreadsheet calculations just to verify that the RTT is normal. Additionally, high granularity packet data is stored so that there is something to look at whenever you do decide to pull out the microscope.

Once the problem is localized, the final step is getting to an actionable root cause determination. Typically this involves packet analysis, but could mean looking at windows counters, database monitors, or other siloed stats. At this point, you’ve already saved yourself a ton of time because you’re focusing your attention where the problem is likely to be found. If you saw a lot of retransmissions or latency in the previous step, take a look at the front end packets. Wireshark and spreadsheets are common here. I prefer using AppTransaction Xpert because of the unique Citrix ICA channel decoding capabilities – I have not found any other product which is able to do that. If you found high response time spikes on the backend transactions, examine that traffic in more detail. Most of the time “Citrix” performance problems are actually backend issues, such as inefficient SQL queries, which are caused by a poorly written thick client. Lastly, if there aren’t any high level problems jumping out at you on the front-end or the back-end, investigate the application running on the Citrix server itself. It may be the case that the process crashed or hung or was otherwise unresponsive. This type of problem can sometimes be tricky since there aren’t that many generically applicable tools which instrument thick clients well. Usually, you will need to rely on debug or log statistics published by the problem application. Process metrics collected via SNMP or other sources can be useful here as well.

The next time someone starts screaming about Citrix performance, take a deep breath. Relax. If you’re instrumented in the right places you’ll probably be home by dinner time and absent any cuts or bruises from an office brawl. With the advent of new technologies such as XenDesktop (Virtual Desktop Infrastructure), Citrix performance is something that we will be managing for years to come. Thankfully, with strong monitoring capabilities and a good analysis methodology, it doesn’t need to be painful.

Related Posts:
New Approach for Citrix Application Performance Monitoring and Troubleshooting
End User Transaction Tracing and Citrix Servers

2 comments:

Unknown said...

Thank you for a great explanation on Citrix performance in easy to understand terms. The step by step guidance of how/where to look for finding Citrix performance issues was most informative.

Question: In your experience, What percentage of problemms is really Citrix and what back-end issues?

Thank you.

Matt Hill said...

Thanks for the kind words, Margarita. That's a good question. The short answer is that most real performance problems, from what I've seen, are not caused by Citrix.

Citrix, in fact, is a blessing for network performance engineers because it can be used as a band-aid for poorly designed applications. If a thick client doesn't work well over latency, Citrix can make that problem quickly go away. The end user may report occasional lag when typing into a text field over Citrix, but may not realize how bad things would be if that app were run locally. The true culprit, in this case, is a thick client application that was not engineered well.

Most problems do seem to be back-end issues. Those same thick client apps that weren't designed to work across the WAN often contain other peculiar design features that can impact performance.