The 6 steps of performance issue RCA and resolution
Use the following steps iteratively to drive the root cause analysis and resolution. Root cause analysis can be difficult but once the root cause is identified, the solution is usually trivial.
Obtain a clear problem description to ensure that the right problem is being solved.
Ensure the problem can be reproduced consistently. This step is important because if the problem can't be consistently reproduced, you never know whether the problem is resolved. Consistency also includes ensuring the result has minimum run-to-run variation. When run-to-run variation is more than 10 - 15%, it will be hard to determine if the improvement or degradation is due to tuning or just run-to-run variation.
Collect data for RCA. Data collection has to be meticulous. Avoid having other unrelated jobs or activities running while the data collection is in progress.
Perform root cause analysis. In particular, focus on the following two tasks. Use the following flow chart to guide the root cause analysis.
Analyze system resource usage
Create workload profile
Apply tuning based on the root cause identified in step 4.
If the problem persists, go back to step 2 again. Repeat until the problem is resolved.