How do you know when Jira goes down? It's pretty easy to tell from both an end user perspective and from a monitoring perspective. You can have any number of monitoring solutions doing a web check for https://jira.example.com/status, and when it doesn't return a 200 response code, it's down.
Awesome! We're all set, right?
Not quite. What about the situations when it's slower than usual? Or what about when a particular function is slower than usual? The built-in health checks won't help you detect that, even though it could have a ripple effect on the productivity of your user base. It's time to get some canaries!
Using Amazon Web Services (AWS) CloudWatch Synthetics, we can write a canary that emulates user browser behavior to validate the uptime and gather metrics about the canary execution. These scenarios could measure the time it takes to log in, create an issue, move the card on a Jira Software board, and then close the issue. We can run that every five minutes and chart the results over time. Then, we can set an alarm when the load time of that canary increases 50% over a period of 30 minutes. We can also combine that metric and alarm with your JVM and system metric monitoring and develop high-accuracy alerting of your on-call person. No longer will a short blip in the application's response or performance trigger alarms that don't require actual system administrator intervention.
Did I mention that canaries take screenshots too? Now when that complex alarm triggers your on-call person, and the responding engineer is beginning to troubleshoot the issue, they are greeted by system metrics in CloudWatch, screenshots of failed canaries, and maybe even snippets of logs from CloudWatch Logs. That engineer would be well-armed with information to validate that there is, in fact, an issue and could jump in before an end user could even get around to reporting the issue! You're now on the first step to proactive monitoring!