Application is down
Incident Report for Totango
Postmortem

Event Description 

Users attempting to login to Totango were unable to do so. A timeout would occur and eventually fail.

Findings & Timeline

Background

Starting at 19:45 UTC, Totango users were unable to login.

Reported: February 4th, 2022 at 19:45 UTC

At 19:45 UTC our monitoring services detected an outage with our application causing inability to login or refresh pages.

Identified : February 4th, 2022 at 20:30 UTC

At 20:30 UTC the issue was identified as the same root cause for an incident that occurred on January 20th. A fix was planned for deployment on February 5th, 2022.

Fix Deployed: February 4th, 2022 at 20:45 UTC

At 20:45 UTC a fix was deployed into production.

Resolved: February 4th, 2022 at 21:15 UTC

After the fix was deployed, it took about 30 minutes to become effective and allow the system to return to normal operation.

Root Cause

We identified the root cause as a background process that was calling an old DATA API with https requests instead of http. This caused our DATA API threads to be overwhelmed and unable to respond to application requests.

Corrective Action

We checked our code base to identify all places that use this URL and changed them to use the new DATA API ingress as they should.

Posted Feb 16, 2022 - 19:08 UTC

Resolved
The incident has been resolved.
Posted Feb 04, 2022 - 21:49 UTC
Monitoring
A fix was deployed to the system at 15:45 ET. We are monitoring but system has returned to normal operation.
Posted Feb 04, 2022 - 21:24 UTC
Update
We continue to investigate the issue. We apologize for this ongoing outage.
Posted Feb 04, 2022 - 20:48 UTC
Investigating
We are currently investigating problems logging into the Totango application and are working to
restore service.
Thank you for your patience while we’re looking into it.
Posted Feb 04, 2022 - 19:58 UTC
This incident affected: Totango Web Application.