Degraded Performance
Incident Report for Totango
Postmortem

Event Description

Users were not able to login and also logged-in users experienced system slow response time

Findings & Timeline

First Noticed: July 6th, 2022 at 13:50 UTC

Our monitoring systems alerted us to an issue with application performance and failures to login. Investigation started by engineering teams.

Problem Area Identified: July 6th, 2022 at 14:00 UTC

At this point, our team was able to identify the problem and began work on steps to mitigate and trace the root cause. There were a handful of users in the system appearing to overload our systems with numerous API calls.

July 6th, 2022 at 15:30 UTC

Continuing efforts are made to mitigate the issue, though the root cause has not yet been identified.

July 6th, 2022 at 17:00 UTC

The root cause analysis found the problematic code. A decision to revert a recent code deployment made

July 6th, 2022 at 17:30 UTC

The revert of the problematic code showed good signs and resolved the issue with the exception of impacting only 2 users.

July 6th, 2022 at 18:30 UTC

Steps were taken to contact the two users and guide them through logging off the system to rectify the erroneous API calls.

Root Cause

  • As part of fixing an issue, where contacts were being added to touchpoints without choosing them, we introduced a new code that in some cases caused an endless component render which fetched contacts from the DB. This code was only running for certain users where a specific setting for showing all hierarchy contacts was off.

Preventive Action

Fix the original bug that caused this issue. Make sure the call to fetch contacts from the DB is done only once - Done

Enforce rate limitations across all components - Not Started

User token isolation to better control rate limitations - Not Started

Posted Jul 07, 2022 - 22:45 UTC

Resolved
This incident has been resolved.
Posted Jul 07, 2022 - 22:44 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 06, 2022 - 19:08 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 06, 2022 - 18:05 UTC
Identified
The issue has been identified and the team is working on fixing the issue. Thank you for your patience.
Posted Jul 06, 2022 - 15:48 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 06, 2022 - 15:01 UTC
Investigating
We are currently investigating this issue.
Posted Jul 06, 2022 - 14:22 UTC
This incident affected: Totango Web Application.