Application is not avavilable
Incident Report for Totango
Postmortem

What Happened?
Starting June 8th, at 17:46 PST, our application APIs timed-out for a portion of our customer base. The issue lasted for several minutes and corrected itself. A few more episodes of 15-20 minutes happened in the next few days as well. Once the incident started, users were unable to login or the application’s performance was unstable. Note, however, data processing and collection was not impacted and no data was lost.

Our team investigated and identified the problem to be related to network subnet configuration. The team re-configured the network which solved the issue on June 14, at 1:46 PST.

Lesson Learned
* Changes in our infrastructure to improve Application Resiliency and redundancy, so Totango would have the ability to react to such problems in one of its components and still provide the best possible service.

  • We are also improving our network failover mechanism, so that if there are future cases of network failure, redundant components will be able to pick the work without end-user degradation
Posted Jun 19, 2018 - 15:41 UTC

Resolved
This incident has been resolved.
Posted Jun 15, 2018 - 08:36 UTC
Update
The application is stable for a few hours now, We keep monitoring our systems.
Posted Jun 13, 2018 - 19:20 UTC
Monitoring
We are currently experiencing a temporary system disruption. Our engineering teams are working on resolving with the highest priority. We will post any updates here so check back periodically. We will share the root cause and actions taken to prevent this from occurring in the future.
Posted Jun 13, 2018 - 16:09 UTC
Investigating
We are currently investigating problems logging into the Totango application and are working to
restore service.
Thank you for your patience while we’re looking into it.
Posted Jun 13, 2018 - 15:52 UTC
This incident affected: Totango Web Application.