MessageFocus Issue
Incident Report for Adestra
Resolved
Our engineers have identified and repaired issues with several subsystems, including database server performance and problems with our Redis caching and queuing servers.

After extensive investigation the team have traced this back to one particular database server that had a very large number of database locks which were causing slow throughput, which in turn caused the Redis server's queues to become large and reduce performance to the platform as a whole. This large number of locks caused undocumented and unexpected performance characteristics for our Postgres database server that we hadn't previously experienced - this took up the bulk of the diagnostic investigation time.

In turn, this situation caused sporadic interruptions in service, and slow response from the transactional email API.

Currently all systems are running at full capacity again, and queue processing is running at full speed. There are still a number of backlogged queue tasks that the team are manually pushing through now, we anticipate this will be complete in around 2 hours from now.

The team have identified a number of system and monitoring changes that will help improve efficiency in general, aid in proactive notification of similar situations in the future - in particular some early warnings if the number of locks on a Postgres server is increasing at a high rate.

Please accept our apologies for the interruption in service.
Posted Feb 08, 2020 - 19:24 GMT
Update
A Database performance issue is continuing and for some users may cause intermittent issues accessing the platform
Posted Feb 08, 2020 - 17:16 GMT
Update
We are continuing to work on a fix for this issue.
Posted Feb 08, 2020 - 15:57 GMT
Identified
Our engineers identified a performance issue with one of our Database servers which may have caused delays to some launches and events.
Posted Feb 08, 2020 - 15:43 GMT
Investigating
We're having a few problems with MessageFocus right now that is affecting some of our users, we're currently investigating the cause and will provide an update shortly
Posted Feb 08, 2020 - 15:29 GMT
This incident affected: Platform Interface, Clicks, reads, unsubs, etc, API, and Launches.