Engine Down
Incident Report for Mergify Status Page
Postmortem

19th November @ 1:00 UTC

  • We start receiving more than 5000 events/minute, while our max rate is usually around 1000 events/minute.

19th November @ 3:00 UTC

  • The high load of incoming events continued our Redis database got full as it has been sized for only 3000 events/minute.
  • Events processing got stuck, and some processes started to crash.

19th November @ 6:00 UTC

  • The engineering team is notified and investigates the issue and remediation solution.
  • The Redis Database gets replicated for further investigation.
  • We increased the Redis database size to be able to absorb up to 6000 events/minute.
  • The engine starts reprocessing events.

19th November @ 6:10 UTC

  • The abusing user has been identified and flagged. Its Mergify installation has been suspended. Its account was generating 100 commit/s on a repository triggering associated CIs. The abusing repository also has been suspended/deleted on the GitHub side.
  • The engine has automatically dropped all its events and does not receive events from it anymore.
Posted Nov 19, 2021 - 12:17 UTC

Resolved
Everything is back to normal.
Posted Nov 19, 2021 - 08:24 UTC
Update
We are continuing to work on a fix for this issue.
Posted Nov 19, 2021 - 07:11 UTC
Identified
We're implementing long term fixes.
Posted Nov 19, 2021 - 07:00 UTC
Update
We have fixed the underlying issue and restored the service. We are now monitoring the platform and planning long term action to have this incident not happen again.
Posted Nov 19, 2021 - 06:42 UTC
Monitoring
The Mergify engine is unable to process most events received.
Posted Nov 19, 2021 - 02:29 UTC
This incident affected: Engine.