merge queue unexpectedly unqueue pull requests or temporary stuck
Incident Report for Mergify Status Page
Postmortem

All timestamps are in UTC

2023-07-31 11:53, first support case about the merge-queue unexpectedly dequeued a pull request with the message: Base does not exist. We started the investigation.

2023-07-31 14:53, We opened an internal incident as our monitoring alerted us about an increasing number of unexpected GitHub API status codes while Mergify created or deleted draft pull requests.

2023-07-31 15:12, We understood that the Git branches we create and the changes we make on them, with the GitHub Git Database API, are not instantly visible by GitHub Repository and Pulls API. API call of Git manipulation succeeds, but when you get the Git resources you just created, GitHub returns that they do not exist.

The issue was causing unexpected failures in many different code paths. That could result for customers into two visible issues:

  • pull requests wrongly dequeued with one of these error messages:

    • No commits between XXXX and YYYY
    • Base does not exist
  • merge queue stuck at step: This queue is waiting for a batch to fill up.

We decided to implement in different code paths a retry mechanism when this issue occurred.

2023-07-31 14:50, Our first change to mitigate the issue lands in production and continue the monitoring closely

2023-07-31 14:53, We enabled some full HTTP request/response logging to gather material for GitHub support.

2023-07-31 15:12, We decide to make the incident public

2023-07-31 15:34, We deploy a second code change to improve the mitigation

2023-07-31 15:49, We escalated the issue to GitHub support as we have enough materials to show the API breakage.

2023-07-31 16:25, We extracted stats about the number of customers and pull requests impacted. We found that GitHub API started to report as non-existing existing Git resources on 2023-07-27 at 14:14:10 UTC for some accounts. We discovered later it was the date of the previous GitHub Pull Request API incident https://www.githubstatus.com/incidents/l59z35rhzdky.

2023-07-31 17:34, A third change is deployed to readjust the retrying strategy. Mergify was always able to succeed in detecting and retrying when the issue occurred.

2023-08-01 07:15, A new change is deployed to cover a new code path where the issue occurs.

2023-08-01 09:51, GitHub support answered our support ticket and acknowledged the GitHub API behavior changed and escalated to the engineering team

2023-08-01 16:53, GitHub fixed the issue; we asked for more details and why the GitHub status page didn’t get updated

2023-08-02 09:36, GitHub communicates more details about the API behavior change issue:

A feature flag related to spoke caching was turn on earlier that causes replication lag. Following reports of 404 errors occurring for newly created refs, the change was reversed.

023-08-02 10:53, GitHub confirms this incident will be part of their next availability report

Thanks for the feedback --I'd pass those on to the relevant team. Hopefully it gets published in the monthly published availability report.

Posted Aug 02, 2023 - 17:43 UTC

Resolved
This incident has been resolved, and the workaround work as expected. We are still waiting for GitHub support to get more information about these API behavior changes we have observed.
Posted Jul 31, 2023 - 18:32 UTC
Update
We added a way to mitigate those API change from GitHub and are monitoring that everything works as intended.
Posted Jul 31, 2023 - 17:50 UTC
Update
We added a way to mitigate those API change from GitHub and are monitoring that everything works as intended.
Posted Jul 31, 2023 - 16:34 UTC
Monitoring
We added a way to mitigate those API change from GitHub and are monitoring that everything works as intended.
Posted Jul 31, 2023 - 16:31 UTC
Identified
The merge queue unexpectedly unqueue some pull requests or get stuck due to recent changes in the GitHub API behavior. We are working on a mitigation.
Posted Jul 31, 2023 - 15:12 UTC
This incident affected: Third parties (GitHub Pull Requests) and Engine.