Copilot Partial Outage

Incident Report for MadKudu

Resolved

This incident has been resolved.
Posted May 30, 2025 - 00:46 PDT

Monitoring

The LaunchDarkly outage triggered an overload on our production database by one of our systems, which in turn prevented the Copilot service from handling some requests. After restarting the database and with LaunchDarkly gradually recovering, the load has decreased and Copilot is now functioning normally. We are closely monitoring the situation and are developing a plan to improve our resilience to similar incidents, which will be implemented in the coming weeks.
Posted May 27, 2025 - 08:52 PDT

Investigating

One of our provider, LaunchDarkly, is experiencing a major outage: https://status.launchdarkly.com/
We use Launchdarkly for our feature flag infrastructure (i.e. control how features are released).

We are looking for a way to fix the situation without having to wait for Launchdarkly to fix their service.
Posted May 27, 2025 - 07:21 PDT
This incident affected: msi.madkudu.com and app.madkudu.com.