Due to a misconfiguration, a significant number of messages in our system were incorrectly marked as errors. These erroneously flagged messages did not impact functionality for our customers.
However, what did affect our customers was that the system became overwhelmed on Monday evening due to the high volume of errors, which couldn't be processed. As a result of this overload, other messages couldn't be handled, putting the system under heavy strain. It took some time to recover from this, and during that period, customers experienced slower system performance.
To prevent similar issues in the future, we have tightened our monitoring procedures and identified a bug that we will address in the upcoming release.