Tracking API is down

Incident Report for Seven Senders

Postmortem

Incident Statement: Tracking API Service Disruption

Date: July 25th through 26th
Duration: 32 hours + 6 hours cleanup
Impact: Tracking API and Tracking Pages request timeouts due to database performance issues

What Happened

On July 25th, our tracking subsystem experienced service disruption due to the unintentional deletion of a critical unique index on our main database. This index is essential for processing of tracking-related queries.

Impact on Services

The loss of this index caused database queries to take much longer than normal, resulting in requests failing with timeouts.

Affected services

  • Tracking API
  • Tracking Pages
  • Parcel Finder

Resolution Steps Taken

Our engineering team responded immediately to restore service:

  1. Initial Response: Initiated rebuilding the deleted unique index
  2. Load Management: Stopped unindexed queries
  3. Infrastructure Scaling: Scaled up the database cluster to speed up indexing and reduce database load
  4. Interim Solution: Created a non-unique index to allow queries to work properly when the rebuilding process failed due to duplicates that had been created in the meantime
  5. Data Cleanup: Cleaned up duplicate entries
  6. Full Recovery: Successfully recreated the unique index, restoring normal service

Preventive Measures

To prevent similar incidents in the future, we are implementing the following measures:

  • Enhanced database change management procedures with mandatory peer review for all schema modifications
  • Implementation of automated database integrity checks to detect missing or corrupted indexes before before production deployment
  • Enhanced code review processes specifically focused on database schema changes and queries without index
  • Improved standard operating procedures for common database incident scenarios

Our Commitment

We sincerely apologize for the disruption this incident caused. We take full responsibility for this issue and have conducted a thorough post-incident review. We are committed to implementing the preventive measures outlined above to ensure the reliability and stability of our services.

If you have any questions or concerns regarding this incident, please don't hesitate to contact our support team.

Thank you for your patience.

Posted Jul 29, 2025 - 14:13 CEST

Resolved

This incident has been resolved.
Posted Jul 27, 2025 - 01:23 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 26, 2025 - 18:49 CEST

Update

A fix has been implemented, and we are currently monitoring the applications.
Posted Jul 26, 2025 - 18:49 CEST

Update

We have a major outage in one of our services, which affects Shipments tracking, including Tracking Page, Tracking API, Analytics, and Parcel Finder. We have identified the cause and are trying to restore the application.
Posted Jul 25, 2025 - 16:21 CEST

Investigating

We are currently investigating this issue.
Posted Jul 25, 2025 - 13:02 CEST
This incident affected: Tracking (Tracking, Shop SQS).