What Happened?
On 3/13/2025, we detected a service disruption affecting all users. During this time, some catalog pages and the Uscreen admin area experienced errors, preventing store functionality from working as expected. The issue lasted approximately 19 minutes, with a complete outage of 5-6 minutes before services began recovering.
What Was the Impact?
- Total Duration: ~19 minutes (15:01 - 15:20 UTC)
- Full Service Disruption: ~5-6 minutes
- Degraded Performance: 15:08 - 15:20 UTC
- Scope: All stores
- Recovery: Initial recovery started at 15:08 UTC, with full restoration by 15:20 UTC
Is Everything Working Now?
Yes, the service has been fully restored and is operating normally.
What Caused the Issue?
After investigating, we identified that:
- A feature in the community section was not optimized for stores with a high number of users.
- The system was executing resource-intensive operations, causing requests to take longer than expected.
- Under heavy load, the database became overwhelmed, leading to errors.
- Our monitoring system did not provide useful diagnostics during the incident, delaying troubleshooting.
What Are We Doing to Prevent This in the Future?
We’ve taken immediate steps to mitigate the issue and are working on long-term improvements:
Immediate Fixes:
- Applied a temporary adjustment to reduce the server load caused by the feature.
- Upgraded our monitoring system for better incident visibility and faster response times.
Planned Improvements:
- Optimizing the affected feature to handle high-traffic scenarios efficiently.
- Implementing measures in our mobile apps to limit excessive requests.
We sincerely apologize for any disruption this may have caused and appreciate your patience as we work to enhance system stability.