Back to overview
Downtime

Elevated Inference Error Rates Due to a DNS Outage

Feb 10 at 04:50pm UTC
Affected services
GPU Cluster (General)
GPU Cluster (XL)

Resolved
Feb 10 at 05:55pm UTC

The issue has been resolved and the system is now stable.

We are continuing to actively monitor the affected services.

Created
Feb 10 at 04:50pm UTC

A DNS outage is causing networking issues across all GPU clusters. As a result, we have seen increased error rates for all inference requests.

These networking issues are preventing job requests and job responses from flowing between the API server and the inference servers.