June 21, 2011

by mario

Network issue post-mortem


Unfortunately, the time has come where all of us had to go through the storm – after one year of flawless network operation, something cracked and we experienced downtime. This post serves as an explanation, and an apology on our part – rest assured we are doing everything in our power to prevent this from happening in the future.

Somewhere around 14:00 CET we received several notifications of servers and VMs going offline, and immediately assigned on the job technicians who started to troubleshoot the issue. Network access was fully restored 90 minutes later, after identifying and fixing the issue.

The downtime happened due to issues with a cable crossing patch panel in Equinix DC between our cage and meet-me-room. The problem was an unusual one, so it couldn’t be immediately identified. When the issue was discovered network access was restored immediately. The cable crossing will be reorganized to eliminate the single point of failure.

We believe it is important to mention that all VMs remained online during network outage.

We apologize for for any issues this might have caused to you, and we will be issuing SLA credits for this downtime during the week. Thank you for staying with us, we love having you here.