Temporary systems outage February 12
On February 12, we flagged suspicious DDOS (Distributed Denial-of-Service) activity patterns in the system and took action to prevent them. We use a variety of different methods to protect our customers from DDOS and hacking activities, including a technology called BGP Flowspec. We actively use this technology to reduce suspicious activity as it happens by applying rules to network traffic in real-time.
On the 12th of February, we attempted to isolate a ‘bad actor’ using this real-time method. Due to a genuine human mistake during this process, connections on our network equipment were temporarily discarded, resulting in Namecheap services (including the main website), becoming unavailable for a short period of time.
What we did in response
The issue was spotted within two minutes, and the changes were reversed immediately. However, because of what had already occurred, our system had to re-establish itself. This ‘warm up’ procedure took around 45 minutes and put an intensive CPU load on our network equipment. This prolonged an outage that would otherwise have only been a few minutes long.
Resolving the issue and protecting against future problems
We aim to prevent events like this from occurring at all costs and endeavor to learn from any rare instances where our systems do go down.
In this case, once our systems had restored peering connections (effectively reconnecting us with the Internet), all of our services went back online automatically. As a result of this event, we are introducing a new safeguard rule check logic. This will guarantee that even in the case of a human error, a broken rule will not be rolled out into the production system.
We’d like to apologize for any inconvenience this caused you, and hope you’ll be reassured by the measures that are now in place to prevent similar events in the future.