Summary
Overnight on November 8th, 2021, Grip experienced a platform-wide disruption due to a DNS incident. Our internal team detected the disruption overnight and immediately started working towards a resolution first thing that morning. Within 31 minutes of investigation, the root cause was identified and restoration immediately followed afterwards. By 9:20 UTC, all Grip core systems were back online on our new Grip.Events domain as was originally planned for later this year.
We sincerely apologize for any inconvenience that this caused our clients and everyone who relies on our services.
What Happened?
On November 8th, Grip experienced a platform-wide disruption due to a DNS incident.
Early on November 8th, a team member tried to access the Grip platform and received a Failure Response error from the intros.at domain. You may have encountered this as the majority of Grip's network was interrupted.
Below is the sequence of events listed in UTC time:
03:47 Internal team discovered the disruption
07:10 Engineering team responded to the incident
07:41 Root cause identified and migration from intros.at to grip.events domain began
08:15 Migration of configuration to grip.events domain for major services
09:10 Mobile Apps started to be rebuilt and published to App Stores
09:20 Core Grip platform (Grip Dashboard and Web Networking Platform) restored
10:10 Ancillary services (iFrame, Insights, Chinese Platform) restored
All services have been migrated to the grip.events domain. We were able to quickly restore the Core Grip platform once the root cause was identified, limiting the platform-wide disruption.
What are the Next Steps?
- We have migrated all services from intros.at to grip.events
- We will create a public facing page that reveals the status of Grip services for streamlined status communication
- We will review all escalation processes for critical events from Tier 1 through to Product/Engineering