Google Cloud Publishes Incident Report on Complete Deletion of Customer Data Due to Original Tool Bug
In a significant cloud computing mishap, Google Cloud inadvertently deleted the account of its Australian customer, UniSuper, leading to the complete erasure of all of UniSuper's data, including geo-redundant backups stored on Google Cloud.
UniSuper, an Australian superannuation fund for university employees, manages assets exceeding $125 billion USD. Following the incident, UniSuper's clients were unable to access any investment data in their accounts.
Fortunately, UniSuper had not placed all its trust in Google Cloud, maintaining redundant backups on another cloud platform. This foresight allowed them to restore services within a few days, averting a major disruption to their operations.
Incident Investigation:
The incident was a significant blow to Google, prompting a comprehensive investigation to maintain transparency around its public cloud platform's reliability.
The investigation revealed that the fault originated from a bug in an original configuration tool developed by Google for customer-specific deployment of Google Cloud VMware Engine (GCVE).
A parameter that should have been left blank, intended to set the lifespan for GCVE, was inadvertently assigned a one-year validity period by the tool. Normally, leaving this parameter blank would imply no set expiry date, but due to the bug, it defaulted to one year—a detail Google engineers were initially unaware of.
As a result, when the time expired, GCVE automatically deleted the customer's account without any prior email notification, as the deletion was not initiated by the customer.
Google emphasized that the configuration of GCVE has since been fully automated, eliminating the need for manual intervention. By the fourth quarter of 2023, the original configuration tool implicated in this incident had been deprecated, preventing a recurrence of similar issues.
Google's Remedial Actions:
To address and prevent such issues, Google has deprecated the internal tool that could trigger similar problems and has fully automated the process. Users can now operate through the console interface without needing manual intervention from Google engineers.
Additionally, Google has reviewed its database and all private GCVE cloud deployments to ensure no other GCVE clouds are at risk.
Finally, Google has cleaned up the workflow that automatically set the deletion of customer private clouds, ensuring that even if expiry issues arise, automatic deletion will not occur.
Google also praised customers like UniSuper for adopting robust and resilient architectures that mitigate the risk of failures. Thanks to UniSuper's multiple backup strategies, they were able to quickly recover from the deletion of their data on Google Cloud.