Significant Security Oversight in Microsoft Cloud Computing Service: Bug Leads to Incomplete Azure Customer Security Logs
For businesses, the security auditing features offered by cloud computing platforms are crucial as they allow for the inspection of potential security risks, such as the erroneous login information of certain accounts.
However, a significant security oversight occurred recently in the Microsoft Azure service. A bug in the newly deployed version of its internal monitoring agent caused data within the log collection service to be incomplete.
Affected platforms or services include:
- Platform logging service Microsoft Azure Logic Apps
- Platform logging service Microsoft Azure Healthcare API
- Security alert service Microsoft Sentinel
- Diagnostic settings routed to monitoring through Microsoft Azure Monitor
- Azure Trusted Signing for recording incomplete SignTransaction and SignHistory logs
- Azure Virtual Desktop for logging into Application Insights
- Power Platform for reporting data discrepancies
- Microsoft Entra (formerly known as Microsoft Azure Activity) for logging login and activity logs
Preliminary cause of the issue:
Starting from September 2, 2024, 23:00 (UTC+0), an error in Microsoft's internal monitoring agent caused some agents to fail while uploading logs to the internal log platform. This error resulted in incomplete log data across affected platforms.
Microsoft detected some anomalies and began investigations on September 5. By September 30, the Microsoft Cloud Engineering Team proposed a temporary and partially effective solution, which involved periodically restarting the agent servers to resume the log collection process.
However, logs lost during the period from September 2 to September 30 cannot be recovered, meaning businesses were almost entirely unable to view complete security logs for September, preventing them from defending against risks recorded in the logs.
Especially since the Microsoft Sentinel security alert logs were also affected, this meant that logs from this service flowing to Microsoft Purview and Microsoft Defender for Cloud were impacted, thereby affecting businesses' ability to analyze data, detect threats, and generate security alerts.
For instance, if an attacker remotely logged into a corporate employee's account through phishing during the affected period, and if the hacker used a remote IP address, a security warning for unusual login location should have been issued. However, because these logs may have been lost, businesses would not receive related alerts.
Microsoft has yet to disclose the root cause of the failure. The company stated it would publish an investigation report after completing the investigation. As for whether a permanent, effective solution will be available remains unclear, since it's unlikely that Microsoft will rely on temporary solutions, such as server restarts, in the long term.