AWS and Microsoft Azure Outages: A Costly Lesson in Operational Resilience?

Research Article

13 Nov

13 November 2025

Taken from the published LinkedIn pulse article.

Introduction

In October, major outages at two of the world’s largest cloud infrastructure providers, Amazon Web Services (AWS) and Microsoft Azure, have highlighted systemic dependency on a small number of hyperscale cloud operators and underscored the critical need for operational resilience within digital asset firms.

On 20 October, AWS reportedly experienced a significant outage in its US-East-1 region, caused by a Domain Name System (DNS) resolution issue affecting its DynamoDB API endpoints. As many AWS services depend on these endpoints, the disruption cascaded across third-party applications and multiple AWS services, impacting thousands of businesses. Then, on 29 October, Microsoft Azure suffered a separate outage allegedly triggered by an inadvertent configuration change in its Azure Front Door service, which caused widespread service degradation across Azure, Microsoft 365, and other downstream services globally.

The Impact of Disruptions

Whilst these outages had a considerable negative impact across a myriad of industries, for digital asset-related firms in particular, where concepts such as uptime and real-time settlement are core to business continuity and regulatory compliance, the implications were, arguably, far more serious.

Cloud infrastructure is deeply embedded within the digital asset ecosystem, particularly among firms that provide front-end services. Exchanges, custodians, payment gateways, staking providers, and DeFi protocols increasingly rely on AWS, Azure, or Google Cloud to host validator nodes, manage APIs, process transactions, and store user data.

This concentration with a few high-profile providers introduces a single point of failure that notably contrasts with the digital asset sector’s core value of decentralisation. When inevitable outages do occur, order books freeze and liquidity aggregation breaks down, and API-dependent systems such as price feeds and custody integrations can go offline, which of course exposes firms to potentially significant market, reputational, and regulatory risks.

Within the risk and compliance functions of digital asset companies, these recent outages should first serve as a wake-up call to revisit cloud dependency models. Regulators, including the UK’s Financial Conduct Authority (FCA), have increasingly emphasised the need for operational resilience, requiring firms to identify critical business services, set impact tolerances, and test continuity under severe but plausible scenarios. In this context, overreliance on a single cloud vendor represents a clear operational concentration risk that must be mitigated through robust architecture and contingency planning. In the case of the many digital asset-related firms forced offline due to the recent outages, this was arguably not the case.

Enhancing Resilience

Practical measures to enhance resilience include adopting multi-cloud and hybrid-cloud deployments to ensure critical workloads can fail over effectively between providers or on-premises environments. Equally important is implementing robust monitoring and alerting systems, coupled with automated backup, snapshot, and recovery protocols that minimise data loss. Firms should also perform regular tabletop exercises and failover simulations to ensure that both technical teams and senior management understand how to respond to service degradation events. Of course, enhancing operational resilience comes at a cost, but investing in higher operational resilience is a worthwhile investment in continuity and reputation. The upfront costs are far outweighed by the financial and reputational losses that result from downtime or service disruptions.

It is important to note that effective operational resilience goes beyond technology alone. Boards and senior leadership must treat operational resilience as a vector of strategic governance, integrating it into their enterprise risk management frameworks and vendor oversight processes. Incident response planning should embed communications, counterparty management, and market impact assessment, rather than merely focusing on IT recovery.

As the digital asset industry evolves and institutional participation increases, the tolerance for downtime or data unavailability will decrease significantly as market participants and regulators demand demonstrable resilience and transparency. Firms that invest early in redundant architectures, diversified providers, and robust operational governance will not only safeguard their users but also strengthen confidence in the broader digital asset ecosystem.

About Appold

At Appold, we are committed to helping firms embed robust operational resilience practices within their systems and processes, ensuring that they are equipped to plan for, respond to, recover from, and learn from operational disruptions through the implementation of our institutional-grade, regulatory-aligned operational resilience framework, tailored to the needs of digital asset firms.

By leveraging Appold’s independent digital asset expertise and successful track record as an auditor’s expert to major digital asset firms, along with our hands-on experience in identifying and remediating systems and controls risks, your company can demonstrate a strong commitment to risk management and operational resilience that will reassure clients, stakeholders, and regulators alike that you are built to last.

Reach out to us for further discussion.

www.appold.com

For further information, please contact:

info@appold.com

Robert Gaskell

AWS and Microsoft Azure Outages: A Costly Lesson in Operational Resilience?

Appold Market Watch - Week ending 14 November 2025

Appold Market Watch - Week ending 7 November 2025