Avoiding Single Points of Failure
Single points of failure can occur at integration points in your system, such as your public API and authorization layer, workflows that depend on a third-party API, or centralized resources like event buses and databases.
While certain single points of failure are very difficult to avoid, your aim should always be to eliminate single points of failure over time. Any potential failure of these components can also be mitigated through various strategies, including replication and backups.
Any single point of failure should be developed, deployed, and operated in isolation from the other parts of your architecture. For instance, you should define your Cognito user pools or application state data stores in separate stacks from the micro‐ services that interact with them. This separation allows you to deploy these resources only when they change, which is typically not often, and not when dependent compo‐ nents change. Chapter 6 provides more details on separating and sharing resources across decoupled services.
Cognito user pools cannot natively be replicated or backed up. This makes them very susceptible to catastrophic faults. Use syn‐ thetic monitoring and metric alarms to ensure you are alerted to any issues with authorizing API requests against your user pool. Define your Cognito user pool, app clients, and scopes and any supporting infrastructure, like Route 53 custom domains, using an infrastructure-as-code tool. Deploy the infrastructure via an isolated, direct pipeline. This will allow you to rapidly iterate or re-create your Cognito resources in the event of failure.
Understanding AWS Availability
At a high level, the global cloud infrastructure of AWS consists of two primary concepts: Regions and Availability Zones. A Region is a geographical area, such as North Virginia (us-east-1), London (eu-west-2), or Tokyo (ap-northeast-1). Each Region consists of at least three Availability Zones (see Figure 8-11).
Figure 8-11. AWS Regions and Availability Zones
An Availability Zone (AZ) is a physically isolated section of an AWS Region with one or more data centers. Each AZ is designed to operate and fail independently. They are physically separated by up to 100 km (60 miles) and connected by high-bandwidth, low-latency networking that allows for synchronous replication between AZs. Practi‐ cally, this isolation protects your applications against issues such as power outages and natural disasters.
It is this global infrastructure of Regions and AZs that enables you, as an AWS customer, to build highly available, fault-tolerant, and scalable applications.
At the time of writing, there are 32 Regions and 102 Availability Zones, with 15 more AZs and 5 more Regions planned in Canada,
Germany, Malaysia, New Zealand, and Thailand.
Choosing a Region to deploy your application to will usually depend on where the majority of your users are located, as well as where your organization is legally permitted to operate and process and store user data. However, many AWS services also provide features that allow you to operate your workload across multiple AWS Regions and AWS accounts.