Identifying the Critical Paths A critical path is typically a user experience that is critical to the operation of your business. Examples of user requests that follow critical paths include ordering a taco, making a payment for your child’s Christmas present, donating to a charitable cause,or tracking a parcel. If these requests go wrong or […]
AWS X-Ray – Operating Serverless
AWS X-Ray X-Ray is the native AWS solution for distributed tracing. It is a key component in the AWS observability stack and is fully integrated with the Amazon CloudWatch console. X-Ray provides tools to configure your application to collect trace data across owned and managed services. You can use the X-Ray console to analyze your […]
Instrumentation – Operating Serverless
Instrumentation Instrumentation is the process of configuring the microservices and managed services in your system to emit trace data when making API calls and performing tasks. For managed services, API calls are instrumented via the configuration of the resources you create. For example, to enable tracing for a Step Functions workflow you would include this […]
When Things Go Wrong – Operating Serverless
When Things Go Wrong The sheer volume of variables and emergent complexity in the operation of a produc‐ tion serverless workload mean things will go wrong. This does not mean you should be complacent and resist trying to minimize the number of things that could go wrong, of course. You should optimize your development practices, […]
Everything Fails All the Time: Fault Tolerance and Recovery – Operating Serverless
Everything Fails All the Time: Fault Tolerance and Recovery As you design your serverless architecture and build your functions, workflows, and microservices, you should always code for failure. Coding for failure requires you keep in mind the mechanisms and strategies available to you for recovering from failures. This ranges from a try/catch in a Lambda […]
Debugging with the Core Analysis Loop – Operating Serverless
Debugging with the Core Analysis Loop When you receive a notification, either from an alert or directly from a user, that something in your application is not working correctly, you know that something is wrong but you do not know why. Following the core analysis loop, as introduced in the book Observability Engineering, allows you […]
Avoiding Single Points of Failure – Operating Serverless
Avoiding Single Points of Failure Single points of failure can occur at integration points in your system, such as your public API and authorization layer, workflows that depend on a third-party API, or centralized resources like event buses and databases. While certain single points of failure are very difficult to avoid, your aim should always […]
Multi-Account, Multi-Region: Is It Worth It? – Operating Serverless
Multi-Account, Multi-Region: Is It Worth It? The effort involved in designing, developing, and operating a cloud native application across multiple AWS accounts or Regions is substantial. You will need an in-depth understanding of all the implementation details. This means the answer to the ques‐ tion of whether it’s worth adopting a multi-account, multi-Region strategy to […]
Noncritical paths – Testing Serverless Applications
Noncritical paths The noncritical paths in your application will usually be background processes. These will not be time-sensitive and will be fully recoverable in the event of performance degradation or outages resulting from transient bugs or persistent errors, or following a code fix or rollback. The operational quality of noncritical paths should be primarily supported […]
Just enough testing – Testing Serverless Applications
Just enough testing Adopting a test strategy of “test everything all the time” simply does not scale. As your application’s codebase grows and becomes increasingly fragmented across microservices and infrastructure stacks, this all-encompassing test strategy will require you to continually add more tests, spreading those tests across more service boundaries. No matter how much you […]