The Saga Pattern - A Distributed Design Pattern

The Saga Pattern is a design pattern used in distributed systems and microservices architecture to manage long-lived transactions in a more scalable and reliable manner. It is advantageous when traditional ACID transactions are not feasible or practical in a distributed environment. The Saga Pattern breaks down a large transaction into a series of smaller, loosely coupled and independent transactions called “sagas.”

Here’s a more detailed explanation of the Saga Pattern:

Definition of a Saga:

A saga is a sequence of local transactions, updating one service’s data and collectively achieving an overall business goal.

Saga Components:

Sagas : The overall business transaction is divided into smaller sagas, each responsible for a specific aspect of the transaction.
Participants : Each saga consists of multiple participants and individual microservices accountable for executing a step.

Types of Sagas:

Choreography-based Saga : Participants communicate directly with each other to coordinate the execution of the saga steps. Each participant emits events or messages that trigger the next step in the saga. This approach requires less central coordination but may be more challenging to understand and maintain.
Orchestration-based Saga : A central component (orchestrator) coordinates and controls the execution of saga steps. The orchestrator sends commands to participants, instructing them on how to execute their part of the saga. This approach provides better visibility and control but introduces a single point of coordination.

Compensating Transactions:

In the Saga Pattern, compensating transactions are used to handle the reversal or compensation of changes made by a saga in case of failure. Each participant in a saga has an associated compensating transaction that undoes the effects of the original transaction.

What Is a Compensating Transaction?

Compensating transactions is a crucial concept in distributed transactions and the Saga Pattern. They provide a way to undo or compensate for the effects of a previously executed transaction in case of a failure or an error. When a part of a distributed transaction fails, compensating transactions are executed to bring the system back to a consistent state.

Here are more details about compensating transactions:

Purpose of Compensating Transactions :

Compensating transactions are designed to handle failures and ensure the system remains consistent, even if individual steps of a distributed transaction encounter errors.

Triggering Compensating Transactions :

Compensating transactions are typically triggered by explicitly recognising failure in a preceding transaction step. This recognition can come from an exception or an error condition that indicates the need to undo the effects of a previously executed operation.

Atomicity and Consistency :

Compensating transactions contribute to achieving atomicity and consistency in distributed systems. Even though the overall transaction might consist of multiple steps across different services, the compensating transactions ensure that the system returns to a consistent state if any part of the transaction fails.

Inverse Operations :

Compensating transactions are inverse operations to the original transactions. For example, if the original transaction were to debit an account, the compensating transaction would credit the same account with the same amount.

Idempotency :

Compensating transactions should be designed to be idempotent, meaning executing them multiple times has the same effect as executing them once. This property is crucial for handling scenarios where compensating transactions must be retried.

Sequential Execution :

In the context of sagas, compensating transactions are typically executed sequentially in the reverse order of the original transactions. This sequential execution ensures that the system progresses from its current state back to a consistent state.

Handling Asynchronous Operations :

Compensating transactions becomes more challenging when dealing with asynchronous operations. Ensuring the proper sequencing of compensating transactions in an asynchronous environment may require additional considerations, such as using message queues or durable storage.

Integration with Saga Pattern :

In the Saga Pattern, each step of the saga has an associated compensating transaction. The orchestrator or coordinator, whether choreography-based or orchestration-based, is responsible for invoking the compensating transactions in the event of a failure.

Handling Failures:

Sagas are designed to handle failures gracefully. If a participant in a saga fails to execute part of the transaction, the saga is responsible for initiating compensating transactions to roll back the changes made by the successful participants.

Durability and Idempotency:

Sagas should be designed to be durable and idempotent. Durability ensures that the state of the saga is preserved even in the face of failures, and idempotency ensures that re-executing a saga or compensating transaction has the same effect as the first execution.

Eventual Consistency:

The Saga Pattern embraces eventual consistency, meaning that even though the saga may not be immediately consistent after each step, it will eventually reach a consistent state.

Eventual consistency in a bit more detail:

Eventual consistency is a concept that arises in distributed systems where copies of data may exist in multiple locations, and changes to the data may be propagated asynchronously. The key idea is that, while the system might not be immediately consistent after an update, given enough time and the absence of further updates, all replicas will eventually converge to a consistent state. This concept is particularly relevant in scenarios where maintaining strict consistency (e.g., immediate consistency as in traditional ACID transactions) would be impractical due to network latency, partitioning, or high availability.

Here are more details about eventual consistency:

Consistency Models:

Strong Consistency : All nodes see the same data simultaneously in a strongly consistent system. Traditional databases that adhere to ACID properties, such as relational databases, typically provide strong consistency.

Eventual Consistency : In an eventually consistent system, different replicas might temporarily have different versions of the data, but given enough time and no further updates, all replicas will converge to the same consistent state.

CAP Theorem:

Eventual consistency is related to the CAP (Consistency, Availability, Partition tolerance) theorem, which states that, in a distributed system, you can have at most two out of three of the following: consistency, availability, and partition tolerance. Eventual consistency often leans towards prioritising availability and partition tolerance over immediate consistency.

Asynchronous Replication:

Eventual consistency is often achieved through asynchronous replication of data across distributed nodes. Updates are propagated to replicas, but there is no guarantee about the order or timing of these updates. This asynchrony allows for greater availability and fault tolerance.

Conflict Resolution:

Conflicts may arise when updates occur concurrently on different replicas in an eventually consistent system. Conflict resolution mechanisms are needed to reconcile these differences. Strategies may include last-write-wins, merging conflicting versions, or manual resolution.

Use Cases:

Eventual consistency is suitable for applications where immediate consistency is not a strict requirement, such as social media feeds, search indexes, or distributed caches. In these cases, users can tolerate temporary inconsistencies if the system eventually converges to a consistent state.

Trade-Offs:

The choice between strong consistency and eventual consistency involves trade-offs. Strong consistency ensures that clients see the most recent update but may lead to increased latency and decreased availability, especially in network partitions. Eventual consistency provides greater availability and fault tolerance but introduces the possibility of temporary inconsistencies.

Concurrency and Parallelism:

Eventual consistency allows for more excellent concurrency and parallelism in distributed systems. Multiple updates can occur independently on different replicas without requiring immediate synchronisation, promoting scalability.

Challenges and Considerations:

Achieving eventual consistency requires careful design and consideration of the application’s specific requirements. Conflict resolution strategies, system architecture, and data model design play crucial roles in ensuring the effectiveness of eventual consistency.

Implementation Patterns:

Different implementation patterns exist for eventual consistency, including anti-entropy mechanisms, vector clocks, and causal consistency models. Each pattern addresses different aspects of ensuring eventual consistency in distributed systems.

Example:

In a distributed key-value store, an update to a key might be propagated asynchronously to multiple replicas. While a client may read from any replica, it might temporarily observe different data versions. Over time, the replicas converge to a consistent state.

java

//example of an eventually consistent key-value store  
public class EventuallyConsistentKeyValueStore {  
    private Map<String, String> dataStore = new ConcurrentHashMap<>();  
    public void put(String key, String value) {  
        dataStore.put(key, value);  
        // Asynchronous replication logic to propagate the update to other replicas  
    }  
    public String get(String key) {  
        return dataStore.get(key);  
    }  
}

In summary, eventual consistency is a pragmatic approach to handling distributed data where strict, immediate consistency is not a strict requirement. It enables systems to maintain availability and fault tolerance while allowing for temporary inconsistencies that will eventually be resolved. Careful consideration of conflict resolution and system design is essential when opting for eventual consistency in distributed systems.

A Step-by-Step Example Of Eventual Consistency

Let’s consider a simple example of a distributed system with three replicas that store a counter value. Updates to the counter can happen independently at each replica. We’ll implement a basic system and explore how eventual consistency might work:

System Architecture :

Three replicas: R1, R2, and R3. Each replica has a counter value.

Initial State :

All replicas start with a counter value of 0.

Update at Replica R1 :

A client sends a request to increment the counter by 1 to replica R1. R1 increments its counter locally and responds to the client.

Update at Replica R2 :

Simultaneously, another client sends a request to increment the counter by 1 to replica R2. R2 increments its counter locally and responds to the client.

Current State :

After updates, R1’s counter is 1, and R2’s counter is 1. However, R3 has yet to receive any updates.

Eventual Consistency Propagation :

In an eventually consistent system, updates are asynchronously propagated to other replicas. R1 and R2 might communicate with each other to exchange updates.

R1 Updates R2 :

R1 sends a message to R2 saying, “I’ve updated the counter to 1.” R2 receives the message and updates its counter to 1.

R2 Updates R1 :

Similarly, R2 sends a message to R1 saying, “I’ve updated the counter to 1.” R1 receives the message but recognises that its counter is already at 1, so no update is needed.

Current State After Propagation :

Both R1 and R2 now have their counters set to 1.

Update at Replica R3 :

A third client sends a request to increment the counter by 1 to replica R3. R3 increments its counter locally and responds to the client.

Current State :

R3’s counter is 1, while R1 and R2’s counters are also 1.

Eventual Consistency Propagation Continues :

R3 might communicate with R1 and R2 to exchange updates, ensuring all replicas converge to a consistent state.

Final Consistent State :

Over time, all replicas converge to a consistent state where the counter is incremented by one at each replica.

This example illustrates the eventual consistency model, where updates can be applied independently at each replica, and consistency is achieved over time through asynchronous propagation and communication between replicas.

It’s important to note that the actual mechanisms for communication, conflict resolution, and propagation may vary based on the system’s design and the application’s specific requirements. Eventual consistency is a trade-off between availability and consistency, and its effectiveness depends on careful system design and implementation.

Monitoring and Logging:

Proper monitoring and logging mechanisms are crucial in sagas to trace the execution of each step, identify failures, and facilitate troubleshooting.

The Saga Pattern is a powerful approach for managing distributed transactions in microservices architectures. It provides a way to achieve consistency across services while allowing for flexibility, fault tolerance, and scalability. The choice between choreography-based and orchestration-based sagas depends on the specific requirements and characteristics of the system and the differences we will discuss later in this text.

Some Open-Source Projects are implementing the Saga Pattern.

Axon Framework:

Description : Axon Framework is a Java framework specifically designed for building scalable and extensible applications using the CQRS (Command Query Responsibility Segregation) and Event Sourcing patterns. It provides support for Sagas as part of its features.

GitHub Repository : https://github.com/AxonFramework/AxonFramework

Eventuate Tram:

Description : Eventuate Tram is a framework for developing Java-based transactional microservices that use Event Sourcing and CQRS patterns. It includes support for Sagas to manage long-running business transactions.

GitHub Repository : https://github.com/eventuate-tram/eventuate-tram-core

Spring Cloud Sleuth:

Description : While not specifically a Saga Pattern implementation, Spring Cloud Sleuth is part of the Spring Cloud ecosystem and provides distributed tracing capabilities. Distributed tracing can be essential for monitoring and troubleshooting purposes when dealing with microservices and sagas.

GitHub Repository : https://github.com/spring-cloud/spring-cloud-sleuth

Before selecting a framework or library for implementing the Saga Pattern in your Java-based microservices architecture, it’s essential to evaluate the specific requirements of your project, community support, and the features provided by each solution. Always refer to the latest documentation for accurate and up-to-date information.

An Example Of A Choreography-based Saga

In a choreography-based Saga, microservices communicate directly with each other to coordinate the execution of saga steps. Each microservice emits events or messages to inform other services about completing its part of the transaction. Let’s consider a simple example of an e-commerce application with three microservices: Order, Payment, and Shipping. We’ll implement a choreography-based saga to handle the purchase process.

Order Service:

java

import java.util.UUID;  
public class OrderService {  
    public void createOrder(String orderId, String product, int quantity) {  
        // Business logic for creating an order  
        System.out.println("Order created: " + orderId + ", Product: " + product + ", Quantity: " + quantity);  
        // Publish an event to notify other services  
        EventPublisher.publishEvent(new OrderCreatedEvent(orderId, product, quantity));  
    }  
    public void cancelOrder(String orderId) {  
        // Business logic for cancelling an order  
        System.out.println("Order canceled: " + orderId);  
        // Publish an event to notify other services  
        EventPublisher.publishEvent(new OrderCanceledEvent(orderId));  
    }  
}

Payment Service:

java

public class PaymentService {  
    public void processPayment(String orderId, double amount) {  
        // Business logic for processing payment  
        System.out.println("Payment processed for order " + orderId + ", Amount: " + amount);  
        // Publish an event to notify other services  
        EventPublisher.publishEvent(new PaymentProcessedEvent(orderId, amount));  
    }  
    public void cancelPayment(String orderId, double amount) {  
        // Business logic for cancelling payment  
        System.out.println("Payment canceled for order " + orderId + ", Amount: " + amount);  
        // Publish an event to notify other services  
        EventPublisher.publishEvent(new PaymentCanceledEvent(orderId));  
    }  
}

Shipping Service:

java

public class ShippingService {  
    public void shipOrder(String orderId) {  
        // Business logic for shipping an order  
        System.out.println("Order shipped: " + orderId);  
        // Publish an event to notify other services  
        EventPublisher.publishEvent(new OrderShippedEvent(orderId));  
    }  
    public void cancelShipping(String orderId) {  
        // Business logic for canceling shipping  
        System.out.println("Shipping canceled for order " + orderId);  
        // Publish an event to notify other services  
        EventPublisher.publishEvent(new ShippingCanceledEvent(orderId));  
    }  
}

Event Publisher:

xml

import java.util.ArrayList;  
import java.util.List;  
public class EventPublisher {  
    private static List<Object> events = new ArrayList<>();  
    public static void publishEvent(Object event) {  
        events.add(event);  
        // In a real scenario, events would be sent to a message broker for distribution  
    }  
    public static List<Object> getEvents() {  
        return events;  
    }  
}

Events:

java

public class OrderCreatedEvent {  
    private String orderId;  
    private String product;  
    private int quantity;  
    // Constructor, getters, and setters  
}  
public class OrderCanceledEvent {  
    private String orderId;  
    // Constructor, getters, and setters  
}  
public class PaymentProcessedEvent {  
    private String orderId;  
    private double amount;  
    // Constructor, getters, and setters  
}  
public class PaymentCanceledEvent {  
    private String orderId;  
    // Constructor, getters, and setters  
}  
public class OrderShippedEvent {  
    private String orderId;  
    // Constructor, getters, and setters  
}  
public class ShippingCanceledEvent {  
    private String orderId;  
    // Constructor, getters, and setters  
}

Main Application:

java

public class MainApplication {  
    public static void main(String[] args) {  
        OrderService orderService = new OrderService();  
        PaymentService paymentService = new PaymentService();  
        ShippingService shippingService = new ShippingService();  
        // Simulate a successful purchase  
        String orderId = UUID.randomUUID().toString();  
        orderService.createOrder(orderId, "Laptop", 2);  
        paymentService.processPayment(orderId, 2000.0);  
        shippingService.shipOrder(orderId);  
        // Simulate a failure during payment processing  
        String failedOrderId = UUID.randomUUID().toString();  
        orderService.createOrder(failedOrderId, "Camera", 1);  
        paymentService.cancelPayment(failedOrderId, 500.0);  
        // Shipping service won't be triggered because payment failed  
    }  
}

Summary - Choreography-based Saga

In this example, each service emits events (e.g., “OrderCreatedEvent “, “PaymentProcessedEvent “, etc.) to notify other services about the progress of the saga. The “EventPublisher” is a simplified class representing a message broker or event bus. In a real-world scenario, you would use a dedicated messaging system (e.g., Apache Kafka, RabbitMQ) for event distribution. This choreography allows services to react to events and progress the saga independently.

An Example of an Orchestration-based Saga

In an orchestration-based Saga, there is a central coordinator (orchestrator) that explicitly defines the sequence of steps and communicates with the individual services to execute those steps. The orchestrator takes on the responsibility of managing the overall flow of the saga. Let’s consider a similar e-commerce example with three microservices (Order, Payment, and Shipping) and implement an orchestration-based saga for handling the purchase process:

Order Service:

java

import java.util.UUID;  
public class OrderService {  
    public void createOrder(String orderId, String product, int quantity) {  
        // Business logic for creating an order  
        System.out.println("Order created: " + orderId + ", Product: " + product + ", Quantity: " + quantity);  
    }  
    public void cancelOrder(String orderId) {  
        // Business logic for canceling an order  
        System.out.println("Order canceled: " + orderId);  
    }  
}

Payment Service:

java

public class PaymentService {  
    public void processPayment(String orderId, double amount) {  
        // Business logic for processing payment  
        System.out.println("Payment processed for order " + orderId + ", Amount: " + amount);  
    }  
    public void cancelPayment(String orderId, double amount) {  
        // Business logic for canceling payment  
        System.out.println("Payment canceled for order " + orderId + ", Amount: " + amount);  
    }  
}

Shipping Service:

java

public class ShippingService {  
    public void shipOrder(String orderId) {  
        // Business logic for shipping an order  
        System.out.println("Order shipped: " + orderId);  
    }  
    public void cancelShipping(String orderId) {  
        // Business logic for canceling shipping  
        System.out.println("Shipping canceled for order " + orderId);  
    }  
}

OrderOrchestrator:

java

import java.util.UUID;  
public class OrderOrchestrator {  
    private OrderService orderService;  
    private PaymentService paymentService;  
    private ShippingService shippingService;  
    public OrderOrchestrator(OrderService orderService, PaymentService paymentService, ShippingService shippingService) {  
        this.orderService = orderService;  
        this.paymentService = paymentService;  
        this.shippingService = shippingService;  
    }  
    public void processPurchase(String product, int quantity, double amount) {  
        // Generate unique order ID  
        String orderId = UUID.randomUUID().toString();  
        // Step 1: Create Order  
        orderService.createOrder(orderId, product, quantity);  
        // Step 2: Process Payment  
        paymentService.processPayment(orderId, amount);  
        // Step 3: Ship Order  
        shippingService.shipOrder(orderId);  
        // If any step fails, initiate compensating transactions  
    }  
    public void cancelPurchase(String orderId) {  
        // Compensating transactions in reverse order  
        shippingService.cancelShipping(orderId);  
        paymentService.cancelPayment(orderId, getAmountForOrder(orderId));  
        orderService.cancelOrder(orderId);  
    }  
    // In a real-world scenario, you might need to retrieve the order amount from a data store  
    private double getAmountForOrder(String orderId) {  
        // Simulated method, replace with actual logic to retrieve order amount  
        return 200.0;  
    }  
}

Main Application:

java

public class MainApplication {  
    public static void main(String[] args) {  
        OrderService orderService = new OrderService();  
        PaymentService paymentService = new PaymentService();  
        ShippingService shippingService = new ShippingService();  
        OrderOrchestrator orderOrchestrator = new OrderOrchestrator(orderService, paymentService, shippingService);  
        // Simulate a successful purchase  
        orderOrchestrator.processPurchase("Laptop", 2, 2000.0);  
        // Simulate a failure during purchase  
        String failedOrderId = UUID.randomUUID().toString();  
        orderOrchestrator.processPurchase("Camera", 1, 500.0 / 0); // Division by zero to trigger exception  
        // Simulate canceling a purchase  
        orderOrchestrator.cancelPurchase(failedOrderId);  
    }  
}

Summary - Orchestration-based Saga

In this example, the “OrderOrchestrator " class acts as the central coordinator. It defines the steps to process a purchase and explicitly calls the corresponding methods in the individual services (”orderService “, “paymentService “, “shippingService “). If an exception occurs during the purchase process, the orchestrator catches it and initiates compensating transactions by calling the “cancelPurchase” method, which reverses the changes made by the previous steps.

Conclusion

We delved into distributed systems, microservices architecture, and design patterns. We explored the fundamental concepts of microservices and their intricacies, discussing communication patterns, data management, and fault tolerance. The Saga Pattern emerged as a focal point, elucidating its utility in managing distributed transactions within microservices.

We provided practical Java implementations of both choreography-based and orchestration-based sagas, using examples from an online booking system. The concept of eventual consistency took centre stage, shedding light on its role in distributed systems, CAP theorem considerations, and conflict resolution strategies.

An illustrative step-by-step example showcased eventual consistency in action, emphasising the asynchronous nature of updates and the eventual convergence of replicas to a consistent state. Compensating transactions, crucial in maintaining system consistency, were detailed, covering idempotency, logging, and sequential execution.

The CAP theorem was briefly introduced, underlining the trade-offs between consistency, availability, and partition tolerance in distributed systems. Throughout the discussion, examples and practical insights were provided to enhance understanding and highlight considerations for real-world implementations. The conversation concluded by inviting further exploration or specific queries on these intricate topics.

Definition of a Saga:

Saga Components:

Types of Sagas:

Compensating Transactions:

What Is a Compensating Transaction?

Handling Failures:

Durability and Idempotency:

Eventual Consistency:

Eventual consistency in a bit more detail:

Consistency Models:

CAP Theorem:

Asynchronous Replication:

Conflict Resolution:

Use Cases:

Trade-Offs:

Concurrency and Parallelism:

Challenges and Considerations:

Implementation Patterns:

Example:

A Step-by-Step Example Of Eventual Consistency

Monitoring and Logging:

Some Open-Source Projects are implementing the Saga Pattern.

Axon Framework:

Eventuate Tram:

Spring Cloud Sleuth:

An Example Of A Choreography-based Saga

Order Service:

Payment Service:

Shipping Service:

Event Publisher:

Events:

Main Application:

Summary - Choreography-based Saga

An Example of an Orchestration-based Saga

Order Service:

Payment Service:

Shipping Service:

OrderOrchestrator:

Main Application:

Summary - Orchestration-based Saga

Conclusion

Related

What are the fundamental principles of CQRS?

What is the Transactional Outbox Pattern?

What is the idea behind all dispatch patterns, and Why should I use them?