JustSaying and Reliability

We’ve recently adopted a message-based architecture for a big chunk of our platform here at JUST EAT. In particular, we use one-way messaging to communicate between our Autonomous Components (ACs). One of the main promises of this style of architecture is ‘Reliability’. Imagine a scenario where a customer is ordering their takeaway on JUST EAT website, and as soon as we charge their credit card, we encounter a deadlock in the database which results in a fatal error. Whoops. So what happens to the customer’s dinner then? In this blog post, I’ll cover what measures we have in place to sort out scenarios like these. We use a custom-built, light-weight, and now open source message bus called JustSaying which uses Amazon Simple Queue Service (SQS) as transport and Amazon Simple Notification Service (SNS) as publishing service. Like all credible message buses, JustSaying promises reliable messaging. Here’s a definition of what I mean by reliable messaging… ‘Given an accurate registration of publishers and consumers, every published message is guaranteed to be delivered to all intended consumers.’ This is possible since we use SQS as transport which is a reliable transport. While SQS guarantees a reliable transport, you’re not protected against unreliability in your consumer’s logic and your application code. JustSaying ensures reliability against both transient errors eg database deadlocks and permanent errors eg. NullReferenceException due to missing data in consumers’ logic in different ways. If the nature of error is transient then the correct course of action is to retry the operation hoping the issue is resolved.

Retry Policy

JustSaying takes care of retrying your messages for you out of the box and by default. If your consumer throws an exception for any reason, the message will be redelivered to your consumer upto five times. Of course the number of time your messages will be retried is configurable at the time of registration of your consumers.

CreateMeABus
.InRegion(RegionEndpoint.EUWest1.SystemName)
.WithSqsTopicSubscriber()
.IntoQueue("CustomerOrders")
.ConfigureSubscriptionWith(
cnf => cnf.RetryCountBeforeSendingToErrorQueue = 3)
.WithMessageHandler(new OrderNotifier())
.StartListening();

Error queue

For those errors that are not transient eg. application bugs resulting in NullReferenceException, no amount of retries is going to solve the problem. Instead JustSaying moves the unhandled messages to an error queue (dead queue). Once the problem is resolved, you can move messages from error queue back into the incoming queue and they will be processed by your consumers. JustSaying uses the underlying Redrive Policy provided by SQS to implement error queues. Error queues are generated at the time of declaration of the consumer and the convention for their name is <queue_name>_error. Yo can move messages from error queue into your incoming queue from the command line using JustSaying powertool which is available on Nuget from http://www.nuget.org/packages/JustSaying.Tools/

cd RetryPolicy\bin\Debug
JustSaying.Tools.exe move -from "customerorders_error" -to "customerorders" -count "2"

If you decide not to have an error queue you can opt out explicitly while registering consumers.

CreateMeABus
.InRegion(RegionEndpoint.EUWest1.SystemName)
.WithSqsTopicSubscriber()
.IntoQueue("CustomerOrders")
.ConfigureSubscriptionWith(cnf =>
{
   cnf.VisibilityTimeoutSeconds = 0;
   cnf.ErrorQueueOptOut = true;
})

Downloads

Source code demonstrating how to configure retry policy in JustSaying is available on GitHub here: https://github.com/payman81/JustSaying.Samples