Internet Engineering
Spring 2024

Introduction to


Parham Alvani



  • NATS is a high-performance messaging system created by Derek Collison in 2010 written in Ruby.
  • It was originally built to serve as the message bus for Cloud Foundry
  • Handling internal communication among components of the system

It Addresses

  • Service Discovery
  • Low latency communication
  • Load balancing
  • Notification and events handling


  1. Client publishes message on foo subject
  2. Only clients subscribed to foo receive the message

How Simple!

INFO {"server_id":"NCNS32OEKOMMHTBVUYC7QWASFAP3BPEKGZG3QDIYRCBUL23WE3OVVBBO","server_name":"NCNS32OEKOMMHTBVUYC7QWASFAP3BPEKGZG3QDIYRCBUL23WE3OVVBBO","version":"2.1.7","proto":1,"git_commit":"bf0930e","go":"go1.13.10","host":"","port":4222,"max_payload":1048576,"client_id":1,"client_ip":""}

SUB greetings 1

PUB greetings 12
Hello World!

MSG greetings 1 12
Hello World!


  1. Client connected to NATS publishes a request on a help subject
  2. Worker client subscribed to help, processes requests, then sends response
  3. Audit client is subscribed to all subjects via a wildcard, so also receives message but does not reply
  4. Publisher receieves the response from the worker client

Request/Response (cont'd)

  • A publisher sends a message on a particular subject by attaching a reply subject name along with it.
  • Then, the subscribers who receive this message send the response to the reply subject that was specified in the request message.
  • Reply subjects have a unique name called inbox and the NATS server will send the message comming into these subjects to the relevant publisher.


  1. Publisher sends multiple help requests on a help subject
  2. Multiple worker clients subscribed to the help subject from a distributed queue
  3. Each published request is randomly balanced to only one of the worker clients in the distributed queue

NATS As an Always Available Dial Tone

  • The main design constraints that define the style of the NATS project are:
    • Simplicity
    • Performance
    • Reliability
  • It does not offer any Persistence or Buffering
  • It is true Fire and Forget


  • NATS will try to protect itself at all costs to be available for all users
  • NATS client libraries, internally try to have an always established connection to one of the available NATS servers
  • In case a server fails, NATS will reconnect to another available server in the pool
  • NATS supports high-availablity via a clustering mode that is set up as a full-mesh of the servers

Slow Consumer

  • By default, if a client fails to drain the pending data that the server is holding for the client for over two seconds, the server will disconnect the client
  • Read more about it here


  • There is a PING/PONG interval happing that the client has to follow otherwise the server will reset the connection in case there are many PONG replies missing.
  • Server sends PING messages to client in each interval and client has to respond
  • Client also can send PING messages to the server

Subject Names

  • The dot character (.) has special meaning as part of the subject.
  • When using it as part of the subject, we can create namespaces (hierarchy) that can be later matched via wildcards.
foo.* ✅, ✅
foo.*foo.hello ✅, ❌
foo.> ✅, ✅, foo.hello ✅
  • *: matches a single token
  • >: matches one or more tokens in the subject, and it can only appear at the end

Queue Subscribtions: Hands-on

    Worker 1 👷

    INFO {"server_id":"NARXYLU3XEZSFYCFRV5WJ63GQS5WDV3WC4WAKCYDYY2IBGIETWJADZVS","server_name":"NARXYLU3XEZSFYCFRV5WJ63GQS5WDV3WC4WAKCYDYY2IBGIETWJADZVS","version":"2.1.8","proto":1,"git_commit":"c0b574f","go":"go1.14.8","host":"","port":4222,"max_payload":1048576,"client_id":4,"client_ip":""}
    SUB requests workers 5
    MSG requests 5 6

    Worker 2 👷

    INFO {"server_id":"NARXYLU3XEZSFYCFRV5WJ63GQS5WDV3WC4WAKCYDYY2IBGIETWJADZVS","server_name":"NARXYLU3XEZSFYCFRV5WJ63GQS5WDV3WC4WAKCYDYY2IBGIETWJADZVS","version":"2.1.8","proto":1,"git_commit":"c0b574f","go":"go1.14.8","host":"","port":4222,"max_payload":1048576,"client_id":5,"client_ip":""}
    SUB requests workers 55
    MSG requests 55 5

    Publisher ✍️

    INFO {"server_id":"NARXYLU3XEZSFYCFRV5WJ63GQS5WDV3WC4WAKCYDYY2IBGIETWJADZVS","server_name":"NARXYLU3XEZSFYCFRV5WJ63GQS5WDV3WC4WAKCYDYY2IBGIETWJADZVS","version":"2.1.8","proto":1,"git_commit":"c0b574f","go":"go1.14.8","host":"","port":4222,"max_payload":1048576,"client_id":6,"client_ip":""}
    PUB requests 5
    PUB requests 6

    Even More Features!

    • Jetstream 🚀
    • Clustering ✨
    • Super-Cluster with Gateways 🗺️


    • JetStream is a NATS built-in distributed persistence system.
    • JetStream is built-in to nats-server and you only need 1 (or 3 or 5 if you want fault-tolerance against 1 or 2 simultaneous NATS server failures) of your NATS server(s) to be JetStream enabled for it to be available to all the client applications.

    Functionalities enabled by JetStream

    • Streaming: temporal decoupling between the publishers and subscribers
    • Replay policies
    • Retention policies and limits
    • Persistent distributed storage
    • Stream replication factor
    • Mirroring between streams
    • De-coupled flow control
    • Exactly once semantics


    temporal decoupling between the publishers and subscribers

    Subscribers only receive the messages that are published when they are actively connected to the messaging system (i.e. they do not receive messages that are published while they are not subscribing or not running or disconnected).

    However, nowadays a new way to provide this temporal de-coupling has been devised and has become 'mainstream': streaming.

    Streams capture and store messages published on one (or more) subject and allow client applications to create subscribers (i.e. JetStream consumers) at any time to replay (or consume) all or some of the messages stored in the stream

    Replay policies

    JetStream consumers support multiple replay policies.

    • all of the messages currently stored in the stream, meaning a complete replay and you can select the replay policy(i.e. the speed of the replay) to be either:
    • instant
      the messages are delivered to the consumer as fast as it can take them.
      the messages are delivered to the consumer at the rate they were published into the stream, which can be very useful for example for staging production traffic.
    • the last message stored in the stream, or the last message for each subject (as streams can capture more than on subject).
    • starting from a specific sequence number.
    • starting from a specific start time.

    Retention policies and limits

    Practically speaking, streams cannot always just keep growing forever and therefore JetStream support multiple retention policies as well as the ability to impose size limits on streams.


    You can impose the following limits on a stream

    • Maximum message age
    • Maximum total stream size (in bytes)
    • Maximum number of messages in the stream
    • Maximum individual message size
    • You can also set limits on the number of consumers that can be defined for the stream at any given point in time.

    You must also select a discard policy which specifies what should happen once the stream has reached one of its limits and a new message is published

    • discard old means that the stream will automatically delete the oldest message in the stream to make room for the new messages.
    • discard new means that new message is discarded (and the JetStream publish call retuns an error indicating that a limit was reached)

    Retention policy

    You can choose what kind of retention you want for each stream

    • limits (the default)
    • interest (messages are kept in the stream for as long as there are consumers that haven't delivered the message yet).
    • work queue (the stream is used as a shared queue and messages are removed from it as they consumed)

    Regardless of the retention policy selected, the limits (and the discard policy) always apply.

    Persistent distributed storage

    You can choose the durability as well as the resilience of the message store according to your needs.

    • Memory storage
    • File storage
    • Replication (1 (none), 2, 3) between nats servers for Fault Tolerance

    JetStream uses a NATS optimized RAFT distributed quorom algorithm to distribute the persistence service between nats servers in a cluster while maintaining immediate consistency even in the face of Byzantine failures.

    In JetStream the configuration for storing messages is defined separately from how they are consumed.

    Storage is defined in a Stream and consuming messages is defined by multiple Consumers.

    Stream replication factor

    A stream's replication factor (R, often referred to as the number Replicas) determines how many places it is stored allowing you to tune to balance risk with resource usage and performance.

    A stream that is easily rebuilt or temporary might be memory based with a R=1 and a stream that can tolerate some downtime might be file based with R=1.

    1Cannot operate during an outage of the server servicing the stream. Highly performant.
    2No significant benefit at this time. We recommend using Replicas=3 instead.
    3Can tolerate loss of one server servicing the stream. An ideal balance between risk and performance.
    4No significant benefit over Replicas=3 except marginally in a 5 node cluster.
    5Can tolerate simultaneous loss of two servers servicing the stream. Mitigates risk at the expense of performance.

    Mirroring between streams

    JetStream also allows server administrators to easily mirror streams, for example between different JetStream domains in order to offer disaster recovery.

    You can also define a stream as one of the sources for another stream.

    De-coupled flow control

    JetStream provides de-coupled flow control over streams, the flow control is not 'end to end' where the publisher(s) are limited to publish no faster than the slowest of all the consumers (i.e. the lowest common denominator) can receive, but is instead happening individually between each client application (publishers or consumers) and the nats server.

    Exactly once semantics

    Because publications to streams using the JetStream publish calls are acknowledged by the server, the base quality of service offered by streams is at least once.

    Meaning that while reliable and normally duplicate free there are some specific failure scenarios:

    • That could result in a publishing application believing (wrongly) that a message was not published successfully and therefore publishing it again.
    • That could result in a client application's consumption acknowledgement getting lost and therefore the message being re-sent to the consumer by the server.

    Therefore, JetStream also offers an exactly once.

    • For the publishing side it relies on the publishing application attaching a unique message or publication id in a message header and on the server keeping track of those ids for a configurable rolling period of time in order to detect the publisher publishing the smae message twice.
    • For the subscribers a double acknowledgement mechanisim is used to avoid a message being erroneously re-sent to a subscriber by the server after some kinds of failures


    JetStream consumers are views on a stream, they are subscribed to (or pulled) by client applications to receive copies of (or to consume if the stream is set as a working queue) message stored in the strem.

    Fast push consumers

    Client applications can choose to use fast un-acknowledged push (ordered) consumers to receive messages as fast as possible (for the selected replay policy) on a specific delivery subject or to an inbox.

    Those consumers are meant to be used to replay rather than consume the messages in a stream.

    Horizentally scalable pull consumers with batching

    Client applications can also use and share pull consumers that are demand-driven, support batching and must explicitly acknowledge message reception and processing which means that they can be used to consume (ie.e use the stream as a distributed queue) as well as process the messages in a stream.

    Pull consumers can and are meant to be shared between applications (just like queue groups) in order to provide easy and transparent horizental scalability of the processing or consumption of messages in a stream without having (for example) to worry about having to define partitions or worry about fault-tolerance.

    References 📚

    Fork me on GitHub