Serializers Revisited

2021-06-08 EDIT: Added serializer implementation using only the actor message-event queue.

One of the earliest concepts explored on this blog was the Serializer, which is a mechanism for providing exclusive access to a group of actors [1]. More recently, Carl Hewitt has referred to an extended version of this mechanism (with holes) as “Swiss Cheese”. Here we present a simple version that serializes requests on a first-come first-served basis. Serializers are the key to composability for actors, because they give us a way to reproduce the “region of mutual exclusion” semantics of a single actor’s behavior when coordinating the activity of several actors.

Shared Mutable State

In Humus all values are immutable. The only mutation that exists is actor behavior replacement (with BECOME). Variables may be bound to values, but once bound they will always designate the same value (like static single assignment). If multiple actors need to “share” access to mutable state, what they actually share is access to a common actor that manages that state. A simple, but dangerous, example is a storage cell.

LET cell_beh(value) = \(cust, req).[
    CASE req OF
    (#read) : [ SEND value TO cust ]
    (#write, value') : [
        BECOME cell_beh(value')
        SEND SELF TO cust
    ]
    (#CAS, old, new) : [
        IF $old = $value [ BECOME cell_beh(new) ]
        SEND value TO cust
    ]
    END
]
CREATE cell WITH cell_beh(0)

When a cell receives a #read message, it sends its stored value to the customer cust.

When a cell receives a #write message, it updates its stored value to the provided value’, and sends itself to the customer cust as a sync signal (indicating completion of the operation).

The #read and #write operations are clearly simple, and sufficient to manage the stored value. However, they re-introduce all the potential hazards of shared mutable state in concurrent programs. As an illustration of a safer access protocol, we also implement an atomic compare-and-swap (CAS) operation.

When a cell receives a #CAS message, it compares the stored value with the old value in the message. If they are the same, the stored value is updated to the new value. The original stored value is sent to the customer cust in any case. The CAS operation is central to many lock-free and wait-free algorithms [2].

Counter Service

Now let’s introduce a simple “counter” service that makes use of our cell actor to maintain the count (we will only make use of the #read and #write operations). Each request to the counter service includes a customer cust and amount. The counter sends the updated count to the customer after adding amount.

LET counter_svc_beh(cell) = \(cust, amount).[
    SEND (k_count, #read) TO cell
    CREATE k_count WITH \count.[
        LET count' = $add(count, amount)
        SEND (k_sync, #write, count') TO cell
        CREATE k_sync WITH \$cell.[
            SEND count' TO cust
        ]
    ]
]

Notice how the counter performs a typical (non-atomic) read-modify-write cycle on the cell. The counter sends a #read request to the cell, with the temporary actor k_count as the customer. When k_count gets the count from the cell, it calculates the new value, and sends it to the cell in a #write request. When k_sync gets the sync signal indicating that the #write completed, it sends the updated count’ to the original customer cust.

Unfortunately, this implementation contains a data race that can lead to incorrect results, as illustrated by the following example:

CREATE cell WITH cell_beh(0)
CREATE counter_svc WITH counter_svc_beh(cell)  # WARNING: UNSAFE!
SEND (println, 1) TO counter_svc
SEND (println, 10) TO counter_svc
SEND (println, 100) TO counter_svc

The read-modify-write cycles of these three requests to the counter service are very likely to overlap, since the counter can begin processing another request as soon as it sends the #read request to the cell (before the update and #write have completed). One way to fix this problem is to serialize requests to the counter service.

The astute reader may have noticed that the counter service could be safely implemented by holding the count internally and operating on it atomically. If fact, this is the design we recommend. However, read-modify-write sequences among services that “share” mutable state (like our cell) are common, motivating the use of a serializer.

Pure-Functional Queue

The pairing operator in Humus (“,“) makes it easy to model “lists” that add/remove elements at the beginning of the list. However, that naturally leads to last-in first-out (LIFO) stack structures. What we need for serializers are first-in first-out (FIFO) queue structures. The following code defines a pure-functional queue (no mutation) with amortized O(1) add/remove operations and minimal data copying.

LET q-empty = (NIL, NIL)
LET q-norm(p, q) = (
    LET push-pop(s', s) = (
        CASE s OF
        NIL : s'
        (x, xs) : push-pop((x, s'), xs)
        END
    ) IN
    CASE p OF
    NIL : (push-pop(NIL, q), NIL)
    _ : (p, q)
    END
)
LET q-put((p, q), x) = q-norm(p, (x, q))
LET q-take(p, q) = (
    CASE p OF
    NIL : (?)
    (h, t) : (h, q-norm(t, q))
    END
)
LET q-empty?(p, q) = (
    CASE p OF
    NIL : TRUE
    (_, _) : FALSE
    END
)

The strategy behind this implementation was explained in a previous post, so here we will just describe the API provided by this code.

q-empty
The value representing an empty queue (use as an initializer).
q-put(queue, item)
Return the new-queue created by adding item to the end of queue.
q-take(queue)
Return (item, new-queue) where new-queue is queue after item is removed from the beginning; or ? if the queue is empty.
q-empty?(queue)
Return TRUE if queue is empty; otherwise FALSE.

There is no mutation involved in any of these operations. New queue values are always computed without changing the old values.

Service Serializer

Now that we have a few building-blocks defined, we can describe the service serializer itself. The serializer is a proxy for the service it protects. Conceptually, it is a state-machine that alternates between “free” and “busy” based on the availability of the underlying service. Initially, of course, the serializer is in the “free” state.

LET tag_beh(cust) = \msg.[
    SEND (SELF, msg) TO cust
]
LET serializer_beh(svc) = \(cust, req).[
    CREATE tag WITH tag_beh(SELF)
    BECOME serializer_busy_beh(svc, cust, tag, q-empty)
    SEND (tag, req) TO svc
]

In the “free” state, the underlying service is available, so the serializer can pass the request directly to the service and move to the “busy” state. Since the serializer needs to know when the request completes (and the service is available again), it substitutes a tagged-proxy for the customer in the request to the service. This proxy redirects the service response back to the serializer in a way that it can uniquely identify (because it created the tag).

LET serializer_busy_beh(svc, cust, tag, pending) = \msg.[
    CASE msg OF
    ($tag, res) : [
        SEND res TO cust
        CASE q-take(pending) OF
        ((cust', req'), rest) : [
            CREATE tag' WITH tag_beh(SELF)
            BECOME serializer_busy_beh(svc, cust', tag', rest)
            SEND (tag', req') TO svc
        ]
        _ : [
            BECOME serializer_beh(svc)
        ]
        END
    ]
    (cust', req') : [
        LET pending' = $q-put(pending, (cust', req'))
        BECOME serializer_busy_beh(svc, cust, tag, pending')
    ]
    END
]

In the “busy” state, the underlying service is not available because it has not yet produced a response to a prior request. If a new request arrives, the customer/request pair is added to the (initially empty) pending queue.

When the serializer receives a tagged response from the service, it immediately forwards the response to the original customer for that request. If the pending queue is empty, there is no more work pending, so the serializer returns to the “free” state. Otherwise, the serializer sends the next request to the service, with a new tagged customer as before, and stays in the “busy” state with an updated (possibly empty) queue.

Lazy Initialization

Access to the counter service (and the cell it depends on) should be protected to insure that all calls go through the serializer. The lazy-initialization pattern provides a simple way to accomplish that.

LET counter_svc_init(value) = \msg.[
    CREATE cell WITH cell_beh(value)
    CREATE svc WITH counter_svc_beh(cell)
    BECOME serializer_beh(svc)
    SEND msg TO SELF
]
CREATE counter_svc WITH counter_svc_init(0)

The publically-visible entry-point for the counter service is actually a lazy-initialization bootstrap actor. When it receives the first service request message, it performs all the initialization steps required to prepare the service for handling requests. In this case, creating both the actual service actor, and the underlying “shared” cell. It then transforms itself into the service serializer, proxying for the service, and re-sends the original request to itself. This message will be received (again) by the transformed entry-point, which is now the service serializer. In this way, the cell and svc actors are safely kept private by the serializer itself.

Counter Service Alternatives

The counter service implementation shown above follows an idiomatic pattern for sequencing actor interactions. In each step of the interaction-sequence, a temporary customer actor is created, representing a continuation of the process. If we can guarantee that only one request-sequence will be in process at a time (as enforced by the serializer), we can use BECOME to transform the service itself into the continuation representing each step of the process.

#
# Alternative `counter_svc_beh` using BECOME
#
LET counter_svc_beh(cell) = \(cust, amount).[
    SEND (SELF, #read) TO cell
    BECOME \count.[
        LET count' = $add(count, amount)
        SEND (SELF, #write, count') TO cell
        BECOME \$cell.[
            SEND count' TO cust
            BECOME counter_svc_beh(cell)
        ]
    ]
]

Instead of creating a new customer for each step, the service provides itself as the customer, and BECOMEs the handler for the next response. Information is accumulated in the nested lexical scope, which gives each behavior access to variables bound by previous steps.

Both the temporary-customer and become-continuation techniques involve nested lexical scope, which leads to correspondingly nested indentation of the continuation code. Fortunately, in practice, the indentation level remains reasonable for most well-factored actor code. I’ve rarely seen more than 5 levels of depth. However, it is also possible to separate the continuations into explicit non-nested behaviors.

#
# Alternative `counter_svc_beh` using multiple-behavior state-machine
#
LET counter_svc_beh(cell) = \(cust, amount).[
    SEND (SELF, #read) TO cell
    BECOME counter_k_count(cust, cell, amount)
]
LET counter_k_count(cust, cell, amount) = \count.[
    LET count' = $add(count, amount)
    SEND (SELF, #write, count') TO cell
    BECOME counter_k_sync(cust, cell, count')
]
LET counter_k_sync(cust, cell, count') = \$cell.[
    SEND count' TO cust
    BECOME counter_svc_beh(cell)
]

The obvious advantage to this approach is there can be an arbitrarily complex sequence of steps, without excessive nesting. In fact, the process doesn’t even have to be strictly sequential. It can be an arbitrary state-machine with decision-branches and loops, if needed. The disadvantage is that the state information required in each step, including the information that must be retained for future steps, must be explicitly named and passed to the relevant behavior constructors. The good news is that this kind of optimization/refactoring can be automated, and potentially performed by a compiler.

Using the Message-Event Queue

The service serializer implementation shown above maintains its own private queue of pending requests. While this makes the mechanism explicit, we observe that there is already a pending-request-queue available, the actor message-event queue itself. We can take advantage of this to create an even simpler implementation of the service serializer.

#
# Alternative `serializer_beh` using the actor message-event queue
#
LET serializer_beh(svc) = \(cust, req).[
    CREATE tag WITH tag_beh(SELF)
    BECOME serializer_busy_beh(svc, cust, tag)
    SEND (tag, req) TO svc
]
LET serializer_busy_beh(svc, cust, tag) = \msg.[
    CASE msg OF
    ($tag, res) : [
        SEND res TO cust
        BECOME serializer_beh(svc)
    ]
    _ : [ SEND msg TO SELF ]
    END
]

In the “busy” state, the serializer simply re-enqueues any message that did not come from the tagged-proxy customer. When the serializer receives a tagged response from the service, it forwards the response to the original customer for that request and BECOMEs available to serve another request.

Summary

Serializers are essential for making composable actor systems [3]. They allow us to encapsulate a group of collaborating actors, and treat them as a single higher-level actor. This encapsulation can be completely hidden from clients, because the serializer is indistinguishable from the underlying service it protects. In this article we have explored the simplest form of serializer, which handles requests strictly one-at-a-time. For greater availability, more complex serializer strategies are possible. This idea is explored more deeply in “High Availability for Mutable Shared State“, which also contains the original description of the Banker’s Queue used here.

References

[1]
C. Hewitt and R. Atkinson. Specification and Proof Techniques for SerializersIEEE Transactions on Software Engineering, Vol. SE-5, No. 1, January 1979.
[2]
M. Herlihy. Wait-free synchronizationACM Transactions on Programming Languages and Systems, Vol. 11, No. 1, January 1991.
[3]
M. Abadi and L. Lamport. Composing specificationsACM Transactions on Programming Languages and Systems, 15 (1), 1993.


Tags: , , , , , , , , , , ,
This entry was posted on Monday, August 10th, 2020 at 5:28 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “Serializers Revisited”

  1. David McClain

    Please keep this series going! I am thoroughly enjoying all of it.

    I come from a heavily pragmatic background in Astrophysics and running large instruments (e.g., big Telescopes) with computers. But over the past 50 years I have migrated first from Forth, via C/C++, and finally into Lisp for the past 30 years.

    Over the past 3 years I have been heavily involved in producing an Actors based system in Lisp for my own needs. And it is remarkably useful. But I have not taken it to the level of granularity that you have with your Configurations and Resource managers.

    I am fascinated by the bleeding edge boundaries of programming. I’m itching to break the bounds imposed by Lisp. Kernel looks very interesting, but likely slow as molasses at this time.

    My Lisp Actors currently each have their own message queue, and they are essentially packaged (Behavior) closures that run on a pool of threads as messages arrive (Grand Central Dispatch on Mac, thread pool on Windows). They run until there are no more messages awaiting in their queue, and then the Executive thread is free to start another Actor if any are ready to run.

    By far, I find that the most usable Behavior is a mere FUNCALL, which allows for PERFORM-IN-ACTOR types of function definitions with a designated Actor and a compiled Thunk. This gives impromptu and open-ended behaviors to each of my Actors. Single-thread semantics are enforced in an SMP multicore environment. Actors own their data, and any Callbacks originated by an Actor will once again happen only in that Actor.

    But from reading your posts, I see that even my lightweight Actors are relatively heavy objects compared to your own Actors in Humus. I am eager to try out a bunch of your finer-grained Actors.

Leave a Reply

Your comment