2021-02-21

Can we contain mutable data in imperative languages?

Using immutable data confers all sorts of advantages, in practice, the main ones are:

It removes nasty "action at a distance" bugs.
It opens the door for easy comparison of values.
It can make concurrent programming easier.

This post briefly surveys the territory, then proposes a small syntactic addition to Javascript that would help constrain mutability, while having a clear transition path from existing code (no funky FP stuff). The examples are in Javascript, but the same concept could be ported to Python and other high level languages with mutable data.

If you're already familiar with immutable libraries in Javascript, feel free to jump to the proposal .

What do we have now?

The most well known immutable library gives us a smorgasbord of new data structures, using them in practice can be a bit clunky:

const map1 = Map({a: 1, b: 2, c: 3})
const map2 = map1.set('b', 50)

Once you start updating large data nested structures, more gnarliness ensues. In introducing a non-native data-type we lose a number of syntactic/typing niceties - it seems that for most people, using new exotic data-types is not worth this hassle.

A more recent library that aims to solve these problems is immer , immer is really neat as it sidesteps the need for new data-types. I even wrote a half baked Python port of it .

The example above becomes:

const map1 = { a: 1, b: 2, c: 3 }
const map2 = produce(map1, draft => {
    draft.b = 50
})

Immer is a really cool idea and is designed well to work with contemporary frontend patterns, but suffers from two flaws:

draft is a mega magicy proxy object, its website lists a number of pitfalls resulting from this.
Immer can only go so far with structural sharing of data.

Some detail on that second point.

With a library that deals in built-in data types, it's only possible to structurally share data that's passed around by reference (basically objects and arrays, not strings or numbers). If we do:

const foo = { some massive nested object }
const array1 = [foo]
const array2 = produce(array1, draft => {
    array1.push(42)
})

Then array2[0] is foo , we didn't have to duplicate anything in memory. Under the hood, array2 is: some reference to foo , and the atomic value 42 .

However, if we did:

const array1 = [1, 2, 3, etc, 9999999]
const array2 = produce(array1, draft => {
    array1.push(42)
})

Then under the hood array2 is not a reference to array2 with 42 tacked on the end, it is a whole new array [1, 2, 3, etc, 9999999, 42] .

With the kind of data used in most frontend applications, this is not so much of a problem. If however we were writing a library that did involve mucking around with rather long arrays, this is a big problem vis-a-vis memory usage.

As a thought experiment, let's imagine what would happen if we baked the immer concept into the language...

Syntax proposal

The proposed syntax consists of one new keyword, alter :

const map1 = { a: 1, b: 2, c: 3 }
const map2 = alter (map1) {
    map1.b = 50
}

The rules

Everything we do to mutate map1 within the alter block applies only within the lexical scope of that block.
The block evaluates to the final value of map1 .

The first rule is important, let's imagine we've added a --disallow-mutating flag to node and attempted to run the following:

function nastyMutator(m){
    m.b += 1
}
const map1 = { a: 1, b: 2, c: 3 }
const map2 = alter (map1) {
    nastyMutator(map1)
}

The interpreter would raise an error for the second line at parse time.

Note that we're still allowing reassignment, so the following is permissable:

let map1 = { a: 1, b: 2, c: 3 }
map1 = alter (map1) {
    map1.b = 50
}

Other syntax

The block can just be the next single statement, as with if and for :

map1 = alter (map1) map1.b = 50

Standard unpacking syntax would enable doing many things at once:

[c, d] = alter ([a, b]) {
    c.foo = 1
    b.bar = 2
}

What do we win?

We solve the problems we had with using specialist data types and with immer:

We can stick to boring objects and arrays and keep all our typing.
Within the alter block, the variable we specified at the beginning is just a plain 'ol value - no pitfalls resulting from proxy magic.
A clever interpreter would be able to do lots of structural sharing, so the following would be memory efficient:
```
const array1 = [1, 2, 3, etc, 9999999]
const array2 = alter (array1) {
    array1.push(42)
}
```

Now let's think long term, let's imagine uptake is high - there's no need to install any libraries and converting existing code is easy, so why not - what happens then?

We can reach the stage where all the mutating in our codebase is contained within alter blocks - we can check this by running with our --disallow-mutating flag from earlier. (Maybe we add a mutate keyword for specific circumstances so we can do eg: mutate this.linepos += 1 ).

Now all our data outside of alter blocks is immutable, we can:

Reason more easily about the code - no mutation at a distance.
Do efficient comparisons on stuff, this opens the door for objects in sets and as keys in maps.
The interpreter can now do various performance tricks.

Application to global-state-styley frontends

As the syntax is lifted straight from immer, we can trivially update their reducer pattern example :

const byId = (state, action) => alter (state) {
    switch (action.type) {
        case RECEIVE_PRODUCTS:
            action.products.forEach(product => {
                state[product.id] = product
            })
    }
}

Python equivalent

The Python version would look the same, just with Python's block syntax:

const array1 = [1, 2, 3, etc, 9999999]
const array2 = alter array1:
    array1.push(42)

Aside on semantics of updating deeply nested values in FP languages.

Clojure has the concept of transients that feel somewhat similar to alter , I should play around with these more.

Haskellers (often it seems) use the lens library for making deep changes to objects. I've seen the Python equivalent used and the resulting code has always been reverted back to a "native" style at some point as it can be tricksy to read and doesn't play well with modern typed Python. An open question for me is:

Is there a pure-FP-ish way to doing deep updates that's always as "natural" as the classic mutational way?

Here's a contrived example of something that to me "feels natural" in an imperative style, but it could just be my unfamiliarity with FP stuff.

const stateAfter = alter (state) {
    const unflagged = []
    for (const message of state.messages){
        if (message.flagged){
            state.flaggedMessages.push(message)
            state.totalCount -= 1
            delete state.visibleUserIds[message.userId]
        }
        else unflagged.push(message)
    }
    state.messages = unflagged
}