Making of Muster

The last two years (or so close to two years that it makes no difference) I spent working with one of my colleagues on a streaming data processing library for modern frontends called Muster. In this post I’d like to write all the things I’ve learned when building a large-scale library in TypeScript. Before I get started on that I should probably write a few words about what is Muster.

What is Muster

Muster is a JavaScript (written in TypeScript) library for front-end and back-end applications, which stores your data and logic in a graph-like-structure, and connects these nodes using reactive streams (You can learn more about reactive programming at http://reactivex.io/).

This was the core idea behind the project since its beginning, but the actual implementation of that idea changed drastically over the course of the project.

Early days

The initial version of the library started as a spin-off of Falcor, and used their pathing model. Additionally, we decided to base the core of the library on RxJS, and that turned out to be a golden solution (up to a certain point, but more on that later).

First versions of the library allowed users to store data in paths inside RxJS BehaviourSubjects, expose external data as streams, and process the data using simple RxJS operators. The API of the library was very primitive, and was exposing too much of the inner workings to the users, but at least it was making it easy to work with both synchronous and asynchronous data.

The problems with that model began when an early version of Muster was put to use in a large-scale application using React Native. We noticed large spikes in performance of the apps when loading/un-loading screens with lots of data on them, and the “easy to use” API turned out to be unreadable given a path of a certain length - reading such long paths became an exercise of “spot a difference between these two lines of code”. Moreover, the use of Muster required the knowledge of inner-workings of the library, and how RxJS streams work. This produced a fair amount of bugs and developer frustration.

The advent of The Node

The problems we encountered when using Muster convinced us that using Falcor is not good enough, and after a bit of brainstorming we came up with an idea for a new version of the library, where the logic and data is represented as nodes in the graph. There were a few benefits to using nodes for this purpose:

Node is a self-contained piece of logic
Easily testable
Can use other nodes, without caring about their implementation, only about their external API
Easy to add new nodes
Hides the inner workings of the library behind an easy-to-use API

This idea turned out to be so good that a year and a half since we’re still using it. The path evolved from as simple string into an array of keys, which can be of any type (see ref node).

With the refactor into Muster 5 (the first one that introduced the concept of nodes) we also added a ton of unit tests for each node.

Historically I used to write my unit tests in a way that maximized the code coverage, which involved tailoring the tests to the specifics of the implementation. My colleague on the other hand was a believer of API-driven tests, which he convinced me to adopt (and I’m so glad he did!). We ended up sprinkling every node with a huge number of unit tests, which came incredibly useful when doing a refactor from Muster 5 to Muster 6.

The current version of Muster Node allowed:

getting the value
setting the value
calling the node
getting node children
getting items (for collections)

This list might remind you of a concept used in Muster 6 - node operation - but at the time it wasn’t distilled yet, and all of the mentioned node operations were an ad-hoc features added to nodes as we were building different types of nodes.

When making Muster 5 we also run into a very particular problem:

Each “node operation” was written as a pure function, expected to always return a “value” depending on the received inputs. The “value” returned from that function could be any Muster node - for example:

computed([ref('dep1')], (dep1) => {
  return tree({ dep1 });
})

The example above is a very contrived one, but it demonstrates a problem with returning generic nodes from operations - how could Muster tell if a returned value has changed?

Simple comparison of objects tree({ a: 'b' }) === tree({ a: 'b' }) would return false, as the references of these objects are different, and doing a deep-comparison is a bit costly, if done many times in a single update loop.

The way Muster 5 solved this is with the use of memoize from Lodash - each node factory was memoizing node instances given a particular set of arguments. This would ensure that doing tree({ a: 'b' }) === tree({ a: 'b' }) would return true, but at the cost remembering all instances of noes created during Muster runtime.

This intended memory leak turned out to be a major problem once one of the teams using a previous version of Muster decided to migrate to use Muster 5 - using the application for more than 15 minutes resulted in a crash due to app running out of RAM on an iPhone 🤦‍♂️ We had to do something about it.

Muster 6

The solution we came up with was to replace memoize with a predictable hashing of nodes based on the shape and type of their data. A simple proof of concept later we were confident, that this idea would work very nicely, by retaining relatively fast performance of comparing nodes (the node hash is just a simple string), as well as getting rid of that pesky memory leak. This means you could now easily create two instances of a node and be able to tell if they’re the same: tree({ a: 'b' }) === tree({ a: 'b' }) // false tree({ a: 'b' }).id === tree({ a: 'b' }).id // true

Additionally, we decided to introduce a concept of graph operations (which you can find in the current version of Muster). This required a massive refactor, and required us to re-write EVERY SINGLE MUSTER NODE. Normally this would mean throwing away all unit tests, but we had a crazy idea:

Delete every node implementation file, but leave the unit tests.

The decision to do API-driven tests was such a good one that we ended up using it as a reference to how a node should work, and a few months later we had a working version of Muster 6. Only a few tests had to be changed, as we’ve discovered some bugs in the original implementation.

The final step in Making Muster was to write an exhaustive documentation for the library. Coming from the .NET background I learned to appreciate Microsoft’s way of documenting their code, which involves a verbose description of classes and methods, and a large number of code examples. Due to the breadth of the library it took quite a while to finish the documentation, but I’m happy with the result we achieved.

Future of Muster

I’m happy with how the current version of the library turned out. It started as a simple library for data processing, but ended up something akin to a programming language.

However, I would not say the library is finished. There are a few pain points I’d like to address in a future versions of it, some of which include:

Make Muster into a language of its own, with its own syntax as larger graphs written in Muster using JavaScript get hard to read
Improve the performance
Add integration with Vue and Angular
Make it easier to do tree shaking on Muster codebase
Implement a native Muster interpreter, which could be used in native Android, iOS, MacOS and Windows apps.

Final thoughts

Muster was probably the most technically challenging project I had pleasure working on. I gained a massive appreciation for API-driven development, and API-driven unit tests. I learned how to do large-scale refactors by retaining unit tests, and only replacing the implementation. I also realised that there’s no benefit of prematurely optimising the code for example by writing a for loop instead of a .map() or .forEach(). It only decreases the readability of the code for a minuscule performance gain. Finally, coming from a strictly OOP background I learned to love functional programming and the beauty of functional code.