Thinking more about JSON than you would like to
JSON is used as a representation for a lot of data going across programs and systems. And as those systems evolve, we’re forced to think all the time about backwards compatibility of their communication protocols.
The most common case I’ve faced that starts such discussions is communication between a backend web server and the clients (frontend & mobile apps). We need to make changes to the backend that are compatible with the oldest client still in use. And different teams usually arrive at the convention. There’s some sort of agreed-upon specification, in which the rules are as such:
On any value:
it has a specified type (boolean, string, number, array, object)
changing the type is not backwards-compatible
whether it can be “null” or not is specified
changing from nullable to non-null is backwards-compatible
changing from non-null to nullable is not backwards-compatible
On a JSON object:
adding a field is backwards-compatible
removing a non-null field is not backwards-compatible
removing a nullable field is not backwards-compatible
On a JSON array:
every element of an array is of the same type
changing the type is not backwards-compatible
whether the array can contain “null” values is specified
changing from nullable to non-null is backwards-compatible
changing from non-null to nullable is not backwards-compatible
adding or removing elements is backwards-compatible
Even if you haven’t explicitly thought about these guidelines, you’ve probably seen them in practice before, even if they’re not exactly what you’re using right now. And even though these rules are wide-spread, automation around it isn’t nearly as popular. We can find tools and standards around the idea of JSON schemas and API contracts, such as OpenAPI and JSON Schema, which have existed for a while. But the idea that those schemas need to evolve is also far from novel. Confluent’s Schema Registry exists to tackle this problem.
The ideas of JSON backwards-compatibility don’t require running systems to reason about. There’s no sequence of instructions. There are schemas, which dictate what values are allowed and there are backwards-compatibility rules. There’s nothing stopping us from building developer tooling around these ideas.
It should be possible to build a JSON library that has the concept of the schema as a first-class citizen. As you would expect of a JSON library, it should allow reading/write JSON (conforming to the schema). But it should also have the capability to:
read / write the schema onto a text representation
compare different schemas for compatibility
Having the schema also means we’re able to generate documentation or client code automatically.
Imagine the team workflow we can then empower:
Every time you make a new backend release, you dump your JSON schema into a file and check it in onto git.
On the CI for each backend PRs you check that you haven’t broken compatibility with your check-in versions.
Whenever a client version is no longer in use, the backend can now delete the check-in version files that no longer apply.
Bonus points if the client code is auto-generated from the schema and updating means updating a dependency of their program and can do most of work by updating the lines where the Typescript compiler complains.
Fuelled by all of this dreaming I started a while ago working on my own JSON library for Scala. Scala is a good candidate because JSON libraries already usually have a type that’s separate from the data type. Something like `JsonWriter<T>` and `JsonReader<T>` for any `T` you want to convert to / from JSON. So I just had to conceptually upgrade this type to be a `JsonDescriptor<T>`.
I did mention that none of the ideas here are specific to some programming language. You can build all of this in your language of choice. And so I recently ported the Scala library into tenecs (my programming language). You can find the code here. As a showcase of the functionality, here’s two unit tests:
_ := UnitTest(
"canRead fails for object with non-optional new field",
(testkit) => {
hasExpectedIssue := (issues: List<CompatibilityIssue>): Void => {
testkit.assert.equal(issues->length(), 1)
}
newSchema := JsonObject(<JsonObjectField>[
JsonObjectField("field1", JsonBoolean()),
JsonObjectField("field2", JsonBoolean())
])
oldSchema := JsonObject(<JsonObjectField>[
JsonObjectField("field1", JsonBoolean())
])
hasExpectedIssue(newSchema->canRead(oldSchema))
}
)
_ := UnitTest(
"canRead object with missing field as optional",
(testkit) => {
noExpectedIssues := (issues: List<CompatibilityIssue>): Void => {
testkit.assert.equal(issues, <CompatibilityIssue>[])
}
newSchema := JsonObject(<JsonObjectField>[
JsonObjectField("field1", JsonBoolean()),
JsonObjectField("field2", JsonOneOf(<JsonSchema>[JsonBoolean(), JsonNull()]))
])
oldSchema := JsonObject(<JsonObjectField>[
JsonObjectField("field1", JsonBoolean())
])
noExpectedIssues(newSchema->canRead(oldSchema))
}
)
The library also supports enums (specific string values), which have their own compatibility rules with strings. What I have on my radar to dive into at some point:
Generating documentation
Supporting regex
Generating client code
But that’s enough of thinking about JSON for today.