The Amazing Peter Parser
This post is meant as an introduction to parser combinators. I’ve recently built a parser combinator library for tenecs (my programming language), called peter_parser. By the end of this article hopefully you’ll also be understand the concepts enough to build such a library yourself.
A parser, conceptually, allows us to convert something into something else, if possible.
Too abstract? Let’s get into examples:
Parsing a Date from a String
Parsing a data structure from a JSON
Parsing a number from a String
If we restrict our input to Strings, we can define our parser type as:
struct Parser<ThingWeWant>(
parse: (input: String) ~> ThingWeWant | ParseError
)
struct ParseError(message: String)
It’s a structure that contains a `parse` function. The `parse` function allows us to take a String and either get the ThingWeWant or a ParseError, when that’s not possible.
But we don’t have to restrict our input to a String. Some JSON libraries parse the JSON String into a data structure representing the JSON values and then allow you to parse from that data structure into your intended target data structure. For that we need to generalise the parser’s input.
struct Parser<In, T>(
parse: (input: In) ~> T | ParseError
)
struct ParseError(message: String)
Now we’d be able to write that code.
struct Color(red: Int, green: Int, blue: Int)
jsonParser: Parser<String, Json> = functionYouWouldImportFromALibrary()
colorFromJsonParser: Parser<Json, Color> = functionYouWouldImplement()
colorFromStringParser: Parser<String, Color> = Parser(
parse = (input: String): Color | ParseError => {
json: Json ?= jsonParser.parse(input)
colorFromJsonParser.parse(json)
}
)
So parsers can be sequentially combined. They are just a function, after all. But that’s not what parser combinators are about. Let’s get into that.
When you’re trying to parse something, there’s often smaller parts you isolate and tackle in order to build it up. For instance, when parsing a date “2025-12-25”, you need to parse a number three times. So perhaps it’s possible to build a parser out of smaller parsers.
For that, we need a slightly more sophisticated notion of a parser. We need to know the remaining of the input, so we can feed it into the next parser.
struct Parser<In, T>(
parse: (input: In) ~> ParseSuccess<In, T> | ParseError
)
struct ParseSuccess<In, T>(parsed: T, remaining: In)
struct ParseError(message: String)
And now, if we implement a parser for a number and a parser for the dash, all we’re missing is the parser combinators to turn it into a date parser. Assuming they exist, you can do something like:
struct Date(year: Int, month: Int, day: Int)
intParser: Parser<String, Int> = assumeThisIsImplemented()
dashParser: Parser<String, Void> = thisOneIsTrivialToImplement()
emptyStringParser: Parser<String, Void> = alsoTrivialToImplement()
dateParser: Parser<String, Date> =
intParser
->andThenIgnore(dashParser)
->andThen(intParser, (year, month) => {
Date(year, month, 0)
})
->andThenIgnore(dashParser)
->andThen(intParser, (date, day) => {
Date(date.year, date.month, day)
})
->andThenIgnore(emptyStringParser) /// make sure no remaining chars
The validation is missing for what constitutes a valid month number, for example, but that’s not a lot of work to add.
And this is what parser combinators are all about. Building smaller parsers that are easy to reason about and then combining them into something more complex. Some examples of combinators my library supports:
`andThen`: Combine two parsers sequentially
`ignoreAndThen`: Run two parsers but ignore the first result
`andThenIgnore`: Run two parsers but ignore the second result
`or`: Try one parser, fall back to another
`zeroOrMoreTimes`: Parse zero or more occurrences
`oneOrMoreTimes`: Parse one or more occurrences
`optional`: Make a parser optional
`separatedBy`: Parse items separated by a delimiter
You can use these to build more complex parsers, such as a CSV parser or a JSON parser. Break down your problem into really small ones and find the appropriate combinators. You might have fun treating parsers like legos.