Wherein our lowly protagonist gets fed up with the state of software development interviews, and creates a new programming language which is particularly well-suited to implementing FizzBuzz (with the help of Python and RPLY).
I’m sure almost everyone’s heard of FizzBuzz, but if you haven’t, it’s is a mythical interview question to see how well you can write code in an environment in which you will never actually write code.
The goal is to sufficiently intimidate junior developers so they will know to steer clear of your organization. It also has the nice effect of offending senior developers as well, so it’s great for quickly narrowing down your hiring pipeline!
The general idea is as follows:
Write a program that prints the numbers
n, however for every number divisible by
'fizz'and for every number divisible by
'buzz'. If a number is divisible by both
We don’t use FizzBuzz or anything like it when we screen potential new hires at PromptWorks, and I’ve never actually been asked to implement it. But, if you gave me this question in an interview, after my initial shock wore off, I might have asked, “Can I use Python?” And if you said yes, I might have written something like this:
1 2 3 4 5 6 7 8 9 10 def fizzbuzz(n): for i in range(1, n+1): if i % 15 == 0: print 'fizzbuzz' elif i % 3 == 0: print 'fizz' elif i % 5 == 0: print 'buzz' else: print i
Which would technically produce the correct output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 >>> fizzbuzz(15) 1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz
But that’s no fun!
Plus, it took way too long to write. Which is why I created my own programming language, called DIVSPL (Dustin Ingram’s Very Special Programming Language), which is particularly well-suited for implementing FizzBuzz, quickly! And in this article, I’m going to walk you through each step in building DIVSPL from scratch.
Let’s design a language
Here’s what I propose as an example syntax of DIVSPL, which just happens to correspond to an implementation of the FizzBuzz problem mentioned above:
1 2 3 1...15 fizz=3 buzz=5
This allows us to express two things in the language: a range of numbers, and some form of variable assignment.
Let’s make a lexer
First things first: we need to build a lexer. The lexer takes a stream of characters as input, and outputs a list (or an iterator) of tokens from the language. This is referred to as “tokenization.” When using RPLY, tokens are represented as a named regular expression. In DIVSPL, we’ll have a total of four tokens:
ELLIPSIS, which is literally three periods in a row:
NUMBER, which is one or more numeric characters in a row, such as
EQUALS, which is just the equals character
WORD, which is one or more lowercase alphabetic characters, such as
Also, because we’re cool, we’ll ignore all whitespace characters, and because
we’re good developers, we’ll allow for comments, which involves ignoring
everything between a
# symbol and a newline character
Here’s how we create our
LexerGenerator as described:
1 2 3 4 5 6 7 8 from rply import LexerGenerator lg = LexerGenerator() lg.add("ELLIPSIS", r"\.\.\.") lg.add("NUMBER", r"\d+") lg.add("EQUALS", r"=") lg.add("WORD", r"[a-z]+") lg.ignore(r"\s+") # Ignore whitespace lg.ignore(r"#.*\n") # Ignore comments
And here’s how we build the lexer from the lexer generator:
1 lexer = lg.build()
We can use the lexer as an iterator: when given a stream of characters, it gives us a new token in our “language” every time we call it:
1 2 3 4 5 6 7 8 9 >>> iterator = lexer.lex('...foo42hut=') >>> iterator.next() Token('ELLIPSIS', '...') >>> iterator.next() Token('WORD', 'foo') >>> iterator.next() Token('NUMBER', '42') >>> iterator.next() Token('WORD', 'hut')
If our stream has invalid tokens, it works as expected, until we get to a token the lexer doesn’t understand:
1 2 3 4 5 6 7 8 9 >>> iterator = lexer.lex('foobar!!!') >>> iterator.next() Token('WORD', 'foobar') >>> iterator.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lexer.py", line 53, in next raise LexingError(...) rply.errors.LexingError
Let’s make a parser
Specifically, we’re going to make a Look-Ahead Left-Right parser. The parser will take the token stream from the lexer and separate and analyze the tokens according to a set of production rules specified by a formal grammar.
Let’s define our grammar
Using only the tokens in our language (
WORD), we’ll now define a formal grammar as a Context Free Grammar (CFG),
which is a series of rules called “productions” which describe how to combine
the tokens of the language into a syntax, thus describing all possible strings
in the language.
For example, if we simplified the English language to only the tokens
VERB (which would match all articles, nouns, and verbs
respectively), we might create a production rule called
sentence as such:
This would match the phrase, “The boy runs,” but not the phrase, “The boy runs quickly.”
Productions can also refer to other previously defined productions, or
themselves. To support adverbs, we could add the token
ADVERB and the
And now the phrase, “The boy runs quickly,” would be recognized by our grammar as a syntactically correct sentence.
Now, we can define the grammar for DIVSPL as follows:
Here, the \(\epsilon\) character represents an epsilon production, which
matches on an empty string, and, in combination with the previous production,
assignments to match anywhere from zero to an infinite number of
The \(main\) rule represents an entire valid DIVSPL program, and evaluating it will produce the output of our entire program.
Let’s play with boxes
To build this grammar with RPLY, we’ll create what is called a “box” in RPLY to represent the return value for each of these expressions.
“Boxes” are just Python classes which extend from
In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project.
Note: While we don’t necessarily need to use “boxes” here, if we want DIVSPL to be compatible with RPython as well, we’ll need to use them. There’s no particular reason why DIVSPL needs to be compatible with RPython, but since RPLY already is, we’ll follow suit and do the same. It does have the one advantage of being significantly faster that using Plain ol’ Python.
First up is the implementation of our \(number\) rule, the
1 2 3 4 5 6 class IntBox(BaseBox): def __init__(self, value): self.value = int(value.getstr()) def int(self): return self.value
IntBox takes a single value, and stores it as an integer. It has one
int, which returns that value.
The box for our next rule, \(word\), is similar:
1 2 3 4 5 6 class WordBox(BaseBox): def __init__(self, value): self.value = value.getstr() def str(self): return self.value
WordBox also takes one value, which it stores as a string, and
which is the return value for its
Next is the implementation for our \(range\) rule, the
1 2 3 4 5 6 7 8 9 10 from rply.token import BaseBox class RangeBox(BaseBox): def __init__(self, low, high): self.low = low self.high = high def range(self): return range(self.low.int(), self.high.int() + 1)
RangeBox takes two
high, and when evaluated,
returns an inclusive range:
1 2 3 4 5 >>> low = IntBox(Token('NUMBER', '1')) >>> high = IntBox(Token('NUMBER', '3')) >>> box = RangeBox(low, high) >>> box.range() [1, 2, 3]
Next is the “box” for our \(assignment\) (singular) rule,
which looks like this:
1 2 3 4 5 6 7 8 9 10 class AssignmentBox(BaseBox): def __init__(self, word, number): self.word = word self.number = number def eval_with(self, i): if not i % int(self.number.int()): return self.word.str() return ''
AssignmentBox has a quirky
eval_with function which takes an argument
i as a parameter. If
i is divisible by
self.number, it returns
self.word. Otherwise, it returns an empty string:
1 2 3 4 5 6 7 >>> word = WordBox(Token('WORD', 'foo')) >>> number = IntBox(Token('NUMBER', 7)) >>> box = AssignmentBox(word, number) >>> box.eval_with(42) 'foo' >>> box.eval_with(40) ''
Our \(assignments\) rule also needs a “box”:
1 2 3 4 5 6 7 8 9 class AssignmentsBox(BaseBox): def __init__(self, assignments=None, assignment=None): self.assignments = assignments self.assignment = assignment def list(self): if self.assignments: return self.assignments.list() + [self.assignment] return 
This “box” has a function
list which will return a list of assignments if
there are any, otherwise it returns an empty list.
Finally, we implement the “box” for our \(main\) rule,
MainBox, as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 class MainBox(BaseBox): def __init__(self, range_box, assignments): self.range_box = range_box self.assignments = assignments def eval(self): lines =  for i in self.range_box.range(): line = "" for assignment in self.assignments.list(): line += assignment.eval_with(i) lines.append(line or str(i)) return "\n".join(lines) + "\n"
(Note that we can’t use fancy list comprehensions here, because RPython prohibits it… 😞)
__init__ method of the
MainBox is straightforward enough, but that
eval function looks crazy. Let’s break it down:
- First, it creates an empty list,
- For every integer
iin the resulting range of
RangeBox, it creates an empty string,
- For every
AssignmentBoxin the list of assignments, it evaluates each one with
iand concatenates the results to
- If the
lineis not an empty string, append it to
lines, otherwise cast the integer
ito a string instead and append that.
- Finally, join the results with newlines and return.
Not sure what that does? You will soon see…
Let’s finally make a parser
Now, we’re ready to build our parser. First, we’ll tell the
what tokens are valid:
1 2 3 4 5 6 7 8 from rply import ParserGenerator pg = ParserGenerator([ "ELLIPSIS", "EQUALS", "NUMBER", "WORD" ])
Next, we’ll implement each of our rules from our grammar as “productions” using
@pg.production decorator. This decorator takes a string which corresponds
quite nicely to the rule in our grammar. For example, here’s the rule for
And here’s the production:
1 2 3 @pg.production("number: NUMBER") def number(p): return IntBox(p)
A production gets a single argument
p, which is a list of tokens which match
its rule. Here’s a nearly identical production for \(word\):
1 2 3 @pg.production("word: WORD") def word(p): return WordBox(p)
Next is our \(range\) rule, which is as follows:
And here’s the production:
1 2 3 @pg.production("range : number ELLIPSIS number") def range_op(p): return RangeBox(p, p)
For \(range\), we don’t care about
ELLIPSIS token (aside from the
fact that it’s there) so we only pass the two
number tokens to
Next, the \(assignment\) rule:
And its production:
1 2 3 @pg.production("assignment : word EQUALS number") def assignment_op(p): return AssignmentBox(p, p)
Again, we don’t care about the
EQUALS token, and only pass
number to the
The \(assignments\) rules are similar:
Here are their productions:
1 2 3 4 5 6 7 @pg.production("assignments : assignments assignment") def expr_assignments(p): return AssignmentsBox(p, p) @pg.production("assignments : ") def expr_empty_assignments(p): return AssignmentsBox()
The latter production takes advantage of the default arguments in
AssignmentsBox to eventually return an empty list.
Last is our \(main\) rule:
And its production:
1 2 3 @pg.production("main : range assignments") def main(p): return MainBox(p, p)
And we build the parser!
1 parser = pg.build()
Let’s make an interpreter
What does an interpreter do? In our case, not much:
1 2 3 4 5 6 7 def begin(argv): if len(argv) > 1: with open(argv, 'r') as f: result = parser.parse(lexer.lex(f.read())) os.write(1, result.eval()) else: os.write(1, "Please provide a filename.")
It is a command-line utility which reads in a file via a filename which is the
first argument. Then it hands the character stream to the lexer, which hands
the token stream to the parser. When the parser returns a result (which will be
MainBox), we evaluate it and print the result.
So now we have an interpreter, let’s write some code! As you might have
guessed, our DIVSPL language is really only good for one thing:
implementing FizzBuzz as quickly as possible. Let’s do it! Open up a file
fizzbuzz.divspl and write:
1 2 3 1...15 fizz=3 buzz=5
divspl interpreter, (which you can get via
pip install divspl in
case you weren’t following along), you can execute that file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 $ divspl fizzbuzz.divspl 1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz
Look at that! The fastest FizzBuzz in the West.
Let’s figure out what happened
When we defined our
fizzbuzz.divspl program, it looked like this:
1 2 3 1...15 fizz=3 buzz=5
When we passed the character stream to the lexer, it returned a list containing
Tokens which looked like this:
1 2 3 4 5 [ Token('NUMBER', '1'), Token('ELLIPSIS', '...'), Token('NUMBER', '15'), Token('WORD', 'fizz'), Token('EQUALS', '='), Token('NUMBER', '3'), Token('WORD', 'buzz'), Token('EQUALS', '='), Token('NUMBER', '5'), ]
Since the entire stream was exhausted and successfully turned into
the lexer decided everything is OK, and passed the list of tokens to the
parser, which turned it into a combination of “boxes,” like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 MainBox( RangeBox( IntBox(Token('NUMBER', '1')), IntBox(Token('NUMBER', '15')) ), AssignmentsBox( AssignmentsBox( AssignmentsBox(), AssignmentBox( WordBox(Token('WORD', 'fizz')), IntBox(Token('NUMBER', '3')) ) ), AssignmentBox( WordBox(Token('WORD', 'buzz')), IntBox(Token('NUMBER', '5')) ) ) )
The two assignment rules together made up an \(assignments\) rule, and that, preceded by a \(range\) rule, satisfied the \(main\) rule, making our program syntactically valid.
Since the parser was able to make the token stream satisfy the rules of our
grammar, it decides everything is OK and passes the
MainBox to our
Finally, the interpreter calls
.eval() on the result, and it returns the
Which, when printed to the console, gives us the FizzBuzz we know and love:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz
Let’s take it a step further
Not only is DIVSPL particularly well-suited to implementing FizzBuzz, it’s actually far better at implementing FizzBuzz than most modern languages.
Consider if we wanted to modify the standard FizzBuzz problem as follows:
- instead of printing integers up to
15, print up to
- whenever a number in the range is divisible by
7, we print
'fuzz'(and the resulting combinations of
In our original example in Python, we would not only need to add case for
'fuzz'), but also:
This would quickly become a huge pain, especially if we had to add any other cases. Luckily, doing the same in DIVSPL requires only one new line:
1 2 3 4 1...105 fizz=3 buzz=5 fuzz=7
And it works perfectly:
1 2 3 4 5 6 7 8 9 10 11 $ divspl fizzbuzzfuzz.divspl ...(intermediate output omitted) 97 fuzz fizz buzz 101 fizz 103 104 fizzbuzzfuzz
Let’s see the source
DIVSPL is open source and available at https://github.com/di/divspl. Pull requests are welcome!
Thanks to Patrick Smith, Ray Zane, and Brian Duggan for reading drafts of this post. Also, thanks to @alex for creating RPLY, and to Ryan Gonzales for creating The Magic of RPython, which was invaluable in helping distinguish RPython from magic.🎩🐇✨