The Fastest Fizzbuzz in the West
Wherein our lowly protagonist gets fed up with the state of software development interviews, and creates a new programming language which is particularly well-suited to implementing FizzBuzz.
by Dustin Ingram
Wherein our lowly protagonist gets fed up with the state of software development interviews, and creates a new programming language which is particularly well-suited to implementing FizzBuzz (with the help of Python and RPLY).
I'm sure almost everyone's heard of FizzBuzz, but if you haven't, it's a mythical interview question for software engineers to see how well you can write code in an environment in which you will never actually write code.
The goal is to sufficiently intimidate junior developers so they will know to steer clear of your organization. It also has the nice effect of offending senior developers as well, so it's great for quickly narrowing down your hiring pipeline!
The general idea is as follows:
Write a program that prints the numbers
1
throughn
, however for every number divisible by3
, print'fizz'
and for every number divisible by5
, print'buzz'
. If a number is divisible by both3
and5
, print'fizzbuzz'
.
We don't use FizzBuzz or anything like it when we screen potential new hires at Promptworks, and I've never actually been asked to implement it. But, if you gave me this question in an interview, after my initial shock wore off, I might have asked, "Can I use Python?" And if you said yes, I might have written something like this:
def fizzbuzz(n):
for i in range(1, n+1):
if i % 15 == 0:
print 'fizzbuzz'
elif i % 3 == 0:
print 'fizz'
elif i % 5 == 0:
print 'buzz'
else:
print i
Which would technically produce the correct output:
>>> fizzbuzz(15)
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
But that's no fun!
Plus, it took way too long to write. Which is why I created my own programming language, called DIVSPL (Dustin Ingram's Very Special Programming Language), which is particularly well-suited for implementing FizzBuzz, quickly! And in this article, I'm going to walk you through each step in building DIVSPL from scratch.
We'll build DIVSPL with RPLY, an implementation of PLY, but with a "cooler" API, and compatible with RPython, a restricted subset of the Python programming language.
PLY is an implementation of two very old and battle tested Unix tools, Lex, which generates lexical analyzers, and Yacc, which is a parser generator.
Let's design a language
Here's what I propose as an example syntax of DIVSPL, which just happens to correspond to an implementation of the FizzBuzz problem mentioned above:
1...15
fizz=3
buzz=5
This allows us to express two things in the language: a range of numbers, and some form of variable assignment.
Let's make a lexer
First things first: we need to build a lexer.
The lexer takes a stream of characters as input, and outputs a list (or an iterator) of tokens from the language. This is referred to as "tokenization." When using RPLY, tokens are represented as a named regular expression. In DIVSPL, we'll have a total of four tokens:
ELLIPSIS
, which is literally three periods in a row:...
;NUMBER
, which is one or more numeric characters in a row, such as42
;EQUALS
, which is just the equals character=
; andWORD
, which is one or more lowercase alphabetic characters, such asfoo
.
Also, because we're cool, we'll ignore all whitespace characters, and because
we're good developers, we'll allow for comments, which involves ignoring
everything between a #
symbol and a newline character \n
.
Here's how we create our LexerGenerator
as described:
from rply import LexerGenerator
lg = LexerGenerator()
lg.add("ELLIPSIS", r"\.\.\.")
lg.add("NUMBER", r"\d+")
lg.add("EQUALS", r"=")
lg.add("WORD", r"[a-z]+")
lg.ignore(r"\s+") # Ignore whitespace
lg.ignore(r"#.*\n") # Ignore comments
And here's how we build the lexer from the lexer generator:
lexer = lg.build()
We can use the lexer as an iterator: when given a stream of characters, it gives us a new token in our "language" every time we call it:
>>> iterator = lexer.lex('...foo42hut=')
>>> iterator.next()
Token('ELLIPSIS', '...')
>>> iterator.next()
Token('WORD', 'foo')
>>> iterator.next()
Token('NUMBER', '42')
>>> iterator.next()
Token('WORD', 'hut')
If our stream has invalid tokens, it works as expected, until we get to a token the lexer doesn't understand:
>>> iterator = lexer.lex('foobar!!!')
>>> iterator.next()
Token('WORD', 'foobar')
>>> iterator.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lexer.py", line 53, in next
raise LexingError(...)
rply.errors.LexingError
Let's make a parser
Specifically, we're going to make a Look-Ahead Left-Right parser. The parser will take the token stream from the lexer and separate and analyze the tokens according to a set of production rules specified by a formal grammar.
Let's define our grammar
Using only the tokens in our language (ELLIPSIS
, NUMBER
, EQUALS
, and
WORD
), we'll now define a formal grammar as a Context Free Grammar (CFG),
which is a series of rules called "productions" which describe how to combine
the tokens of the language into a syntax, thus describing all possible strings
in the language.
For example, if we simplified the English language to only the tokens
ARTICLE
, NOUN
, and VERB
(which would match all articles, nouns, and verbs
respectively), we might create a production rule called sentence
as such:
This would match the phrase, "The boy runs," but not the phrase, "The boy runs quickly."
Productions can also refer to other previously defined productions, or
themselves. To support adverbs, we could add the token ADVERB
and the
following rule:
And now the phrase, "The boy runs quickly," would be recognized by our grammar as a syntactically correct sentence.
Now, we can define the grammar for DIVSPL as follows:
Here, the \(\epsilon\) character represents an epsilon production, which
matches on an empty string, and, in combination with the previous production,
allows assignments
to match anywhere from zero to an infinite number of
assignment
rules.
The \(main\) rule represents an entire valid DIVSPL program, and evaluating it will produce the output of our entire program.
Let's play with boxes
To build this grammar with RPLY, we'll create what is called a "box" in RPLY to represent the return value for each of these expressions.
"Boxes" are just Python classes which extend from rply.token.BaseBox
. From
https://github.com/alex/rply#why-do-we-have-the-boxes:
In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project.
Note: While we don't necessarily need to use "boxes" here, if we want DIVSPL to be compatible with RPython as well, we'll need to use them. There's no particular reason why DIVSPL needs to be compatible with RPython, but since RPLY already is, we'll follow suit and do the same. It does have the one advantage of being significantly faster that using Plain ol' Python.
First up is the implementation of our \(number\) rule, the IntBox
:
class IntBox(BaseBox):
def __init__(self, value):
self.value = int(value.getstr())
def int(self):
return self.value
The IntBox
takes a single value, and stores it as an integer. It has one
function, int
, which returns that value.
The box for our next rule, \(word\), is similar:
class WordBox(BaseBox):
def __init__(self, value):
self.value = value.getstr()
def str(self):
return self.value
Here, the WordBox
also takes one value, which it stores as a string, and
which is the return value for its str
function.
Next is the implementation for our \(range\) rule, the RangeBox
:
from rply.token import BaseBox
class RangeBox(BaseBox):
def __init__(self, low, high):
self.low = low
self.high = high
def range(self):
return range(self.low.int(), self.high.int() + 1)
A RangeBox
takes two IntBox
es, low
and high
, and when evaluated,
returns an inclusive range:
>>> low = IntBox(Token('NUMBER', '1'))
>>> high = IntBox(Token('NUMBER', '3'))
>>> box = RangeBox(low, high)
>>> box.range()
[1, 2, 3]
Next is the "box" for our \(assignment\) (singular) rule, AssignmentBox
,
which looks like this:
class AssignmentBox(BaseBox):
def __init__(self, word, number):
self.word = word
self.number = number
def eval_with(self, i):
if not i % int(self.number.int()):
return self.word.str()
return ''
The AssignmentBox
has a quirky eval_with
function which takes an argument
i
as a parameter. If i
is divisible by self.number
, it returns
self.word
. Otherwise, it returns an empty string:
>>> word = WordBox(Token('WORD', 'foo'))
>>> number = IntBox(Token('NUMBER', 7))
>>> box = AssignmentBox(word, number)
>>> box.eval_with(42)
'foo'
>>> box.eval_with(40)
''
Our \(assignments\) rule also needs a "box":
class AssignmentsBox(BaseBox):
def __init__(self, assignments=None, assignment=None):
self.assignments = assignments
self.assignment = assignment
def list(self):
if self.assignments:
return self.assignments.list() + [self.assignment]
return []
This "box" has a function list
which will return a list of assignments if
there are any, otherwise it returns an empty list.
Finally, we implement the "box" for our \(main\) rule, MainBox
, as follows:
class MainBox(BaseBox):
def __init__(self, range_box, assignments):
self.range_box = range_box
self.assignments = assignments
def eval(self):
lines = []
for i in self.range_box.range():
line = ""
for assignment in self.assignments.list():
line += assignment.eval_with(i)
lines.append(line or str(i))
return "\n".join(lines) + "\n"
(Note that we can't use fancy list comprehensions here, because RPython prohibits it... 😞)
The __init__
method of the MainBox
is straightforward enough, but that
eval
function looks crazy. Let's break it down:
- First, it creates an empty list,
lines
. - For every integer
i
in the resulting range ofRangeBox
, it creates an empty string,line
. - For every
AssignmentBox
in the list of assignments, it evaluates each one withi
and concatenates the results toline
. - If the
line
is not an empty string, append it tolines
, otherwise cast the integeri
to a string instead and append that. - Finally, join the results with newlines and return.
Let's finally make a parser
Now, we're ready to build our parser. First, we'll tell the ParserGenerator
what tokens are valid:
from rply import ParserGenerator
pg = ParserGenerator([
"ELLIPSIS",
"EQUALS",
"NUMBER",
"WORD"
])
Next, we'll implement each of our rules from our grammar as "productions" using
the @pg.production
decorator. This decorator takes a string which corresponds
quite nicely to the rule in our grammar. For example, here's the rule for
\(number\):
And here's the production:
@pg.production("number: NUMBER")
def number(p):
return IntBox(p[0])
A production gets a single argument p
, which is a list of tokens which match
its rule. Here's a nearly identical production for \(word\):
@pg.production("word: WORD")
def word(p):
return WordBox(p[0])
Next is our \(range\) rule, which is as follows:
And here's the production:
@pg.production("range : number ELLIPSIS number")
def range_op(p):
return RangeBox(p[0], p[2])
For \(range\), we don't care about ELLIPSIS
token (aside from the
fact that it's there) so we only pass the two number
tokens to RangeBox
(as
integers).
Next, the \(assignment\) rule:
And its production:
@pg.production("assignment : word EQUALS number")
def assignment_op(p):
return AssignmentBox(p[0], p[2])
Again, we don't care about the EQUALS
token, and only pass word
and
number
to the AssignmentBox
.
The \(assignments\) rules are similar:
Here are their productions:
@pg.production("assignments : assignments assignment")
def expr_assignments(p):
return AssignmentsBox(p[0], p[1])
@pg.production("assignments : ")
def expr_empty_assignments(p):
return AssignmentsBox()
The latter production takes advantage of the default arguments in
AssignmentsBox
to eventually return an empty list.
Last is our \(main\) rule:
And its production:
@pg.production("main : range assignments")
def main(p):
return MainBox(p[0], p[1])
And we build the parser!
parser = pg.build()
Let's make an interpreter
What does an interpreter do? In our case, not much:
def begin(argv):
if len(argv) > 1:
with open(argv[1], 'r') as f:
result = parser.parse(lexer.lex(f.read()))
os.write(1, result.eval())
else:
os.write(1, "Please provide a filename.")
It is a command-line utility which reads in a file via a filename which is the
first argument. Then it hands the character stream to the lexer, which hands
the token stream to the parser. When the parser returns a result (which will be
a MainBox
), we evaluate it and print the result.
Let's code
So now we have an interpreter, let's write some code!
As you might have guessed, our DIVSPL language is really only good for one thing:
implementing FizzBuzz as quickly as possible. Open up a file
fizzbuzz.divspl
and write:
1...15
fizz=3
buzz=5
With the divspl
interpreter, (which you can get via pip install divspl
in
case you weren't following along), you can execute that file:
$ divspl fizzbuzz.divspl
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
Look at that! The fastest FizzBuzz in the West.
Let's figure out what happened
When we defined our fizzbuzz.divspl
program, it looked like this:
1...15
fizz=3
buzz=5
When we passed the character stream to the lexer, it returned a list containing
Token
s which looked like this:
[
Token('NUMBER', '1'), Token('ELLIPSIS', '...'), Token('NUMBER', '15'),
Token('WORD', 'fizz'), Token('EQUALS', '='), Token('NUMBER', '3'),
Token('WORD', 'buzz'), Token('EQUALS', '='), Token('NUMBER', '5'),
]
Since the entire stream was exhausted and successfully turned into Token
s,
the lexer decided everything is OK, and passed the list of tokens to the
parser, which turned it into a combination of "boxes," like so:
MainBox(
RangeBox(
IntBox(Token('NUMBER', '1')),
IntBox(Token('NUMBER', '15'))
),
AssignmentsBox(
AssignmentsBox(
AssignmentsBox(),
AssignmentBox(
WordBox(Token('WORD', 'fizz')),
IntBox(Token('NUMBER', '3'))
)
),
AssignmentBox(
WordBox(Token('WORD', 'buzz')),
IntBox(Token('NUMBER', '5'))
)
)
)
The two assignment rules together made up an \(assignments\) rule, and that, preceded by a \(range\) rule, satisfied the \(main\) rule, making our program syntactically valid.
Since the parser was able to make the token stream satisfy the rules of our
grammar, it decides everything is OK and passes the MainBox
to our
interpreter.
Finally, the interpreter calls .eval()
on the result, and it returns the
following string:
"1\n2\nfizz\n4\nbuzz\nfizz\n7\n8\nfizz\nbuzz\n11\nfizz\n13\n14\nfizzbuzz\n"
Which, when printed to the console, gives us the FizzBuzz we know and love:
1
2
fizz
4
buzz
fizz
7
8
fizz
buzz
11
fizz
13
14
fizzbuzz
Let's take it a step further
Not only is DIVSPL particularly well-suited to implementing FizzBuzz, it's actually far better at implementing FizzBuzz than most modern languages.
Consider if we wanted to modify the standard FizzBuzz problem as follows:
- instead of printing integers up to
15
, print up to105
; - whenever a number in the range is divisible by
7
, we print'fuzz'
(and the resulting combinations of'fuzz'
with'fizz'
and'buzz'
).
In our original example in Python, we would not only need to add case for
i%7
(for 'fuzz'
), but also:
i%21
(for'fizzfuzz'
);i%35
(for'buzzfuzz'
); andi%105
(for'fizzbuzzfuzz'
).
This would quickly become a huge pain, especially if we had to add any other cases. Luckily, doing the same in DIVSPL requires only one new line:
1...105
fizz=3
buzz=5
fuzz=7
And it works perfectly:
$ divspl fizzbuzzfuzz.divspl
...(intermediate output omitted)
97
fuzz
fizz
buzz
101
fizz
103
104
fizzbuzzfuzz
Let's see the source
DIVSPL is open source technology and available at https://github.com/di/divspl. Pull requests are welcome!
Thanks to Patrick Smith, Ray Zane, and Brian Duggan for reading drafts of this post. Also, thanks to @alex for creating RPLY, and to Ryan Gonzales for creating The Magic of RPython, which was invaluable in helping distinguish RPython from magic.🎩🐇✨