Sunday, August 28, 2016

Solving Phone Number with Elixir and a lexer

With Elixir Conf less than a week away, it's time to prepare! And James Gray's email about his Elixir training couldn't be any more timely, especially the part about being comfortable solving problems on First, let me say that I had not heard of this site prior and wow is it fantastic and a great way to jump into a new language!

After solving the Hello World exercism I moved on to Phone Number. Phone Number is fairly simple, or so I thought. The basic idea is to take a string and parse out the phone number. If the input is invalid, return zeroes. Since Exercism defines all the tests, not only was I able to TDD the solution, but it was very clear what was expected.

My first cut explored using binary pattern matching to extract the number. That got me so far, but I had trouble handling the error conditions and non digit characters. So I looked into using regular expressions. In particular, could I use a regex to pattern match method arguments. Turns out, no you can't. However on my quest I ran across a great post on using regex's in Elixir and using them to route behavior. This post became the basis for my second approach. The idea was to transform each character of the input into a tuple that identified what it was based on a regular expression.

However, when I reached the test for treating input that included letters as invalid, I was stuck.

My solution was not setup to handle it and I wanted a way to exit early if the data was identified as bad. So I looked around for more ideas and found myself rereading the first post I found about regular expressions, except this time I read the whole thread, including the part about lexers.

Yes, compiling the regexps with a tool like leex[*] will generally produce a much faster program for doing this type of thing because:
It will only make one pass over the string you are testing irrespective of how many regexps you are testing it against. The other alternatives here will test your string against each possible regexp one at a time, this irrespective of whether they are hard-wired in a cond or defined in some nicer way.
That the leex version can do this partially depends on the regexps it allows are more restricted, amongst other things they never need backtracking which Perl and PCRE regexps may need.
For this type of usage we don't have to actually generate a "token" as such just some tag indicating what we found.
It is honestly quite easy to write a tool which generates an input file for leex from a set of regexps and return values. It is compile time but it is not that difficult to fix it so you could handle changing the regexp set dynamically and recompile your "scanner", though you wouldn't want to do it too often.[**]
[*] Leex is based on the same principles as other scanner generating tools like lex and flex. Where did we get the name from? It leaks tokens. :slight_smile:
[**] There are programs which handle configuration data in this way, they dynamically compile a new config module containing the config data instead of keeping it in a database. Quite cool actually.
So I did my homework on lexers and leex  (helpful, and also helpful) and wrote my own lexer grammar to tokenize the raw input.

My goal was to identify characters the problem said were valid. This is important, because I initially identified characters that were invalid also. Not only was writing the regex to define illegal characters ugly, but more importantly if a lexer fails it returns a beautiful { :error } message so now I could route to an overloaded method that knew how to handle { :error } and another one that knew how to handle { :ok } messages. This was pivotal in cleaning up my solution. Not only did I now have all of my digits being tagged as :int and return, but I also immediately knew if invalid input was being passed in.

All that remained was inspecting the length and doing a little formatting:

Looking back I believe I could also handle the length in the lexer definition as well, but I'll save that for another day. For now I'm going to find another problem on to solve!!