Let's write our own language.

No, on second thought, let's not.

One of the classic decisions any business has to grapple with is make-or-buy. From the smallest private enterprise in the consumer beverage space, to large government agencies in outer space, figuring out whether it makes more sense to roll your own or let somebody else do it for you is a decision that stares you in the face every day.

The amazing thing is that when it comes to scripting languages, the decision is no longer buy-or-make, but download-or-make. You used to have to compare the cost of building something yourself to purchasing somebody else's product and using it in your own. Now you just have to compare to using something that somebody else has already built and is willing to give you for free. And people still get it wrong.

Small languages

Back when I was just getting into programming (the mid 1970s), one of the current rages was small languages, sometimes known as domain-specific languages. The idea was that you could make a certain set of tasks easier by designing a language specifically suited to those kinds of tasks.  By that time, tools like yacc and lex had come on the scene, making it (somewhat) easier to implement custom languages. People were coming out of computer science programs where they had learned about grammars and compilers, and were eager to put that knowledge to good use. So, they started inventing languages and inflicting them on the world.

Now, I'm not saying that inventing new languages is necessarially a bad thing. It is only by trying new things that we make progress. Had people not invented new languages, we would all still be writing Fortran, Cobol, and Lisp (not to mention assembler). Surely the progress we've made since then is a good thing.

A classic (and exceptionally successful) example is regular expressions. When I write:

\[(?P<accept_date>\d{2}/\w{3}/\d{4}(:\d{2}){3}\.\d{3})]

I'm really writing a program which gets compiled and run on a regex virtual machine. In theory, I could have written this in a GPL. Maybe something like:

pattern = literal('[')
pattern += named_group('accept_date', digits(1, 2)
                                    + literal('/')
                                    + word(1, 3)
                                    + literal('/')
                                    + digits(1, 4)
                                    + optional_group(......

but that's so horrible, I don't even want to finish typing it out. Regex is a big win here. There is a (somewhat steep) learning curve to using the language, but once you've mastered it, you've got a powerful tool in your kit. A tool that can make quick work of otherwise messy problems.

But, by the same token, we (by which I mean the software industry as a whole) seem to have this insatible desire to keep inventing new languages for no good reason. Generally, they all start out the same way. There's some specific problem domain you're working in and need a scripting language. You look around at what's available off the shelf, and one by one cross each one off the list. This one is not expressive enough. That one not embeddable enough (or extendable enough). Another is saddled with onerous licensing restrictions. Too slow. Needs too much memory. Not portable enough. The reasons are endless. Once you're down to no viable candidates, the only logical next step is to start scribbling a grammar on the whiteboard.

You're not insane (at least not obviously so). You know that writing a full-fledged general purpose programming language is an absurdly difficult and out-of-scope project. But that's not your goal anyway. Your goal is to write a small language. Just enough to get done what you need to do, and no more. You knock off the basic grammar in an afternoon. By the end of the week, you've got something up and running. It's small. It's deliberately limited in scope. It's exactly what you need to solve today's problem.

But, then somebody says, "Wouldn't it be great if we had floating point". And then somebody else needs "if" statements. And subroutines. Exceptions. Classes. Unicode support. Networking. The list is endless. At first, you resist, insisting this wasn't supposed to be a general-purpose programming language. But the requests continue. Eventually, some request turns into a demand from somebody you can't refuse, and you're sliding headlong down the slippery slope.

If you're going to invent, at least invent something new.

If you've worked with me in the past, you've probably heard this rant before. That's because I've lived through this hell before. Multiple times. To avoid embarrasing anybody (and/or legal entanglements), I'm not going to mention any names, but here's a few real-life examples.

I worked on one project that had an embedded scripting language. It was almost perl. It looked so much like perl, I didn't even realize at first that it wasn't perl. I suppose the people who invented this figured being almost like perl was a good thing, since most people knew perl and that meant that most people would already almost know this new language. Wrong. What it really meant was everybody just assumed it was perl, wrote in it as if it were perl, and then had to figure out from the cryptic error messages what they had done wrong. Had they simply embedded a real perl interpreter, they would have saved themselves a lot of implementation effort and ended up with a much more powerful system.

Currently, I'm building an application for a well-known consumer product. Development is done in a vendor-proprietary scripting language based on (of all things) Visual Basic. Why?  They could have embedded Python. Or Ruby. Or Java. Or Javascript. Or probably a few other reasonable candidates. I happen not to be a fan of some of those, but any would have made more sense than some home-grown Visual Basic clone. With any of those, you would have gotten (for free):

  • Tons of libraries to do all sort of interesting things.
  • An entire developer ecosystem of debuggers, profilers, test frameworks, editor plugins, and so on.
  • A support network of people on mailing lists, Stack Overflow, and other forums, eager to help you with problems.

None of that exists with a home-grown language.  Instead, what you get is something which is incomplete, an evolutionary dead-end, probaby buggy, and you own it all (so if you don't continue development, nobody else will).  Do yourself (and your users) a favor.  When the urge to write your own language strikes, just say no.