Basis - Reinventing the world

July 06, 2018

Assembling a new home from many pieces.

When I decided to throw everything away and start from scratch, I was looking through available options for building a (mostly) static site. My requirements for the new site:

As a first step, I checked out the solutions my peers use. Andre is using Jekyll. Michael used to use Jekyll, but recently switch to Hugo. I'll refrain from citing specific reasons. Suffice it to say that a lot of swearing and phrases like "bit rot" were involved.

Jekyll is built on Ruby, uses Liquid as its (capable) templating engine, and dictates a very rigid directory structure. I'm not a big fan of Ruby (fight me), and while Jekyll let's you do a lot, I felt I'd probably get annoyed by it's rigidity. Jekyll's documentation is fantastic though!

Hugo comes off as a more enterprise-y version of Jekyll. That also expresses itself in its documentation. There's a lot of it, including videos. Why people think videos are a great source of documentation I'll never understand. Hugo is built on Go and uses Go's not very expressive built-in templating engine. Just based on the overwhelming documentation, Hugo seemed too complex to me, so I abandoned it as well.

At this point I decided not to look any futher, because ...

The human body contains 8 liters of blood. All of which can exit through a little tile wound.

How hard can it be?

The static site generator is going to be a command line application, so I need to parse command line arguments. A templating engine is needed to do more complex things like listing published blog posts. Finally, I need a driver that assembles all these pieces and packages them with a file system watcher. Bonus points for making that driver easy to embed in a web service that serves the static files and provides end points for dynamic content like comments.

I'm familar with the JVM. It's proven technology for web backends. Its tooling is pretty good to superb. And while Kotlin or Scala would be fun to (re-)try in production, Java 8+ has enough creature comforts to ease most of my pain.

With the JVM as the target platform, I could surely find existing OSS components, duct-tape them together and call it a day. But where is the fun in that? All of the pieces will be hand-crafted, artisan products of love, with zero external dependencies except for what the Java standard library provides.

I like re-usable components, so each of the three pieces will go into a separate project. Since I prefer declarative builds, I'll be going with Maven as the build and dependency management system. I've used Gradle in anger, but its mix of Ant like free-form style and all that Groovy make me dislike it more than Maven. Also, it's still a pain to deploy to Maven Central.

Speaking of Maven Central: all components will be BSD licensed OSS. This forces me to keep everything somewhat clean and tested, with OK documentation. I wouldn't want to scare away poor LinkedIn recruiters with low quality public GitHub repositories!

All of these and future components need an umbrella. And thus Basis is born.

Basis-arguments

Robust command line parsing is more involved than one might think. A single argument may have multiple forms, e.g. -i and --input, may be optional, and may expect a value of a specific type that needs validation.

But that's not all. You also want a way to generate nicely formatted help and error messages, so the user can explore the options your program offers, or figure out which parameters were incorrect or missing.

With basis-arguments, I tried to code up all these requirements, with a minimal API and zero magic. It also tries to be as type-safe as Java allows. Here's what that looks like:

public static void main (String[] argv) {
    Arguments args = new Arguments();

    // Add a simple, optional argument that doesn't expect a value.
    Argument verbose = args.addArgument(
        new Argument("-v", "Display verbose log messages.", true)
    );

    // Add an argument that expects a string value.
    StringArgument serve = args.addArgument(
        new StringArgument(
            new String["-s", "--serve-static-files"],
            "Serve static files from the given directory, non-optional.",
            "",
            false
        )
    );

    // Add an argument that expects an integer value.
    IntegerArgument port = args.addArgument(
        new IntegerArgument(
            new String[] {"-p", "--port"},
            "The port to serve the files from, non-optional.",
            "<port>", false
        )
    );

    // And a final argument so the user can request to display
    // the nicely formatted help text.
    Argument help = args.addArgument(
        new Argument("-h", "--help", "Display this help text and exit.", true)
    );

    try {
        // Parse the arguments
        ParsedArguments parsed = args.parse(argv);

        // If the user requested to be shown the help text, use the
        // Arguments#printHelp function to output it nicely formated.
        if (parsed.has(help)) {
            args.printHelp(System.out);
            System.exit(0);
        }

        // Otherwise check if non-value arguments are given, and get the
        // non-optional port value.
        boolean isLogVerbosely = parsed.has(verbose);
        boolean isServeStaticFiles = parsed.has(serve);
        int portNumber = parsed.getValue(port);
    } catch (ArgumentException e) {
        // We got an unexpected argument, or a non-optional argument wasn't given,
        // or an argument value couldn't be parsed, so tell the user what they
        // did wrong, using the error message from the exception.
        System.err.println(e.getMessage());
        Sytem.exit(-1);
    }
}

Excuse the wonky formatting. I use a 120 character wide line length. My monitor is made for humans (> 640x480), not for ants. Read the documentation for the full monty.

Basis-template

Template engines are a dime a dozen on pretty much all platforms. On the JVM, they range from not so expressive, to full blown scripting languages. I'm in the "expressive" camp. If you have to use a template engine, it might as well allow you to shoot yourself in the foot with a bazooka.

As a compiler geek, I'm a bit saddened by the fact that I haven't touched any compiler code in 2 years. Naturally, I ignored everything that's available out there, and wrote my own little template engine called basis-template. It's a bazooka-grade foot gun!

There's a lot of functionality cramed into the little thing, so I recommend to read its extensive documentation. Here, we just want to get a little taste and highlight some interesting features and implementation details.

What's in a template?

Basis-template is inspired by Jtwig, which itself is a sort of JVM port of PHP's Twig. Great ancestry! A template consists of text and code spans, the latter delimited by {{ and }}. Anything found in a code span is interpreted by the template engine according to the syntax and semantics of the template language.

Hello {{name}}.

This template has 2 text spans ("Hello " and .) that will be emitted verbatim. The code span ({{name}}) will be evaluated to some value by the template engine. The value then replaces the code span in the final output.

The driver on the Java side looks like this:

TemplateLoader loader = new FileTemplateLoader();
Template template = loader.load("helloworld.bt");
TemplateContext context = new TemplateContext();
context.set("name", "Hotzenplotz");
System.out.println(template.render(context));
It yields this output:
Hello Hotzenplotz.

Basis-template's language features almost everything the programmer heart desires:

You can not define your own data types in basis-template. You can come close by using map and array literals though.

The power of basis-template comes from injecting JVM objects into the template via the TemplateContext. Need Math.cos() in your template?

{{cos(3.14)}}
context.set("cos", (DoubleFunction)Math::cos);
System.out.println(template.render(context));
-0.9999987317275395

The interpreter evaluating templates is also smart enough to ensure the proper types are used when interacting with JVM objects. It will even resolve overloaded methods and functions. Speaking of interpretation.

Parsing, error reporting and interpretation

A lot of template engines and other compiler-like projects rely on parser generators like ANTLR. You provide a grammar in some (extended) Backus-Naur form flavor, fight against the peculiarities of an LR(k) or Peg parser, until the whole thing becomes an unmaintainable mess, and error reporting gives you an ulcer.

Since I'm in full control of all aspects of the template language, I can also define its syntax in such a way, that hand-writing a recursive descent LL(1) parser is trivial. In fact, the entire parser is less than 450 LOC. While small, it supports syntax comparable in complexity to that of Lua and JavaScript.

The error reporting of many template engines is also a tad lack luster, with some not even reporting the line on which an error occured. While no error reporting is perfect, I think basis-templates error reporting is pretty OK given its scope:

Error (site/posts/hello-world/index.bt.html:12): Error in included file.
Error (site/posts/hello-world/../../_templates/post_header.bt.html:1): Error in included file.
Error (site/posts/hello-world/../../_templates/header.bt.html:31):
Couldn't find method 'omg' for object of type 'SiteFile'.

        Hey, look at this. {{file.omg()}}
                                  ^^^^^

Basis-template is essentially an abstract syntax tree interpreter. AST interpreters are commonly known to be slow, but easy to implement. Even some mature languages like Ruby used AST interpretation for a long time, before switching to a byte code virtual machine or other evaluation strategies.

How slow is slow? To answer this question, I forked template-benchmark, a JMH-based micro-benchmark suite that pits popular JVM template engines against each other. Results:

Rocker.benchmark                thrpt   10  70602.199 ±  768.252  ops/s
BasisTemplate.benchmark         thrpt   10  39906.867 ±  513.546  ops/s
BasisTemplateGetters.benchmark  thrpt   10  36745.616 ± 1389.334  ops/s
Pebble.benchmark                thrpt   10  28430.695 ±  807.715  ops/s
Trimou.benchmark                thrpt   10  25983.174 ±  558.236  ops/s
Velocity.benchmark              thrpt   10  23083.624 ±  139.350  ops/s
Handlebars.benchmark            thrpt   10  21507.412 ±  242.507  ops/s
Freemarker.benchmark            thrpt   10  20429.272 ±  394.242  ops/s
JavaMustache.benchmark          thrpt   10  19954.687 ± 4233.619  ops/s
JMustache.benchmark             thrpt   10  14235.609 ±  105.795  ops/s
JTwig.benchmark                 thrpt   10   4327.615 ±  322.175  ops/s
Thymeleaf.benchmark             thrpt   10   1495.825 ±   34.195  ops/s

Basis-template comes in second place behind Rocker. Rocker compiles its templates to Java code, which is then JIT compiled by the JVM. I think it's fair to say that at 55% of the speed of a JVM JIT compiled solution, basis-template is pretty fast.

I spent quite some time optimizing basis-template. You can view my optimization steps in these commits. The optimizations where driven by a healthy dose of JProfiler and observing changes in the JMH benchmark timing results.

The biggest speed up was achieved by moving interpretation from a set of big, static methods that used instanceof to, plain old virtual dispatch. Turns out its really hard beating the JVM at its own game.

The curious among you can find the entire interpreter code in the Ast class. It pains me to have that code intermingled with the type definitions, but that is the price to pay for acceptable performance.

Basis-site

With the exciting part out of the way, it is time to speak about the boring part: basis-site, the actual site generator.

You give basis-site an input directory, it process each encountered file, and writes the result to an output directory. That's it.

The magic happens in the processing step. Each input file is passed through a list of (configurable) processors. Each processor can modify the file content and final output file name. The output of one processor is the input of the next processor.

Basis-site comes with a single processor out-of-the-box. It evaluates files containing the infix .bt. in their file names via basis-template. It also strips the infix from the output file names. The processor will not touch files without the infix. These files are copied verbatim or processed by a user defined processor if that processor so desires.

The template file processor will also inject a handful of functions into each template for formating dates and listing other input files and their metadata. The metadata is similar to Jekyll's and Hugo's front matter, except it's also defined using the template language. Why have 2 different sets of languages when you have a bazooka?

Files and folders starting with _ in their name will not be passed to processors, and will not be copied to the output directory. They are however accessible to templated files, e.g. via an include statement.

The most basic site with a landing page, an about page, and shared header and footer could look like this:

input/
    _templates/
        header.html
        footer.html
    css/
        style.css
    js/
        code.js
    index.bt.html
    about.bt.hml

The directory structure is completely arbitrary. The files in the _templates/ directory will be ignored (but are accessible to templated files, e.g. for inclusion). The folders css/ and js/ will be copied verbatim. The index.bt.html and about.bt.html file are evaluated via basis-template and stripped of their infix. They include the header and footer via the basis-template include statement.

<!-- index.bt.html -->
{{include "_templates/header.html"}}

<h1>Welcome to my website</h1>

<p>You can learn more about me on the <a href="about.html">About page</a></p>

{{include "_templates/footer.html"}}
<!-- about.bt.html -->
{{include "_templates/header.html"}}

<h1>About me</h1>

<p>I'm a little pea, I love the birds and the trees. Go back to the
<a href="index.html">landing page</a></p>

{{include "_templates/footer.html"}}
The output would look like this:
output/
    css/
        style.css
    js/
        code.js
    index.html
    about.html

Basis-site can be used either as a command line application, or embedded in a JVM application. This very site uses the latter approach. It allows me to add additional processors for tasks like image resizing and cropping, and lets me provide more functionality to templates by injecting functions. I can also serve dynamic content like comments via a tiny web service based on the excellent Javalin (which I have no intention of re-inventing).

This only scratches the surface of this particular foot gun. Read the documentation and explore this site's source code to see basis-site in production.

Moving on

There are a handful of NIH projects that still require doing. What's a blog without comments? What are comments without captchas? Also, I'd like to track some basic site visit stats without having to turn over all of our data to Google. All perfectly scoped little things for me to reinvent badly!

How many cool points do I get when I tell you that this site is managed by Docker? None? OK. Sad trombone.

p.s.: Since I still haven't gotten around to implementing a comment system, you can either reply to this tweet, or scream into a can of tuna.