Qak - Minimally viable product

June 26, 2020

Language goes Brrrr.

Qak 0.1

Alrighty. With all the basics out of the way as described in the last Qak post, it's time to define a minimal language I want to implement. This v0.1 will only contain the absolute mininum language features needed to get the infrastructure composed of tokenizer, parser, AST, type checker, byte code generator, and interpreter going. Ideally, I can also build the debugger interface for this.

Once all these components are in place for v0.1, I hope to be able to iterate quickly, adding new language, interpreter and standard library features. I've used this iterative approach for previous (toy) language projects and it's so far worked pretty well. It's also a surefire way to jump head first into super deep dead ends. Which means I get to document my back tracking and failures for your enjoyment.

Minimal language features

Before we dive into the semantics and syntax bike shed, let me quickly lay out the planned high level language features for v0.1.

Basic types and values.
Functions, operators, overloading, and foreign function interface.
Basic statements.
Modules.
Minimal standard library.

Let's look at these in more detail.

Types and values

For v0.1, I'll only support a handful of what's usually known as primitive types.

boolean is needed for conditionals. Values are expressed as true and false in code.
int32 and float32 should be self-explanatory and foreshadow what else to expect in the future. Values are expressed as the usual literals, like 123, 0xfe, 123.456, 123f, etc.
nothing is needed to express the absence of a return value for a function. The only value is nothing. Yes, be worried. This might eventually turn into something like null!

Why not just int32? Because it complicates the type checker just enough to hit that annoyance sweet spot, which may lay bare issues earlier in the development process. Having both an integer and float type also forces me to figure out if and what type conversions I want to have in Qak.

v0.1 of Qak will only allow the definition of foreign types. These are types for which the full definition must be provided to the compiler outside of a .qak source file. However, they must still be declared in the module they belong to. Sine these are built-in types, they go into the file (std.qak), the standard library module:

module std

foreign type boolean
foreign type int32
foreign type float32

Why have them explicitely in a .qak file like that at all? You'll see in the next section on functions an operators.

v0.1 will also not include any string or collection types, which keeps the scope small. The v0.1 types don't require me to implement a GC, which is nice. It's also an open invitation to smack into a design wall at full speed when designing the interpreter. We'll see how that works out.

Functions, operators, overloading, and foreign function interface

A function is a piece of code that has a name, (optional) arguments, and a return type. I don't particularly care for the syntax, but here we go:

function foo(a: int32, b: int32): int32
	...
end

Omitting the return type means the function returns nothing (which can of course also be specified explicitely).

A pretty standard affair, with the notable exception that Qak won't be a curly based language, at least not for statement/block delimitation. Instead we write it all out. And we'll like it. I believe this to be friendlier to beginners, but we'll see.

Surprise: operators and functions are the same thing. It's just that operators have syntactic sugar, e.g. they can be called by infix notation ala 2 + 3, instead of +(2, 3) (looks familiar, eh?). There's also unary operators like ! or -.

The whole range of operators for our built-in types are expressed as functions. The precedence of operators is fixed and part of the language definition. Making that configurable seems like a lot of pain. The supported operators consist of the standard logical and arithmetic operators you are used to from other languages, including unary, binary, and ternary operators.

Going back to our std.qak file, which defines the built-in types, it comes as no surprise that the operators are defined there as well, namely as foreign functions, the implementation of which must be provided to interpreter later on.

module std

foreign type boolean
foreign type int32
foreign type float32

foreign function !(a:boolean): boolean
foreign function &&(a: boolean, b: boolean): boolean
...
foreign function ==(a:float32, b: float32): boolean
...
foreign function +(a: int32, b: int32): int32
foreign function *(a: int32, b: int32): int32
...
foreign function +(a: float32, b: float32): float32

Should the compiler see an expression like 3 + 2, it will try to find a function with the name +, with two arguments of type int32, and insert a call to that function in the generated code. "But Mario, that will be slow!" I hear you scream at the top of your lungs. Yes. Which is why there'll be an optimization pass that translates certain known operator function calls to virtual machine instructions that are quicker. Think HotSpot intrinsics.

What about 3.0 + 2, the addition of a float32 and an int32? I can't possibly create operator functions for all permutations of input types. So I'll rely on explicit casting for now. The expression 3.0 + 2 will throw a compiler error, as the function +(a: float32, b: int32): float32 is undefined. Instead, a user is expected to explicitely cast one of the operands to the type of the other, e.g. 3.0 + toFloat32(2). I might add some compiler sugar to insert automatic casts. For now everything is explicit.

Functions can also be overloaded based on their argument types. I could include the return type in this mechanism, but my gut tells me that path leads to lots and lots of darkness. We'll see.

Basic statements

Many languages de jour treat almost anything as an expression that evaluate to a value, including control flow statements like if. Qak won't do that. Instead, it provides a handful of statements that do not produce a value, and also allow you to liter your functions with naked expressions. The values generated by such expressions will be discarded.


// variable declaration with initializers and simple type inference
// Variables without initializer will be initialized to the type's
// default value.
var foo = 123
var bar: boolean = true
var zeroInitializer: int32

// While statement, who needs for(-each)?!
while(bar)
	// Variables are block scoped
	var uff = 3

	// If statement
	if (foo > 200) then
		// Assignments
		bar = false
	else
		// arbitrary expressions (Which includes things
		// like function calls.
		print(foo)

		// The value generated by this expression is simply discarded.
		foo + 34 * zeroInitializer

		if (shouldWeStop()) then
			// break and continue (not pictured here)
			break
		end
	end

	foo = foo + 1
end

// return statement
return foo * 10

There's plenty in the above code that will force me to build reasonable infrastructure for the interpreter. More complex constructs are mostly just syntactic sugar that can be compiled to the basic statements shown above.

Modules

The compilation unit of Qak is the module. A module consists of:

The module name, expressed as module myModuleName at the top of the file
Imports of other modules (see below).
Type definitions. For now only foreign types are possible.
Function definitions.
Module variable definitions.

A module can import other modules, as long as the resulting graph of imports is acyclic. Rust does this, some people hate it, so I'll do it as well as I strive on hatred. All modules import the std module by default.

Importing a module is done like this: import someModuleName. I'll cross the "how the hell do you resolve modules?" bridge later. To avoid name clashes between things from an imported module and the current module, one can do import someModuleName as foo, and access things from that module via foo.thing.

Any type, function, or variable with a name starting with _ is considered private to the module. Anything else can be accessed by other modules. Again, this is likely a terrible idea. An explicit private specifier might be better.

Since a module can have variables, and those variables can have initializer expressions, I have to figure out a way when those initializers are run by the interpreter. I'm sure every little thing is gonna be alright.

Standard library

The standard library will be exceptionally minimal. It will define the (foreign) built-in types and their operators, common type conversion functions, and a handful of print() overloads, one for each type. All its code will go into the std.qak file.

Up next

This should be enough to create a contrived factorial nano benchmark to compare Qak against the likes of Lua and Python. Next time we'll look at the implementation of the parser and generation of the abstract syntax tree for the poor excuse of a language specification outlined above.

Discuss this post by replying to this tweet.