Alrighty. With all the basics out of the way as described in the last Qak post, it's time to define a minimal language I want to implement. This v0.1 will only contain the absolute mininum language features needed to get the infrastructure composed of tokenizer, parser, AST, type checker, byte code generator, and interpreter going. Ideally, I can also build the debugger interface for this.
Once all these components are in place for v0.1, I hope to be able to iterate quickly, adding new language, interpreter and standard library features. I've used this iterative approach for previous (toy) language projects and it's so far worked pretty well. It's also a surefire way to jump head first into super deep dead ends. Which means I get to document my back tracking and failures for your enjoyment.
Minimal language features
Before we dive into the semantics and syntax bike shed, let me quickly lay out the planned high level language features for v0.1.
- Basic types and values.
- Functions, operators, overloading, and foreign function interface.
- Basic statements.
- Minimal standard library.
Let's look at these in more detail.
Types and values
For v0.1, I'll only support a handful of what's usually known as primitive types.
booleanis needed for conditionals. Values are expressed as
float32should be self-explanatory and foreshadow what else to expect in the future. Values are expressed as the usual literals, like
nothingis needed to express the absence of a return value for a function. The only value is
nothing. Yes, be worried. This might eventually turn into something like
Why not just
int32? Because it complicates the type checker just enough to hit that annoyance sweet spot, which may lay bare issues earlier in the development process. Having both an integer and float type also forces me to figure out if and what type conversions I want to have in Qak.
v0.1 of Qak will only allow the definition of
foreign types. These are types for which the full definition must be provided to the compiler outside of a
.qak source file. However, they must still be declared in the module they belong to. Sine these are built-in types, they go into the file (
std.qak), the standard library module:
module std foreign type boolean foreign type int32 foreign type float32
Why have them explicitely in a
.qak file like that at all? You'll see in the next section on functions an operators.
v0.1 will also not include any string or collection types, which keeps the scope small. The v0.1 types don't require me to implement a GC, which is nice. It's also an open invitation to smack into a design wall at full speed when designing the interpreter. We'll see how that works out.
Functions, operators, overloading, and foreign function interface
A function is a piece of code that has a name, (optional) arguments, and a return type. I don't particularly care for the syntax, but here we go:
function foo(a: int32, b: int32): int32 ... end
Omitting the return type means the function returns
nothing (which can of course also be specified explicitely).
A pretty standard affair, with the notable exception that Qak won't be a curly based language, at least not for statement/block delimitation. Instead we write it all out. And we'll like it. I believe this to be friendlier to beginners, but we'll see.
Surprise: operators and functions are the same thing. It's just that operators have syntactic sugar, e.g. they can be called by infix notation ala
2 + 3, instead of
+(2, 3) (looks familiar, eh?). There's also unary operators like
The whole range of operators for our built-in types are expressed as functions. The precedence of operators is fixed and part of the language definition. Making that configurable seems like a lot of pain. The supported operators consist of the standard logical and arithmetic operators you are used to from other languages, including unary, binary, and ternary operators.
Going back to our
std.qak file, which defines the built-in types, it comes as no surprise that the operators are defined there as well, namely as
foreign functions, the implementation of which must be provided to interpreter later on.
module std foreign type boolean foreign type int32 foreign type float32 foreign function !(a:boolean): boolean foreign function &&(a: boolean, b: boolean): boolean ... foreign function ==(a:float32, b: float32): boolean ... foreign function +(a: int32, b: int32): int32 foreign function *(a: int32, b: int32): int32 ... foreign function +(a: float32, b: float32): float32
Should the compiler see an expression like
3 + 2, it will try to find a function with the name
+, with two arguments of type
int32, and insert a call to that function in the generated code. "But Mario, that will be slow!" I hear you scream at the top of your lungs. Yes. Which is why there'll be an optimization pass that translates certain known operator function calls to virtual machine instructions that are quicker. Think HotSpot intrinsics.
3.0 + 2, the addition of a
float32 and an
int32? I can't possibly create operator functions for all permutations of input types. So I'll rely on explicit casting for now. The expression
3.0 + 2 will throw a compiler error, as the function
+(a: float32, b: int32): float32 is undefined. Instead, a user is expected to explicitely cast one of the operands to the type of the other, e.g.
3.0 + toFloat32(2). I might add some compiler sugar to insert automatic casts. For now everything is explicit.
Functions can also be overloaded based on their argument types. I could include the return type in this mechanism, but my gut tells me that path leads to lots and lots of darkness. We'll see.
Many languages de jour treat almost anything as an expression that evaluate to a value, including control flow statements like
if. Qak won't do that. Instead, it provides a handful of statements that do not produce a value, and also allow you to liter your functions with naked expressions. The values generated by such expressions will be discarded.
// variable declaration with initializers and simple type inference // Variables without initializer will be initialized to the type's // default value. var foo = 123 var bar: boolean = true var zeroInitializer: int32 // While statement, who needs for(-each)?! while(bar) // Variables are block scoped var uff = 3 // If statement if (foo > 200) then // Assignments bar = false else // arbitrary expressions (Which includes things // like function calls. print(foo) // The value generated by this expression is simply discarded. foo + 34 * zeroInitializer if (shouldWeStop()) then // break and continue (not pictured here) break end end foo = foo + 1 end // return statement return foo * 10
There's plenty in the above code that will force me to build reasonable infrastructure for the interpreter. More complex constructs are mostly just syntactic sugar that can be compiled to the basic statements shown above.
The compilation unit of Qak is the module. A module consists of:
- The module name, expressed as
module myModuleNameat the top of the file
- Imports of other modules (see below).
- Type definitions. For now only
foreigntypes are possible.
- Function definitions.
- Module variable definitions.
A module can import other modules, as long as the resulting graph of imports is acyclic. Rust does this, some people hate it, so I'll do it as well as I strive on hatred. All modules import the
std module by default.
Importing a module is done like this:
import someModuleName. I'll cross the "how the hell do you resolve modules?" bridge later. To avoid name clashes between things from an imported module and the current module, one can do
import someModuleName as foo, and access things from that module via
Any type, function, or variable with a name starting with
_ is considered private to the module. Anything else can be accessed by other modules. Again, this is likely a terrible idea. An explicit
private specifier might be better.
Since a module can have variables, and those variables can have initializer expressions, I have to figure out a way when those initializers are run by the interpreter. I'm sure every little thing is gonna be alright.
The standard library will be exceptionally minimal. It will define the (
foreign) built-in types and their operators, common type conversion functions, and a handful of
print() overloads, one for each type. All its code will go into the
This should be enough to create a contrived factorial nano benchmark to compare Qak against the likes of Lua and Python. Next time we'll look at the implementation of the parser and generation of the abstract syntax tree for the poor excuse of a language specification outlined above.
Discuss this post by replying to this tweet.