Qak 0.1
Alrighty. With all the basics out of the way as described in the last Qak post, it's time to define a minimal language I want to implement. This v0.1 will only contain the absolute mininum language features needed to get the infrastructure composed of tokenizer, parser, AST, type checker, byte code generator, and interpreter going. Ideally, I can also build the debugger interface for this.
Once all these components are in place for v0.1, I hope to be able to iterate quickly, adding new language, interpreter and standard library features. I've used this iterative approach for previous (toy) language projects and it's so far worked pretty well. It's also a surefire way to jump head first into super deep dead ends. Which means I get to document my back tracking and failures for your enjoyment.
Minimal language features
Before we dive into the semantics and syntax bike shed, let me quickly lay out the planned high level language features for v0.1.
- Basic types and values.
- Functions, operators, overloading, and foreign function interface.
- Basic statements.
- Modules.
- Minimal standard library.
Let's look at these in more detail.
Types and values
For v0.1, I'll only support a handful of what's usually known as primitive types.
boolean
is needed for conditionals. Values are expressed astrue
andfalse
in code.int32
andfloat32
should be self-explanatory and foreshadow what else to expect in the future. Values are expressed as the usual literals, like123
,0xfe
,123.456
,123f
, etc.nothing
is needed to express the absence of a return value for a function. The only value isnothing
. Yes, be worried. This might eventually turn into something likenull
!
Why not just int32
? Because it complicates the type checker just enough to hit that annoyance sweet spot, which may lay bare issues earlier in the development process. Having both an integer and float type also forces me to figure out if and what type conversions I want to have in Qak.
v0.1 of Qak will only allow the definition of foreign
types. These are types for which the full definition must be provided to the compiler outside of a .qak
source file. However, they must still be declared in the module they belong to. Sine these are built-in types, they go into the file (std.qak
), the standard library module:
module std
foreign type boolean
foreign type int32
foreign type float32
Why have them explicitely in a .qak
file like that at all? You'll see in the next section on functions an operators.
v0.1 will also not include any string or collection types, which keeps the scope small. The v0.1 types don't require me to implement a GC, which is nice. It's also an open invitation to smack into a design wall at full speed when designing the interpreter. We'll see how that works out.
Functions, operators, overloading, and foreign function interface
A function is a piece of code that has a name, (optional) arguments, and a return type. I don't particularly care for the syntax, but here we go:
function foo(a: int32, b: int32): int32
...
end
Omitting the return type means the function returns nothing
(which can of course also be specified explicitely).
A pretty standard affair, with the notable exception that Qak won't be a curly based language, at least not for statement/block delimitation. Instead we write it all out. And we'll like it. I believe this to be friendlier to beginners, but we'll see.
Surprise: operators and functions are the same thing. It's just that operators have syntactic sugar, e.g. they can be called by infix notation ala 2 + 3
, instead of +(2, 3)
(looks familiar, eh?). There's also unary operators like !
or -
.
The whole range of operators for our built-in types are expressed as functions. The precedence of operators is fixed and part of the language definition. Making that configurable seems like a lot of pain. The supported operators consist of the standard logical and arithmetic operators you are used to from other languages, including unary, binary, and ternary operators.
Going back to our std.qak
file, which defines the built-in types, it comes as no surprise that the operators are defined there as well, namely as foreign
functions, the implementation of which must be provided to interpreter later on.
module std
foreign type boolean
foreign type int32
foreign type float32
foreign function !(a:boolean): boolean
foreign function &&(a: boolean, b: boolean): boolean
...
foreign function ==(a:float32, b: float32): boolean
...
foreign function +(a: int32, b: int32): int32
foreign function *(a: int32, b: int32): int32
...
foreign function +(a: float32, b: float32): float32
Should the compiler see an expression like 3 + 2
, it will try to find a function with the name +
, with two arguments of type int32
, and insert a call to that function in the generated code. "But Mario, that will be slow!" I hear you scream at the top of your lungs. Yes. Which is why there'll be an optimization pass that translates certain known operator function calls to virtual machine instructions that are quicker. Think HotSpot intrinsics.
What about 3.0 + 2
, the addition of a float32
and an int32
? I can't possibly create operator functions for all permutations of input types. So I'll rely on explicit casting for now. The expression 3.0 + 2
will throw a compiler error, as the function +(a: float32, b: int32): float32
is undefined. Instead, a user is expected to explicitely cast one of the operands to the type of the other, e.g. 3.0 + toFloat32(2)
. I might add some compiler sugar to insert automatic casts. For now everything is explicit.
Functions can also be overloaded based on their argument types. I could include the return type in this mechanism, but my gut tells me that path leads to lots and lots of darkness. We'll see.
Basic statements
Many languages de jour treat almost anything as an expression that evaluate to a value, including control flow statements like if
. Qak won't do that. Instead, it provides a handful of statements that do not produce a value, and also allow you to liter your functions with naked expressions. The values generated by such expressions will be discarded.
// variable declaration with initializers and simple type inference
// Variables without initializer will be initialized to the type's
// default value.
var foo = 123
var bar: boolean = true
var zeroInitializer: int32
// While statement, who needs for(-each)?!
while(bar)
// Variables are block scoped
var uff = 3
// If statement
if (foo > 200) then
// Assignments
bar = false
else
// arbitrary expressions (Which includes things
// like function calls.
print(foo)
// The value generated by this expression is simply discarded.
foo + 34 * zeroInitializer
if (shouldWeStop()) then
// break and continue (not pictured here)
break
end
end
foo = foo + 1
end
// return statement
return foo * 10
There's plenty in the above code that will force me to build reasonable infrastructure for the interpreter. More complex constructs are mostly just syntactic sugar that can be compiled to the basic statements shown above.
Modules
The compilation unit of Qak is the module. A module consists of:
- The module name, expressed as
module myModuleName
at the top of the file - Imports of other modules (see below).
- Type definitions. For now only
foreign
types are possible. - Function definitions.
- Module variable definitions.
A module can import other modules, as long as the resulting graph of imports is acyclic. Rust does this, some people hate it, so I'll do it as well as I strive on hatred. All modules import the std
module by default.
Importing a module is done like this: import someModuleName
. I'll cross the "how the hell do you resolve modules?" bridge later. To avoid name clashes between things from an imported module and the current module, one can do import someModuleName as foo
, and access things from that module via foo.thing
.
Any type, function, or variable with a name starting with _
is considered private to the module. Anything else can be accessed by other modules. Again, this is likely a terrible idea. An explicit private
specifier might be better.
Since a module can have variables, and those variables can have initializer expressions, I have to figure out a way when those initializers are run by the interpreter. I'm sure every little thing is gonna be alright.
Standard library
The standard library will be exceptionally minimal. It will define the (foreign
) built-in types and their operators, common type conversion functions, and a handful of print()
overloads, one for each type. All its code will go into the std.qak
file.
Up next
This should be enough to create a contrived factorial nano benchmark to compare Qak against the likes of Lua and Python. Next time we'll look at the implementation of the parser and generation of the abstract syntax tree for the poor excuse of a language specification outlined above.
Discuss this post by replying to this tweet.