Chapter 4 - Types and Expressions

In this chapter we’ll cover most of the types and the expressions that operate on them. Let’s get started with the simple types; types that can exist on their own.

Simple Types

Integers

You’ve been introduced to one of the integers already; i32. The i stands for ‘integer’. The number after the ‘i’ is how many bits - zeros and ones - represent the integer. That means the number determines the range of numbers that can be stored. For instance, the i8 is 8 bits (1 byte) in size, and stores the integer range -128 to 127. If you add one to an i8 holding a value of 127, it is said to ‘overflow’, and wraps around to -128. Here are all the signed integers:

i8  min:-128 max:127
i16 min:-32,768 max:32,767
i32 min:−2,147,483,648 max:2,147,483,647
i64 min:-9,223,372,036,854,775,808 max:9,223,372,036,854,775,807

As said, those are the ‘signed’ integers. The sign being the minus sign. There’s another kind of integer - the ‘unsigned’ integer. As their name suggests, they don’t have a sign; they can’t hold negative values.

u8  min:0 max:255
u16 min:0 max:65,535
u32 min:0 max:4,294,967,295
u64 min:0 max:18,446,744,073,709,551,615

Integer Literals

A series of digits on their own is an integer literal:

123
46

If it’s preceded by - it’s a negative value:

-4
-42

Integer literals are signed by default, but if you put U on the end they will be given an unsigned type:

32U
56U

Underscores are ignored, this can help with larger numbers:

1000_000_000  // same as '1000000000'
3_2  // same as '32'

A literal starting with 0x is a hexadecimal literal:

0xFF  (same as 255)

A literal starting with 0b is a binary literal:

0b10  (same as 2)

Floating Point

Floating points represent numbers with a decimal place. You can’t represent pi as an integer, for instance, so that’s where the floating point types come in. Volt has two, and like integers they are delimited by their size.

f32
f64

Floating points use the same arithmetic expressions as integers so 5.0 / 2.0 gives the result 2.5. Floating point is a big topic, and the finer details deserve their own document. If you’re interested, What Every Computer Scientist Should Know About Floating-Point Arithmetic is well worth reading.

Floating Point Literals

The simplest form of floating point literal is just a number with a decimal point in it:

0.0

Floating point literals are double precision (f64) unless they end with f:

0.0f

Characters

Character types hold little pieces of text. There are three:

char   1 byte wide, for UTF-8
wchar  2 bytes wide, for UTF-16
dchar  4 bytes wide, for UTF-32

Character Literals

A character in single quotes (') is a character literal:

'a'
'2'

The ‘escape’ character is a backslash (\), this lets you insert various special characters that you can’t type:

\' The ' character
\\ The \ character
\0 An ASCII nul (a zero)
\n A newline character
\r Carriage return
\t A tab

Null

null is a special kind of value. For several reference types (like pointers, classes, and arrays), null represents an uninitialised type.

ptr: i32* = null;
// *ptr = 2;  // Crash!
obj: MyObject = null;
// obj.someMethod();  // Crash!

Note that in the examples above, assigning to null is redundant: both pointers and class instances default to a value of null unless otherwise assigned to.

One interaction of null that is worth noting is with array types. null, just as before represents an uninitialised type:

arr: i32[] = null;
// arr[0] = 3;  // Crash!

The main difference being that an uninitialised can be operated on by several array operations without error. You can check the length field of an array initialised with null (0), and you can check the ptr field too (naturally, null). There are also operations that are valid with an empty array; concatenation (the ~ operator combines two arrays into one array), for example:

arr: i32[] = null;  // Again, technically redundant here.
arr ~= 1;  // arr is now [1], with a valid ptr field.

Note that in expressions involving arrays of arrays, null has the type of the whole array, not the base type. This might seem obvious given i32[], but given:

arr: i32[][];
arr ~= null;

The length of arr is 0, not 1. To see why, simply expand out the expression by hand:

arr = arr ~ cast(i32[][])null;

Two empty i32[][] concatenated together equals one empty array. If what you want to do is concatenate an empty i32[] to a list of i32[]s, then you can explicitly cast the null yourself:

arr: i32[][];
arr ~= cast(i32[])null;  // arr.length == 1

Remaining Primitives

The last two primitives are bool and void.

bool is a boolean data type. It is either true or false.

void is not a type at all, but instead marks the absence of a type.

Composite Types

These types can only be complete when paired with other composite types, and at least one concrete (a primitive, or user-defined) type.

Arrays

We used these last chapter. To declare an array of type T, we would write T[].

i32[]     an array of i32s.
bool[]    an array of bools.
i16[][][] an array of an array of an array of i16s.

Strings

You’ve seen these. They’re arrays of characters.

month := "January";  // month is of type string

The type definition of string is simple:

alias string = immutable(char)[];

alias creates a shorthand way of referring to a type. Using string or immutable(char)[] is the same, and the compiler will generate the same code for either. immutable means you can’t change the portion in parens:

a: string = "hello";
a[0] = 'H';  // Error!

You can, however, change the array:

a: string = "hello";
a = "H" ~ "ello";  // Hello

Volt strings are arrays of UTF-8 codepoints. Using unicode correctly deserves a document all of its own, but don’t assume everything is ASCII. It may seem, for example, that the length parameter is counting letters, but that’s not true. The length of "world" may be 5, but the length of 世界 is not 2, but 6, because despite being made of 2 characters, it’s 6 bytes of UTF-8. Use the count function in watt.text.utf if you want to count ‘characters’, and don’t assume that the Nth index into a string will get you the Nth character.

\ escapes a string and lets you put special characters into a string:

writeln("hello\\\nworld");

would display

hello\
world

You can create raw strings if you need to use a lot of backslashes, for regexes and Windows paths:

r"C:\Users\Steve"
`C:\Users\Steve`

Three backticks gets you a multiline string:

```
hello
world
this
is a multiline
string
```

Array Literals

The [ character denotes the start of the array literal. Values are separated by the , character, and the literal ends with the ] character:

[1, 2, 3]
['a', 'b']

Pointers

A pointer holds the address of another value. To declare a pointer to type T, we would write T*.

i32*     a pointer to an i32.
bool[]*  a pointer to an array of bools.
char*[]  an array of pointers to chars.

Associative Arrays

An associative array associates ‘keys’ with ‘values’. To declare a an associative array of T values, using J as a key, we would write T[J].

aa: bool[i32]  // an associative array of bools keyed by i32
void*[string]  // an associative array of pointers to void keyed by string

Associative Array Literals

Like the array literals, associative array literals open with [, have values separated by ,, and end with ]. The values of an associative array literal have the key and value separated by a : character. So [1:"hello", 2:"goodbye"] would be a string[i32], that has a value of “hello” associated with a key 1, and so on.

Storage Type

The three storage types are const, immutable, and scope. These can be applied to a type using parens. const(i32[])* is a pointer to const array of i32s. Now, a quick overview of the meaning of each of these types.

Const

A const type may not be modified.

a: const(i32) = 12;  // A const type may be initialised.
a = 6;  // Error, cannot modify const.

Immutable

An immutable type may not be modified. What separates this from const is that an immutable type instance cannot be constructed of a type that could be modified. If we think of a variable as a window into memory, a const window guarantees that the window marked as const will not be used to modify memory, but another window might modify that memory. An immutable window, on the other hand, makes the same guarantee that it will not be used to change that memory, but adds a further guarantee: the memory it’s looking at will not be changed by anyone.

i: i32 = 12;
ip: const(i32)* = &i;
assert(*ip == 12);
i = 6;
assert(*ip == 6);

Whereas with immutable:

i: i32 = 12;
ip: immutable(i32)* = &i;  // error, cannot convert i32* to immutable(i32)*.

Short of explicitly going over the type system’s head by casting away immutable, immutable values won’t change.

Scope

A scope value can be modified, but it cannot become non-scope. A scoped type has a list of restrictions on what can be done with it. These restrictions are to prevent a reference to a piece of memory escaping the lifetime of the scope they were declared in, so that the pointer doesn’t become invalid.

That last sentence may have looked like Martian to you. At first, scope seems like an excuse for the compiler to yell at you for no good reason, but there is a genuine use to it, so here’s a concrete example.

Say we’re writing an API that takes a pointer to an i32, and doubles it (I didn’t say it was a useful API).

import watt.io;

fn processInteger(ip: i32*)
{
	*ip = *ip * 2;
}

fn apiUser()
{
	i := 32;
	processInteger(&i);
	writeln(i);  // output '64'
}

fn main() i32
{
	apiUser();
	return 0;
}

As long as the pointer the user passes in is valid when processInteger is called, nothing will go wrong. But say you produce version 2.0 of the API, and this version stores pointers, and then processes them in a batch, later on.

module test;

import watt.io;

global integers: i32*[];

fn storeInteger(ip: i32*)
{
	integers ~= ip;
}

fn processIntegers()
{
	foreach (ip; integers) {
		*ip = *ip * 2;
		writeln(*ip);  // output ????
	}
}

fn apiUser()
{
	i := 32;
	storeInteger(&i);
}

fn main() i32
{
	apiUser();
	processIntegers();
	return 0;
}

There’s no real way to predict how this program will behave on your machine. On mine, it produces seemingly random numbers, but it could just as easily crash. What happens here is known as ‘stack corruption’, and is the source of many very difficult to debug errors. Let’s break it down.

The i variable, in apiUser is what is known as a ‘stack’ variable. The stack is an area of memory that functions use for temporary variables – the local variables that don’t use the GC, or any other form of memory allocation. Once you return from these functions, the memory is free to be reused somewhere else.

But if some code, like storeInteger squirrels away that pointer, and then writes to it, all sorts of evil can happen. What processIntegers thinks it’s writing to is the i variable – that’s what the pointer was pointing to when storeInteger was called, after all. But now, even though the pointer value hasn’t changed, what it’s pointing to has. In fact, it’s likely (but not guaranteed) to be pointing to a point in processInteger, as it was called right next to apiUser. It’s easy to see how this can lead to bugs, especially in larger programs where it’s not obvious that this is happening. It leads to strange behaviours like functions jumping into the middle of other functions – the programs can limp on for a long time, doing all sorts of damage, before they crash.

Enter scope. If the pointer types above were instead marked as scope

global integers: scope i32*[];

fn storeInteger(ip: scope i32*)
{
	integers ~= ip;  // Error! Can't escape scope!
}

If the pointer would leave the scope of the current function, e.g. assigning to a global variable as in the above example, the compiler will complain. This is why inline functions are typed as scoped dg – they refer to the stack frame of the current function, so storing and calling them outside of that frame can lead to stack corruption.

So in addition to a scope value being disallowed from being implicitly converted to a non scope value, a scope value may not be assigned somewhere outside of the current function, and it may not be returned, even if being returned as a scope type.

fn storeInteger(ip: scope i32*) scope i32*
{
	return ip;  // Error! Can't escape scope!
}

Mutable Indirection.

The above rules, where a type ‘cannot’ be converted to another can be elided if a type has no ‘mutable indirection’. A type with mutable indirection can change memory. A pointer, an array, and so on. A purely value type, like an i32, or a struct that only has i32 members is said to be non mutably indirect, and the type system is more lenient when dealing with them.

Compare assigning an immutable array…

ia: immutable(i32[]);
ib: i32[] = ia;  // error!

…to assigning an immutable integer:

i: immutable(i32);
j: i32 = i;

The above is allowed, because j can not impact i’s value in any way.

Expressions

Expressions perform an operation on one or more values.

Arithmetic

The simplest expressions are the basic math operations. +, -, *, and /. Or, addition, subtraction, multiplication, and division, respectively.

a := 5 + 3;  // 8
b := 5 - 3;  // 2
c := 5 * 3;  // 15
d := 5 / 3;  // 1

If you’ve not dealt with integer math before, you were probably nodding your head right up to the last example there. Integers are whole numbers, with no pesky fractional portions like 1, 23, 0, or -42. If you divide two integers with /, you will get ‘integer division’ – the fractional portion will be ‘chopped off’ (effectively rounding down to the nearest integer, never up)

a := 10 / 6;  // 1
b := 2 / 5;   // 0

Dividing by zero will cause your program to crash. If you want to represent a number with its fractional portion intact, you’ll need a floating point number (also known as a real):

a := 10.0 / 6.0;  // 1.666666666666667 (ish)

If either side of the division operation is a real, then floating point division will be used, and the type of the operation will either be f32 or f64.

One other operator that’s not quite as well known, but very useful is the modulo operator, % which returns the remainder of a division operation.

a := 10 % 4;  // 2
b := 10 % 5;  // 0

You’ll often see this used to determine if a value is even or odd:

fn isEven(n: i32) bool
{
	return n % 2 == 0;
}

Concatenation

In many languages, if you want to concatenate (stick) two strings together, you would use the + operator. Volt uses a separator operator altogether, ~ – the concatenation operator:

a := "hello " ~ "world"  // "hello world"
b ~= " nice.";  // "hello world nice."

Note that ~ requires the language runtime to allocate memory for a new string, and concatenating in a loop can be suprisingly slow because of this. If you find yourself doing a lot of string concatenation, the StringSink struct in watt.text.sink is worth using.

Relational Operators

These return a bool value.

== is the equality operator. If the two sides of this expression are the same, it returns true. Otherwise, it returns false.

import watt.io;

fn main() i32
{
	str := readln();
	if (str == "banana") {
		writeln("you wrote 'banana'");
	} else {
		writeln("you didn't write 'banana'");
	}
	return 0;
}

!= is the inequality operator. It’s like ==, but instead returns true if both sides are not the same, otherwise it returns false.

import watt.conv;
import watt.io;

fn main() i32
{
	writeln("I'm thinking of a number between one and one hundred. What is it?");
	n := toInt(readln());
	if (n != 32) {
		writeln("you didn't get it!");
	} else {
		writeln("correct!");
	}
	return 0;
}

The < and > and operators returns true if the left side is less than or greater than the right side, respectively.

import watt.io;

fn main() i32
{
	if (5 > 2) {
		writeln("five is bigger than two");
	}
	if (2 < 5) {
		writeln("two is less than five");
	}
	return 0;
}

Output:

five is bigger than two
two is less than five

The <= returns true if the left side is less than or equal to the right, and >= returns true if the right side is greater than or equal to the right.

5 >= 4  true
5 >= 5  true
5 >= 6  false
4 <= 5  true
4 <= 4  true
4 <= 3  false

The relational comparison operators function on arrays, too. The first than determines the ordering. If one array is shorter than the other, but equal in other respects it evaluates ‘less’ than the longer array. That is to say,

"aaaa" < "aaaaaaaa"  true
"b"    < "aaaaaaaa"  false

Logical Operators

&& returns true if both sides are true.

true && true    true
true && false   false
false && false  false

|| returns true if one or more side is true.

true || true    true
true || false   true
false || true   true
false || false  false

! returns true if it’s applied to a false value, and false if it’s applied to a true one.

!true   false
!false  true

Casts

Usually, Volt won’t let us assign a variable to another if the types aren’t the same, unless it knows it’ll fit. For instance,

a: i32;
b: i16;
a = b;  // This is okay, as any i16 can fit into the larger i32
b = a;  // Error: certain values of i32 may not fit into i16.

cast lets us say “we know what we’re doing, assign anyway.”

import watt.io;

fn main() i32
{
	b: u8 = cast(u8)257;
	writeln(b);
	return 0;
}

Output:

1

As the maximum value that a u8 can hold is a 255, the cast forces it to truncate the value, and it wraps around.

Increment and Decrement

The increment and decrement operators increase and decrease a variables value by 1 each time.

a := 0;
a++;  // a is 1
a--;  // a is 0

They can be used both before, and after the variable. These are known as prefix and postfix increment/decrement operators.

a := 0;
++a;  // a is 1
a++;  // a is 2
a--;  // a is 1
--a;  // a is 0

There is a difference. The prefix operators return the value modified by the operation:

a := 0;
b := ++a;  // b is 1, a is 1

While the postfix operators return the value before modifying it:

a := 0;
b := a++;  // b is 0, a is 1

Bitwise Operators

The bitwise operators perform logical operations on a bit level. See the wikipedia page for more detail. These operations operate on types of the same size.

| sets a bit if one or both of the bits are set.

0b0000 | 0b0001  // 0b0001
0b0001 | 0b0001  // 0b0001
0b0000 | 0b0000  // 0b0000

& sets a bit if both of the bits are set.

0b0001 & 0b0001  // 0b0001
0b0000 & 0b0001  // 0b0000
0b0001 & 0b0000  // 0b0000

^ sets a bit if one (and only one) of the bits are set.

0b0000 ^ 0b0001  // 0b0001
0b0001 ^ 0b0001  // 0b0000
0b0000 ^ 0b0000  // 0b0000

<< shifts the bit pattern left by the number on the right.

0b0001 << 1  // 0b0010
0b0001 << 2  // 0b0100

>> shifts the bit pattern right by the number on the right.

0b1000 >> 1  // 0b0100
0b1000 >> 2  // 0b0010

Assignment

In addition to the regular assignment operator we’ve been using, there are several that are combined with the operators we’ve been touching on in this chapter.

a += 1 is the same as a = a + 1.

a *= 1 is the same as a = a * 1.

a /= 1 is the same as a = a / 1.

a -= 1 is the same as a = a - 1.

Ternary

The ternary expression is like a compact if statement. It returns a value dependent on a boolean expression.

a := 5 > 2 ? "bigger" : "smaller";
writeln(a);

Output:

bigger

Precedence

Operator precedence is the order in which operators (expressions) are evaluated. For instance, * has a higher precedence than +, so 5 * 5 + 2 is 27, and not 35. The associativity of an operator is from where groups are formed in expressions using the same precedence. = is right associative, so a = b = c is interpreted as a = (b = c), not (a = b) = c. In order from highest precedence to lowest:

* / %
+ - ~
<< >>
< > <= >= in !in
== != is !is
&
^
|
&&
||
= += *= etc

If you wanted 5 * 5 + 2 to mean 35, you can wrap a part of an expression in parens to give it a higher priority, as parens are evaluated first: 5 * (5 + 2).


PREV INDEX NEXT