Saturday, March 22, 2008

Episode 8: Hashtables and Arrays

Welcome to Episode 8! This is the second-last episode in this tutorial. After this episode, we'll have a complete implementation of our Squaak language.
This episode focuses on aggregate data structures: arrays and hashtables. We'll discuss the syntax to assign to them and to construct them. We'll see that implementing the action methods is really easy, almost trivial. After that, we'll make some notes on aggregates as arguments, and how they differ from the basic data types when passing them around as subroutine arguments.

Arrays and Hashtables
Besides basic data types such as integer, floating-point and string, Squaak has two aggregate data types: array and hashtable. An array is an object that can store a sequence of values. The values in this sequence can be of different types, unlike some languages that require all elements of an array to be the same type. An example of using arrays is shown below:
grades[0] = "A"
grades[1] = "A+"
grades[2] = "B+"
grades[3] = "C+"
A hashtable stores key-value pairs; the key is used as index to store a value. Keys must be string constants, but the value can be of any type. An example is shown below:
lastnames{"larry"}   = "wall"
lastnames{"allison"} = "randal"
Array constructors
Just as there are integer literals (42) and string literals ("hello world") that can be assigned to variables, you can have array literals. Below is the grammar rule for this:
rule array_constructor {
'[' [ <expression> [',' <expression>]*]? ']'
{*}
}
Some examples are shown below:
foo = []
bar = [1, "hi", 3.14]
baz = [1, [2, 3, 4] ]
The first example creates an empty array and assigns this to foo. The second example shows the construction of three elements, assigning the array to bar. Note that the elements of one array can be of different types. The third example shows the construction of nested arrays. This means that element baz[1][0] evaluates to the value 2 (indexing starts at 0).

Hashtable constructors
Besides array literals, Squaak supports hashtable literals, that can be constructed through a hashtable constructor. The syntax for this is expressed below:
rule hash_constructor {
'{' [<named_field> [',' <named_field>]* ]? '}'
{*}
}

rule named_field {
<string_constant> '=>' <expression>
{*}
}
Some examples are shown below:
foo = {}
bar = { "larry" => "wall", "allison" => "randal" }
baz = { "a" => { "b" => 42} }
The first line creates an empty hashtable and assigns this to foo. The second creates a hashtable with two fields: "larry" and "allison". Their respective values are: "wall" and "randal". The third line shows that hashtables can be nested, too. There, a hashtable is constructed that has one field, called "a", and its value is another hashtable, containing a field "b" that has the value 42.

Implementation
You might think implementing support for arrays and hashtables looks rather difficult. Well, it's not. Actually, the implementation is rather straightforward. First, we're going to update the grammar rule for primary:
rule primary {
<identifier> <postfix_expression>*
{*}
}

rule postfix_expression {
| <index> {*} #= index
| <key> {*} #= key
}

rule index {
'[' <expression> ']'
{*}
}

rule key {
'{' <expression> '}'
{*}
}
A primary object is now an identifier followed by any number of postfix-expressions. A postfix expression is either a hashtable key or an array index. Allowing any number of postfix expressions allows to nest arrays and hashtables in each other, allowing us to write, for instance:
foo{"key"}[42][0]{"hi"}
Of course, you as a Squaak programmer must make sure that foo is actually a hashtable, and that foo{"key"} yields an array, and so forth. Implementing this is actually quite simple. First, let us see how to implement the action method index.
method index($/) {
my $index := $( $<expression> );

my $past := PAST::Var.new( $index,
:scope('keyed'),
:viviself('Undef'),
:vivibase('ResizablePMCArray'),
:node($/) );
make $past;
}
First, we retrieve the PAST node for expression. Then, we create a keyed variable access operation, by creating a PAST::Var node and setting its scope to 'keyed'. If a PAST::Var node has keyed scope, then the first child is evaluated as the aggregate object, and the second child is evaluated as the index on that aggregate.

But wait! The PAST::Var node we just created has only one child!

Here's where the updated action method for primary comes in. This is shown below.
method primary($/) {
my $past := $( $<identifier> );

for $<postfix_expression> {
my $expr := $( $_ );
$expr.unshift( $past );
$past := $expr;
}
make $past;
}
First, the PAST node for identifier is retrieved. Then, for each postfix-expression, we get the PAST node, and unshift the (current) $past onto it. Effectively, the (current) $past is set as the first child of $expr. And you know what $expr contains: that's the keyed variable access node, that was created in the action method index.
After that, $past is set to $expr; either there's another postfix-expression, in which case this $past will be set as the first child of that next postfix-expression, or, the current $past is set as the result object.

Implementing Constructors
To implement the array and hashtable constructors, we're going to take advantage of Parrot's Calling Conventions (PCC). The PCC supports, amongst others, optional parameters, named parameters and slurpy parameters. If you're Dutch, you might think that slurpy parameters make a lot of noise ("slurpen" is a Dutch verb meaning drinking carefully, which you usually do if your beverage is hot, making noise in the process), but you would be wrong. Slurpy parameters will store all remaining arguments that have not yet been stored in other parameters (implying that there can only be one slurpy (positional) parameter, and it should come after all normal (positional) parameters). Parrot will automatically create an aggregate to store these remaining arguments. Besides positional slurpy parameters, you can also define a named slurpy parameter, which will store all remaining named parameters, after all normal (named) arguments have been stored.

You might be confused by now.

Let's look at an example, as this issue is worth a few brain cells to store.
.sub foo
.param pmc a
.param pmc b
.param pmc c :slurpy
.param pmc k :named('x')
.param pmc l :named('y')
.param pmc m :named :slurpy

.end

foo(1, 2, 3, 4, 6 :named('y'), 5 :named('x'), 7 :named('p'), 8 :named('q') )
This will result in the following mapping:
a: 1
b: 2
c: {3, 4}
k: 5
l: 6
m: {"p"=>7, "q"=>8}
So, after the positional parameters (a, b), c is declared as a slurpy parameters, storing all remaining positional parameters. Parameters k and l are declared as named parameters, which have the respective names "x" and "y". Using these names, values can be passed. After the named parameters, there's the parameter m, which is both flagged as named and slurpy. This parameter will store all remaining named arguments that have not yet been stored by the normal named parameters.

The interesting parameters for us are "c" and "m". For the positional slurpy parameter, Parrot creates an array, while for the named slurpy parameter a hashtable is created. This happens to be exactly what we need! Implementing the array and hash constructors becomes trivial:
.sub '!array'
.param pmc fields :slurpy
.return (fields)
.end

.sub '!hash'
.param pmc fields :named :slurpy
.return (fields)
.end
Array and hashtable constructors can then be compiled into subroutine calls to the respective Parrot subroutines, passing all fields as arguments. (Note that these names start with a "!", which is not a valid Squaak identifier. This prevents us from calling these subs in normal Squaak code).

Basic data types and Aggregates as arguments
All data types, both basic and aggregate data types are represented by Parrot Magic Cookies (PMCs). The PMC is one of the four built-in data types that Parrot can handle; the others are integer, floating-point and string. Currently, the PCT can only generate code to handle PMCs, not the other basic data types.
Parrot has registers for each its four built-in data types. The integer, floating-point and string registers store the actual data value, while PMC registers store a reference to the PMC object. This has consequences for how PMCs are handled when passing them as arguments. When passing a PMC as an argument, the invoked subroutine gets access to the PMC reference; in other words, PMCs are passed by reference. This means that the subroutine can change the original argument that was passed by the caller. Of course, it depends what instructions are being generated, what the invoked subroutine does to the references.
In Squaak, when passing basic data values, these cannot be changed by the invoked subroutine. When assigning a new value to a parameter, a whole new object is created and bound to the parameter identifier. No changes are made to the original argument.
Aggregate data types are handled differently, however. When an invoked subroutine assigns to an index or hashtable field of a parameter, then the original argument is affected.
In other words, basic data types have by value semantics, while aggregate data types have by reference semantics. A short example to demonstrate this:
sub foo(a,b,c)
a = 42
b[0] = 1
c{"hi"} = 2
end

var a = 0
var b = []
var c = {}
foo(a,b,c)

print(a, b[0], c{"hi"} ) # prints 0, 1, 2
What's Next?
This was the last episode to discuss implementation details to make Parrot (run) Squaak. After doing this episode's exercises, your implementation should be fairly complete. Next episode will be the last of this series, in which we'll recap what we did, and demonstrate our language with a nice demo program.

Exercises
  1. We've shown how to implement keyed variable access for arrays, by implementing the action method for index. The same principle can be applied to keyed access for hashtables. Implement the action method for key.
  2. Implement the action methods for array_constructor and hash_constructor. Use a PAST::Op node and set the pasttype to 'call'. Use the "name" attribute to specify the names of the subs to be invoked (e.g., :name("!array") ). Note that all hash fields must be passed as named arguments. Check out PDD26 for doing this, and look for a "named " method.
  3. We'd like to add a little bit of syntactic sugar for accessing hashtable keys. Instead of writing foo{"key"}, I'd like to write foo.key. Of course, this only works for keys that do not contain spaces and such. Add the appropriate grammar rule (call it "member") that enables this syntax, and write the associated action method. Make sure this member name is converted to a string.
    Hint: use a PAST::Val node for the string conversion.
License
The source code in this tutorial has been released by the author into the public domain. Where this is not possible by law, the author grants license to use this file for any reason without any rights reserved, and with no warranty express or implied or fitness for a particular purpose.

No comments: