Monday, March 17, 2008

Episode 4: PAST Nodes and More Statements

The previous episode introduced the full grammar specification of Squaak, and we finally started working on the implementation. If you're doing the exercises, you currently have basic assignments working; strings and integers can be assigned to (global) variables. This episode will focus on implementation of some statement types and explain a few bits about the different PAST node types.

Parrot Abstract Syntax Tree
A Parrot Abstract Syntax Tree (PAST) represents a program written in Squaak (or any other Parrot-ported language), and consists of nodes. In the previous episode, we already saw nodes to represent string and integer literals, identifiers and "operator" nodes (PAST::Op), in our case assignment.
Other operators represent other high-level language constructs such as conditional statements, loops, and subroutine invocation. Depending on the node type, a PAST node can take child nodes. For instance, a PAST node to represent an if-statement can have up to three child nodes. The first child node represents the condition; if true, the second child node is evaluated. If the condition evaluates to false, and there's a third child node, this third child node is evaluated (the else part).
If the PAST represents a subroutine invocation, the child nodes are evaluated in a different way. In that case, the first child node represents the subroutine that is to be invoked (unless the :name attribute was set on this node, but more on that in a later episode), and all other child nodes are passed to that subroutine as arguments.
It generally doesn't matter of which PAST node type the children are. For instance, consider a language in which a simple expression is a statement:
42
You might wonder what kind of code is generated for this. Well, it's really very simple: a new PAST::Val node is created (of a certain type, for this example that would be 'Integer'), and the value is assigned to this node. It might seem a bit confusing to write something like this, as it doesn't really do anything (note that this is not valid Squaak input):
if 42 then "hi" else "bye" end
But again, this works out correctly; the "then" and "else" blocks are compiled to instructions that load that particular literal into a PAST::Val node and leave it there. That's fine, if your language allows such statements.
The point I'm trying to make is, that all PAST nodes are equal. You don't need to think about the node types if you set a node as a child of some other parent node. Each PAST node is compiled into a number of PIR instructions.

Go with the control-flow
Now you know a bit more on PAST nodes, let's get our hands dirty and implement some more statement types. In the rest of this episode, we'll handle if-statements and throw-statements.

If-then-else
The first statement we're going to implement now is the if-statement. An if-statement has typically three parts (but this of course depends on the programming language): a conditional expression, a "then" part and an "else" part. Implementing this in Perl 6 rules and PAST is almost trivial:
rule if_statement {
'if' <expression> 'then' <block>
['else' $<else>=<block> ]?
'end'
{*}
}

rule block {
<statement>*
{*}
}

rule statement {
| <assignment> {*} #= assignment
| <if_statement> {*} #= if_statement
}
Note that the optional else block is stored in the match object's "else" field. If we hadn't written this $<else>= part, then <block> would have been an array, with block[0] the "then" part, and block[1] the optional else part. Assigning the optional else block to a different field, makes the action method slightly easier to read.
Also note that the statement rule has been updated; a statement is now either an assignment or an if-statement. As a result, the action method statement now takes a key argument. The relevant action methods are shown below:
method statement($/, $key) {
# get the field stored in $key from the $/ object,
# and retrieve the result object from that field.
make $( $/{$key} );
}

method block($/) {
# create a new block, set its type to 'immediate',
# meaning it is potentially executed immediately
# (as opposed to a declaration, such as a
# subroutine definition).
my $past := PAST::Block.new( :blocktype('immediate'),
:node($/) );

# for each statement, add the result
# object to the block
for $<statement> {
$past.push( $( $_ ) );
}
make $past;
}

method if_statement($/) {
my $cond := $( $<expression> );
my $then := $( $<block> );
my $past := PAST::Op.new( $cond, $then,
:pasttype('if'),
:node($/) );
if $<else> {
$past.push( $( $<else>[0] ) );
}
make $past;
}
That's, easy, huh? First, we get the result objects for the conditional expression and the then part. Then, a new PAST::Op node is created, and the :pasttype is set to 'if', meaning this node represents an if-statement. Then, if there is an "else" block, this block's result object is retrieved and added as the third child of the PAST node. Finally, the result object is set with the make function.

Result objects
At this point it's wise to spend a few words on the make function, the parse actions and how the whole PAST is created by the individual parse actions. Have another look at the action method if_statement. In the first two lines, we request the result objects for the conditional expression and the "then" block. When were these result objects created? How can we be sure they're there?
The answer lies in the order in which the parse actions are executed. The special "{*}" symbol that triggers a parse action invocation, is usually placed at the end of the rule. For this input string: "if 42 then x = 1 end" this implies the following order:
  1. parse TOP
  2. parse statement
  3. parse if_statement
  4. parse expression
  5. parse integer
  6. create PAST::Val( :value(42) )
  7. parse block
  8. parse statement
  9. parse assignment
  10. parse identifier
  11. create PAST::Var( :name('x'))
  12. parse integer
  13. create PAST::Val( :value(1) )
  14. create PAST::Op( :pasttype('bind') )
  15. create PAST::Block (in action method block)
  16. create PAST::Op( :pasttype('if') )
  17. create PAST::Block (in action method TOP)

As you can see, PAST nodes are created in the leafs of the parse tree first, so that later, action methods higher in the parse tree can retrieve them.

Throwing Exceptions
The grammar rule for the "throw" statement is really quite easy, but it's useful to discuss the parse action, as it shows the use of generating custom PIR instructions. First the grammar rule:
rule throw_statement {
'throw' <expression>
{*}
}
I assume you know how to update the "statement" rule by now. The throw statement will compile down to Parrot's "throw" instruction, which takes one argument. In order to generate a custom Parrot instruction, the instruction can be specified in the :pirop attribute when creating a PAST::Op node. Any child nodes are passed as arguments to this instruction, so we need to pass the result object of the expression being thrown as a child of the PAST::Op node representing the "throw" instruction.
method throw_statement($/) {
make PAST::Op.new( $( $<expression> ),
:pirop('throw'),
:node($/) );
}

What's Next?
In this episode we implemented two more statement types of Squaak. You should get a general idea of how and when PAST nodes are created, and how they can be retrieved sub (parse) trees. In the next episode we'll take a closer look at variable scope and subroutines.

In the mean time, I can imagine some things are not too clear. In case you're lost, don't hesitate to leave comment, and I'll try to answer (as far as my knowledge goes).

Exercises
  1. We showed how the if-statement was implemented. The while-statement and try-statement are very similar. Implement these. Check out pdd26 to see what PAST::Op nodes you should create.
  2. Start Squaak in interactive mode, and specify the target option to show the generated PIR instructions. Check out what instructions and labels are generated, and see if you can recognize which instructions make up the conditional expression, which represent the "then" block, and which represent the "else" block (if any).

References
  • PDD26: AST
  • docs/art/*.pod for good introductions to PIR
License
The source code in this tutorial has been released by the author into the public domain. Where this is not possible by law, the author grants license to use this file for any reason without any rights reserved, and with no warranty express or implied or fitness for a particular purpose.

3 comments:

Unknown said...

I'm working on the exercises, and when I do the --target=pir I get the following error:

ResizablePMCArray: Can't shift from an empty array!
current instr.: 'parrot;POST;Compiler;pir' pc 8244 (src/POST/Compiler.pir:139)
called from Sub 'parrot;POST;Compiler;pir_children' pc 8150 (src/POST/Compiler.pir:79)
... call repeated 3 times
called from Sub 'parrot;POST;Compiler;pir' pc 8629 (src/POST/Compiler.pir:279)
called from Sub 'parrot;POST;Compiler;to_pir' pc 8111 (src/POST/Compiler.pir:56)
called from Sub 'parrot;PCT;HLLCompiler;compile' pc 433 (src/PCT/HLLCompiler.pir:302)
called from Sub 'parrot;PCT;HLLCompiler;eval' pc 834 (src/PCT/HLLCompiler.pir:490)
called from Sub 'parrot;PCT;HLLCompiler;evalfiles' pc 1138 (src/PCT/HLLCompiler.pir:627)
called from Sub 'parrot;PCT;HLLCompiler;command_line' pc 1317 (src/PCT/HLLCompiler.pir:716)
called from Sub 'parrot;Aescla;Compiler;main' pc 76 (aescla.pir:51)

What would cause this, and where can I start looking for possible causes?

Anonymous said...

Same here!!!!!!!!!!

Pavan sharma said...

nice post dear, which provides lots of valuable information.

law firm