Thursday, March 20, 2008

Episode 6: Scope and Subroutines

In Episode 5, we looked at variable declarations and implementing scope. We covered a lot of information then, but did not tell the full story, in order to keep that post short. In this episode we'll address the missing parts, which will also result in implementing subroutines.

Variables
In the previous episode, we entered local variables into the current block's symbol table. As we've seen earlier, using the do-block statement, scopes may nest. Consider this example:
do
var x = 42
do
print(x)
end
end
In this example, the print statement should print 42, even though x was not declared in the scope where it is referenced. How does the compiler know it's still a local variable? That's simple: it should look in all scopes, starting at the innermost scope. Only when the variable is found in any scope, should its scope be set to "lexical", so that the right instructions are being generated.

The solution I came up with is shown below. Please note that I'm not 100% sure if this is the "best" solution, but my personal understanding of the PAST compiler is limited. So, while this solution works, I may teach you the wrong "habit". Please be aware of this.
method identifier($/) {
our @?BLOCK;

my $name := ~$<ident>;
my $scope := 'package'; # default value

# go through all scopes and check if the symbol
# is registered as a local. If so, set scope to
# local.
for @?BLOCK {
if $_.symbol($name) {
$scope := 'lexical';
}
}

make PAST::Var.new( :name($name),
:scope($scope),
:viviself('Undef'),
:node($/) );
}
Viviself
You might have noticed the viviself attribute before. This attribute will result in extra instructions that will initialize the variable if it doesn't exist. As you know, global variables spring into life automatically when they're used. Earlier we mentioned that uninitialized variables have a default value of "Undef": the viviself attribute does this.
For local variables, we use this mechanism to set the (optional) initialization value. When the identifier is a parameter, the parameter will be initialized automatically if it doesn't receive a value when the subroutine it belongs to is invoked. Effectively this means that all parameters in Squaak are optional!

Subroutines
We already mentioned subroutines before, and introduced the PAST::Block node type. We also briefly mentioned the blocktype attribute that can be set on a PAST::Block node, which indicates whether the block is to be executed immediately (for instance, a do-block or if statement) or it represents a declaration (for instance, subroutines). Let us now look at the grammar rule for subroutine definitions:
rule sub_definition {
'sub' <identifier> <parameters>
<statement>*
'end'
{*}
}

rule parameters {
'(' [<identifier> [',' <identifier>]* ]? ')'
{*}
}
This is rather straightforward, and the action methods for these rules are quite simple, as you will see. First, however, let's have a look at the rule for sub definitions. Why is the sub body defined as <statement>* and not as a <block>? Surely, a subroutine defines a new scope, which was already covered by <block> Well, you're right in that. However, as we will see, by the time that a new PAST::Block node would be created, we are too late! The parameters would already have been parsed, and not entered into the block's symbol table. That's a problem, because parameters are most likely to be used in the subroutine's body, and as they are not registered as local variables (which they are), any usage of parameters would not be compiled down to the right instructions to fetch any parameters.

So, how do we solve this in an efficient way?

The solution is simple. The only place where parameters live, is in the subroutine's body, represented by a PAST::Block node. Why don't we create the PAST::Block node in the action method for the parameters rule. By doing so, the block is already in place and the parameters are registered as local symbols right in time. Let's look at the action methods.
method parameters($/) {
our $?BLOCK;
our @?BLOCK;

my $past := PAST::Block.new( :blocktype('declaration'),
:node($/) );

# now add all parameters to this block
for $<identifier> {
my $param := $( $_ );
$param.scope('parameter');
$past.push($param);

# register the parameter as a local symbol
$past.symbol($param.name(), :scope('lexical'));
}

# now put the block into place on the scope stack
$?BLOCK := $past;
@?BLOCK.unshift($past);

make $past;
}

method sub_definition($/) {
our $?BLOCK;
our @?BLOCK;

my $past := $( $<parameters> );
my $name := $( $<identifier> );

# set the sub's name
$past.name( $name.name() );

# add all statements to the sub's body
for $<statement> {
$past.push( $( $_ ) );
}

# and remove the block from the scope
# stack and restore the current block
@?BLOCK.shift();
$?BLOCK := @?BLOCK[0];

make $past;
}
First, let's check out the parse action for parameters. First, a new PAST::Block node is created. Then, we iterate over the list of identifiers (which may be empty), each representing a parameter. After retrieving the result object for a parameter (which is just an identifier), we set its scope to "parameter", and we add it to the block object. After that, we register the parameter as a symbol in the block object, specifying the scope as "lexical". Parameters are just a special kind of local variables, and there's no difference in a parameter and a declared local variable in a subroutine, except that a parameter will usually be initialized with a value that is passed when the subroutine is invoked.
After handling the parameters, we set the current block (referred to by our package variable $?BLOCK) to PAST::Block node we just created, and push it on the scope stack (referred to by our package variable @?BLOCK).

After the whole subroutine definition is parsed, the action method sub_definition is invoked. This will retrieve the result object for parameters, which is the PAST::Block node that will represent the sub. After retrieving the result object for the sub's name, we set the name on the block node, and add all statements to the block. After this, we pop off this block node of the scope stack (@?BLOCK), and restore the current block ($?BLOCK).

Pretty easy, huh?

Subroutine invocation
Once you defined a subroutine, you'll want to invoke it. In the exercises of Episode 5, we already gave some tips on how to create the PAST nodes for a subroutine invocation. In this section, we'll give a complete description. First we'll introduce the grammar rules.
rule sub_call {
<primary> <arguments>
{*}
}
Not only allows this to invoke subroutines by their name, you can also store the subroutines in an array or hash field, and invoke them from there. Let's take a look at the action method, which is really quite straightforward.
method sub_call($/) {
my $invocant := $( $<primary> );
my $past := $( $<arguments> );
$past.unshift($invocant);
make $past;
}

method arguments($/) {
my $past := PAST::Op.new( :pasttype('call'), :node($/) );
for $<expression> {
$past.push( $( $_ ) );
}
make $past;
}
The result object of the sub_call method should be a PAST::Op node (of type 'call'), which contains a number of child nodes: the first one is the invocant object, and all remaining children are the arguments to that sub call.
In order to "move" the result objects of the arguments to the sub_call method, we create the PAST::Op node in the method arguments, which is then retrieved by sub_call. In sub_call, the invocant object is set as the first child (using unshift). This is all too easy, isn't it? :-)

What's Next?
In this episode we finished the implementation of scope in Squaak, and implemented subroutines. Our language is coming along nicely! In the next episode, we'll explore how to implement operators and an operator precedence table for efficient expression parsing.

In the mean time, should you have any problems or questions, don't hesitate to leave a comment!

Exercises
  1. By now you should have a good idea on the implementation of scope in Squaak. We haven't implemented the for-statement yet, as it needs proper scope handling to implement. Implement this. Check out episode 3 for the BNF rules that define the syntax of the for-statement. When implementing it, you will run into the same issue as we did when implementing subroutines and parameters. Use the same trick for the implementation of the for-statement.
License
The source code in this tutorial has been released by the author into the public domain. Where this is not possible by law, the author grants license to use this file for any reason without any rights reserved, and with no warranty express or implied or fitness for a particular purpose.

3 comments:

perlybird said...

For the sub-definition rule, you are using in the grammar a stat-or-def rule.

To achive the implementation of the sub_definition, I simply set the sub_definition as a statement and at a first glance it seems to work.

I suppose that it is needed to be sure that all subroutine definitions are beeing parsed first.

Anonymous said...

If you add sub_definition as a statement, then you can define a subroutine within a sub body. In other words, you'd have nested subroutines. This introduces additional complexity (closures), which I thought would make the tutorial language too difficult as a first start. That's why I added a top-level rule stat-or-def.

Anonymous said...
This comment has been removed by the author.