Wednesday, August 20, 2008

Using 'register' scope in Parrot Abstract Syntax Trees

Introduction
The Parrot Compiler Toolkit (PCT) is evolving, slowly becoming mature. More and more features and improvements are added over time. In this short article, we'll be looking at "register" scope in the Parrot PAST::Var node, which was added recently. For this article, I assume you're familiar with the PCT, or at least heard of it, otherwise I'd suggest reading up a bit, for instance by reading the PCT Tutorial. In order to make things as easy as possible, I'll start out with a short introduction on the Parrot Abstract Syntax Tree data structure.

PAST Variables and Scope

PCT-based compilers generate a data structure called a Parrot Abstract Syntax Tree (PAST). The PAST itself consists of a number of different PAST nodes. For instance, to represent a subroutine (or function, or procedure), you'd use a PAST::Block node, while you'd use PAST::Val nodes to represent literal constants (such as 42, "hello world", etc.).
For representing variables, you'd use a PAST::Var node. All variables have a name and a scope, implemented as attributes of the PAST::Var class. Available values for the scope attribute are, among others: "package" and "lexical", representing "global" and "lexical" (or "local", except that "lexical" implies availability in nested subs (or PAST::Block nodes) as well) scope.

Introducing Register Scope
Lexically scoped variables are stored in such a way that nested subroutines can access them too. In languages such as Perl 6 and Lua, this makes sense. However, in a language such as C (yes, there is a C implementation for Parrot as well, albeit an incomplete one) this is not needed, as C does not have nested functions. The overhead of lexicals would be a waste of your CPU cycles.
The new register scope allows for such "light-weight" local variables. Furthermore, although not implemented at the moment of writing this article, these register variables allow for reusable results. Consider the following pseudo-code example of a "with" statement. A with statement takes some expression, and the operations in the block are all executed on that expression. (Note that this is pseudo code).
with (foo) {
.bar();
.baz();
.foobar();
}
This is equivalent to:
foo.bar();
foo.baz();
foo.foobar();
except that in the case of the with-statement, the expression is evaluated only once. I'm sure there are better examples to be thought of for a with statement, but this is the best I can come up with for now. Using the PCT, you could implement this as follows:
 rule with_statement {
'with' '(' <expression> ')'
'{' <statement>* '}'
{*}
}

The corresponding action method would be:
method with_statement($/) {
my $past := PAST::Stmts.new( :node($/) );
my $expr := $( $<expression> );
for $<statement> {
my $operation := $( $_ );
$operation.unshift($expr);
$past.push( $operation );
}
make $past;
}

Basically, the PAST node representing the expression is unshifted onto each operation. However, this would result in PIR code that would evaluate the expression for each statement in the with-block. This, of course, reduce the efficiency of the whole with-statement, whose semantics define that the expression is evaluated only once. In order to solve this, you can generate PIR code that evaluates the expression once, stores the result in a register variable, and use that register variable in each operation.

Why "Register" instead of "Local"?
You might wonder why this scope is called "register". The scope name "register" makes perfect sense, as this is exactly how it is implemented. A variable with "register" scope is implemented as a PIR variable declared with the .local directive. Such a variable is just a symbolic register, which makes writing PIR code by hand much easier. The PIR compiler will map these .locals to Parrot registers (of type PMC, Parrot does not support lexicals of other types). The name "register", therefore, makes perfect sense.

Using Register-scoped Variables in Your Compiler
There is not much to it to use register-scoped variables. You use them the same way as you would use global or lexical variables. The name attribute gives the register a name, and when setting the isdecl flag on it, the PAST node will generate a PIR .local declaration. In comparison, when this flag is set on a PAST::Var node with "lexical" scope, a .lex directive is generated.

Conclusion
This article discussed a recently added feature to the PCT, albeit a small one. The addition of a new scope type to PAST::Var nodes. The new "register" scope allows for light-weight local variables, and in the not too distant future will allow for reusable (intermediate) results, preventing re-evaluation of expressions in your PAST.
As you can see, the PCT is evolving over time, which allows users to create compilers for all sorts of languages targeting the Parrot virtual machine.

No comments: