Sunday, March 9, 2008

Episode 1: Introduction

Introduction
This is the first episode in a tutorial series on building a compiler with the Parrot Compiler Tools. If you're interested in virtual machines, you've probably heard of the Parrot virtual machine. Parrot is a generic virtual machine designed for dynamic languages. This is in contrast with the Java virtual machine (JVM) and Microsoft's Common Language Runtime (CLR), both of which were designed to run static languages. Both the JVM and Microsoft (through the Dynamic Language Runtime -- DLR) are adding support for dynamic languages, but their primary focus is still static languages.

High Level Languages
The main purpose of a virtual machine is to run programs. These programs are typically written in some High Level Language (HLL). Some well-known dynamic languages (sometimes referred to as scripting languages) are Lua, Perl, PHP, Python, Ruby, and Tcl. Parrot is designed to be able to run all these languages. Each language that Parrot hosts, needs a compiler to parse the syntax of the language and generate Parrot instructions.

If you've never implemented a programming language (and maybe even if you have implemented a language), you might consider writing a compiler a bit of a black art. I know I did when I became interested. And you know what, it is. Compilers are complex programs, and implementing a language can be very difficult.

The Facts: 1) Parrot is suitable for running virtually any dynamic language known, but before doing so, compilers must be written, and 2) writing compilers is rather difficult.

The Parrot Compiler Toolkit
Enter the Parrot Compiler Toolkit (PCT). In order to make Parrot an interesting target for language developers, the process of constructing a compiler should be supported by the right tools. Just as any construction task becomes much easier if you have the right tools (you wouldn't build a house using only your bare hands, would you?), the same is true for constructing a compiler. The PCT was designed to do just that: provide powerful tools to make writing a compiler for Parrot childishly easy.

This tutorial will introduce the PCT by demonstrating the ease with which a (simple) language can be implemented for Parrot. The case study language is not as complex as a real-world language, but this tutorial is written to whet your appetite and show the power of the PCT. This tutorial will also present some exercises which you can explore in order to learn more details of the PCT not covered in this tutorial.

Squaak: A Simple Language
The case study language, named Squaak, that we will be implementing on Parrot will be a full-fledged compiler that can compile a program from source into Parrot Intermediate Representation (PIR) (or run the PIR immediately). It can also be used as a command-line interpreter. Squaak demonstrates some common language constructs, but at the same time is lacking some other, seemingly simple features. For instance, our language will not have return, break or continue statements (or equivalents in your favorite syntax).

Squaak will have the following features:
  • global and local variables
  • basic types: integer, floating-point and strings
  • aggregate types: arrays and hash tables
  • operators: +, -, /, *, %, <, <=, >, >=, ==, !=, .., and, or, not
  • subroutines and parameters
  • assignments and various control statements, such as "if" and "while"
As you can see, a number of common (more advanced) features are missing.
Most notable are:
  • classes and objects
  • exceptional control statements such as break and return
  • advanced control statements such as switch
  • closures (nested subroutines and accessing local variables in an outer scope)

The Compiler Tools
The Parrot Compiler Tools we'll use to implement Squaak consist of the following parts:
  • Parrot Grammar Engine (PGE). The PGE is an advanced engine for regular expressions. Besides regexes as found in Perl 5, it can also be used to define language grammars, using Perl 6 syntax. (Check the references for the specification.)
  • Parrot Abstract Syntax Tree (PAST). The PAST nodes are a set of classes defining generic abstract syntax tree nodes that represent common language constructs.
  • HLLCompiler class. This class is the compiler driver for any PCT-based compiler.
  • Not Quite Perl (6) (NQP). NQP is a lightweight language inspired by Perl 6 and can be used to write the methods that must be executed during the parsing phase, just as you can write actions in a Yacc/Bison input file.

Getting Started
For this tutorial, it is assumed you have successfully compiled parrot (and maybe even run the test suite). If you browse through the languages directory in the Parrot source tree, you'll find a number of language implementations. Most of them are not complete yet; some are actively maintained actively and others aren't. If, after reading this tutorial, you feel like contributing to one of these languages, you can check out the mailing list or join IRC (see the references section for details).

The languages subdirectory is the right spot to put our language implementation. Parrot comes with a special shell script to generate the necessary files for a language implementation. In order to generate these files for our language, type (assuming you're in parrot's root directory):
$ perl tools/dev/mk_language_shell.pl Squaak languages/squaak
(Note: if you're on Windows, you should use backslashes.) This will generate the files in a directory languages/squaak, and use the name Squaak as the language's name. After this, go to this directory and type:
$ make test
This will compile the generated files and run the test suite. If you want more information on what files are being generated, please check out the references at the end of this episode.

Note that we didn't write a single line of code, and already we have the basic infrastructure in place to get us started. Of course, the generated compiler doesn't even look like the language we will be implementing, but that's ok for now. Later we'll adapt the grammar to accept our language.

Now you might want to actually run a simple script with this compiler. Launch your favorite editor, and put in this statement:
say "Squaak!";
Save the file (for instance as test.sq) and type:
$ ../../parrot squaak.pbc test.sq
This will run Parrot, specifying squaak.pbc as the file to be run by Parrot, which takes a single argument: the file test.sq. If all went well, you should see the following output:
$ ../../parrot squaak.pbc test.sq
Squaak!
Instead of running a script file, you can also run the Squaak compiler as an interactive interpreter. Run the Squaak compiler without specifying a script file, and type the same statement as you wrote in the file:
$ ../../parrot squaak.pbc
say "Squaak!";

which will print:
Squaak!
What's next?
This first episode of this tutorial is mainly an overview of what will be coming. Hopefully you now have a global idea of what the Parrot Compiler Tools are, and how they can be used to build a compiler targeting Parrot. If you want to check out some serious usage of the PCT, check out Rakudo (Perl 6 on Parrot) in languages/perl6 or Pynie (Python on Parrot) in languages/pynie.

The next episodes will focus on the step-by-step implementation of our language, including the following topics:
  • structure of PCT-based compilers
  • using PGE rules to define the language grammar
  • implementing operator precedence using an operator precedence table
  • using NQP to write embedded parse actions
  • implementing language library routines
In the mean time, experiment for yourself. You are welcome to join us on IRC (see the References section for details). Any feedback on this tutorial is appreciated.

Exercises
The exercises are provided at the end of each episode of this tutorial. In order to keep the length of this tutorial somewhat acceptable, not everything can be discussed in full detail. The answers and/or solutions to these exercises will be posted several days after the episode.

Advanced interactive mode.
Launch your favorite editor and look at the file squaak.pir in the directory languages/squaak. This file contains the main function (entry point) of the compiler. The class HLLCcompiler defines methods to set a command-line banner and prompt for your compiler when it is running in interactive mode. For instance, when you run Python in interactive mode, you'll see:
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
or something similar (depending on your Python installation and version). This text is called the command line banner. And while running in interactive mode, each line will start with:
>>>
which is called a prompt. For Squaak, we'd like to see the following when running in interactive mode (of course you can change this according to your personal taste):
$ ../../parrot squaak.pbc
Squaak for Parrot VM.

>
Add code to the file squaak.pir to achieve this.
Hint 1: look in the subroutine onload.
Hint 2: Note that only double-quoted strings in PIR can interpret escape-characters such as '\n'.

References
  • Parrot mailing list: parrot-porters@perl.org
  • IRC: join #parrot on irc.perl.org
  • Getting started with PCT: docs/pct/gettingstarted.pod
  • Parrot Abstract Syntax Tree (PAST): docs/pct/past_building_blocks.pod
  • Operator Precedence Parsing with PCT: docs/pct/pct_optable_guide.pod
  • Perl 6/PGE rules syntax: Synopsis 5
License
The source code in this tutorial has been released by the author into the public domain. Where this is not possible by law, the author grants license to use this file for any reason without any rights reserved, and with no warranty express or implied or fitness for a particular purpose.

8 comments:

sjansen said...

Wow, thanks! I look forward to more posts like this.

Any chance you could tweak your layout to improve printing? Unfortunately, your current layout makes most of the content unprintable.

kjs said...

I don't know how to tweak the layout, but I think the best thing to do is a copy/paste of the text into a word processor, and print it from there.

AkiRoss' said...

Is this tutorial going to be inserted in the parrot/docs/ ?

Anyway thanks, very interesting tutorial :)

Aran said...

on my computer there was no makefile in the squaak folder after calling perl tools/dev/mk_language_shell.pl Squaak
languages/squaak. the solution was to call perl Configure.pl again (from the parrot folder).

Taweth said...

It seems that recent changes have altered the behaviour of the mk_language_shell script. If anyone else finds themselves without a Makefile once they run this, the fix is to run the command without the path on the end (eg: "perl tools/dev/mk_language_shell.pl Squaak")

Chris said...

Thanks for the tutorial. I also didn't have a Makefile. Running Parrot's Configure.pl script worked but only after adding "monkey" (I am using the name instead of squaak) to a data structure in config/gen/languages.pm. Someone in #parrot also mentioned that for "full integration" (whatever that means) of a language edits need to be made to several files, "languages/LANGUAGES_STATUS.pod, config/gen/makefiles/languages.in, config/gen/languages.pm, out of my head".

Another thing is that it wasn't easy for me to find documentation on the HLLCompiler object. I finally found it (partially documented) in docs/pdds/draft/pdd29_compiler_tools.pod

PantherSE said...

I tried the episode 1 tutorial and in the "make test" I got the following output. I initially did perl tools/dev/mk_languages_shell.pl Aescla languages/aescla (I want my own directory). Then after I read the comments to get a Makefile, I followed aran's comment, which gave me the Makefile; but running make test get's the following error:

../../parrot /Users/mendozak/code_test/parrot_vm/parrot/runtime/parrot/library/PGE/Perl6Grammar.pbc \
--output=src/gen_grammar.pir \
src/parser/grammar.pg \
src/parser/grammar-oper.pg \

../../parrot /Users/mendozak/code_test/parrot_vm/parrot/compilers/nqp/nqp.pbc --output=src/gen_actions.pir \
--target=pir src/parser/actions.pm
perl -MExtUtils::Command -e cat src/builtins/say.pir >src/gen_builtins.pir
../../parrot -o aescla.pbc aescla.pir
perl t/harness
t/00-sanity....error:imcc:syntax error, unexpected PREG, expecting '(' ('$P12')
in file 'EVAL_1' line 11
error:imcc:syntax error, unexpected ')' (')')
in file 'EVAL_1' line 12
Could not find non-existent sub n_add
current instr.: '_block11' pc 45 (EVAL_1:11)
called from Sub 'parrot;PCT;HLLCompiler;eval' pc 864 (src/PCT/HLLCompiler.pir:498)
called from Sub 'parrot;PCT;HLLCompiler;evalfiles' pc 1138 (src/PCT/HLLCompiler.pir:627)
called from Sub 'parrot;PCT;HLLCompiler;command_line' pc 1317 (src/PCT/HLLCompiler.pir:716)
called from Sub 'parrot;Aescla;Compiler;main' pc 58 (aescla.pir:49)
t/00-sanity....dubious
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 3
Failed 1/3 tests, 66.67% okay
Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------
t/00-sanity.t 1 256 3 2 66.67% 3
Failed 1/1 test scripts, 0.00% okay. 1/3 subtests failed, 66.67% okay.
BEGIN failed--compilation aborted at t/harness line 12.
gmake: *** [test] Error 1

wayland said...

A user on #parrot had a problem that he fixed by changing the part of the parrot installation process where you run "make install" to "make install-dev".