Friday, October 3, 2008

Tracking down an IMCC bug

So I was adding support for handling next/redo/last exceptions to PCT's loops the other day, and I hit a weird bug. Here's the patch.

The problem is approximately this:

$I0 = defined, stuff_to_loop_over
unless $I0, for_loop_end
push_eh for_loop_next
...
for_loop_end:
pop_eh

I'm adding the error handler there after a conditional jump, but popping the error handler off after the target of that jump, but I didn't notice this at the time. That's going to at least cause bugs at runtime, but this also caused the PIR compiler to hang. I beat my head against it for a while, posted a bug about it, and went to sleep.

The next day, I used valgrind's callgrind tool to find where it was spending it's time. The output is:

--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
13,455,028,108 /home/sweeks/src/parrot/compilers/imcc/sets.c:set_add [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
11,209,537,812 /home/sweeks/src/parrot/compilers/imcc/cfg.c:compute_dominance_frontiers [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
135,234,114 /home/sweeks/src/parrot/compilers/imcc/imclexer.c:yylex [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
128,962,606 /home/sweeks/src/parrot/compilers/imcc/sets.c:set_contains [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
88,825,729 /home/sweeks/src/parrot/compilers/imcc/imcparser.c:yyparse [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
70,349,560 /home/sweeks/src/parrot/compilers/imcc/instructions.c:instruction_reads [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
65,814,257 /home/sweeks/src/parrot/compilers/imcc/cfg.c:compute_dominators [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
56,483,518 /home/sweeks/src/parrot/compilers/imcc/instructions.c:instruction_writes [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
48,013,979 ???:_int_malloc [/lib64/libc-2.8.so]
46,419,350 /home/sweeks/src/parrot/compilers/imcc/pbc.c:constant_folding [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
33,201,086 /home/sweeks/src/parrot/compilers/imcc/sets.c:set_intersec_inplace [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
29,624,696 /home/sweeks/src/parrot/compilers/imcc/symreg.c:hash_str [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
20,719,320 ???:calloc [/lib64/ld-2.8.so]
18,291,953 /home/sweeks/src/parrot/compilers/imcc/cfg.c:bb_check_set_addr [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
17,404,418 /home/sweeks/src/parrot/compilers/imcc/reg_alloc.c:compute_one_du_chain [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
13,809,278 /home/sweeks/src/parrot/compilers/imcc/imcc.l:yylex
12,337,246 /home/sweeks/src/parrot/src/ops/core_ops.c:hash_str [/home/sweeks/src/parrot/blib/lib/libparrot.so.0.7.1]
12,162,840 ???:memcpy [/lib64/ld-2.8.so]
11,450,969 ???:strcmp'2 [/lib64/ld-2.8.so]

So it's in IMCC, the PIR compiler, specifically in compute_dominance_frontiers... which is what? So let's find out.

The POD for that function says:

=item C<void compute_dominance_frontiers>

Algorithm to find dominance frontiers described in paper "A Simple, Fast
Dominance Algorithm", Cooper et al. (2001)


A quick check with google turns up the PDF I linked.

Skimming over that and the surrounding code that calls compute_dominance_frontier reveals that it's related to register allocation. I read it another couple of times, think I understand the general ideas, and go to sleep.

The next day, I chat with pmichaud about it, and he manages to trim down a simplifed test case, and also notices the actual bug in my patch, which allows me to commit that and pass a half-dozen more tests. He passes the minimal test off to chromatic, who debugs it for a bit and finds that an inner loop in compute_dominance_frontiers is alternating between 8 and 9.

chromatic suggests that I find a way to dump out the register and basic blocks information, and look for the edge identification between blocks. I only have a vague idea of what this means then, so I ask for more information. Here's chromatic's explanation:

<@chromatic> I can give you a quick overview.
<@chromatic> The alligator divides every compilation unit into basic blocks.
<@chromatic> A block is a single block of code between branch points.
<@chromatic> Every entrance and exit point demarcates a block.
<@chromatic> If you arrange the blocks in terms of a control flow graph, you can evaluate all of the possible ways you can reach a point within the block.
<@chromatic> You also keep track of which registers you use within a block.
<@chromatic> If all of the paths to a block go through another block, the latter dominates the former.
<@chromatic> All of this is to say, if you have a named register used in the first block of a unit and the final block of the unit can branch back to the first unit, you have to keep the physical register untouched.
<@chromatic> If there's a block beyond which you can never branch back, you can reuse physical registers unique to that branch and never used later.
<@chromatic> I've been unclear about name/physical register mapping, but I think you get the picture.
<@chromatic> It's the mapping of name to register that really matters.

So I start browsing that code again, but fall asleep before doing anything. Do you see the pattern here yet? ;)

The next day, I go look around in there and I find a couple of debug functions I can use to dump relevant information. Here's what I find:

Dumping the CFG:
-------------------------------
0 (3) -> 8 1 2 <-
1 (3) -> 2 <- 0
2 (3) -> 3 9 <- 1 0
3 (2) -> 4 <- 2
4 (2) -> 5 <- 3
5 (3) -> 6 9 <- 8 7 4
6 (3) -> 7 <- 5
7 (3) -> 5 <- 6
8 (2) -> 5 <- 9 0
9 (3) -> 8 <- 5 2

This is:

block number (loop depth) -> blocks that this block can branch to <- blocks that can branch to this block

We also have:

Dumping the Dominators Tree:
-------------------------------
0 <- ( 0) 0
1 <- ( 0) 0 1 8 9
2 <- ( 0) 0 2 8 9
3 <- ( 2) 0 2 3 8 9
4 <- ( 3) 0 2 3 4 8 9
5 <- ( 0) 0 5 8 9
6 <- ( 5) 0 5 6 8 9
7 <- ( 6) 0 5 6 7 8 9
8 <- ( 9) 0 8 9
9 <- ( 8) 0 8 9

This is:

block number <- (immediate dominator) full list of dominators

The loop that was spinning was:

while (runner >= 0 && runner != unit->idoms[b]) {
/* add b to runner's dominance frontier set */
set_add(unit->dominance_frontiers[runner], b);

/* runner = idoms[runner] */
if (runner == 0)
runner = -1;
else
runner = unit->idoms[runner];
}

Where 'runner' is the node we're currently evaluating, unit is the overall compilation unit, and idoms[] is the list of immediate dominators.

Adding some debugging printfs, I discover that the problem is when the algorithm follows block five, it spins with the runner alternating between 8 and 9. If you look back up at the dominators tree, you'll see that the immediate dominator of block 8 is block 9, and the idom of block 9 is block 8. Infinite loop.

After re-reading the algorithm and paper a few times to verify that I undersand what's going on, I add a conditional to break out of the statement if we're trying to add a block to the dominance frontiers list of a block that we've already added the current block to.

while (runner >= 0 && runner != unit->idoms[b]) {
if (set_contains(unit->dominance_frontiers[runner], b))
/* we've already gone down this path once before */
runner = 0;
else
/* add b to runner's dominance frontier set */
set_add(unit->dominance_frontiers[runner], b);

/* runner = idoms[runner] */
if (runner == 0)
runner = -1;
else
runner = unit->idoms[runner];
}

Everything seemed to work fine, so after getting a review from particle, I committed.

No comments: