Skip to content

Follow-ups for register allocation #7

@jespiron

Description

@jespiron

https://www.cs.cmu.edu/~15411/lectures/02-regalloc.pdf

Optimizations

Two optimizations that have somewhat conflicting goals:

  • Register coalescing: works well in programs that require only few registers, becomes more difficult as the graph becomes denser
  • Splitting live ranges: allocating more temps -> use fewer registers

Not immediately obvious which optimization reduces the spills

Questions

Answers to questions at end of lecture notes:

1. Why does register allocation take such a long time? It is polynomial isn’t it?

Register allocation is not polynomial; it reduces to graph coloring, which is NP-Complete. However, there are approximations in polynomial time.

2. Given an optimal graph coloring, how would you construct an ordering such that greedy graph coloring would re-create that coloring?

Sort the vertices in ascending order by optimal graph coloring. Since the greedy graph coloring selects the lowest unused color, this will recreate the optimal coloring.

3. What is the minimum number of 3-address code instructions, ending with ret %eax, needed to make a non-chordal graph?

4. What is the minimum number of 2-address code instructions?

5. Does it make a difference where start the construction of a simplicial elimination order?

Yes

6. Is register allocation for programs with mixed data types more difficult than for programs with uniform types? Why or why not?

Yes

7. Why is chordality of a graph interesting for register allocation?

For chordal graphs, it is possible to efficiently find an optimal ordering, and the algorithms needed for coloring chordal graphs are easier to implement than non-chordal.

8. Why should one worry about allocating half registers of lower data width? Isn’t accessing words out of double words etc. inefficient? Is accessing bytes out of words inefficient?

I assume that by "allocating half registers," we're using a register to hold data that is smaller than the register's native width. For example, for a 64-bit register, we'd use part of the register to hold a 32-bit value or even smaller values like 16-bit or 8-bit values.

Yes and yes, accessing smaller units within larger data types can be inefficient, as some CPUs might need to perform masking or shifting operations to extract just the part we're interested in. This can add overhead compared to accessing a full word or double word directly.

An exception is data parallelism, where we load multiple small values into a large register and perform the same operation on all of them simultaneously. For example, with SIMD, we load the values into the registers R1 = {a1, ..., a8}, R2 = {b1, ..., b8}, and we add these registers to get {a1 + b1, ..., a8 + b8}. SIMD is not wasteful because we're using the full register. If we're just accessing a1 and throwing the rest of the register contents away, then it would be wasteful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions