Code:Blocked 2.0: Programming theory: Static vs Dynamic compilation

There was a bit of debate in my structured programming class about whether it's possible to declare an array with a number of elements dependent on a variable. I argued for, because there is nothing in the C language standard that forbids it. The assistant professor, however, argued against, because of static compilation. Seeing how it turned into a rather interesting debate, i thought i'd post about it, and how both of us came to our conclusions. To do that, however, we will first need some

C o m p i l e r t h e o r y : t h e a b s o l u t e b a s i c s

First of all, let's settle for compilers, instead of interpreters. Of course, this debate could be extended to interpreters, but a)They always render code to a compiler and b)They always dynamically allocate memory. So it would be a moot debate, in the first place.

Now that we settled that difference, let us examine the bare-bone basics of compilers. A compiler is a software package that takes instructions written in a language as input and translates them into machine-compliant instructions as output. Now that dictates that a compiler is consisted of three parts: A lexical parser, that scans the entire code and understands it, lexical analyzer, optimizer, which speeds up our code before it is actually compiled, and a translator which performs the final stage of compiling (reserving memory included).

Now that we have that bit of information, let us focus on the second and third part of the compiler, as the parser doesn't actually have an effect in memory allocation. Assume the following visualization of the Java interpreting system:

Apparently, there are two branches here, leading to different code speeds. Now this is because of two very different approaches to compilation: static compilation, and dynamic compilation.

A static compiler does all in a minimal amount of parses. It takes the code, calculates line for line how much memory you need, and makes a request at the operating system to allocate it. Now this does indeed mean that memory gets a bit inflexible, as the compiler needs to know how much memory you need beforehand: you cannot allocate it at runtime. Now not only this means you cannot declare array x with a variable for index size, it means you also cannot use dynamic structures, such as dynamic arrays. Also, it is impossible to use some of the more interesting forms of linked lists, which basically expand by need, but use pointer arithmetic to link them instead of adjacent memory addresses, as in arrays. Most traditional compilers, like gcc (one of the very, very best) work using this principle, and it's all good. Until you need something more, that is.

Now let us look at the second branch of the diagram. Instead of going directly to the interpreter, it first passes the Java Accelerator. The Accelerator is a dynamic allocation based semi-compiler, also known as JIT or "lazy" compiler. The lazy approach is a bit different. Instead of parsing everything in a single pass, or close to it, it goes on and on, simplifying statements until it reaches their simplest form. For a practical demonstration, assume the following algorithm sample:

exec start:
var x = 3, y = 4, sum = 0
const z = 12
sum = x*y - z
exec stop

A fairly simple code snippet. as anyone would agree. It takes two variables (3 and 4), multiplies them, and
assigns the difference of that product and a third variable (12) to the sum.

Let us see how a traditional compiler would parse this:

-Notice 4 variables, all integer type.
-Assign memory for the 4 variables (in C, this would be 4x4 = 16 bytes minimum.)
-Assign values to x, y, z, sum
-Calculate x*y
-Calculate (x*y)-z
-Assign ((x*y)-z) to sum

12 instructions. Perhaps more, depending on optimization. Rather slow on the hardware level.
Now let us look at the JIT approach:

-Notice 4 variables, all integer types
-Assign memory for the variables
-Assign values
-Calculate x*y
-Calculate (x*y)-z

And there isn't an assignment statement after this. Wait, what? No assignment to sum?

Nope. Sum already contains that value. Why would you reassign it? See, a lazy (JIT) compiler doesn't do anything until it absolutely needs to. That includes assignment and memory allocation. Now that not only means you lose the unnecessary assignments, that also delays the memory assignment. This implies that memory is assigned at runtime, and that makes the variable-based array declaration perfectly legal.

Now some argue about the speed of the JIT compilers, but most of the modern compilers are JIT. The CLI compiler (.NET architecture) is JIT, for an example. And JIT offers more than just optimization. It would be impossible to write self-modifying code without JIT and dynamic compilation, to say one thing.

Code:Blocked 2.0

Hit counter:

Friday, October 28, 2011

Programming theory: Static vs Dynamic compilation

No comments:

Post a Comment