CLASS 5

Assembly Language.

Assembly language is the one language into which all other languages are translated or interpreted. Compilers translate high level languages into assembly language. We'll discuss compilers a little more later in the course, but it's really a complex topic in its own right.

Because Assembly Language must be a good language for compiling or interpreting many different languages, it is intentionally very simple. It provides all of the necessary capabilities required for high level languages, but does so using simple instructions that can be combined in many different ways to support different languages. This is called FLEXIBILITY.


Example of a specific Assembly Language: SPARC Assembly

Sparc Architecture : A Brief Overview

Registers

The SPARC Architecture specifies that a SPARC CPU has 32 registers, in four sets: Registers are denoted by the presence of a percent sign in their name: When you put data in a register, it overwrites whatever was there (register writes are destructive; the old data is destroyed). Once you put data into a register, it stays there until you put something else there instead.

See table on page 44 of textbook for details.


The Sparc Assembler (as)

A two-pass assembler, and is line based (like all assemblers)
Comments as in C ( /* and */) ( not as in C++)
Comments also line based, start with !

Psuedo-operations (psuedo-ops) - These instructions do not generate any machine instructions, but only provide some information to the assembler. They usually start with a period (.) e.g.
.global (for label to be accessible outside the function),
.word (for initializing a merory location), etc.

How does assembly work??
All "C" programs have a .c extension (source code)
The compilation results in the .o file (object file)
The .o files are linked and we get the executable file, a.out

The compilation is a two step process:

Step1:   C program converted into .s file (assembly language)
Step2:   .s file converted to .o file (using the assembler)

To convert a C program into the corresponding assembly language, we use the instruction
gcc -S program.c

To execute the .s file,
gcc -g file.s -o <filename>

So, the entire process of creating an executable can be summed up as :

Source Code (filename.c)    
    ||
    ||  (compiler, such as gcc -S filename.c
    \/
Assembly Language  (filename.s)
    ||
    ||  (assembler, such as gcc -c filename.s)
    \/
Object Code (filename.o)
    ||
    ||  (Linker/Loader, ld filename.o
    \/
Binary file (executable, such as a.out)

Format of Instructions

Any instruction is made up of two parts:

Opcode (the name of the instruction)
Operands (the values or data manipulated by the instruction)

The opcode is necessary in every instruction, but the operand may or maynot be present.

Most of the instructions in SPARC have three operands: three registers, or two registers and a literal constant.

opcode  operand 1 (optional), operand 2 (optional), operand 3 (optional) 
 add       %r1             ,     %r2	        ,   %r3
 sub       %r1             ,     %r2	        ,   %r3
 mov       %r1             ,     %r2	

The contents of the first register are combined with those of the second register or literal constant, and the result is stored in the third register.

Some instructions are:
clr r1 (clear register r1, r1 has to be a valid register name)

mov r1(or c), r2 (move the contents of r1 or the literal constant c to r2)

add r1, r2(or c), r3 (add r1 and r2(or c), store result in r3)

sub r1, r2(or c), r3 (subtract r2(or c) from r1, store result in r3)

The format of an instruction is:

        op      reg, reg_or_imm, reg
        clr reg                     !   sets a register to all zeros
        mov reg_or_imm reg          !   copy data from one reg to another
        add reg1 reg2_or_imm reg3   !   reg1 + reg2_or_imm = reg3
        sub reg1 reg2_or_imm reg3   !   reg1 - reg2_or_imm = reg3
Note: Anything after the "!" on a line is ignored by the assembler, and is treated as a comment.

An immediate value is data encoded right in the instruction. Like this:

     add %o1, 13, %o2                

When the program is run, some memory has to be allocated for it. This is done using the command
save %sp, -64, %sp

MACROS IN ASSEMBLY LANGUAGE

They are used to giving symbols for numeric constants. Symbols are defined using the `define' macro. e.g.
define(x, l0)
define(y, l1)

Some other simple instructions:
labels -- why are they used, how are they defined?

e.g.   _main:     save %sp, -64, %sp
Here, _main is the label for the main function. All programs need to have the main function, as control starts from this function.

Our First Program

Take a look at the following piece of code. All this code does is to calculate the value of a polynomial. The code is in C (though it is the same for C++ in this case.) Let us suppose that the code is stored in a file first.c. The code is given below:

/* first.c */ void main() { int x , y ; y = (x - 1) * (x - 7) / (x - 13) ; exit(0) ; }
We can compile this code by using the gcc compiler (GNU C compiler). To get the assembly code corresponding to this code, we use the -S option. i.e.

gcc -S first.c

We now get a file called first.s in the current directory, which is the assembly file. This is what it looks like:


/* first.s */ gcc2_compiled.: ___gnu_compiled_c: .text .align 4 .global _main .proc 020 _main: !#PROLOGUE# 0 save %sp, -112, %sp !#PROLOGUE# 1 call ___main,0 nop ld [%fp-12], %o1 add %o1,-1, %o0 ld [%fp-12], %o2 add %o2,-7, %o1 call .umul, 0 nop ld [%fp-12], %o2 add %o2, -13, %o1 call .div, 0 nop st %o0, [%fp-16] mov 0, %o0 call _exit, 0 nop L1: ret restore
This code does look a lot complicated, so we will take up a simplified version of it, and go through it step by step. Actually, this simplified version of the code does the same thing, i.e. calculate the value of the expression, but is well commented, and easier to understand.

For convenience, we first create a file called first.m which is not the assembly code, but very close to it. It is the code written using some macros, and needs to be processed by a macro-processor before it can be converted into assembly language code.

p.s. If you are wondering that Macro's are, remember #define in C/C++ ??

Look at the following file:


/* first.m */ /* This programs computes the expression: y = (x - 1) * (x - 7) / (x -13) for x = 9 The polynomial coefficients are: */ define(a2, 1) define(a1, 7) define(a0, 11) /* Variables x and y are stored in %l0 and %l1 */ define(x_r, l0) define(y_r, l1) .global _main _main: save %sp, -64, %sp mov 9, %x_r !initialize x sub %x_r, a2, %o0 !(x - a2) into %o0 sub %x_r, a1, %o1 !(x - a1) into %o1 call .mul nop !result in %o0 sub %x_r, a0, %o1 !(x - a0) into %o1, the divisor call .div nop !result in %o0 mov %o0, %y_r !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system
This is not actually assembly language, as there is no instruction called define in assembly language. But the define statements are used by the to just give a symbolic name to the constants, i.e. a2 stands for 1, a1 stands for 7, etc. in the code given.

To convert this code into assmbly language, we need to pass it through a M4 macro-processor. To do that, we use the following command:

m4 < first.m > first.s

This creates the new assembly language file from the .m file. The assembly language file is given below:


/* first.s */ /* This programs computes the expression: y = (x - 1) * (x - 7) / (x -13) for x = 9 The polynomial coefficients are: */ /* Variables x and y are stored in %l0 and %l1 */ .global _main _main: save %sp, -64, %sp mov 9, %l0 !initialize x sub %l0, 1, %o0 !(x - 1) into %o0 sub %l0, 7, %o1 !(x - 7) into %o1 call .mul nop !result in %o0 sub %l0, 11, %o1 !(x - 11) into %o1, the divisor call .div nop !result in %o0 mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system
If you notice, all the define statements are gone, and the variables a0, a1, a2 are all replaced by their corresponding values in this code. The only disadvantage is that this file is a lot more difficult to read and debug as compared to the previous file, especially in the case when a lot of variables have the same values, i.e. if a0, a1, a2 had the same value, say 1, you would not know which variable is being referred to.

Also, x_r and y_r are replaced by l0 and l1 in the entire file. We will look at the program in more detail in the next class. For now, lets just compile it to get the executable. To do that, type the following command:

gcc first.s -o first

This just produces the executable called first. To execute the file, just type the filename (first) on the command prompt.


colossus>first colossus>
What happened? Well, your program ran successfully, but there was no output, the reason being that there are no input/output statements in your code. So it just executed the segment of code, calculated the value of y, and then ended. So how do we know if we did the right thing or not? Well, to find out, we use something called a debugger. We'll talk about it in more detail in one of the later classes. For now, this much is enough!! Bye :-)

For class 6 notes, click here

For more information, contact me at tvohra@mtu.edu