To represent characters, we can use their octal values as specified in the ASCII table.
Any number preceded by 0 is in octal
Any number preceded by 0x is in hexa-decimal
Also, we can use single or double quotes to represent characters. e.g.
mov 0140, %r1 // All these instructions have the same effect mov `a', %r1 // this not used frequently, used by assembler mov "a", %r1 // frequently used
Many different character codes have been used over the years. People have had a hard time settling on how many bits to use, and given a set number of bits, what characters to encode, and what the mapping should be from bit patterns to characters. Character codes based on 5 bits, 6 bits, 7 bits, and 8 bits have been used at various times. There are about 47 keys on a typical typewriter (which the computer keyboard is closely modelled after). Why are 5 bit (Baudot) and 6 bit (Honeywell) codes not very useful then?
So that leaves 7 bit and 8 bit codes. Until recently, both were still in use. One was used by IBM, and one was used by everybody else. The 8 bit code was used by IBM, and was called EBCDIC (extended binary coded decimal information code). IBM gave it up in the workstation era. The one that everyone uses now is a 7 bit code, called ASCII (American Standard Code for Information Interchange).
Note the simple relationship between uppercase and lowercase characters -- convert a lowercase to uppercase by subtracting 0x20.
Seven bits means 128 characters. We only need about 96 characters to encode everything on the keyboard. What to do with the other 32 codes? "Control" characters. These have been used for various formatting and in-band signalling functions in the past. Currently, about their only use is for formatting and data communications.
Control characters are those below 0x20. We often refer to them by the character obtained by adding 0x40 -- eg, control-C Important control characters:
char mnemonic common use nul 0x00 to terminate strings, esp in C etx ^C interrupt character on Unix eot ^D to indicate end of file on Unix bel ^G rings a bell on many terminals bs ^H signals a backspace ht ^I signals a tab lf ^J signals a linefeed -- line separator in Unix ff ^L formfeed cr ^M carriage return dc1 ^Q used for line control dc3 ^S used for line control esc ^[ escape -- used to introduce a different character mapping del not a control character -- used to mean back up & eraseThe numerical encoding of strings in mainly important for sorting operations. It's very convenient for our notions of lexicographic order to coincide with the numerical ordering of character codes. In addition, this explains why Big Endian memory organization is often favored: instead of comparing bytes (expensive) we can compare words (cheap) to determine lexicographic order of string data.
ASCII meets this definition, which explains (perhaps) why TA comes before Tanuj in the phone book.
The basic instructions are:
and reg, reg_or_imm, reg or reg, reg_or_imm, reg xor reg, reg_or_imm, regRather than having a simple "not" instruction, it is combined with each of the above:
andn reg, reg_or_imm, reg orn reg, reg_or_imm, reg xnor reg, reg_or_imm, regEach of these first negates the second operand, then performs the instruction.
Besides, there are also the versions of these instructions that set the condition codes, besides performing the desired operation. These instructions are:
andcc reg, reg_or_imm, reg orcc reg, reg_or_imm, reg xorcc reg, reg_or_imm, reg andncc reg, reg_or_imm, reg orncc reg, reg_or_imm, reg xnorcc reg, reg_or_imm, reg
Examples of Synthetic Instructions:
mov %x_r , %y_r ==> or %g0, %x_r, %y_r not %x_r, %y_r ==> xnor %x_r, %g0, %y_r not %x_r ==> xnor %x_r, %g0, %x_r clr %x_r ==> or %g0, %g0, %x_r cmp %x_r, 0 ==> subcc %x_r, %g0, %g0 tst %x_r ==> orcc %x_r, %g0, %g0We just use these synthetic instructions for our convenience while writing assembly code.
We can also test for individual flags (which are nothing but bits in a word) using synthetic instructions. There are 4 bytes per word, i.e. 32 bits. We can check any of them, or modify them. These are some synthetic instructions that are used:
bset = or bclr = andn btog = xor btst = andcc
btst reg_or_imm, reg -> andcc reg, reg_or_imm, %g0
bset reg_or_imm, reg -> or reg, reg_or_imm, %reg
bclr reg_or_imm, reg -> andn reg, reg_or_imm, %reg
btog reg_or_imm, reg -> xor reg, reg_or_imm, %reg
Example: to test if flags (bits) 2 or 3 are set in %l0:
mov 12, %l1 btst %l1, %l0 be clear nop
Another example: to check if flags (bits) 0x10 and 0x6 are set in %x_r:
btst 0x16, %x_r be clear nop set: clear:The complete list of Synthetic instructions and Psuedo Operations is given in Appendix D of the text.
Whenever we want to perform a computation on an object, we need to decide on how to represent that object inside the computer. As designers of both hardware and software, how do we decide on the best representation of an object? The answer is that we choose the representation that is most efficient. A representation is efficient if it allows a lot of capability for little cost. Cost here means both dollars and time, so a representation is efficient if it requires very little space (dollars), supports fast operations (time), and supports a wide range of operations (capability).
Some objects, like integers and real numbers, are so important that their representations are fixed, built into the computer's hardware. Other objects, like a photograph of a hurricane, are represented in different ways depending on the problem, because different operations may be required for different problems. This is another example of a hardware/software tradeoff. Operations on simple things like integers are so common that we want them to be as fast as possible and we are willing to sacrifice ease of modification. Operations on things like airplanes are much more rare, and ease of modification of the representation is much more important, so we leave such problems to software.
We will now look closely at the design decisions involved in implementing the basic data types: integers, text, instructions, and real numbers. We will use as our examples the implementation of these basic data types in the SPARC architecture. Before we do that however, we need to look at the raw material we have to work with.
To represent an object that can have more than two states, we concatenate bits. Two bits can represent an object with 4 states: 00, 01, 10, 11. In general, n bits can represent an object with 2^n states.
Concatenated bits of various sizes have names:
8 bits is a BYTE 16 bits is a HALFWORD 32 bits is a WORD 64 bits is a DOUBLEWORDWhen we are talking about bit patterns, it's convenient not to have to write out all those 1s and 0s. Usually we write bit patterns either in octal or hexadecimal. In octal, we group bits into groups of threes and the write the number (0 to 7) for each group. For 16 bits:
1 010 001 111 110 101 = 0121765 1 2 1 7 6 5Hexadecimal is base-16 notation. We group bits in sets of 4 and use 0-9, a-f for the numbers 0-15.
1010 0011 1111 0101 = 0xa3f5 a 3 f 5It's important to note at this point that these are just bit patterns. We don't know what they mean, if they mean anything, until we know what representation we are using. The bit pattern
binary 10010001101000101011001111000 = octal 02215053170 = hex 12345678could represent the integer 305419896, the real number 5.69045661e-28, or the string ' 4Vx'. It all depends on what is being represented.
A computer has storage for bit patterns, and instructions that operate on those bit patterns AS IF they were particular representations. For example, there is a different instruction for adding floating point (real) numbers than the one we've already seen for adding integers (the "add" instruction). There is no check to make sure that you are using a bit pattern according to its intended representation. That job is performed by the compiler, or, if the programmer is writing assembly language, it must be performed by the programmer. For example, if register %l0 is intended to be storing a floating point number, there is nothing to stop you from using the register as argument to an (integer) "add" instruction. The bit pattern in $l0 will be interpreted as an integer, and so the results will be garbage, but you can do it if you want to.
It's an interesting example of design evolution that such representation checks have been built into computers in the past. However, these checks slowed down the processor and they weren't used very often, since compilers could do most of the job. So processor designers optimized the common case and just let the compiler do all the work associated with keeping representations straight. This makes the processor much faster in the long run.
First of all, to make integer representations efficient, we allocate a fixed amount of space to them. This allows the operand fetch unit to know exactly how many bytes to fetch when fetching an operand, so it can be quite fast.
In the SPARC architecture integers are stored in sizes of one byte, one halfword, one word, and one doubleword. We will concentrate on the word-sized integers - these are 32 bits in size.
Consider the representation in which the number 7 is encoded as seven '1' bits. What is wrong with this representation? It's not space-efficient. To represent the number 32 we would need 32 bits. We have already noted that n bits can represent an object with 2^n states, so using 32 bits to encode only 32 numbers is not as good as we know we can do. We know that we can encode 2^32, about 4 million, numbers using 32 bits.
This has been a problem ever since neanderthals put scratches on the cave wall to count the number of mastodons killed. It was solved quite elegantly by the invention of positional notation, which is familiar to us because it's used in the decimal system. The idea, as used in the binary system, works like this. We can encode the numbers 0 and 1 using a single bit. So to encode the next number, 2, we move one space to the left. Now we use our '2' symbol along with the original bit to encode 2 and 3. We need a new number for 4, so we move one place to the left again. With the '4' symbol and the others we can encode numbers up to 7. And so on. This is called BASE TWO, or BINARY NOTATION.
To read a base two number, just add up the powers of two:
1101 = 1*2^3 + 1*2^2 + 0*2^1 + 1*2^0 = 8 + 4 + 1 = 13.
Using these, and if you memorize the powers of two up to 1K, you can quickly convert any power of two to a decimal equivalent.
2^2 = 4 2^12 = 4K 2^22 = 4M 2^32 = 4G 2^26 = 64M 2^19 = 512K 2^31 = 2G
The addition operation will be quick if we can perform it using a small number of simple operations. Let's use the same approach to addition that we use every day.
Say we are adding two, 2-digit binary numbers:
wx + yz -- ab
Just as in everyday addition, start on the right and work towards the left. We need to add two bits. What are the possibilities?
x z b --------- 0 0 0 0 1 1 1 0 1 1 1 0We can see that we can find the value of b using the "xor" logical operation. This operation can be implemented in a very simple digital circuit.
However, when x=1 and z=1, we need to carry a 1 to the next column. Once again, this can be implemented using a simple digital circuit, called "AND":
x z C (the carry bit) --------- 0 0 0 0 1 0 1 0 0 1 1 1So: b = x XOR z, C = x AND z, a = (w XOR y) XOR C. Thus there are two circuits in the addition unit for each bit: an XOR circuit and an AND circuit. The AND circuit feeds into the XOR circuit of the next higher-order bit.
This can go on indefinitely. For a 32-bit integer, there are 32 of these bitwise adders. However, there is one problem: what happens to the output from carry circuit on the highest-order bit?
When we carry from the highest-order bit, it is because we have added two numbers whose sum is greater than 2^32-1. Thus we have calculated a number that is too big to be stored in 32 bits. This condition is called OVERFLOW. It's a bad thing to ignore an overflow, right? Think about what happens here.
Binary Addition: Two ways to carry out binary addition, first using a half adder, the second one uses a full adder. Details about the working of the half adder and the full adder are given in the text. At this point, it is also important to understand the meaning of the term modulus arithmetic. e.g. If someone says 2 modulo arithmetic, it means that there are only two numbers, 0 and 1, and any number will be represented as one of them. e.g.
Modulus arithmetic: Examples of addition modulo N: (where N = 2)
7 % 2 = 1 (3 + 4) % 2 = 1An addition unit in the ALU always does arithmetic modulus some value, in the SPARC it's modulus 2^32. That is, when we add two numbers whose sum is greater than or equal to 2^32, the result is less than 2^32 because a bit gets thrown away (the carry-out bit). The fact that we can add two number and get a smaller one sounds like subtraction -- this suggested to computer designers a way to implement negative numbers.