Published by: Sujan
Published date: 18 Jun 2021
Binary Data Representation is a very important part of the computer. The atomic unit of data in computer systems is the bit, which is actually an acronym that stands for binary digit. It can hold only 2 values or states: 0 or 1, true or false,. .therefore it can carry the smallest amount of meaningful information possible. These two state property enables us to build simple, cheap, and reliable machines that can process information using electro-mechanical, in the early days of computing, and, currently, electronic components. Although a bit is small and simple, it can be used to represent any kind of information by simply using many of them. Binary Data Representation is similar to the way a few symbols of the alphabet can describe virtually anything. The number of possible values that can be expressed with n bits grows very fast as n increases; each time we use one more bit, the number of represented values doubles. So we do not need too many bits to represent a useful set of values: n bits represent 2n different values.
Representing natural numbers (positive integers, commonly called unsigned integers in CS) is very easy. It is exactly the same as with numbers in the decimal system that humans use (because we have 10 fingers): we simply place the digits horizontally one after the other and the position of each digit determines its significance. The last digit (on the right-hand side, called the least-significant1 ) counts the units, the one to its left counts the 2’s, the next one the 4’s, and so on. Figure II.1 shows this schematically; it also shows how to convert a binary number to a base-10 number.
Although we can use as many digits as we want/need when we do calculations, computers cannot handle infinitely long sequences of bits directly in hardware; it can be done in software with hardware support in the form of special instructions. Therefore, computers operate on data types which are fixed-length bit vectors. Note that this fixed-length constraint applies to all types of data, not just numbers.
The ‘natural’ unit of access in a computer is called a word. Nearly all instructions operate on words. Most computers today use 32-bit words but this is currently being changing to 64. Another commonly used data type is byte, which is 8 bits. Other common data types include short, 16 bits, and long, which is 64 bits.
The fixed length of data types can lead to problems when operating on numbers, as the result of an operation may be too large to be represented by the Overflow number of bits available to a specific data type. This condition is called overflow and it is usually the responsibility of the program (or the operating system) to detect and deal with them. When a number overflows, it ‘wraps around’ and appears much smaller than what it is supposed to be. For this reason computer arithmetic using n bits is sometimes called modulo 2n arithmetic, i.e. it appears as if each number has been divided by 2n and what the computer stores are the remainder.
Because binary numbers can be very long and therefore rather tedious for humans to use, the hexadecimal notation is very commonly used in computer programming. Hex numbers (short for hexadecimal) have 16 as their base and use the usual 0-9 digits and also A-F3 to represent 10-15. For example 0xaf (0x is a common prefix used in computing to signify that a hex number follows) is 10×16+15 = 175 (remember 0xa = 10, 0xf = 15). Since the base of hex numbers is a power of 2, for every 4 binary digit combination there is a corresponding
hex digit. So a binary number can be converted to hex by directly substituting every 4 bits with 1 hex digit and vice versa.
Since bits can be used to represent any type of information, a hex number could also represent data types other than numbers. Remember that hexadecimal numbers are just a convenience, data are represented in binary in computers!
In addition to integers, computers are also used to perform calculations with real numbers. The representation of these numbers in computers is typically done using the floating-point data type. Real numbers can be represented in binary using the form (-1)s × f × 2e Therefore a floating-point type needs to hold s (the sign), f, and e. Note that floating-point numbers use sign-magnitude for the fraction part, f.
In addition to number crunching, computers are used to process text. Therefore characters (and punctuation marks) have to be encoded in a binary format. The most common representation is ASCII which stores one character into a byte. Java, being a modern language, uses Unicode (16-bits) for representing characters which means that the alphabets of most languages of the world can be encoded.
Words and phrases are created by combining characters into strings. The tricky issue here is how to tell where the string ends. One convention (adopted by the C language) is to use a special character to indicate the end of the string. Another, adopted by Java, is to use an accompanying variable, which contains the length of the string, ‘packaged’ — in an object — together with the string itself.
Warning: Don’t confuse (or mix without thinking) characters with numbers! 0 (the number) and ‘0’ (the character) are quite different when stored, say, in a byte; the former is stored as 0x00 while the latter is stored as 0x30. While it makes sense to do arithmetic with the numbers, doing standard binary arithmetic with ASCII represented numbers will not produce the correct results. Try it out, if you don’t believe it! Note that all data types can be represented in hex and hex numbers use some letters as digits, so be careful not to confuse characters with hex digits.
As explained earlier, not only the data but instructions are also stored in the computer memory. Therefore, instructions are also represented in a binary form and need to be encoded somehow. We will see the encodings for some MIPS instructions shortly.
The important point to remember is that the contents of a memory location could be anything: an instruction, a 2’s complement integer, four characters, . . .There is nothing in the memory location itself that describes what is stored there. It is the responsibility of the program to use these contents as they should be used. There is some support from the processor itself though, usually in coordination with the operating system, which ensures that a program cannot modify a part of the memory that it is not supposed to.
A program written in a high-level language ‘keeps’ data in variables. In the actual computer system most of the data are stored in the memory. Therefore a mechanism for mapping variables to memory locations is needed. The details of this mechanism are revealed in more advanced courses (compilers). What one needs to know, for now, is that we need a kind of ‘meta-data’, which says where the real data are in the memory. Because these meta-data point to the data, they are called pointers and since memory addresses are unsigned numbers, pointers are numbers themselves and can be ‘processed’ as such.