Lecture 4 Notes: Bits and bytes Computer Literacy 1 Tuesday - - PDF document

lecture 4 notes bits and bytes
SMART_READER_LITE
LIVE PREVIEW

Lecture 4 Notes: Bits and bytes Computer Literacy 1 Tuesday - - PDF document

Lecture 4 Notes: Bits and bytes Computer Literacy 1 Tuesday 28/9/2004 Lecture Overview Lecture topics: How computers encode information How to quantify information and memory How to represent and communicate binary data The aim


slide-1
SLIDE 1

Lecture 4 Notes: Bits and bytes

Computer Literacy 1 Tuesday 28/9/2004

Lecture Overview

Lecture topics:

  • How computers encode information
  • How to quantify information and memory
  • How to represent and communicate binary data

The aim is to be able to:

  • Recognise the significance of numbers such as in 256 MB RAM, 32 bit

word length, set of 256 characters etc.

  • Reason quantitatively about computer systems, e.g. Assess the capability
  • f a system to handle a file of a given size.

Computers are digital

Computers are built of electric switches: each switch is “on” or “off” at any

  • time. This means that computers process discrete electrical events: they are
  • digital. In contrast, analogue devices process continuous signals.

Digital and analogue examples:

  • On an analogue watch the clock hands move continuously, and the time

can be read with an arbitrary level of precision (depending on your eyesight and reactions!) including fractions of seconds. Time increases continuously, as read by the analogue watch.

  • On a digital watch that displays the hours, minutes and seconds, the time

increases in steps of one second. Between seconds, there is no way of measuring what fraction of a second has passed. Time increases in discrete steps, as read by the digital watch.

  • Modems convert the digital signals in the computer into an analogue
  • signal. This can be sent down telephone wires as an electromagnetic
  • wave. At a receiving computer, the modem converts the analogue signal

back into a digital form. Advantages of digital over analogue processing:

  • It is fast. It is much quicker to decide if a switch is “on” or “off” than to

decide how much it is “on” or “off”.

  • It is robust to errors: small errors at each switch in the computer are not
  • propagated. If a digital switch is a little bit “on” when it should be “off”, then

the signal coming from it will still be treated as an “off” signal. If an analgogue switch is a little bit “on” when it should be “off”, then signal will be accepted at the next stage as “a little bit on”. Errors can accumulate this way in analogue devices.

slide-2
SLIDE 2

Bits of information

A bit is one unit of “information”. Because computers are built out of transistors, which act as binary switches gated by a third input, the smallest level of information that it makes sense to think about is the binary choice. The term bit first occurs in print in 1948, and is attributed to a scientist called John Tukey. It comes from binary digit (~1948). Tukey considered “bigit” and “binit” as possibilities. What a missed opportunity.

  • Each bit has one of two values e.g. 1 or 0.
  • Could be represented by Yes or No, True or False, American Idol™

finalists Fantasia or Diana, indeed any binary scheme.

Bytes

Byte is short for binary term, and was coined in 1956. The mutation from bite to byte occurred around 1956. So the word is a bit weird.

  • 1 Byte = 8 bits.

Originally, the term byte referred to the smallest unit of memory that can be accessed by the CPU. There were computers where this was 6-, 7-, 9-bits. In modern machines, 8-bits has become the standard and the other byte sizes have become obsolete. So 1 byte = 8 bits!

  • One byte can express 256 (28 ) possibilities.
  • Each bit has can have one of two values, so for 2 bits, there are 2x2

possibilities (00, 01, 10, 11). With three bits there are 2x2x2 possibilities (000, 001, 010, 011 and 100, 101, 110, 111). With 8 bits there are 2x2x2x2x2x2x2x2 possiblities, more conveniently written as 28 possibilities

The importance of Bytes

1 byte is the minimum unit of memory that can be accessed. Word length of a processor: number of bits a CPU can process at one time: Pentium – 32 bits, 4 bytes Itanium – 64 bits, 8 bytes An example of bytes in use is an internet “IP” address e.g. 129.215.155.141 (IP addresses will be explain in later lectures)

Big Bytes

210 bytes = 1024 bytes ~1000 bytes 210 bytes = 1 kilobyte = 1KB So 1 KB is not 1000 bytes (N.b. 22 = 2x2, 23 = 2x2x2, 25 = 2x2x2x2x2 etc) 1 Megabyte (MB) = 1,048,576 bytes = 220 bytes = 1024 KB ~109 bytes 1 Gigabyte (GB) = 1,073,741,824 bytes = 230 bytes = 10242 KB ~109 bytes 1 Terabyte (TB) = 240 bytes = 10243 KB ~1012 bytes 1 Petabyte (PB) = 250 bytes = 10244 KB ~1015 bytes Then exabyte, zettabyte and yottabyte! Maybe in your lifetime!

slide-3
SLIDE 3

How big is big?

Develop a sense of how big files are and you will do yourself a big favour, avoiding opening huge files with innapropriate applications, knowing when you have generated files of junk etc. You will also be better able to assess the capability of a system. 300 page novel 0.5 MB Floppy Disk 1.44 MB 3 minute MP3 track 2 MB Edinburgh telephone directory 20 Mb CD-ROM 600-700 MB DVD 4 GB Corporate customer database 1 TB Video of your life 1 PB Human brain > 10 PB

Hexadecimal (Hex)

Binary arithmetic is counting in 2s. Normally we count in 10s. Hexadecimal arithmetic is counting in 16s.

  • Bits can be expressed in hexadecimal without losing any information.
  • It is easier for humans to read!
  • Because bytes can be expressed using two numbers in hexadecimal (one

hex digit for the first 4 bits, and one hex digit for the next 4 bits), it is a very widely used system for expressing binary code. You won' t have to do hex arithmetic in the exam. You should be able to recognise it though.

Counting in different bases

In normal (base 10) counting, 1101 = 1x1000 + 1x100 + 0x10 + 1x1 = 1x103 + 1x102 + 0x101 + 1x100 So each position quantifies a power of 10. when you count in a different “base”, each digit quantifies a power of the base number you are counting in. Because computers process binary bits, computer talk requires talking about numbers in terms of base numbers which are multiples of 2. Below are some examples of counting in relevant bases: In base 2, 1101 = 1x23 + 1x22 + 0x21 + 1x20 = 13 (base 10) In base 8 (octal) 1101 = 1x83 + 1x82 + 0x81 + 1x80 = 83 (base 10) In base 16 (hex) 1101 = 1x163 + 1x162 + 0x161 + 1x160 = 4368 (base 10)

Counting in Hex

The sixteen digits of Hex are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F

  • When you get to the number for ten, if you used “10” it would be

interpreted in hex as one unit of sixteen plus zero ones (as the counting in different bases examples above show). Creating new digits to express the

slide-4
SLIDE 4

numbers from ten to fifteen is therefore a lot less confusing. The original choice of using letters to do this job was arbitrary, but this is the convention. Now keep counting! 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1A, 1B, 1C, 1D, 1E, 1F So 1A (base 16) = 1x16 + 10x1 (base 10) = 26 (base 10) and 1B (base 16) = 1x16 + 11x1 (base 10) = 27 (base 10).

  • Hex is normally written using “#” or “0x” to indicate the number is in hex
  • form. E.g. #AB3 or 0xAB3 means AB3 (hex), AB3 (base 16), AB316
  • You will come across hex in many occaisions if you work with computers.

To know about hex is to be enabled in the computer world.

  • One example of hex in practice is the “non-dithering RGB colour codes”.

“Dithering colours” are colours which look different on different web browsers, and a common fault of many websites. Colours defined by their red green blue (RGB) code, which specifies how much red, green and blue is in the colour, look the same using every web browser. The “non-dithering RGB colour codes” are binary codes, expressed in hex.

  • The above are examples of non-dithering RGB colours and their hex codes

(printed in greyscale). Each code is three bytes long, and each byte is expressed using two numbers in hex. So the code #CC3399 expresses

  • ne byte of information about the level of red (#CC), one byte of

information about the level of green (#33) and one byte about the level of blue (#99). Easy, see!

Expressing characters

Computers process bits of information, but we process language through

  • characters. There are standard conventions that define how characters are

expressed in bits. The standard conventions include:

  • ASCII, Unicode, ANSI, ISO Latin

ASCII is a very common convention used in many situations.

  • Its weird name stands for American Standard Code for Information

Interchange (don' t learn this! It' s just to make the acronym less mysterious). The convention works as a dictionary, allowing keystroke characters to be translated into binary form.

slide-5
SLIDE 5

Here are some examples of characters, their ASCII binary codes, and these codes expressed in hex: Character Binary Hex A 01000001 41 ? 00111111 3F a 11100001 61 9 00111001 39 (escape) 00011011 1B The rightmost 7 bits (having a potential to express 128, or 27 possibilities) represent 96 characters and 32 control codes (the control codes include carriage return, end of text etc). The leftmost 8th bit in bold font allows some error detection:

  • It codes “parity” of the byte: whether the total number of 1s in the byte is
  • dd or even.
  • If one bit is corrupted, parity will change.
  • If the value of the 8th bit does not match the parity of the other 7 bits, then

a transmission error has been detected.

  • The mechanism is unable to detect 2 corrupted bits. Originally, ASCII was

designed to encode basic American English text so 7 bits was enough to express the character set. By this time, the “least addressable unit of memory” (see above) was already a byte, 8 bits, so this is the use the 8th bit was put to.

Other conventions

ASCII (7 bits)

  • 128 characters

ISO Latin 1 (8 bits) - 256 characters Unicode (16 bits)

  • 65536 characters

In Japanese, 1945 ideogram characters are in standard use. In Chinese, there are at least 50000 ideogram characters that can be used, although an educted person may only know 5000. 2 bytes is of the right order of magnitude to express the characters of the laguages of the world, and is a convenient memory size for computers. It is the convention used by Java. Unicode issues

  • But doubles memory demands compared to ASCII
  • Requires compatibility of software

Encoding binary

You want to transfer files. You need to ensure that binary strings are not misinterpreted as control strings by the CPU, telling it do give you, for instance, the “blue screen of death”.

  • Macs use a scheme called BinHex.
  • Unix can use a scheme called uuencode.
  • Mail programs use MIME...
slide-6
SLIDE 6

MIME stands for Multipurpose Internet Mail Extensions (you don't need to learn the acronym). It is an Internet standard that describes

  • How emails must be formatted, so that the receiver can interpret the email
  • How non-text (e.g. Pictures, audio files etc) is converted into ASCII

Email programs which support MIME

  • Outlook, Eudora, Pine, Pegasus...
  • MIME uses the encoding scheme “Base64” to encode non-text files.

Below is an email with a photo attached (showing a friend meeting a C-list celebrity) in the MIME format received by a mail program. If someone sends you an email using an email program that is not compatible with your email program, you may well see a message looking like this: From: Kit Longden <kit@anc.ed.ac.uk> To: <kit@anc.ed.ac.uk> Date: Mon, 27 Sep 2004 18:13:22 +0000 MIME-Version: 1.0 Content-Type: image/png; name="Pete_n_Phillipa.png" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="Pete_n_Phillipa.png" UesDBBQAAAAIAMyiPDBbWM11ah8BAJRVAQAGABUASUYucGRmVVQJA AMAGhhAABoYQFV4BAD0AfQB7P0FVJxL0yiM4u4JECRYSNAwwAwySAg OwV0DAYK7Q0ggQCA4QYIEd3d3CRY0uDtBgmsg2M/MsN/N3vv9znm/8527 1r3r/msyRT8t1d1VXV3V1fVMHsuLiD1lY2HH4KJipbLWN8Pg48MAKLvaGAJ ErQysX5taGWMAREyNjAztDK0MDO21gECAoYuBhZ4lwNbR2sHwtb6Fnamx iQPAytFS39DO3tTYCvDa2sJCzw5gY2hnYGjlANCzvEnZ61m9hrWAVbfRu8F nYWh0m4Jl6tlDiu3NATYWjvYAA2tLSz2AiauNiaEVBJmp9WuAvYWevQngja GdNcDayhDg4GwNcDCxMzQEGFk72gGMTJ0MAfamLgB7Q6ebNoawgZneV DSw...

  • Note the line telling you the message is in MIME format, Version 1.0 in this

case.

  • Note the <Content-Type: image/png> telling you it is an image in png

(portable network graphic) format.

  • Note the <Content-Transfer-Encoding: base64> telling you the encoding

scheme used.

  • Note the encoding of the photo: it is expressed in terms of letters, numbers

and slashes.

slide-7
SLIDE 7

For the curious, this is the encoded photo:

MIME and Base64

When email delivery protocols (IMAP, POP3, SMTP, and others) were created, they used ASCII characters to express commands e.g. “end of message”. MIME uses Base64 to convert binary data into safe ASCII characters. There are 64 safe ASCII characters: “A-Z”, “a-z”, “0-9”, “+” and “/”. What Base64 does is (and this is an illustrative example, you won't have to do Base64 encoding in the exam) this:

  • Take 3 bytes at a time (24 bits)
  • Express each block of 6 bits in ASCII
  • Result is four bytes of “safe” ASCII code

The file size is increased by 33% as each block of 6 bits is expressed by 8 bits, but the encoding benefits from ASCII parity checks. Example: “fun” in Base64

  • The ASCII code for “fun” is:

“f” 01100110 “u” 01110101 “n” 01101110

  • Written as 4 blocks of 6 bits this is: 011001 100111 010101 101110
  • These numbers are in decimal:

25 39 21 46

  • Base64 then uses its own dictionary to convert the numbers (which must

be less than 64, or 26) into one of the safe ASCII characters.

  • They are: Z n V u, so ZnVu is how Base64 expresses fun!

Key Points

  • Computers process bits of information very quickly.
  • Hex and octal allow user to express binary
  • Characters are encoded by the computer as numbers, using e.g. ASCII
  • Non-text requires encoding
  • Can you recognise encoded information?