Monday, January 22, 2018

IAM Search

Code sets

After engineering methods to encode bits of data as electrical (or optical or radio) signals, standardizing the speed of those bits’ broadcast, framing bits as groups complete with start and stop signals, and providing for multiple devices to share a common communications channel, there still remains the issue of how to make “1” and “0” symbols represent something other than Boolean values (on/off, true/false, mark/space, etc.). This is where codes become useful.

Morse and Baudot codes

In the early days of communication, Morse code was used to represent letters of the alphabet, numerals (0 through 9), and some other characters in the form of “dot” and “dash” signals. In the International Morse Code, no character requires more than six bits of data, and some (such as the common letters E and T) require only one bit.

The variable bit-length of Morse code, though very efficient1 in terms of the total number of “dots” and “dashes” required to communicate textual messages, was difficult to automate in the form of teletype machines. In answer to this technological problem, Emile Baudot invented a different code where each and every character was five bits in length. Although this gave only 32 characters, which is not enough to represent the 26-letter English alphabet, plus all ten numerals and punctuation symbols, Baudot successfully addressed this problem by designating two of the characters as “shift” characters: one called “letters” and the other called “figures.” The other 30 characters had dual (overloaded) meanings, depending on the last “shift” character issued in the serial data stream2.

 

EBCDIC and ASCII

A much more modern attempt at encoding characters useful for text representation was EBCDIC, the “Extended Binary Coded Decimal Interchange Code” invented by IBM in 1962 for use with their line of large (“mainframe”) computers. In EBCDIC, each character was represented by a one-byte (eight bit) code, giving this code set 256 (28) unique characters. Not only did this provide enough unique characters to represent all the letters of the English alphabet (lower-case and capital letters separately!) and numerals 0 through 9, but it also provided a rich set of control characters such as “null,” “delete,” “carriage return,” “linefeed,” and others useful for controlling the action of electronic printers and other machines.

A number of EBCDIC codes were unused (or seldom used), though, which made it somewhat inefficient for large data transfers. An attempt to improve this state of affairs was ASCII, the “American Standard Code for Information Interchange” first developed in 1963 and then later revised in 1967, both by the American National Standards Institute (ANSI). ASCII is a seven-bit code, one bit shorter per character than EBCDIC, having only 128 unique combinations as opposed to EBCDIC’s 256 unique combinations. The compromise made with ASCII versus EBCDIC was a smaller set of control characters.

IBM later created their own “extended” version of ASCII, which was eight bits per character. In this extended code set were included some non-English characters plus special graphic characters, many of which may be placed adjacently on a paper printout or on a computer console display to form larger graphic objects such as lines and boxes.

ASCII is wildly popular, even today. Nearly every digital transmission of English text in existence employs ASCII as the character encoding3. Nearly every text-based computer program’s source code is also stored on media using ASCII encoding, where 7-bit codes represent alphanumeric characters comprising the program instructions.

 

The basic seven-bit ASCII code is shown in this table, with the three most significant bits in different columns and the four least significant bits in different rows. For example, the ASCII representation of the upper-case letter “F” is 1000110, the ASCII representation of the equal sign (=) is 0111101, and the ASCII representation of the lower-case letter “q” is 1110001.

ASCII code set

 

LSB / MSB
000 001 010 011
100
101
110
111
0000 zNUL zDLE zSP z0 @ zP z zp
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EOT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K z k z
1100 FF FS , < L z l z
1101 CR GS - = M z m z
1110 SO RS . > N z n z
1111 SI US / ? O z o z

Unicode

There exist many written languages whose characters cannot and are not represented by either EBCDIC or ASCII. In in attempt to remedy this state of affairs, a new standardized code set is being developed called Unicode, with sixteen bits per character. This large bit field gives 65,536 possible combinations, which should be enough to represent every unique character in every written language in the entire world. In deference to existing standards, Unicode encapsulates both ASCII and EBCDIC as sub-sets within its defined character set4.

And no, I am not going to include a table showing all the Unicode characters!

 

1Morse code is an example of a self-compressing code, already optimized in terms of minimum bit count. Fixed- field codes such as Baudot and the more modern ASCII tend to waste bandwidth, and may be “compressed” by removing redundant bits.

2For example, the Baudot code 11101 meant either “Q” or “1” depending on whether the last shift character was “letters” or “figures,” respectively. The code 01010 meant either “R” or “4”. The code 00001 meant either “T” or a “5”.

3Including the source code for this textbook!

4To illustrate, the first 128 Unicode characters (0000 through 007F hexadecimal) are identical to ASCII’s 128 characters (00 through 7F hexadecimal)

 

Click here to go back to Digital Data Communication Theory

Go Back to Lessons in Instrumentation Table of Contents


Comments (0)Add Comment

Write comment

security code
Write the displayed characters


busy

Promotions

  • ...more

Disclaimer

Important: All images are copyrighted to their respective owners. All content cited is derived from their respective sources.

Contact us for information and your inquiries. IAMechatronics is open to link exchanges.

IAMechatronics Login