2. HTML Specification
Level 0
The characters between HTML tags represent text encoded according to ISO 8859/1 8-bit single-byte coded graphic character set known as Latin Alphabet No. 1, or simply Latin-1. There are 256 character positions in the Latin-1 encoding. Latin-1 includes characters from most Western European languages. It consists of the space character, 186 characters that form a subset of the graphic characters in ISO 6937/2 (1983), and four additional characters that are intended for inclusion in ISO 6937/2. Also see Section 2.4.
The lower 128 character positions include a space, 33 control characters, the 26 upper- and lowercase letters of the english alphabet, 10 numerals and 32 other printing characters This subset, functionally identical to ASCII, is defined by ISO 646 7-bit coded character set for information interchange, also known as the International Reference Version. ISO 646 is identical in most respect to the ANSI standard for ASCII (American Standard Code for Information Interchange). The only significant difference between ISO 646 and ASCII is the specific names assigned to the control characters in positions 00-31 and 127.
The upper 128 positions include a non-breaking space, a soft hyphen indicator, 93 graphical characters, 8 unassigned characters, and 25 control characters. Because non-breaking space and soft hyphen indicator are not recognized and interpreted by all HTML user agents, their use is discouraged.
There are 58 character positions occupied by control characters. See Section 2.16.2 for details on the interpretation of control characters.
Because certain special characters are subject to interpretation and special processing, information providers and HTML user agent implementors should follow the guidelines in Section 2.16.1.
Certain characters may not be accessible from your keyboard, or some part of your system (i.e. translation software) may not be equipped to deal with 8-bit character codes. HTML and many HTML user agents provide character entity references (see Section 2.17.2) and numerical character references (see Section 2.17.3) to facilitate the entry and interpretation of characters by name and by numerical position.
Because certain characters will be interpreted as markup, they must be represented by markup as described in Section 2.16.3 and Section 2.16.4.
In SGML applications, the use of control characters is limited in order to maximize the chance of successful interchange over heterogenous networks and operating systems. In HTML, only three control characters are used. The valid control characters and their interpretation are:
2.16.1 Special Characters
Certain characters have special meaning in HTML documents. There are two printing characters which may be interpreted by an HTML application to have an effect of the format of the text:Space
Hyphen
2.16.2 Control Characters
Control characters are non-printable characters that are typically used for communication and device control, as format effectors, and as information separators.Horizontal Tab (HT - 9 dec)
Line Feed (LF - 10 dec)
Carriage Return (CR - 13 dec)
2.16.3 Numeric Character References
Any printing character within the 8-bit character encoding of ISO 8859/1 (256 character positions) or the 7-bit character encoding of ISO 646 (128 character positions) may be represented within the text of an HTML document by a numeric character reference. See Section 2.17.1 for a list of the characters, their names and input syntax.
Two reasons for using a numeric character reference:
the ampersand (&), double quotes ("), lesser (<) and greater (>) characters
Two reasons for using a character entity:
the ampersand (&), double quotes ("), lesser (<) and greater (>) characters Kurt Gödel was a famous logician and mathematician.
Generated with CERN WebMaker