06 Character Encoding

The Character Set

32
33 !
34
35 #
36 $
37 %
38 &
39
40 (
41 )
42 *
43 +
44 ,
45
46 .
47 /
48 0
49 1
50 2
51 3
52 4
53 5
54 6
55 7
56 8
57 9
58 :
59 ;
60 <
61 =
62 >
63 ?
64 @
65 A
66 B
67 C
68 D
69 E
70 F
71 G
72 H
73 I
74 J
75 K
76 L
77 M
78 N
79 O
80 P
81 Q
82 R
83 S
84 T
85 U
86 V
87 W
88 X
89 Y
90 Z
91 [
92 \
93 ]
94 ^
95 _
96 `
97 a
98 b
99 c
100 d
101 e
102 f
103 g
104 h
105 i
106 j
107 k
108 l
109 m
110 n
111 o
112 p
113 q
114 r
115 s
116 t
117 u
118 v
119 w
120 x
121 y
122 z
123 {
124 |
125 }
126 ~

If you see unexpected error marker in your writing it is because you are encoding it with ASCII, which was originally developed for a seven bit system back in the early 1960’s, and has 128 information points to represent characters, of which the first 33 are mostly obsolete control codes. The printable characters are listed here. If you want to use additional characters, called entities, you can look up the special codes below.

Because there are many more characters than 128 ASCII characters, a more capable unicode was introduced with over 100,000 characters. Because of issues related to how computers read information, the 8bit multibyte character encoding for Unicode is the most popular. It is called utfF-8. It works by conjoining up to four 8 bytes (each composed up of 8 bits), with the first byte reproducing the ASCII characters so it is backwards compatible.

Number
of bytes
Bits for
code point
First
code point
Last
code point
Byte 1 Byte 2 Byte 3 Byte 4
1 7 U+0000 U+007F 0xxxxxxx
2 11 U+0080 U+07FF 110xxxxx 10xxxxxx
3 16 U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
4 21 U+10000 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode, which covers the remainder of almost all Latin-script alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N’Ko alphabets, as well as Combining Diacritical Marks. Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use, including most Chinese, Japanese and Korean characters. Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).

Entities lookup for characters with special meaning in HTML

Entity E Number N Description
&amp; & &#38; & ampersand
&gt; > &#62; > greater-than sign
&lt; < &#60; < less-than sign
&quot; " &#34; " quotation mark = APL quote

Entities for punctuation characters

Entity E Number N Description
&cent; ¢ &#162; ¢ cent sign
&curren; ¤ &#164; ¤ currency sign
&euro; &#8364; euro sign
&pound; £ &#163; £ pound sign
&yen; ¥ &#165; ¥ yen sign = yuan sign
&brvbar; ¦ &#166; ¦ broken bar = broken vertical bar
&bull; &#8226; bullet = black small circle
&copy; © &#169; © copyright sign
&dagger; &#8224; dagger
&Dagger; &#8225; double dagger
&frasl; &#8260; fraction slash
&hellip; &#8230; horizontal ellipsis = three dot leader
&iexcl; ¡ &#161; ¡ inverted exclamation mark
&image; &#8465; blackletter capital I = imaginary part
&iquest; ¿ &#191; ¿ inverted question mark = turned question mark
&lrm; &#8206; left-to-right mark (for formatting only)
&mdash; &#8212; em dash
&ndash; &#8211; en dash
&not; ¬ &#172; ¬ not sign
&oline; &#8254; overline = spacing overscore
&ordf; ª &#170; ª feminine ordinal indicator
&ordm; º &#186; º masculine ordinal indicator
&para; &#182; pilcrow sign = paragraph sign
&permil; &#8240; per mille sign
&prime; &#8242; prime = minutes = feet
&Prime; &#8243; double prime = seconds = inches
&real; &#8476; blackletter capital R = real part symbol
&reg; ® &#174; ® registered sign = registered trade mark sign
&rlm; &#8207; right-to-left mark (for formatting only)
&sect; § &#167; § section sign
&shy; ­ &#173; ­ soft hyphen = discretionary hyphen (displays incorrectly on Mac)
&sup1; ¹ &#185; ¹ superscript one = superscript digit one
&trade; &#8482; trade mark sign
&weierp; &#8472; script capital P = power set = Weierstrass p
&bdquo; &#8222; double low-9 quotation mark
&laquo; « &#171; « left-pointing double angle quotation mark = left pointing guillemet
&ldquo; &#8220; left double quotation mark
&lsaquo; &#8249; single left-pointing angle quotation mark
&lsquo; &#8216; left single quotation mark
&raquo; » &#187; » right-pointing double angle quotation mark = right pointing guillemet
&rdquo; &#8221; right double quotation mark
&rsaquo; &#8250; single right-pointing angle quotation mark
&rsquo; &#8217; right single quotation mark
&sbquo; &#8218; single low-9 quotation mark
&emsp; &#8195; em space
&ensp; &#8194; en space
&nbsp;   &#160;   no-break space = non-breaking space
&thinsp; &#8201; thin space
&zwj; &#8205; zero width joiner
&zwnj; &#8204; zero width non-joiner

Entities for shapes and arrows

Entity E Number N Description
&crarr; &#8629; downwards arrow with corner leftwards = carriage return
&darr; &#8595; downwards arrow
&dArr; &#8659; downwards double arrow
&harr; &#8596; left right arrow
&hArr; &#8660; left right double arrow
&larr; &#8592; leftwards arrow
&lArr; &#8656; leftwards double arrow
&rarr; &#8594; rightwards arrow
&rArr; &#8658; rightwards double arrow
&uarr; &#8593; upwards arrow
&uArr; &#8657; upwards double arrow
&clubs; &#9827; black club suit = shamrock
&diams; &#9830; black diamond suit
&hearts; &#9829; black heart suit = valentine
&spades; &#9824; black spade suit
&loz; &#9674; lozenge
&#8984; command key
&#8997; option key
&#9774; Peace Symbol
&#9775; yin yang

Entities for accented characters, accents, and other diacritics from Western European Languages

Entity E Number N Description
&acute; ´ &#180; ´ acute accent = spacing acute
&cedil; ¸ &#184; ¸ cedilla = spacing cedilla
&circ; ˆ &#710; ˆ modifier letter circumflex accent
&macr; ¯ &#175; ¯ macron = spacing macron = overline = APL overbar
&middot; · &#183; · middle dot = Georgian comma = Greek middle dot
&tilde; ˜ &#732; ˜ small tilde
&uml; ¨ &#168; ¨ diaeresis = spacing diaeresis
&Aacute; Á &#193; Á latin capital letter A with acute
&aacute; á &#225; á latin small letter a with acute
&Acirc; Â &#194; Â latin capital letter A with circumflex
&acirc; â &#226; â latin small letter a with circumflex
&AElig; Æ &#198; Æ latin capital letter AE = latin capital ligature AE
&aelig; æ &#230; æ latin small letter ae = latin small ligature ae
&Agrave; À &#192; À latin capital letter A with grave = latin capital letter A grave
&agrave; à &#224; à latin small letter a with grave = latin small letter a grave
&Aring; Å &#197; Å latin capital letter A with ring above = latin capital letter A ring
&aring; å &#229; å latin small letter a with ring above = latin small letter a ring
&Atilde; Ã &#195; Ã latin capital letter A with tilde
&atilde; ã &#227; ã latin small letter a with tilde
&Auml; Ä &#196; Ä latin capital letter A with diaeresis
&auml; ä &#228; ä latin small letter a with diaeresis
&Ccedil; Ç &#199; Ç latin capital letter C with cedilla
&ccedil; ç &#231; ç latin small letter c with cedilla
&Eacute; É &#201; É latin capital letter E with acute
&eacute; é &#233; é latin small letter e with acute
&Ecirc; Ê &#202; Ê latin capital letter E with circumflex
&ecirc; ê &#234; ê latin small letter e with circumflex
&Egrave; È &#200; È latin capital letter E with grave
&egrave; è &#232; è latin small letter e with grave
&ETH; Ð &#208; Ð latin capital letter ETH
&eth; ð &#240; ð latin small letter eth
&Euml; Ë &#203; Ë latin capital letter E with diaeresis
&euml; ë &#235; ë latin small letter e with diaeresis
&Iacute; Í &#205; Í latin capital letter I with acute
&iacute; í &#237; í latin small letter i with acute
&Icirc; Î &#206; Î latin capital letter I with circumflex
&icirc; î &#238; î latin small letter i with circumflex
&Igrave; Ì &#204; Ì latin capital letter I with grave
&igrave; ì &#236; ì latin small letter i with grave
&Iuml; Ï &#207; Ï latin capital letter I with diaeresis
&iuml; ï &#239; ï latin small letter i with diaeresis
&Ntilde; Ñ &#209; Ñ latin capital letter N with tilde
&ntilde; ñ &#241; ñ latin small letter n with tilde
&Oacute; Ó &#211; Ó latin capital letter O with acute
&oacute; ó &#243; ó latin small letter o with acute
&Ocirc; Ô &#212; Ô latin capital letter O with circumflex
&ocirc; ô &#244; ô latin small letter o with circumflex
&OElig; Π&#338; Πlatin capital ligature OE
&oelig; œ &#339; œ latin small ligature oe
&Ograve; Ò &#210; Ò latin capital letter O with grave
&ograve; ò &#242; ò latin small letter o with grave
&Oslash; Ø &#216; Ø latin capital letter O with stroke = latin capital letter O slash
&oslash; ø &#248; ø latin small letter o with stroke, = latin small letter o slash
&Otilde; Õ &#213; Õ latin capital letter O with tilde
&otilde; õ &#245; õ latin small letter o with tilde
&Ouml; Ö &#214; Ö latin capital letter O with diaeresis
&ouml; ö &#246; ö latin small letter o with diaeresis
&Scaron; Š &#352; Š latin capital letter S with caron
&scaron; š &#353; š latin small letter s with caron
&szlig; ß &#223; ß latin small letter sharp s = ess-zed
&THORN; Þ &#222; Þ latin capital letter THORN
&thorn; þ &#254; þ latin small letter thorn
&Uacute; Ú &#218; Ú latin capital letter U with acute
&uacute; ú &#250; ú latin small letter u with acute
&Ucirc; Û &#219; Û latin capital letter U with circumflex
&ucirc; û &#251; û latin small letter u with circumflex
&Ugrave; Ù &#217; Ù latin capital letter U with grave
&ugrave; ù &#249; ù latin small letter u with grave
&Uuml; Ü &#220; Ü latin capital letter U with diaeresis
&uuml; ü &#252; ü latin small letter u with diaeresis
&Yacute; Ý &#221; Ý latin capital letter Y with acute
&yacute; ý &#253; ý latin small letter y with acute
&yuml; ÿ &#255; ÿ latin small letter y with diaeresis
&Yuml; Ÿ &#376; Ÿ latin capital letter Y with diaeresis

Entities for mathematical and technical characters (including Greek)

Entity E Number N Description
&deg; ° &#176; ° degree sign
&divide; ÷ &#247; ÷ division sign
&frac12; ½ &#189; ½ vulgar fraction one half = fraction one half
&frac14; ¼ &#188; ¼ vulgar fraction one quarter = fraction one quarter
&frac34; ¾ &#190; ¾ vulgar fraction three quarters = fraction three quarters
&ge; &#8805; greater-than or equal to
&le; &#8804; less-than or equal to
&minus; &#8722; minus sign
&sup2; ² &#178; ² superscript two = superscript digit two = squared
&sup3; ³ &#179; ³ superscript three = superscript digit three = cubed
&times; × &#215; × multiplication sign
&alefsym; &#8501; alef symbol = first transfinite cardinal
&and; &#8743; logical and = wedge
&ang; &#8736; angle
&asymp; &#8776; almost equal to = asymptotic to
&cap; &#8745; intersection = cap
&cong; &#8773; approximately equal to
&cup; &#8746; union = cup
&empty; &#8709; empty set = null set = diameter
&equiv; &#8801; identical to
&exist; &#8707; there exists
&fnof; ƒ &#402; ƒ latin small f with hook = function = florin
&forall; &#8704; for all
&infin; &#8734; infinity
&int; &#8747; integral
&isin; &#8712; element of
&lang; &#9001; left-pointing angle bracket = bra
&lceil; &#8968; left ceiling = apl upstile
&lfloor; &#8970; left floor = apl downstile
&lowast; &#8727; asterisk operator
&micro; µ &#181; µ micro sign
&nabla; &#8711; nabla = backward difference
&ne; &#8800; not equal to
&ni; &#8715; contains as member
&notin; &#8713; not an element of
&nsub; &#8836; not a subset of
&oplus; &#8853; circled plus = direct sum
&or; &#8744; logical or = vee
&otimes; &#8855; circled times = vector product
&part; &#8706; partial differential
&perp; &#8869; up tack = orthogonal to = perpendicular
&plusmn; ± &#177; ± plus-minus sign = plus-or-minus sign
&prod; &#8719; n-ary product = product sign
&prop; &#8733; proportional to
&radic; &#8730; square root = radical sign
&rang; &#9002; right-pointing angle bracket = ket
&rceil; &#8969; right ceiling
&rfloor; &#8971; right floor
&sdot; &#8901; dot operator
&sim; &#8764; tilde operator = varies with = similar to
&sub; &#8834; subset of
&sube; &#8838; subset of or equal to
&sum; &#8721; n-ary sumation
&sup; &#8835; superset of
&supe; &#8839; superset of or equal to
&there4; &#8756; therefore
&Alpha; Α &#913; Α greek capital letter alpha
&alpha; α &#945; α greek small letter alpha
&Beta; Β &#914; Β greek capital letter beta
&beta; β &#946; β greek small letter beta
&Chi; Χ &#935; Χ greek capital letter chi
&chi; χ &#967; χ greek small letter chi
&Delta; Δ &#916; Δ greek capital letter delta
&delta; δ &#948; δ greek small letter delta
&Epsilon; Ε &#917; Ε greek capital letter epsilon
&epsilon; ε &#949; ε greek small letter epsilon
&Eta; Η &#919; Η greek capital letter eta
&eta; η &#951; η greek small letter eta
&Gamma; Γ &#915; Γ greek capital letter gamma
&gamma; γ &#947; γ greek small letter gamma
&Iota; Ι &#921; Ι greek capital letter iota
&iota; ι &#953; ι greek small letter iota
&Kappa; Κ &#922; Κ greek capital letter kappa
&kappa; κ &#954; κ greek small letter kappa
&Lambda; Λ &#923; Λ greek capital letter lambda
&lambda; λ &#955; λ greek small letter lambda
&Mu; Μ &#924; Μ greek capital letter mu
&mu; μ &#956; μ greek small letter mu
&Nu; Ν &#925; Ν greek capital letter nu
&nu; ν &#957; ν greek small letter nu
&Omega; Ω &#937; Ω greek capital letter omega
&omega; ω &#969; ω greek small letter omega
&Omicron; Ο &#927; Ο greek capital letter omicron
&omicron; ο &#959; ο greek small letter omicron
&Phi; Φ &#934; Φ greek capital letter phi
&phi; φ &#966; φ greek small letter phi
&Pi; Π &#928; Π greek capital letter pi
&pi; π &#960; π greek small letter pi
&piv; ϖ &#982; ϖ greek pi symbol
&Psi; Ψ &#936; Ψ greek capital letter psi
&psi; ψ &#968; ψ greek small letter psi
&Rho; Ρ &#929; Ρ greek capital letter rho
&rho; ρ &#961; ρ greek small letter rho
&Sigma; Σ &#931; Σ greek capital letter sigma
&sigma; σ &#963; σ greek small letter sigma
&sigmaf; ς &#962; ς greek small letter final sigma
&Tau; Τ &#932; Τ greek capital letter tau
&tau; τ &#964; τ greek small letter tau
&Theta; Θ &#920; Θ greek capital letter theta
&theta; θ &#952; θ greek small letter theta
&thetasym; ϑ &#977; ϑ greek small letter theta symbol
&upsih; ϒ &#978; ϒ greek upsilon with hook symbol
&Upsilon; Υ &#933; Υ greek capital letter upsilon
&upsilon; υ &#965; υ greek small letter upsilon
&Xi; Ξ &#926; Ξ greek capital letter xi
&xi; ξ &#958; ξ greek small letter xi
&Zeta; Ζ &#918; Ζ greek capital letter zeta
&zeta; ζ &#950; ζ greek small letter zeta

Leave a Reply