Commit 85eeac93 authored by Chong Yidong's avatar Chong Yidong

Consistently hex notation to represent character codes.

* nonascii.texi (Text Representations, Character Codes)
(Converting Representations, Explicit Encoding)
(Translation of Characters): Use hex notation consistently.
(Character Sets): Fix map-charset-chars doc (Bug#5197).
parent b894c439
2010-01-02 Chong Yidong <cyd@stupidchicken.com>
* nonascii.texi (Text Representations, Character Codes)
(Converting Representations, Explicit Encoding)
(Translation of Characters): Use hex notation consistently.
(Character Sets): Fix map-charset-chars doc (Bug#5197).
2010-01-01 Chong Yidong <cyd@stupidchicken.com> 2010-01-01 Chong Yidong <cyd@stupidchicken.com>
* loading.texi (Where Defined): Make it clearer that these are * loading.texi (Where Defined): Make it clearer that these are
......
...@@ -46,12 +46,12 @@ in most any known written language. ...@@ -46,12 +46,12 @@ in most any known written language.
follows the @dfn{Unicode Standard}. The Unicode Standard assigns a follows the @dfn{Unicode Standard}. The Unicode Standard assigns a
unique number, called a @dfn{codepoint}, to each and every character. unique number, called a @dfn{codepoint}, to each and every character.
The range of codepoints defined by Unicode, or the Unicode The range of codepoints defined by Unicode, or the Unicode
@dfn{codespace}, is @code{0..10FFFF} (in hex), inclusive. Emacs @dfn{codespace}, is @code{0..#x10FFFF} (in hexadecimal notation),
extends this range with codepoints in the range @code{110000..3FFFFF}, inclusive. Emacs extends this range with codepoints in the range
which it uses for representing characters that are not unified with @code{#x110000..#x3FFFFF}, which it uses for representing characters
Unicode and raw 8-bit bytes that cannot be interpreted as characters that are not unified with Unicode and @dfn{raw 8-bit bytes} that
(the latter occupy the range @code{3FFF80..3FFFFF}). Thus, a cannot be interpreted as characters. Thus, a character codepoint in
character codepoint in Emacs is a 22-bit integer number. Emacs is a 22-bit integer number.
@cindex internal representation of characters @cindex internal representation of characters
@cindex characters, representation in buffers and strings @cindex characters, representation in buffers and strings
...@@ -189,8 +189,8 @@ of characters as @var{string}. If @var{string} is a multibyte string, ...@@ -189,8 +189,8 @@ of characters as @var{string}. If @var{string} is a multibyte string,
it is returned unchanged. The function assumes that @var{string} it is returned unchanged. The function assumes that @var{string}
includes only @acronym{ASCII} characters and raw 8-bit bytes; the includes only @acronym{ASCII} characters and raw 8-bit bytes; the
latter are converted to their multibyte representation corresponding latter are converted to their multibyte representation corresponding
to the codepoints in the @code{3FFF80..3FFFFF} area (@pxref{Text to the codepoints @code{#x3FFF80} through @code{#x3FFFFF}, inclusive
Representations, codepoints}). (@pxref{Text Representations, codepoints}).
@end defun @end defun
@defun string-to-unibyte string @defun string-to-unibyte string
...@@ -271,15 +271,19 @@ contains no text properties. ...@@ -271,15 +271,19 @@ contains no text properties.
The unibyte and multibyte text representations use different The unibyte and multibyte text representations use different
character codes. The valid character codes for unibyte representation character codes. The valid character codes for unibyte representation
range from 0 to 255---the values that can fit in one byte. The valid range from 0 to @code{#xFF} (255)---the values that can fit in one
character codes for multibyte representation range from 0 to 4194303 byte. The valid character codes for multibyte representation range
(#x3FFFFF). In this code space, values 0 through 127 are for from 0 to @code{#x3FFFFF}. In this code space, values 0 through
@acronym{ASCII} characters, and values 128 through 4194175 (#x3FFF7F) @code{#x7F} (127) are for @acronym{ASCII} characters, and values
are for non-@acronym{ASCII} characters. Values 0 through 1114111 @code{#x80} (128) through @code{#x3FFF7F} (4194175) are for
(#10FFFF) correspond to Unicode characters of the same codepoint; non-@acronym{ASCII} characters.
values 1114112 (#110000) through 4194175 (#x3FFF7F) represent
characters that are not unified with Unicode; and values 4194176 Emacs character codes are a superset of the Unicode standard.
(#x3FFF80) through 4194303 (#x3FFFFF) represent eight-bit raw bytes. Values 0 through @code{#x10FFFF} (1114111) correspond to Unicode
characters of the same codepoint; values @code{#x110000} (1114112)
through @code{#x3FFF7F} (4194175) represent characters that are not
unified with Unicode; and values @code{#x3FFF80} (4194176) through
@code{#x3FFFFF} (4194303) represent eight-bit raw bytes.
@defun characterp charcode @defun characterp charcode
This returns @code{t} if @var{charcode} is a valid character, and This returns @code{t} if @var{charcode} is a valid character, and
...@@ -540,7 +544,7 @@ and strings. ...@@ -540,7 +544,7 @@ and strings.
@cindex @code{eight-bit}, a charset @cindex @code{eight-bit}, a charset
Emacs defines several special character sets. The character set Emacs defines several special character sets. The character set
@code{unicode} includes all the characters whose Emacs code points are @code{unicode} includes all the characters whose Emacs code points are
in the range @code{0..10FFFF}. The character set @code{emacs} in the range @code{0..#x10FFFF}. The character set @code{emacs}
includes all @acronym{ASCII} and non-@acronym{ASCII} characters. includes all @acronym{ASCII} and non-@acronym{ASCII} characters.
Finally, the @code{eight-bit} charset includes the 8-bit raw bytes; Finally, the @code{eight-bit} charset includes the 8-bit raw bytes;
Emacs uses it to represent raw bytes encountered in text. Emacs uses it to represent raw bytes encountered in text.
...@@ -628,12 +632,12 @@ that fits the second argument of @code{decode-char} above. If ...@@ -628,12 +632,12 @@ that fits the second argument of @code{decode-char} above. If
The following function comes in handy for applying a certain The following function comes in handy for applying a certain
function to all or part of the characters in a charset: function to all or part of the characters in a charset:
@defun map-charset-chars function charset &optional arg from to @defun map-charset-chars function charset &optional arg from-code to-code
Call @var{function} for characters in @var{charset}. @var{function} Call @var{function} for characters in @var{charset}. @var{function}
is called with two arguments. The first one is a cons cell is called with two arguments. The first one is a cons cell
@code{(@var{from} . @var{to})}, where @var{from} and @var{to} @code{(@var{from} . @var{to})}, where @var{from} and @var{to}
indicate a range of characters contained in charset. The second indicate a range of characters contained in charset. The second
argument is the optional argument @var{arg}. argument passed to @var{function} is @var{arg}.
By default, the range of codepoints passed to @var{function} includes By default, the range of codepoints passed to @var{function} includes
all the characters in @var{charset}, but optional arguments all the characters in @var{charset}, but optional arguments
...@@ -751,7 +755,7 @@ This variable automatically becomes buffer-local when set. ...@@ -751,7 +755,7 @@ This variable automatically becomes buffer-local when set.
@defun make-translation-table-from-vector vec @defun make-translation-table-from-vector vec
This function returns a translation table made from @var{vec} that is This function returns a translation table made from @var{vec} that is
an array of 256 elements to map byte values 0 through 255 to an array of 256 elements to map bytes (values 0 through #xFF) to
characters. Elements may be @code{nil} for untranslated bytes. The characters. Elements may be @code{nil} for untranslated bytes. The
returned table has a translation table for reverse mapping in the returned table has a translation table for reverse mapping in the
first extra slot, and the value @code{1} in the second extra slot. first extra slot, and the value @code{1} in the second extra slot.
...@@ -1562,10 +1566,10 @@ in this section. ...@@ -1562,10 +1566,10 @@ in this section.
text. They logically consist of a series of byte values; that is, a text. They logically consist of a series of byte values; that is, a
series of @acronym{ASCII} and eight-bit characters. In unibyte series of @acronym{ASCII} and eight-bit characters. In unibyte
buffers and strings, these characters have codes in the range 0 buffers and strings, these characters have codes in the range 0
through 255. In a multibyte buffer or string, eight-bit characters through #xFF (255). In a multibyte buffer or string, eight-bit
have character codes higher than 255 (@pxref{Text Representations}), characters have character codes higher than #xFF (@pxref{Text
but Emacs transparently converts them to their single-byte values when Representations}), but Emacs transparently converts them to their
you encode or decode such text. single-byte values when you encode or decode such text.
The usual way to read a file into a buffer as a sequence of bytes, so The usual way to read a file into a buffer as a sequence of bytes, so
you can decode the contents explicitly, is with you can decode the contents explicitly, is with
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment