Commit d4241ae4 authored by Luc Teirlinck's avatar Luc Teirlinck
Browse files

(Non-ASCII in Strings): Clarify description of when a string is

unibyte or multibyte.
(Bool-Vector Type): Update examples.
(Equality Predicates): Correctly describe when two strings are `equal'.
parent d18473b9
......@@ -226,11 +226,12 @@ example, the character @kbd{A} is represented as the @w{integer 65}.
common to work with @emph{strings}, which are sequences composed of
characters. @xref{String Type}.
Characters in strings, buffers, and files are currently limited to the
range of 0 to 524287---nineteen bits. But not all values in that range
are valid character codes. Codes 0 through 127 are @acronym{ASCII} codes; the
rest are non-@acronym{ASCII} (@pxref{Non-ASCII Characters}). Characters that represent
keyboard input have a much wider range, to encode modifier keys such as
Characters in strings, buffers, and files are currently limited to
the range of 0 to 524287---nineteen bits. But not all values in that
range are valid character codes. Codes 0 through 127 are
@acronym{ASCII} codes; the rest are non-@acronym{ASCII}
(@pxref{Non-ASCII Characters}). Characters that represent keyboard
input have a much wider range, to encode modifier keys such as
Control, Meta and Shift.
@cindex read syntax for characters
......@@ -375,11 +376,11 @@ possible a wide range of basic character codes.
@ifnottex
2**7
@end ifnottex
bit attached to an @acronym{ASCII} character indicates a meta character; thus, the
meta characters that can fit in a string have codes in the range from
128 to 255, and are the meta versions of the ordinary @acronym{ASCII}
characters. (In Emacs versions 18 and older, this convention was used
for characters outside of strings as well.)
bit attached to an @acronym{ASCII} character indicates a meta
character; thus, the meta characters that can fit in a string have
codes in the range from 128 to 255, and are the meta versions of the
ordinary @acronym{ASCII} characters. (In Emacs versions 18 and older,
this convention was used for characters outside of strings as well.)
The read syntax for meta characters uses @samp{\M-}. For example,
@samp{?\M-A} stands for @kbd{M-A}. You can use @samp{\M-} together with
......@@ -416,8 +417,8 @@ significant in these prefixes.) Thus, @samp{?\H-\M-\A-x} represents
@kbd{Alt-Hyper-Meta-x}. (Note that @samp{\s} with no following @samp{-}
represents the space character.)
@tex
Numerically, the
bit values are @math{2^{22}} for alt, @math{2^{23}} for super and @math{2^{24}} for hyper.
Numerically, the bit values are @math{2^{22}} for alt, @math{2^{23}}
for super and @math{2^{24}} for hyper.
@end tex
@ifnottex
Numerically, the
......@@ -938,10 +939,13 @@ one character, @samp{a} with grave accent. @w{@samp{\ }} in a string
constant is just like backslash-newline; it does not contribute any
character to the string, but it does terminate the preceding hex escape.
Using a multibyte hex escape forces the string to multibyte. You can
represent a unibyte non-@acronym{ASCII} character with its character code,
which must be in the range from 128 (0200 octal) to 255 (0377 octal).
This forces a unibyte string.
You can represent a unibyte non-@acronym{ASCII} character with its
character code, which must be in the range from 128 (0200 octal) to
255 (0377 octal). If you write all such character codes in octal and
the string contains no other characters forcing it to be multibyte,
this produces a unibyte string. However, using any hex escape in a
string (even for an @acronym{ASCII} character) forces the string to be
multibyte.
@xref{Text Representations}, for more information about the two
text representations.
......@@ -963,9 +967,9 @@ distinguish case in @acronym{ASCII} control characters.
Properly speaking, strings cannot hold meta characters; but when a
string is to be used as a key sequence, there is a special convention
that provides a way to represent meta versions of @acronym{ASCII} characters in a
string. If you use the @samp{\M-} syntax to indicate a meta character
in a string constant, this sets the
that provides a way to represent meta versions of @acronym{ASCII}
characters in a string. If you use the @samp{\M-} syntax to indicate
a meta character in a string constant, this sets the
@tex
@math{2^{7}}
@end tex
......@@ -1082,16 +1086,25 @@ constant that follows actually specifies the contents of the bool-vector
as a bitmap---each ``character'' in the string contains 8 bits, which
specify the next 8 elements of the bool-vector (1 stands for @code{t},
and 0 for @code{nil}). The least significant bits of the character
correspond to the lowest indices in the bool-vector. If the length is not a
multiple of 8, the printed representation shows extra elements, but
these extras really make no difference.
correspond to the lowest indices in the bool-vector.
@example
(make-bool-vector 3 t)
@result{} #&3"\007"
@result{} #&3"^G"
(make-bool-vector 3 nil)
@result{} #&3"\0"
;; @r{These are equal since only the first 3 bits are used.}
@result{} #&3"^@@"
@end example
@noindent
These results make sense, because the binary code for @samp{C-g} is
111 and @samp{C-@@} is the character with code 0.
If the length is not a multiple of 8, the printed representation
shows extra elements, but these extras really make no difference. For
instance, in the next example, the two bool-vectors are equal, because
only the first 3 bits are used:
@example
(equal #&3"\377" #&3"\007")
@result{} t
@end example
......@@ -1875,9 +1888,12 @@ always true.
@end example
Comparison of strings is case-sensitive, but does not take account of
text properties---it compares only the characters in the strings.
A unibyte string never equals a multibyte string unless the
contents are entirely @acronym{ASCII} (@pxref{Text Representations}).
text properties---it compares only the characters in the strings. For
technical reasons, a unibyte string and a multibyte string are
@code{equal} if and only if they contain the same sequence of
character codes and all these codes are either in the range 0 through
127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}).
(@pxref{Text Representations}).
@example
@group
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment