Commit 36bbdfc0 authored by Eli Zaretskii's avatar Eli Zaretskii

Update Unicode data files to version 11.0.0 of Unicode

* admin/unidata/UnicodeData.txt:
* admin/unidata/SpecialCasing.txt:
* admin/unidata/NormalizationTest.txt:
* admin/unidata/copyright.html:
* admin/unidata/BidiMirroring.txt:
* admin/unidata/BidiBrackets.txt: Import from Unicode 11.0.
* admin/notes/unicode: Update the URL for OTF script tags.

* lisp/international/mule-cmds.el (ucs-names): Update unused ranges.
* lisp/international/fontset.el (script-representative-chars): Add
hanifi-rohingya, old-sogdian, sogdian, dogra, gunjala-gondi,
makasar, and medefaidrin.
(otf-script-alist): Add old-hungarian.
* lisp/international/characters.el (tbl): Add syntax entries for
Supplemental Mathematical Operators, Miscellaneous Symbols and
Arrows, and Supplemental Punctuation.
Update the list of wide characters.

* test/lisp/international/ucs-normalize-tests.el
(ucs-normalize-tests--failing-lines-part2): Update to match
admin/unidata/NormalizationTest.txt.

* doc/lispref/nonascii.texi (Character Properties): Update the
reference to the Unicode Standard.
* doc/misc/efaq.texi (New in Emacs 26):
* etc/NEWS: Mention compatibility with Unicode 11.0.
parent b7b7a5f4
......@@ -46,7 +46,7 @@ Any new scripts added by UnicodeData.txt will also need updates to
script-representative-chars defined in fontset.el, and also the list
of OTF script tags in otf-script-alist, whose source is on this page:
https://www.microsoft.com/typography/otspec/scripttags.htm
https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags
Other databases in fontset.el might also need to be updated as needed.
......
# BidiBrackets-10.0.0.txt
# Date: 2017-04-12, 17:30:00 GMT [AG, LI, KW]
# © 2017 Unicode®, Inc.
# BidiBrackets-11.0.0.txt
# Date: 2018-02-18, 05:50:00 GMT [AG, LI, KW]
# © 2018 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
......
This diff is collapsed.
# Blocks-10.0.0.txt
# Date: 2017-04-12, 17:30:00 GMT [KW]
# Blocks-11.0.0.txt
# Date: 2017-10-16, 24:39:00 GMT [KW]
# © 2017 Unicode®, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
......@@ -95,6 +95,7 @@
1C00..1C4F; Lepcha
1C50..1C7F; Ol Chiki
1C80..1C8F; Cyrillic Extended-C
1C90..1CBF; Georgian Extended
1CC0..1CCF; Sundanese Supplement
1CD0..1CFF; Vedic Extensions
1D00..1D7F; Phonetic Extensions
......@@ -234,7 +235,10 @@ FFF0..FFFF; Specials
10B80..10BAF; Psalter Pahlavi
10C00..10C4F; Old Turkic
10C80..10CFF; Old Hungarian
10D00..10D3F; Hanifi Rohingya
10E60..10E7F; Rumi Numeral Symbols
10F00..10F2F; Old Sogdian
10F30..10F6F; Sogdian
11000..1107F; Brahmi
11080..110CF; Kaithi
110D0..110FF; Sora Sompeng
......@@ -253,6 +257,7 @@ FFF0..FFFF; Specials
11660..1167F; Mongolian Supplement
11680..116CF; Takri
11700..1173F; Ahom
11800..1184F; Dogra
118A0..118FF; Warang Citi
11A00..11A4F; Zanabazar Square
11A50..11AAF; Soyombo
......@@ -260,6 +265,8 @@ FFF0..FFFF; Specials
11C00..11C6F; Bhaiksuki
11C70..11CBF; Marchen
11D00..11D5F; Masaram Gondi
11D60..11DAF; Gunjala Gondi
11EE0..11EFF; Makasar
12000..123FF; Cuneiform
12400..1247F; Cuneiform Numbers and Punctuation
12480..1254F; Early Dynastic Cuneiform
......@@ -269,6 +276,7 @@ FFF0..FFFF; Specials
16A40..16A6F; Mro
16AD0..16AFF; Bassa Vah
16B00..16B8F; Pahawh Hmong
16E40..16E9F; Medefaidrin
16F00..16F9F; Miao
16FE0..16FFF; Ideographic Symbols and Punctuation
17000..187FF; Tangut
......@@ -281,6 +289,7 @@ FFF0..FFFF; Specials
1D000..1D0FF; Byzantine Musical Symbols
1D100..1D1FF; Musical Symbols
1D200..1D24F; Ancient Greek Musical Notation
1D2E0..1D2FF; Mayan Numerals
1D300..1D35F; Tai Xuan Jing Symbols
1D360..1D37F; Counting Rod Numerals
1D400..1D7FF; Mathematical Alphanumeric Symbols
......@@ -288,6 +297,7 @@ FFF0..FFFF; Specials
1E000..1E02F; Glagolitic Supplement
1E800..1E8DF; Mende Kikakui
1E900..1E95F; Adlam
1EC70..1ECBF; Indic Siyaq Numbers
1EE00..1EEFF; Arabic Mathematical Alphabetic Symbols
1F000..1F02F; Mahjong Tiles
1F030..1F09F; Domino Tiles
......@@ -302,6 +312,7 @@ FFF0..FFFF; Specials
1F780..1F7FF; Geometric Shapes Extended
1F800..1F8FF; Supplemental Arrows-C
1F900..1F9FF; Supplemental Symbols and Pictographs
1FA00..1FA6F; Chess Symbols
20000..2A6DF; CJK Unified Ideographs Extension B
2A700..2B73F; CJK Unified Ideographs Extension C
2B740..2B81F; CJK Unified Ideographs Extension D
......
This diff is collapsed.
# SpecialCasing-10.0.0.txt
# Date: 2017-04-14, 05:40:43 GMT
# © 2017 Unicode®, Inc.
# SpecialCasing-11.0.0.txt
# Date: 2018-02-22, 06:16:47 GMT
# © 2018 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
......@@ -121,7 +121,7 @@ FB17; FB17; 0544 056D; 0544 053D; # ARMENIAN SMALL LIGATURE MEN XEH
# The following cases are already in the UnicodeData.txt file, so are only commented here.
# 0345; 0345; 0345; 0399; # COMBINING GREEK YPOGEGRAMMENI
# 0345; 0345; 0399; 0399; # COMBINING GREEK YPOGEGRAMMENI
# All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota adscript)
# have special uppercases.
......
This diff is collapsed.
This diff is collapsed.
......@@ -460,7 +460,7 @@ of character properties. In particular, Emacs supports the
@uref{http://www.unicode.org/reports/tr23/, Unicode Character Property
Model}, and the Emacs character property database is derived from the
Unicode Character Database (@acronym{UCD}). See the
@uref{http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf, Character
@uref{http://www.unicode.org/versions/latest/ch04.pdf, Character
Properties chapter of the Unicode Standard}, for a detailed
description of Unicode character properties and their meaning. This
section assumes you are already familiar with that chapter of the
......
......@@ -1068,6 +1068,11 @@ which opens a vulnerability for Emacs users receiving Enriched Text
from external sources. Execution of arbitrary Lisp forms in
@code{display} properties decoded by Enriched Text mode is now
disabled by default.
@cindex Unicode 11.0.0
@item
Emacs 26.2 comes with data files imported from the latest Unicode
Standard version 11.0.0.
@end itemize
Consult the Emacs @file{NEWS} file (@kbd{C-h n}) for the full list of
......
......@@ -31,6 +31,9 @@ in its NEWS.)
* Changes in Emacs 26.2
---
** Emacs is now compliant with the latest version 11.0 of the Unicode Standard.
---
** New variable 'xft-ignore-color-fonts'.
Default t means don't try to load color fonts when using Xft, as they
......
......@@ -643,12 +643,24 @@ with L, LRE, or LRO Unicode bidi character type.")
(setq c (1+ c)))
;; Circled Latin
(setq c #x24b6)
(while (<= c #x24cf)
(setq c #x24B6)
(while (<= c #x24CF)
(modify-category-entry c ?l)
(modify-category-entry (+ c 26) ?l)
(setq c (1+ c)))
;; Supplemental Mathematical Operators
(setq c #x2A00)
(while (<= c #x2AFF)
(set-case-syntax c "." tbl)
(setq c (1+ c)))
;; Miscellaneous Symbols and Arrows
(setq c #x2B00)
(while (<= c #x2BFF)
(set-case-syntax c "." tbl)
(setq c (1+ c)))
;; Coptic
;; There's no Coptic category. However, Coptic letters that are
;; part of the Greek block above get the Greek category, and those
......@@ -656,6 +668,12 @@ with L, LRE, or LRO Unicode bidi character type.")
;; consistent about their category.
(modify-category-entry '(#x2C80 . #x2CFF) ?g)
;; Supplemental Punctuation
(setq c #x2E00)
(while (<= c #x2E7F)
(set-case-syntax c "." tbl)
(setq c (1+ c)))
;; Fullwidth Latin
(setq c #xff21)
(while (<= c #xff3a)
......@@ -1200,7 +1218,7 @@ with L, LRE, or LRO Unicode bidi character type.")
(#xFF01 . #xFF60)
(#xFFE0 . #xFFE6)
(#x16FE0 . #x16FE1)
(#x17000 . #x187EC)
(#x17000 . #x187F1)
(#x18800 . #x18AF2)
(#x1B000 . #x1B11E)
(#x1B170 . #x1B2FB)
......@@ -1233,13 +1251,16 @@ with L, LRE, or LRO Unicode bidi character type.")
(#x1F6CC . #x1F6CC)
(#x1F6D0 . #x1F6D2)
(#x1F6EB . #x1F6EC)
(#x1F6F4 . #x1F6F8)
(#x1F6F4 . #x1F6F9)
(#x1F910 . #x1F93E)
(#x1F940 . #x1F94C)
(#x1F950 . #x1F96B)
(#x1F980 . #x1F997)
(#x1F9C0 . #x1F9C0)
(#x1F9D0 . #x1F9E6)
(#x1F940 . #x1F970)
(#x1F973 . #x1F976)
(#x1F97A . #x1F97A)
(#x1F97C . #x1F9A2)
(#x1F9B0 . #x1F9B9)
(#x1F9C0 . #x1F9C2)
(#x1F9D0 . #x1F9FF)
(#x1FA60 . #x1FA6D)
(#x20000 . #x2FFFF)
(#x30000 . #x3FFFF))))
(dolist (elt l)
......
......@@ -219,6 +219,9 @@
(lydian #x10920)
(kharoshthi #x10A00)
(manichaean #x10AC0)
(hanifi-rohingya #x10D00)
(old-sogdian #x10F00)
(sogdian #x10F30)
(mahajani #x11150)
(sinhala-archaic-number #x111E1)
(khojki #x11200)
......@@ -229,6 +232,7 @@
(siddham #x11580)
(modi #x11600)
(takri #x11680)
(dogra #x11800)
(warang-citi #x118A1)
(zanabazar-square #x11A00)
(soyombo #x11A50)
......@@ -236,11 +240,14 @@
(bhaiksuki #x11C00)
(marchen #x11C72)
(masaram-gondi #x11D00)
(gunjala-gondi #x11D60)
(makasar #x11EE0)
(cuneiform #x12000)
(cuneiform-numbers-and-punctuation #x12400)
(mro #x16A40)
(bassa-vah #x16AD0)
(pahawh-hmong #x16B11)
(medefaidrin #x16E40)
(tangut #x17000)
(tangut-components #x18800)
(nushu #x1B170)
......@@ -257,7 +264,7 @@
(defvar otf-script-alist)
;; The below was synchronized with the latest Feb 25, 2016 version of
;; The below was synchronized with the latest Jul 23, 2017 version of
;; https://www.microsoft.com/typography/otspec/scripttags.htm.
(setq otf-script-alist
'((adlm . adlam)
......@@ -312,6 +319,7 @@
(hano . hanunoo)
(hatr . hatran)
(hebr . hebrew)
(hung . old-hungarian)
(phli . inscriptional-pahlavi)
(prti . inscriptional-parthian)
(java . javanese)
......
......@@ -2934,7 +2934,7 @@ on encoding."
(#x4DC0 . #x4DFF)
;; (#x4E00 . #x9FFF) CJK Unified Ideographs
(#xA000 . #xD7FF)
;; (#xD800 . #xFAFF) Surrogate/Private
;; (#xD800 . #xF8FF) Surrogate/Private
(#xFB00 . #x134FF)
;; (#x13500 . #x143FF) unused
(#x14400 . #x14646)
......
......@@ -258,21 +258,23 @@ implementations:
ucs-normalize-tests--failing-lines-part1)))
(defconst ucs-normalize-tests--failing-lines-part2
(list 17656 17658 18006 18007 18008 18009 18010 18011
18012 18340 18342 18344 18346 18348 18350 18352
18354 18356 18358 18360 18362 18364 18366 18368
18370 18372 18374 18376 18378 18380 18382 18384
18386 18388 18390 18392 18394 18396 18398 18400
18402 18404 18406 18408 18410 18412 18414 18416
18418 18420 18422 18424 18426 18428 18430 18432
18434 18436 18438 18440 18442 18444 18446 18448
18450 18518 18520 18522 18524 18526 18528 18530
18532 18534 18536 18538 18540 18542 18544 18546
18548 18550 18552 18554 18556 18558 18560 18562
18564 18566 18568 18570 18572 18574 18576 18578
18580 18582 18584 18586 18588 18590 18592 18594
18596 18598 18600 18602 18604 18606 18608 18610
18612 18614 18616 18618 18620))
(list 17482 17532 17636 18338 18340 18342 18344 18346
18348 18350 18352 18354 18356 18358 18360 18362
18364 18366 18376 18378 18380 18382 18384 18386
18388 18390 18392 18394 18396 18398 18400 18402
18404 18406 18408 18410 18412 18414 18416 18418
18420 18422 18424 18426 18428 18430 18432 18434
18436 18438 18440 18442 18444 18446 18448 18450
18452 18454 18456 18458 18460 18462 18464 18466
18468 18470 18472 18474 18476 18478 18480 18482
18484 18486 18488 18490 18492 18494 18496 18564
18566 18568 18570 18572 18574 18576 18578 18580
18582 18584 18586 18588 18590 18592 18594 18596
18598 18600 18602 18604 18606 18608 18610 18612
18614 18616 18618 18620 18622 18624 18626 18628
18630 18632 18634 18636 18638 18640 18642 18644
18646 18648 18650 18652 18654 18656 18658 18660
18662 18664 18666))
(ert-deftest ucs-normalize-part2 ()
:tags '(:expensive-test)
......
# BidiCharacterTest-10.0.0.txt
# Date: 2017-03-09, 00:30:00 GMT [LI]
# © 2017 Unicode®, Inc.
# BidiCharacterTest-11.0.0.txt
# Date: 2018-02-18, 05:50:00 GMT [LI]
# © 2018 Unicode®, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment