Commit fddb915d authored by Eli Zaretskii's avatar Eli Zaretskii

Import Unicode 12.0 data files

* admin/unidata/copyright.html:
* admin/unidata/UnicodeData.txt:
* admin/unidata/SpecialCasing.txt:
* admin/unidata/NormalizationTest.txt:
* admin/unidata/Blocks.txt:
* admin/unidata/BidiMirroring.txt:
* admin/unidata/BidiBrackets.txt: New versions from Unicode 12.0.
* admin/unidata/unidata-gen.el (unidata-gen-file):
* admin/unidata/blocks.awk (name2alias): Adapt to changes in
new data files.
* admin/notes/unicode: Update and improve instructions for
importing a new Unicode Standard.

* lisp/international/characters.el (char-width-table): Update
lists of characters according to Unicode 12.0.
* lisp/international/fontset.el (script-representative-chars):
Add characters from new scripts to 'script-representative-chars'.
(otf-script-alist): Update according to data on the MS site.
* lisp/international/mule-cmds.el (ucs-names): Update unused
ranges of codepoints according to Unicode 12.0.

* test/lisp/international/ucs-normalize-tests.el
(ucs-normalize-tests--failing-lines-part1)
(ucs-normalize-tests--failing-lines-part2): Update for the new
NormalizationTest.txt file.
* test/manual/BidiCharacterTest.txt: Update with the new
version from Unicode 12.0.
parent 4e082ce3
Pipeline #955 passed with stage
in 57 minutes and 2 seconds
......@@ -11,15 +11,20 @@ Emacs uses the following files from the Unicode Character Database
. UnicodeData.txt
. Blocks.txt
. BidiMirroring.txt
. BidiBrackets.txt
. BidiCharacterTest.txt
. BidiMirroring.txt
. IVD_Sequences.txt
. NormalizationTest.txt
. SpecialCasing.txt
. BidiCharacterTest.txt
First, the first 7 files need to be copied into admin/unidata/, and
then Emacs should be rebuilt for them to take effect. Rebuilding
the file https://www.unicode.org/copyright.html should be copied over
copyright.html in admin/unidata (that file might need trailing
whitespace removed before it can be committed to the Emacs
repository).
Then Emacs should be rebuilt for them to take effect. Rebuilding
Emacs updates several derived files elsewhere in the Emacs source
tree, mainly in lisp/international/.
......@@ -28,7 +33,10 @@ files, pay attention to any warning or error messages. In particular,
admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines
new bidirectional attributes of characters, because unidata-gen.el,
bidi.c and dispextern.h need to be updated in that case; failure to do
so will cause aborts in redisplay.
so will cause aborts in redisplay. unidata-gen.el will also complain
if the format of the Unicode Copyright notice in copyright.html
changed in significant ways; in that case, update the regular
expression in unidata-gen-file used to extract the copyright string.
Next, review the changes in UnicodeData.txt vs the previous version
used by Emacs. Any changes, be it introduction of new scripts or
......@@ -40,7 +48,12 @@ and see if any changes in admin/unidata/blocks.awk are required.
The setting of char-width-table around line 1200 of characters.el
should be checked against the latest version of the Unicode file
EastAsianWidth.txt, and any discrepancies fixed.
EastAsianWidth.txt, and any discrepancies fixed: double-width
characters are those marked with W or F in that file. Zero-width
characters are not taken from EastAsianWidth.txt, they are those whose
Unicode General Category property is one of Mn, Me, or Cf, and also
Hangul jungseong and jongseong characters (a.k.a. "Jamo medial vowels"
and "Jamo final consonants").
Any new scripts added by UnicodeData.txt will also need updates to
script-representative-chars defined in fontset.el, and also the list
......
# BidiBrackets-11.0.0.txt
# Date: 2018-02-18, 05:50:00 GMT [AG, LI, KW]
# BidiBrackets-12.0.0.txt
# Date: 2018-11-02, 16:32:00 GMT [AG, LI, KW]
# © 2018 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
......
# BidiMirroring-11.0.0.txt
# Date: 2018-05-07, 18:02:00 GMT [KW, LI, RP]
# BidiMirroring-12.0.0.txt
# Date: 2018-11-02, 16:33:00 GMT [KW, LI, RP]
# © 2018 Unicode®, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
......@@ -15,7 +15,7 @@
# value, for which there is another Unicode character that typically has a glyph
# that is the mirror image of the original character's glyph.
#
# The repertoire covered by the file is Unicode 11.0.0.
# The repertoire covered by the file is Unicode 12.0.0.
#
# The file contains a list of lines with mappings from one code point
# to another one for character-based mirroring.
......
# Blocks-11.0.0.txt
# Date: 2017-10-16, 24:39:00 GMT [KW]
# © 2017 Unicode®, Inc.
# Blocks-12.0.0.txt
# Date: 2018-07-30, 19:40:00 GMT [KW]
# © 2018 Unicode®, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Unicode Character Database
......@@ -239,6 +239,7 @@ FFF0..FFFF; Specials
10E60..10E7F; Rumi Numeral Symbols
10F00..10F2F; Old Sogdian
10F30..10F6F; Sogdian
10FE0..10FFF; Elymaic
11000..1107F; Brahmi
11080..110CF; Kaithi
110D0..110FF; Sora Sompeng
......@@ -259,6 +260,7 @@ FFF0..FFFF; Specials
11700..1173F; Ahom
11800..1184F; Dogra
118A0..118FF; Warang Citi
119A0..119FF; Nandinagari
11A00..11A4F; Zanabazar Square
11A50..11AAF; Soyombo
11AC0..11AFF; Pau Cin Hau
......@@ -267,10 +269,12 @@ FFF0..FFFF; Specials
11D00..11D5F; Masaram Gondi
11D60..11DAF; Gunjala Gondi
11EE0..11EFF; Makasar
11FC0..11FFF; Tamil Supplement
12000..123FF; Cuneiform
12400..1247F; Cuneiform Numbers and Punctuation
12480..1254F; Early Dynastic Cuneiform
13000..1342F; Egyptian Hieroglyphs
13430..1343F; Egyptian Hieroglyph Format Controls
14400..1467F; Anatolian Hieroglyphs
16800..16A3F; Bamum Supplement
16A40..16A6F; Mro
......@@ -283,6 +287,7 @@ FFF0..FFFF; Specials
18800..18AFF; Tangut Components
1B000..1B0FF; Kana Supplement
1B100..1B12F; Kana Extended-A
1B130..1B16F; Small Kana Extension
1B170..1B2FF; Nushu
1BC00..1BC9F; Duployan
1BCA0..1BCAF; Shorthand Format Controls
......@@ -295,9 +300,12 @@ FFF0..FFFF; Specials
1D400..1D7FF; Mathematical Alphanumeric Symbols
1D800..1DAAF; Sutton SignWriting
1E000..1E02F; Glagolitic Supplement
1E100..1E14F; Nyiakeng Puachue Hmong
1E2C0..1E2FF; Wancho
1E800..1E8DF; Mende Kikakui
1E900..1E95F; Adlam
1EC70..1ECBF; Indic Siyaq Numbers
1ED00..1ED4F; Ottoman Siyaq Numbers
1EE00..1EEFF; Arabic Mathematical Alphabetic Symbols
1F000..1F02F; Mahjong Tiles
1F030..1F09F; Domino Tiles
......@@ -313,6 +321,7 @@ FFF0..FFFF; Specials
1F800..1F8FF; Supplemental Arrows-C
1F900..1F9FF; Supplemental Symbols and Pictographs
1FA00..1FA6F; Chess Symbols
1FA70..1FAFF; Symbols and Pictographs Extended-A
20000..2A6DF; CJK Unified Ideographs Extension B
2A700..2B73F; CJK Unified Ideographs Extension C
2B740..2B81F; CJK Unified Ideographs Extension D
......
This diff is collapsed.
# SpecialCasing-11.0.0.txt
# Date: 2018-02-22, 06:16:47 GMT
# © 2018 Unicode®, Inc.
# SpecialCasing-12.0.0.txt
# Date: 2019-01-22, 08:18:50 GMT
# © 2019 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
......
This diff is collapsed.
......@@ -115,14 +115,15 @@ function name2alias(name , w, w2) {
else if (name ~ /duployan|shorthand/) return "duployan-shorthand"
else if (name ~ /sutton signwriting/) return "sutton-sign-writing"
sub(/ (extended|extensions|supplement).*/, "", name)
sub(/^small /, "", name)
sub(/ (extended|extensions*|supplement).*/, "", name)
sub(/numbers/, "number", name)
sub(/numerals/, "numeral", name)
sub(/symbols/, "symbol", name)
sub(/forms$/, "form", name)
sub(/tiles$/, "tile", name)
sub(/^new /, "", name)
sub(/ (characters|hieroglyphs|cursive)$/, "", name)
sub(/ (characters|hieroglyphs|cursive|hieroglyph format controls)$/, "", name)
gsub(/ /, "-", name)
return name
......
This diff is collapsed.
......@@ -1413,7 +1413,7 @@ Property value is a symbol `o' (Open), `c' (Close), or `n' (None)."
(copyright (with-temp-buffer
(insert-file-contents
(expand-file-name "copyright.html" unidata-dir))
(re-search-forward "^Copyright .*Unicode, Inc.")
(re-search-forward "Copyright .*Unicode, Inc.")
(match-string 0))))
(or unidata-list (unidata-setup-list unidata-text-file))
(let* ((basename (file-name-nondirectory file))
......
......@@ -987,11 +987,12 @@ with L, LRE, or LRO Unicode bidi character type.")
(#x103D . #x103E)
(#x1058 . #x1059)
(#x105E . #x1160)
(#x1171 . #x1074)
(#x1071 . #x1074)
(#x1082 . #x1082)
(#x1085 . #x1086)
(#x108D . #x108D)
(#x109D . #x109D)
(#x1160 . #x11FF)
(#x135D . #x135F)
(#x1712 . #x1714)
(#x1732 . #x1734)
......@@ -1081,6 +1082,7 @@ with L, LRE, or LRO Unicode bidi character type.")
(#xABE5 . #xABE5)
(#xABE8 . #xABE8)
(#xABED . #xABED)
(#xD7B0 . #xD7FB)
(#xFB1E . #xFB1E)
(#xFE00 . #xFE0F)
(#xFE20 . #xFE2F)
......@@ -1217,10 +1219,11 @@ with L, LRE, or LRO Unicode bidi character type.")
(#xFE30 . #xFE6F)
(#xFF01 . #xFF60)
(#xFFE0 . #xFFE6)
(#x16FE0 . #x16FE1)
(#x17000 . #x187F1)
(#x16FE0 . #x16FE3)
(#x17000 . #x187F7)
(#x18800 . #x18AF2)
(#x1B000 . #x1B11E)
(#x1B000 . #x1B152)
(#x1B164 . #x1B167)
(#x1B170 . #x1B2FB)
(#x1F004 . #x1F004)
(#x1F0CF . #x1F0CF)
......@@ -1250,17 +1253,22 @@ with L, LRE, or LRO Unicode bidi character type.")
(#x1F680 . #x1F6C5)
(#x1F6CC . #x1F6CC)
(#x1F6D0 . #x1F6D2)
(#x1F6D5 . #x1F6D5)
(#x1F6EB . #x1F6EC)
(#x1F6F4 . #x1F6F9)
(#x1F910 . #x1F93E)
(#x1F940 . #x1F970)
(#x1F6F4 . #x1F6FA)
(#x1F7E0 . #x1F7EB)
(#x1F90D . #x1F971)
(#x1F973 . #x1F976)
(#x1F97A . #x1F97A)
(#x1F97C . #x1F9A2)
(#x1F9B0 . #x1F9B9)
(#x1F9C0 . #x1F9C2)
(#x1F9D0 . #x1F9FF)
(#x1F97A . #x1F9A2)
(#x1F9A5 . #x1F9AA)
(#x1F9AE . #x1F9CA)
(#x1F9CD . #x1F9FF)
(#x1FA00 . #x1FA53)
(#x1FA60 . #x1FA6D)
(#x1FA70 . #x1FA73)
(#x1FA78 . #x1FA7A)
(#x1FA80 . #x1FA82)
(#x1FA90 . #x1FA95)
(#x20000 . #x2FFFF)
(#x30000 . #x3FFFF))))
(dolist (elt l)
......
......@@ -222,6 +222,7 @@
(hanifi-rohingya #x10D00)
(old-sogdian #x10F00)
(sogdian #x10F30)
(elymaic #x10fe0)
(mahajani #x11150)
(sinhala-archaic-number #x111E1)
(khojki #x11200)
......@@ -234,6 +235,7 @@
(takri #x11680)
(dogra #x11800)
(warang-citi #x118A1)
(nandinagari #x119a0)
(zanabazar-square #x11A00)
(soyombo #x11A50)
(pau-cin-hau #x11AC0)
......@@ -257,15 +259,19 @@
(ancient-greek-musical-notation #x1D200)
(tai-xuan-jing-symbol #x1D300)
(counting-rod-numeral #x1D360)
(nyiakeng-puachue-hmong #x1e100)
(wancho #x1e2c0)
(mende-kikakui #x1E810)
(adlam #x1E900)
(indic-siyaq-number #x1ec71)
(ottoman-siyaq-number #x1ed01)
(mahjong-tile #x1F000)
(domino-tile #x1F030)))
(defvar otf-script-alist)
;; The below was synchronized with the latest Jul 23, 2017 version of
;; https://www.microsoft.com/typography/otspec/scripttags.htm.
;; The below was synchronized with the latest Aug 16, 2018 version of
;; https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags
(setq otf-script-alist
'((adlm . adlam)
(ahom . ahom)
......@@ -300,6 +306,7 @@
(dsrt . deseret)
(deva . devanagari)
(dev2 . devanagari)
(dogr . dogra)
(dupl . duployan-shorthand)
(egyp . egyptian)
(elba . elbasan)
......@@ -311,11 +318,13 @@
(grek . greek)
(gujr . gujarati)
(gjr2 . gujarati)
(gong . gunjala-gondi)
(guru . gurmukhi)
(gur2 . gurmukhi)
(hani . han)
(hang . hangul)
(jamo . hangul)
(rohg . hanifi-rohingya)
(hano . hanunoo)
(hatr . hatran)
(hebr . hebrew)
......@@ -324,9 +333,9 @@
(prti . inscriptional-parthian)
(java . javanese)
(kthi . kaithi)
(kana . kana) ; Hiragana
(knda . kannada)
(knd2 . kannada)
(kana . kana) ; Hiragana
(kali . kayah-li)
(khar . kharoshthi)
(khmr . khmer)
......@@ -342,12 +351,15 @@
(lyci . lycian)
(lydi . lydian)
(mahj . mahajani)
(maka . makasar)
(marc . marchen)
(mlym . malayalam)
(mlm2 . malayalam)
(mand . mandaic)
(mani . manichaean)
(gonm . masaram-gondi)
(math . mathematical)
(medf . medefaidrin)
(mtei . meetei-mayek)
(mend . mende-kikakui)
(merc . meroitic)
......@@ -363,12 +375,14 @@
(nbat . nabataean)
(newa . newa)
(nko\ . nko)
(nshu . nushu)
(ogam . ogham)
(olck . ol-chiki)
(ital . old_italic)
(xpeo . old_persian)
(narb . old-north-arabian)
(perm . old-permic)
(sogo . old-sogdian)
(sarb . old-south-arabian)
(orkh . old-turkic)
(orya . oriya)
......@@ -392,7 +406,9 @@
(sidd . siddham)
(sgnw . sutton-sign-writing)
(sinh . sinhala)
(sogd . sogdian)
(sora . sora-sompeng)
(soyo . soyombo)
(sund . sundanese)
(sylo . syloti_nagri)
(syrc . syriac)
......@@ -416,7 +432,8 @@
(ugar . ugaritic)
(vai\ . vai)
(wara . warang-citi)
(yi\ \ . yi)))
(yi\ \ . yi)
(zanb . zanabazar-square)))
;; Set standard fontname specification of characters in the default
;; fontset to find an appropriate font for each script/charset. The
......
......@@ -2929,12 +2929,13 @@ on encoding."
(#x14400 . #x14646)
;; (#x14647 . #x167FF) unused
(#x16800 . #x16F9F)
(#x16FE0 . #x16FE0)
(#x16FE0 . #x16FE3)
;; (#x17000 . #x187FF) Tangut Ideographs
;; (#x18800 . #x18AFF) Tangut Components
;; (#x18B00 . #x1AFFF) unused
(#x1B000 . #x1B12F)
;; (#x1B130 . #x1B16F) unused
(#x1B000 . #x1B11F)
;; (#x1B120 . #x1B14F) unused
(#x1B150 . #x1B16F)
(#x1B170 . #x1B2FF)
;; (#x1B300 . #x1BBFF) unused
(#x1BC00 . #x1BCAF)
......
......@@ -182,25 +182,24 @@ implementations:
(defconst ucs-normalize-tests--failing-lines-part1
(list 15131 15132 15133 15134 15135 15136 15137 15138
15139
16149 16150 16151 16152 16153 16154 16155 16156
16157 16158 16159 16160 16161 16162 16163 16164
16165 16166 16167 16168 16169 16170 16171 16172
16173 16174 16175 16176 16177 16178 16179 16180
16181 16182 16183 16184 16185 16186 16187 16188
16189 16190 16191 16192 16193 16194 16195 16196
16197 16198 16199 16200 16201 16202 16203 16204
16205 16206 16207 16208 16209 16210 16211 16212
16213 16214 16215 16216 16217 16218 16219 16220
16221 16222 16223 16224 16225 16226 16227 16228
16229 16230 16231 16232 16233 16234 16235 16236
16237 16238 16239 16240 16241 16242 16243 16244
16245 16246 16247 16248 16249 16250 16251 16252
16253 16254 16255 16256 16257 16258 16259 16260
16261 16262 16263 16264 16265 16266 16267 16268
16269 16270 16271 16272 16273 16274 16275 16276
16277 16278 16279 16280 16281 16282 16283 16284
16285 16286 16287 16288 16289))
15139 16149 16150 16151 16152 16153 16154 16155
16156 16157 16158 16159 16160 16161 16162 16163
16164 16165 16166 16167 16168 16169 16170 16171
16172 16173 16174 16175 16176 16177 16178 16179
16180 16181 16182 16183 16184 16185 16186 16187
16188 16189 16190 16191 16192 16193 16194 16195
16196 16197 16198 16199 16200 16201 16202 16203
16204 16205 16206 16207 16208 16209 16210 16211
16212 16213 16214 16215 16216 16217 16218 16219
16220 16221 16222 16223 16224 16225 16226 16227
16228 16229 16230 16231 16232 16233 16234 16235
16236 16237 16238 16239 16240 16241 16242 16243
16244 16245 16246 16247 16248 16249 16250 16251
16252 16253 16254 16255 16256 16257 16258 16259
16260 16261 16262 16263 16264 16265 16266 16267
16268 16269 16270 16271 16272 16273 16274 16275
16276 16277 16278 16279 16280 16281 16282 16283
16284 16285 16286 16287 16288 16289 16366))
;; Keep a record of failures, for consulting afterwards (the ert
;; backtrace only shows a truncated version of these lists).
......@@ -258,23 +257,22 @@ implementations:
ucs-normalize-tests--failing-lines-part1)))
(defconst ucs-normalize-tests--failing-lines-part2
(list 17482 17532 17636 18338 18340 18342 18344 18346
18348 18350 18352 18354 18356 18358 18360 18362
18364 18366 18376 18378 18380 18382 18384 18386
18388 18390 18392 18394 18396 18398 18400 18402
18404 18406 18408 18410 18412 18414 18416 18418
18420 18422 18424 18426 18428 18430 18432 18434
18436 18438 18440 18442 18444 18446 18448 18450
18452 18454 18456 18458 18460 18462 18464 18466
18468 18470 18472 18474 18476 18478 18480 18482
18484 18486 18488 18490 18492 18494 18496 18564
18566 18568 18570 18572 18574 18576 18578 18580
18582 18584 18586 18588 18590 18592 18594 18596
18598 18600 18602 18604 18606 18608 18610 18612
18614 18616 18618 18620 18622 18624 18626 18628
18630 18632 18634 18636 18638 18640 18642 18644
18646 18648 18650 18652 18654 18656 18658 18660
18662 18664 18666))
(list 17689 18379 18381 18383 18385 18387 18389 18391
18393 18395 18397 18399 18401 18403 18405 18407
18409 18411 18413 18415 18417 18419 18421 18423
18425 18427 18429 18431 18433 18435 18437 18439
18441 18443 18445 18447 18449 18451 18453 18455
18457 18459 18461 18463 18465 18467 18469 18471
18473 18475 18477 18479 18481 18483 18485 18487
18489 18491 18493 18495 18497 18499 18501 18569
18571 18573 18575 18577 18579 18581 18583 18585
18587 18589 18591 18593 18595 18597 18599 18601
18603 18605 18607 18609 18611 18613 18615 18617
18619 18621 18623 18625 18627 18629 18631 18633
18635 18637 18639 18641 18643 18645 18647 18649
18651 18653 18655 18657 18659 18661 18663 18665
18667 18669 18671 18673 18675 18677 18679 18681
18683 18685 18687 18689 18691 18693))
(ert-deftest ucs-normalize-part2 ()
:tags '(:expensive-test)
......
# BidiCharacterTest-11.0.0.txt
# Date: 2018-02-18, 05:50:00 GMT [LI]
# BidiCharacterTest-12.0.0.txt
# Date: 2018-11-02, 16:34:00 GMT [LI]
# © 2018 Unicode®, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment