Commit 3ed1621d authored by Mattias Engdegård's avatar Mattias Engdegård

Disallow reversed char ranges in `rx'

(any "a-Z0-9") generated "[0-9]", and (any (?9 . ?0)) generated "[9-0]".
Reversed ranges are either mistakes or abuse.  Neither should be allowed.

etc/NEWS: Explain the change.
lisp/emacs-lisp/rx.el (rx): Document.
(rx-check-any-string, rx-check-any): Add error checks for reversed ranges.
test/lisp/emacs-lisp/rx-tests.el (rx-char-any-range-bad): New test.
......@@ -1336,6 +1336,13 @@ they are now allocated like any other pseudovector. As a result, the
'misc' component, and the 'misc-objects-consed' variable has been
** Reversed character ranges are no longer permitted in rx.
Previously, ranges where the starting character is greater than the
ending character were silently omitted.
For example, '(rx (any "@z-a" (?9 . ?0)))' would match '@' only.
Now, such rx expressions generate an error.
* Lisp Changes in Emacs 27.1
......@@ -482,7 +482,10 @@ The original order is not preserved. Ranges, \"A-Z\", become pairs, (?A . ?Z)."
(let ((start (funcall decode-char (aref str i)))
(end (funcall decode-char (aref str (+ i 2)))))
(cond ((< start end) (push (cons start end) ret))
((= start end) (push start ret)))
((= start end) (push start ret))
(error "Rx character range `%c-%c' is reversed"
start end)))
(setq i (+ i 3))))
;; Single character.
......@@ -503,7 +506,10 @@ The original order is not preserved. Ranges, \"A-Z\", become pairs, (?A . ?Z)."
(null (string-match "\\`\\[\\[:[-a-z]+:\\]\\]\\'" translation)))
(error "Invalid char class `%s' in Rx `any'" arg))
(list (substring translation 1 -1)))) ; strip outer brackets
((and (integerp (car-safe arg)) (integerp (cdr-safe arg)))
((and (characterp (car-safe arg)) (characterp (cdr-safe arg)))
(unless (<= (car arg) (cdr arg))
(error "Rx character range `%c-%c' is reversed"
(car arg) (cdr arg)))
(list arg))
((stringp arg) (rx-check-any-string arg))
......@@ -916,6 +922,7 @@ CHAR
matches any character in SET .... SET may be a character or string.
Ranges of characters can be specified as `A-Z' in strings.
Ranges may also be specified as conses like `(?A . ?Z)'.
Reversed ranges like `Z-A' and `(?Z . ?A)' are not permitted.
SET may also be the name of a character class: `digit',
`control', `hex-digit', `blank', `graph', `print', `alnum',
......@@ -40,6 +40,10 @@
(should (equal (rx (any "\a-\n"))
(ert-deftest rx-char-any-range-bad ()
(should-error (rx (any "0-9a-Z")))
(should-error (rx (any (?0 . ?9) (?a . ?Z)))))
(ert-deftest rx-char-any-raw-byte ()
"Test raw bytes in character alternatives."
;; Separate raw characters.
