Commit f8581bcf authored by Philipp Stephani's avatar Philipp Stephani

Reject invalid characters in XML strings (Bug#41094).

* lisp/xml.el (xml-escape-string): Search for invalid characters.
(xml-invalid-character): New error symbol.

* test/lisp/xml-tests.el (xml-print-invalid-cdata): New unit test.

* etc/NEWS: Document new behavior.
parent 232bb691
Pipeline #5652 failed with stage
in 90 minutes and 3 seconds
......@@ -393,6 +393,13 @@ component are now rejected by 'json-read' and friends. This makes
them more compliant with the JSON specification and consistent with
the native JSON parsing functions.
** xml.el
*** XML serialization functions now reject invalid characters.
Previously 'xml-print' would produce invalid XML when given a string
with characters that are not valid in XML (see
https://www.w3.org/TR/xml/#charsets). Now it rejects such strings.
* New Modes and Packages in Emacs 28.1
......
......@@ -1023,9 +1023,17 @@ entity references (e.g., replace each & with &).
XML character data must not contain & or < characters, nor the >
character under some circumstances. The XML spec does not impose
restriction on \" or \\=', but we just substitute for these too
\(as is permitted by the spec)."
\(as is permitted by the spec).
If STRING contains characters that are invalid in XML (as defined
by https://www.w3.org/TR/xml/#charsets), signal an error of type
`xml-invalid-character'."
(with-temp-buffer
(insert string)
(goto-char (point-min))
(when (re-search-forward
"[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\U00010000-\U0010FFFF]")
(signal 'xml-invalid-character (list (char-before) (match-beginning 0))))
(dolist (substitution '(("&" . "&amp;")
("<" . "&lt;")
(">" . "&gt;")
......@@ -1036,6 +1044,9 @@ restriction on \" or \\=', but we just substitute for these too
(replace-match (cdr substitution) t t nil)))
(buffer-string)))
(define-error 'xml-invalid-character "Invalid XML character"
'wrong-type-argument)
(defun xml-debug-print-internal (xml indent-string)
"Outputs the XML tree in the current buffer.
The first line is indented with INDENT-STRING."
......
......@@ -164,6 +164,16 @@ Parser is called with and without 'symbol-qnames argument.")
(should (equal (cdr xml-parse-test--namespace-attribute-qnames)
(xml-parse-region nil nil nil nil 'symbol-qnames)))))
(ert-deftest xml-print-invalid-cdata ()
"Check that Bug#41094 is fixed."
(with-temp-buffer
(should (equal (should-error (xml-print '((foo () "\0")))
:type 'xml-invalid-character)
'(xml-invalid-character 0 1)))
(should (equal (should-error (xml-print '((foo () "\u00FF \xFF")))
:type 'xml-invalid-character)
'(xml-invalid-character #x3FFFFF 3)))))
;; Local Variables:
;; no-byte-compile: t
;; End:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment