json: Allow encoding multibyte UTF-8 sequences

The current implementation rejects all input with bytes > 0x7E, which includes
all multibyte UTF-8 sequences.  According to ECMA-404, Section 9, only double
quotation marks, backslashes, and characters 0x00 - 0x1F must be escaped
in JSON strings, so non-ascii bytes can just be passed without escaping.
This also mirrors what the decoder does above.

Of course this allows invalid UTF-8 characters to be encoded.  Checks for this
could be added as well, but at least the decoder does not seem to do that.
And from what I can tell from a quick glance, the text output path does not
check that either.

Fixes: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/issues/1310
This commit is contained in:
Sophie Hirn 2025-10-19 11:54:12 +02:00
parent eee0e8f22f
commit 8da50542d1
No known key found for this signature in database
GPG key ID: 03EAFD113FC0873C

View file

@ -763,8 +763,8 @@ static char *pa_json_escape(const char *p) {
*output++ = 't';
break;
default:
if (*s < 0x20 || *s > 0x7E) {
pa_log("Invalid non-ASCII character: 0x%x", (unsigned int) *s);
if (*s < 0x20 || *s == 0x7F) {
pa_log("Invalid ASCII character: 0x%x", (unsigned int) *s);
pa_xfree(out_string);
return NULL;
}