url-mode: fix double-width characters not being handled correctly

When a regex matches a string containing double-width characters, the
CELL_SPACER values were included in the URL string. This meant the
final URL (either launched, or copied) weren't handled correctly, as
invalid UTF-8 sequences were inserted in the middle of the string.

Closes #2027
This commit is contained in:
Daniel Eklöf 2025-04-13 08:26:20 +02:00
parent bc2e0a29bb
commit b93d2f042c
No known key found for this signature in database
GPG key ID: 5BBD4992C116573F
2 changed files with 9 additions and 2 deletions

View file

@ -106,9 +106,12 @@
* Build failure (`srgb.h` not found) when doing a parallel build.
* Regression: reflowing (changing the window size) removing empty
lines ([#2011][2011]).
* `url/regex-copy` not handling double-width characters correctly
([#2027][2027]).
[2000]: https://codeberg.org/dnkl/foot/issues/2000
[2011]: https://codeberg.org/dnkl/foot/issues/2011
[2027]: https://codeberg.org/dnkl/foot/issues/2027
### Security

View file

@ -347,6 +347,9 @@ regex_detected(const struct terminal *term, enum url_action action,
wc_count = composed->count;
}
else if (wc[0] >= CELL_SPACER)
continue;
/* Convert wide character to utf8 */
for (size_t i = 0; i < wc_count; i++) {
char buf[16];
@ -355,6 +358,7 @@ regex_detected(const struct terminal *term, enum url_action action,
if (char_len == (size_t)-1)
continue;
for (size_t j = 0; j < char_len; j++) {
const size_t requires_size = vline->len + char_len;
@ -411,9 +415,9 @@ regex_detected(const struct terminal *term, enum url_action action,
const size_t end = start + mlen;
LOG_DBG(
"regex match at row %d: %.*srow/col = %dx%d",
"regex match at row %d: %.*s (%zu bytes), row/col = %dx%d",
matches[1].rm_so, (int)mlen, &search_string[matches[1].rm_so],
v->map[start].row, v->map[start].col);
mlen, v->map[start].row, v->map[start].col);
tll_push_back(
*urls,