When compiled with grapheme clustering support, zero-width characters
that also are grapheme breaks, were ignored (not stored in the
grid).
When utf8proc says the character is a grapheme break, we try to print
the character to the current cell. But this is only done when width >
0. As a result, zero width grapheme breaks were simply discarded.
This only happens when grapheme clustering is enabled; when disabled,
all zero width characters are appended.
Fix this by also requiring the width to be non-zero when if we should
append the character or not.
Closes#1960
When the cursor is at the end of the line, with a pending wrap (LCF
set), the lcf flag should be cleared *and* the cursor moved one cell
to the right.
Before this patch, we cleared LCF, but didn't move the cursor.
Closes#1954
Wrap *all* cell copying logic in a for-loop for the characters
width. This _should_, in theory, mean reflow of e.g. cursor
coordinates in the middle of a multi-column character works correctly.
Also fix reflow of cmd start/end integration.
We've been trying to performance optimize reflow by "chunking" cells;
try to gather as many as possible, and memcpy a chunk at once.
The problem is that a) this quickly becomes very complex, and b) is
very hard to get right for multi-column characters, especially when we
need to truncate long ones due to the window being too small.
Refactor, and once again walk and copy all cells one by one. This is
slower, but at least it's correct.
Commit 859b4c89 (url-mode: wip: more work on regex matching,
2025-01-30) regressed URL detection in weechat. Some of the URLs
still work but others don't. This is because regexec() stops at the
first NUL, thus skipping the rest of the line. weechat seems create
NUL cells between their UI widgets.
Work around this by replacing NUL with space. This is probably correct
because selecting and copying those cells also translates to space
(not sure where in the code).
Zero-initialize the 'launch' spawn template before calling
value_to_spawn_template(). This is needed since
value_to_spawn_template() tries to free the old value before assigning
the new one.
Closes#1951
When checking if we're breaking in the middle of a multi-column
character, we counted spacers starting from the break point. But,
the character may be wider than that. Use the fact that the spacers
cells encode how many *more* there are after them; when we get to the
first one, we know exactly how wide the character is.
When appending to an existing composed character, "inherit" its forced
width, if set.
Also make sure to actually _use_ the forced width, if set, rather than
the calculated width.
This fixes an issue when appending zero-width codepoints to a
forced-width combining character.
If there's a single codepoint in the text portion of the OSC sequence,
and its calculated width matches the forced width, print it directly
to the grid instead of emitting a combining character.
When w=0, we split up the text string "as we normally would". Since we
don't support any other text-sizing parameters, this means simply
printing each codepoint to the grid.
This function "prints" any non-ascii character (i.e. any character
that ends up in the action_utf8_print() function in vt.c) to the
grid. This includes grapheme cluster processing etc.
action_utf8_print() now simply calls this function.
This allows us to re-use the same functionality from other
places (like the text-sizing protocol).
This brings initial support for the new kitty text-sizing
protocol. Note hat only the width-parameter ('w') is supported. That
is, no font scaling, and no multi-line cells.
For now, only explicit widths are supported. That is, w=0 does not yet
work.
There are a couple of changes to the renderer, to handle e.g.
OSC 66 ; w=6 ; foobar ST
There are two ways this can get rendered, depending on whether
grapheme shaping has been enabled. We either shape it, and get an
array of glyphs back that we render. Or, we rasterize each codepoint
ourselves, and render each resulting glyph.
The two cases ends up in two different renderer loops, that worked
somewhat different. In particular, the first case has probably never
been tested/used at all...
With this patch, both are changed, and now uses some heuristic to
differentiate between multi-cell text strings (like in the example
above), or single-cell combining characters. The difference is mainly
in which offset to use for the secondary glyphs.
In a multi-cell string, each glyph is mapped to its own cell, while in
the combining case, we try to map all glyphs to the same cell.
The logic that tries to ensure we don't break a line in the middle of
a multi-cell character was flawed when the number of cells were larger
than 2.
In particular, if the number of cells to copy were limited by the
number of cells left on the current (new) line, and were less than the
length of the multi-cell character, then we failed to insert the
correct number of spacers, and also ended up misplacing the multi-cell
character; instead of pushing it to the next line, it was inserted on
the current line, even though it doesn't fit.
Also change how trailing SPACER cells are rendered (cells that are
"fillers" at then end of a line, when a multi-column character was
pushed over to the next line): don't copy the previous cell's
attributes (which may be wrong anyway), use default attributes
instead.
When the client application emits combining characters, for example
multi-codepoint emojis, in insert-mode, we ended up pushing partial
graphemes to the right, for each codepoint, resulting in too many
cells (and with the wrong content) being inserted.
The fix is fairly simple; don't "insert" when appending characters to
an existing grapheme cluster.
This isn't something we can detect easily in print_insert() (it would
require us to do grapheme clustering again). Fortunately, we do have
the required information in action_utf8_print(). So, pass this
information as a boolean to term_print().
Closes#1947
When auto-matching URLs (or custom regular expressions), use the
first *subexpression* as URL, rather than the while regex match.
This allows us to write custom regular expressions with prefix/suffix
strings that should not be included in the presented match.
Users can now define their own regex patterns, and use them via key
bindings:
[regex:foo]
regex=foo(bar)?
launch=path-to-script-or-application {match}
[key-bindings]
regex-launch=[foo] Control+Shift+q
regex-copy=[foo] Control+Mod1+Shift+q
That is, add a section called 'regex:', followed by an
identifier. Define a regex and a launcher command line.
Add a key-binding, regex-launch and/or regex-copy (similar to
show-urls-launch and show-urls-copy), and connect them to the regex
with the "[regex-name]" syntax (similar to how the pipe-* bindings
work).