../vt.c:648:13: runtime error: signed integer overflow: 3924432811 * 2654435761 cannot be represented in type 'long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../vt.c:648:13 in
Closes#1456
This patch detects invalid codepoints in the UTF-8 EDxxxx range, and
the F4xxxxxx range.
Note that we still allow the E0xxxx and F0xxxxxx ranges. These
contains overlong encodings. We allow them, because they still decode
into correct UTF-32.
Closes#1423
Do not insert existing positions into the tab stop list.
This prevents a performance issue when iterating through
an extremely long tab stop list.
Also corrects the behaviour of CBT.
We’re already switching on the next VT input byte in the state
machine; no need to if...else if in action_param() too.
That is, split up action_param() into three:
* action_param_new()
* action_param_new_subparam()
* action_param()
This makes the code cleaner, and hopefully slightly faster.
Next, to improve performance further, only check for (sub)parameter
overflow in action_param_new() and action_param_subparam().
Add pointers to the VT struct that points to the currently active
parameter and sub-parameter.
When the number of parameters (or sub-parameters) overflow, warn, and
then point the parameter pointer to a "dummy" value in the VT struct.
This way, we don’t have to check anything in action_param().
* Don’t assume 32 bits when rotating the old key. Use the number of
actual bits available, as determined by CELL_COMB_CHARS_{HI,LO}
* Multiply with magic hash constant
This greatly reduces the number of collisions seen. For example, the
Emoji test file (from the Unicode specification), now has zero
collisions.
Composed characters are stored in a tree structure, using a key as
identifier. The key is calculated from the individual characters that
make up the composed character sequence.
Since the address space for keys is limited, collisions may occur. In
this case, we simply increment the key and try again.
It is theoretically possible to saturate the key space, in which case
we’ll get stuck in an endless loop.
Even if the key space isn’t fully saturated, we fairly easy reach a
point where there are so many collisions for each insertion, that
performance drops significantly.
Since key space is limited (it’s not like a hash table that we can
grow), our only option is to limit the number of collisions. If we
can’t find a slot within a hard code amount of collisions, the
character is simply dropped.
Fcft no longer uses wchar_t, but plain uint32_t to represent
codepoints.
Since we do a fair amount of string operations in foot, it still makes
sense to use something that actually _is_ a string (or character),
rather than an array of uint32_t.
For this reason, we switch out all wchar_t usage in foot to
char32_t. We also verify, at compile-time, that char32_t used
UTF-32 (which is what fcft expects).
Unfortunately, there are no string functions for char32_t. To avoid
having to re-implement all wcs*() functions, we add a small wrapper
layer of c32*() functions.
These wrapper functions take char32_t arguments, but then simply call
the corresponding wcs*() function.
For this to work, wcs*() must _also_ be UTF-32 compatible. We can
check for the presence of the __STDC_ISO_10646__ macro. If set,
wchar_t is at least 4 bytes and its internal representation is UTF-32.
FreeBSD does *not* define this macro, because its internal wchar_t
representation depends on the current locale. It _does_ use UTF-32
_if_ the current locale is UTF-8.
Since foot enforces UTF-8, we simply need to check if __FreeBSD__ is
defined.
Other fcft API changes:
* fcft_glyph_rasterize() -> fcft_codepoint_rasterize()
* font.space_advance has been removed
* ‘tags’ have been removed from fcft_grapheme_rasterize()
* ‘fcft_log_init()’ removed
* ‘fcft_init()’ and ‘fcft_fini()’ must be explicitly called
All emoji graphemes are double-width. Foot doesn’t support non-latin
scripts. Ergo, this should result in the Right Thing, even though
we’re not doing it the Right Way.
Note that we’re now breaking cursor synchronization with nearly all
applications.
But the way I see it, the applications need to be
updated.
The previous implementation stored compose chains in a dynamically
allocated array. Adding a chain was easy: resize the array and append
the new chain at the end. Looking up a compose chain given a compose
chain key/index was also easy: just index into the array.
However, searching for a pre-existing chain given a codepoint sequence
was very slow. Since the array wasn’t sorted, we typically had to scan
through the entire array, just to realize that there is no
pre-existing chain, and that we need to add a new one.
Since this happens for *each* codepoint in a grapheme cluster, things
quickly became really slow.
Things were ok:ish as long as the compose chain struct was small, as
that made it possible to hold all the chains in the cache. Once the
number of chains reached a certain point, or when we were forced to
bump maximum number of allowed codepoints in a chain, we started
thrashing the cache and things got much much worse.
So what can we do?
We can’t sort the array, because
a) that would invalidate all existing chain keys in the grid (and
iterating the entire scrollback and updating compose keys is *not* an
option).
b) inserting a chain becomes slow as we need to first find _where_ to
insert it, and then memmove() the rest of the array.
This patch uses a binary search tree to store the chains instead of a
simple array.
The tree is sorted on a “key”, which is the XOR of all codepoints,
truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range.
The grid now stores CELL_COMB_CHARS_LO+key, instead of
CELL_COMB_CHARS_LO+index.
Since the key is truncated, collisions may occur. This is handled by
incrementing the key by 1.
Lookup is of course slower than before, O(log n) instead of
O(1).
Insertion is slightly slower as well: technically it’s O(log n)
instead of O(1). However, we also need to take into account the
re-allocating the array will occasionally force a full copy of the
array when it cannot simply be growed.
But finding a pre-existing chain is now *much* faster: O(log n)
instead of O(n). In most cases, the first lookup will either
succeed (return a true match), or fail (return NULL). However, since
key collisions are possible, it may also return false matches. This
means we need to verify the contents of the chain before deciding to
use it instead of inserting a new chain. But remember that this
comparison was being done for each and every chain in the previous
implementation.
With lookups being much faster, and in particular, no longer requiring
us to check the chain contents for every singlec chain, we can now use
a dynamically allocated ‘chars’ array in the chain. This was
previously a hardcoded array of 10 chars.
Using a dynamic allocated array means looking in the array is slower,
since we now need two loads: one to load the pointer, and a second to
load _from_ the pointer.
As a result, the base size of a compose chain (i.e. an “empty” chain)
has now been reduced from 48 bytes to 32. A chain with two codepoints
is 40 bytes. This means we have up to 4 codepoints while still using
less, or the same amount, of memory as before.
Furthermore, the Unicode random test (i.e. write random “unicode”
chars) is now **faster** than current master (i.e. before text-shaping
support was added), **with** test-shaping enabled. With text-shaping
disabled, we’re _even_ faster.
When checking if we already have a compose chain for the current
sequence of characters, don’t search the list from the beginning,
unless we have to.
Taking the following things into consideration:
* New compose chains are always appended at the end of the list
* If the current sequence is 3 or more characters, it *must* consist
of an existing compose chain, plus the new character.
Thus, when searching, start at index 0 if we only have two characters,
since then the base cell originally contained a regular base
character, and not a compose chain. I.e. the new chain may be
_anywhere_ in the chain list.
If however we have a sequence of three or more characters, start at
the index the *base* chain was at. If the chain we’re searching for
exists, it *must* have been added *after* the base chain, and thus
it *must* be located *after* the base chain in the chain list.
We already have all the widths needed to calculate the new one; it’s
the base characters width (base_width), or the previous combining
chain’s width (composed->width) plus the new characters’s
width (width).
This commit also renames the term_set_single_shift_ascii_printer()
function to term_single_shift(), since the former is overly verbose
and not really even accurate.
These sequences are supposed to affect the next printable ASCII
character and then reset to the previous character set, but before
this commit they were behaving like locking shifts.
If the cursor is already at the right edge, our logic that checked for
non-empty cells failed; it didn’t check the current cell.
Fix by initializing ‘emit_tab_char’ to true/false, depending on the
contents of the current cell.
TAB (\t) move the cursor to the next tab stop. That’s it, according to
the specification.
However, many terminal emulators try to keep tabs in the grid, to be
able to e.g. copy them. That is, copying a text chunk containing tabs
should result in tabs being pasted, not spaces.
In order to do that, we need to print a tab character to the grid. To
improve text reflow of tabs, we also print spaces to the subsequent
cells, up until (but not including) the next tab stop.
However, we can only do this if all the cells between the cursor and
the next tab stop are empty, since (obviously), we cannot overwrite
pre-existing characters.
Finally, while some fonts render tabs as spaces (i.e. an empty glyph),
some use a glyph representing “unprintable” characters, or
similar. Thus, we need to exclude cells with tab characters when
rendering.
Only the first character in the chain was being compared with `priv`
and the rest were just being evaluated as simple expressions. This
was causing the G2 and G3 operations to erroneously use the G1 index.
Since the characters are a contiguous range, we can just subtract the
start of the range to get the appropriate index. The outer switch
statement already ensures the values are in range.
These are part of the "anywhere" state in Paul Flo Williams' VT parser
state diagram[1]. That means that they should be accepted *anywhere* in
a byte sequence, including in the middle of other sequences or even in
the middle of a multi-byte UTF-8 sequence. Adhering to this requirement
makes them incompatible with the use of UTF-8 as a universal encoding.
Not adhering to the aforementioned requirement by making a special case
for UTF-8 sequences may seem tempting, but it's much more at odds with
the relevant standards[2] than it appears on the surface. UTF-8 is not
an "8-bit code", at least not according to the parlance of ECMA-43, nor
does it map the C1 control range in a compatible way.
[1]: https://vt100.net/emu/dec_ansi_parser
[2]: ECMA-35, ECMA-43, ECMA-48
Instead of using CELL_SPACER for *all* cells that previously used
CELL_MULT_COL_SPACER, include the remaining number of spacers
following, and including, itself. This is encoded by adding to the
CELL_SPACER value.
So, a double width character will now store the character itself in
the first cell (just like before), and CELL_SPACER+1 in the second
cell.
A three-cell character would store the character itself, then
CELL_SPACER+2, and finally CELL_SPACER+1.
In other words, the last spacer is always CELL_SPACER+1.
CELL_SPACER+0 is used when padding at the right margin. I.e. when
writing e.g. a double width character in the last column, we insert a
CELL_SPACER+0 pad character, and then write the double width character
in the first column on the next row.
Foot currently does reverse-wrapping (‘auto_left_margin’, or ’bw’) on
everything that calls ‘term_cursor_left()’. This is wrong; it should
only be done for cub1. From man terminfo:
auto_left_margin | bw | bw | cub1 wraps from column 0 to last
column
This patch moves the reverse-wrapping logic from term_cursor_left() to
the handling of BS (backspace).
Closes#441
term_print() is called whenever the client application “prints”
something to the grid. It is called for both ASCII and UTF-8
characters, and needs to handle sixels, insert mode and ASCII
vs. graphical charsets.
Since it’s on the hot path, this becomes unnecessarily slow.
This patch adds a “fast” version of term_print(), tailored for the
common case: ASCII characters in non-insert mode, without any sixels
and non-graphical charsets.
A new function, term_update_ascii_printer(), has been added, and must
be called whenever:
* The currently selected charset *index* changes
* The currently selected charset changes (from ASCII to graphical, or
vice verse)
* Sixels are added to the grid
* Sixels are removed from the grid
* Insert mode is enabled/disabled
action_print() is in the hot path, and having if-statement here *does*
have an impact on performance.
Much more so when that if-statement involves a functional call to
wcwidth().
Closes#330