This commit also renames the term_set_single_shift_ascii_printer()
function to term_single_shift(), since the former is overly verbose
and not really even accurate.
These sequences are supposed to affect the next printable ASCII
character and then reset to the previous character set, but before
this commit they were behaving like locking shifts.
If the cursor is already at the right edge, our logic that checked for
non-empty cells failed; it didn’t check the current cell.
Fix by initializing ‘emit_tab_char’ to true/false, depending on the
contents of the current cell.
TAB (\t) move the cursor to the next tab stop. That’s it, according to
the specification.
However, many terminal emulators try to keep tabs in the grid, to be
able to e.g. copy them. That is, copying a text chunk containing tabs
should result in tabs being pasted, not spaces.
In order to do that, we need to print a tab character to the grid. To
improve text reflow of tabs, we also print spaces to the subsequent
cells, up until (but not including) the next tab stop.
However, we can only do this if all the cells between the cursor and
the next tab stop are empty, since (obviously), we cannot overwrite
pre-existing characters.
Finally, while some fonts render tabs as spaces (i.e. an empty glyph),
some use a glyph representing “unprintable” characters, or
similar. Thus, we need to exclude cells with tab characters when
rendering.
Only the first character in the chain was being compared with `priv`
and the rest were just being evaluated as simple expressions. This
was causing the G2 and G3 operations to erroneously use the G1 index.
Since the characters are a contiguous range, we can just subtract the
start of the range to get the appropriate index. The outer switch
statement already ensures the values are in range.
These are part of the "anywhere" state in Paul Flo Williams' VT parser
state diagram[1]. That means that they should be accepted *anywhere* in
a byte sequence, including in the middle of other sequences or even in
the middle of a multi-byte UTF-8 sequence. Adhering to this requirement
makes them incompatible with the use of UTF-8 as a universal encoding.
Not adhering to the aforementioned requirement by making a special case
for UTF-8 sequences may seem tempting, but it's much more at odds with
the relevant standards[2] than it appears on the surface. UTF-8 is not
an "8-bit code", at least not according to the parlance of ECMA-43, nor
does it map the C1 control range in a compatible way.
[1]: https://vt100.net/emu/dec_ansi_parser
[2]: ECMA-35, ECMA-43, ECMA-48
Instead of using CELL_SPACER for *all* cells that previously used
CELL_MULT_COL_SPACER, include the remaining number of spacers
following, and including, itself. This is encoded by adding to the
CELL_SPACER value.
So, a double width character will now store the character itself in
the first cell (just like before), and CELL_SPACER+1 in the second
cell.
A three-cell character would store the character itself, then
CELL_SPACER+2, and finally CELL_SPACER+1.
In other words, the last spacer is always CELL_SPACER+1.
CELL_SPACER+0 is used when padding at the right margin. I.e. when
writing e.g. a double width character in the last column, we insert a
CELL_SPACER+0 pad character, and then write the double width character
in the first column on the next row.
Foot currently does reverse-wrapping (‘auto_left_margin’, or ’bw’) on
everything that calls ‘term_cursor_left()’. This is wrong; it should
only be done for cub1. From man terminfo:
auto_left_margin | bw | bw | cub1 wraps from column 0 to last
column
This patch moves the reverse-wrapping logic from term_cursor_left() to
the handling of BS (backspace).
Closes#441
term_print() is called whenever the client application “prints”
something to the grid. It is called for both ASCII and UTF-8
characters, and needs to handle sixels, insert mode and ASCII
vs. graphical charsets.
Since it’s on the hot path, this becomes unnecessarily slow.
This patch adds a “fast” version of term_print(), tailored for the
common case: ASCII characters in non-insert mode, without any sixels
and non-graphical charsets.
A new function, term_update_ascii_printer(), has been added, and must
be called whenever:
* The currently selected charset *index* changes
* The currently selected charset changes (from ASCII to graphical, or
vice verse)
* Sixels are added to the grid
* Sixels are removed from the grid
* Insert mode is enabled/disabled
action_print() is in the hot path, and having if-statement here *does*
have an impact on performance.
Much more so when that if-statement involves a functional call to
wcwidth().
Closes#330
Take ‘\E(#0’ for example - this is *not* the same as ‘\E(0’.
Up until now, foot has however treated them as the same escape,
because the handler for ‘\E(0’ didn’t verify there weren’t any _other_
private characters present.
Fix this by turning the ‘private’ array into a single 4-byte
integer. This allows us to match *all* privates with a single
comparison.
Private characters are added to the LSB first, and MSB last. This
means we can check for single privates in pretty much the same way as
before:
switch (term->vt.private) {
case ‘?’:
...
break;
}
Checking for two (or more) is much uglier, but foot only supports
a *single* escape with two privates, and no escapes with three or
more:
switch (term->vt.private) {
case 0x243f: /* ‘?$’ */
...
break;
}
The ‘clear’ action remains simple (and fast), with a single write
operation.
Collecting privates is potentially _slightly_ more complex than
before; we now need mask and compare, instead of simply comparing,
when checking how many privates we already have.
We _could_ add a counter, which would make collecting privates easier,
but this would add an additional write to the ‘clean’ action which is
really bad since it’s in the hot path.
Add anew config option, ‘bell=none|set-urgency’. When set to
‘set-urgency’, the margins will be painted in red (if the window did
not have keyboard focus).
This is intended as a cheap replacement for the ‘urgency’ hint, that
doesn’t (yet) exist on Wayland.
Closes#157
The switch statements use the GCC extension "case X ... Y", and here
it doesn't really make any sense to convert it to "case X: case Y:",
so hide the warnings instead.
Previously, C0::VT was implemented as a simple 'cursor down'. I.e. it
would behave as LF **until** it reached the bottom of the screen,
where instead of scrolling, it became a no-op.
See https://vt100.net/docs/vt102-ug/chapter5.html
This is slightly faster, since we don't need to initialize an
mbstate_t struct (using mbrtowc() with a NULL-pointer for 'ps' also
works).
Also, avoid a branch by setting wc=0 and then ignoring the
result/error code from mbtowc().
Since the pre-composing functionality is now part of fcft, it makes
little sense to have a compile time option - there's no size benefit
to be had.
Furthermore, virtually all terminal emulators do
pre-composing (alacritty being an exception), this really isn't that
controversial.
This allows us more options when determining whether to use a
pre-composed character or not:
We now only use the pre-composed character if it's from the primary
font, or if at least one of the base or combining characters are from
a fallback font.
I.e. use glyphs from the primary font if possible. But, if one or more
of the decomposed glyphs are from a fallback font, use the
pre-composed character anyway.
We currently store up to 5 combining characters in any given
base+combining chain.
This adds a check for when that limit is about to be exceeded. When
this happens, we log the chain + the new combining character.
Since things will break anyway, we simply overwrite the last combining
character.
Instead of storing combining data per cell, realize that most
combinations are re-occurring and that there's lots of available space
left in the unicode range, and store seen base+combining combinations
chains in a per-terminal array.
When we encounter a combining character, we first try to pre-compose,
like before. If that fails, we then search for the current
base+combining combo in the list of previously seen combinations. If
not found there either, we allocate a new combo and add it to the
list. Regardless, the result is an index into this array. We store
this index, offsetted by COMB_CHARS_LO=0x40000000ul in the cell.
When rendering, we need to check if the cell character is a plain
character, or if it's a composed character (identified by checking if
the cell character is >= COMB_CHARS_LO).
Then we render the grapheme pretty much like before.