Specifically, make sure we do *not* call wl_data_offer_receive() with
a NULL mime-type, as this causes libwayland to error out, which in
turn causes foot to exit.
Closes#1092
This syncs foot with more recent versions of XTerm, where it’s
“disallowedPasteControls” resource has changed its default value to
BS,DEL,ENQ,EOT,ESC,NULL
Note that we’re already stripping out ENQ,EOT,ESC in all modes.
What does it mean for foot:
* HT, VT and FF are now allowed, regardless of paste mode
* NUL is now stripped in non-bracketed paste mode
Closes#1084
When true, selection_find_word_boundary_right() behaves as before - it
stops as soon as it encounters a character that isn’t of the
same *type* as the “initial” character (the last character in the
selection).
Take this, for example:
The Quick Brown Fox
The selection will first stop at the end of “the”, then just *before*
“quick”, then at the end of “quick”. Then just *before* “brown”, and
then at the end of “brown”, and so on.
This suits mouse selections pretty good. But when
selection_find_word_boundary_right() is used to extend a search match,
it’s better to ignore space-to-word character transitions. That is, we
want
The Quick Brown Fox
to first extend to the end of “the”, then immediately to the end of
“quick”, then to the end of “brown”, and so on.
Setting the ‘stop_to_space_to_word_boundary’ argument to false results
in latter behavior.
This is now done by search, when executing the
“extend-to-word-boundary” and “extend-to-next-whitespace” key
bindings.
Internally, selection coordinates are *unbounded* (that is, the row
numbers may be larger than grid->num_rows) while a selection is
ongoing. Only after it has been finalized are the coordinates bounded.
This means it isn’t safe to use term->selection.coords.* directly.
In non-bracketed paste mode, we translate \n to \r, and \r\n to
\r. The latter matches XTerm, urxvt, alacritty and kitty. The former
matches alacritty and kitty (xterm and urxvt just blindly replaces all
\n occurrences with \r, meaning \r\n is translated to \r\r.
For some reason, we then unconditionally translated all \r back to \n,
regardless of whether bracketed paste was enabled or not. Unsure
why/where this comes from, but it doesn't match any of the other
terminal emulators I tested.
One example where this caused issues is in older versions of nano (at
least up to 2.9).
Closes#980
When iterating the characters in a selection, we want to go from the
“start” to the “end”, where start is the upper left-most character,
and “end” is the lower right-most character.
There are two things to consider:
* The ‘start’ coordinate may actually sort after the ‘end’
coordinate (user selected from bottom of the window and upward)
* The scrollback wraparound.
What we do is calculate both the star and end coordinates’
scrollback-start relative row numbers. That is, the number of rows
from the beginning of the scrollback. So if the very first
row (i.e. the oldest) in the scrollback is selected, that has the
scrollback-start relative number “0”.
Then we loop from whichever (start or end coordinate) is highest up in
the scrollback, to the “other” coordinate.
Fcft no longer uses wchar_t, but plain uint32_t to represent
codepoints.
Since we do a fair amount of string operations in foot, it still makes
sense to use something that actually _is_ a string (or character),
rather than an array of uint32_t.
For this reason, we switch out all wchar_t usage in foot to
char32_t. We also verify, at compile-time, that char32_t used
UTF-32 (which is what fcft expects).
Unfortunately, there are no string functions for char32_t. To avoid
having to re-implement all wcs*() functions, we add a small wrapper
layer of c32*() functions.
These wrapper functions take char32_t arguments, but then simply call
the corresponding wcs*() function.
For this to work, wcs*() must _also_ be UTF-32 compatible. We can
check for the presence of the __STDC_ISO_10646__ macro. If set,
wchar_t is at least 4 bytes and its internal representation is UTF-32.
FreeBSD does *not* define this macro, because its internal wchar_t
representation depends on the current locale. It _does_ use UTF-32
_if_ the current locale is UTF-8.
Since foot enforces UTF-8, we simply need to check if __FreeBSD__ is
defined.
Other fcft API changes:
* fcft_glyph_rasterize() -> fcft_codepoint_rasterize()
* font.space_advance has been removed
* ‘tags’ have been removed from fcft_grapheme_rasterize()
* ‘fcft_log_init()’ removed
* ‘fcft_init()’ and ‘fcft_fini()’ must be explicitly called
This fixes an issue where pasting (using e.g. OSC-52) in client
applications that doesn’t do this conversion themselves, like tmux,
doesn’t work.
Closes#752
Previously, soft-wrapped lines were not selected correctly, as the
selection logic was hardcoded to simply select everything between the
first and last column on the current terminal row.
Now, we scan backward and forward, looking for hard-wrapped
lines. This is similar to how word-based selection works.
Closes#726
When updating the selection (i.e when changing it - adding or removing
cells to the selection), we need to do two things:
* Unset the ‘selected’ bit on all cells that are no longer selected.
* Set the ‘selected’ bit on all cells that *are* selected.
Since it’s quite tricky to calculate the difference between the “old”
and “new” selection, this is done by first un-selecting the old
selection, and then selecting the new, updated selection. I.e. first
we clear the ‘selected’ bit from *all* cells, and then we re-set it on
those cells that are still selected.
This process also dirties the cells, to make sure they are
re-rendered (needed to reflect their new selected/un-selected status).
To avoid dirtying *all* previously selected, and newly selected cells,
we have used an algorithm that first runs a “pre-pass”, marking all
cells that *will* be selected as such. The un-select pass would then
skip (no dirty) cells that have been marked by the pre-pass. Finally,
the select pass would only dirty cells that have *not* been marked by
the pre-pass.
In short, we only dirty cells whose selection state have *changed*.
To do this, we used a second ‘selected’ bit in the cell attribute
struct.
Those bits are *scarce*.
This patch implements an alternative algorithm, that frees up one of
the two ‘selected’ bits.
This is done by lazy allocating a bitmask for the entire grid. The
pre-pass sets bits in the bitmask. Thus, after the pre-pass, the
bitmask has set bits for all cells that *will* be selected.
The un-select pass simply skips cells with a one-bit in the
bitmask. Cells without a one-bit in the bitmask are dirtied, and their
‘selected’ bit is cleared.
The select-pass doesn’t even have to look at the bitmask - if the cell
already has its ‘selected’ bit set, it does nothing. Otherwise it sets
it and dirties the cell.
The bitmask is implemented as an array of arrays of 64-bit
integers. Each outer element represents one row. These pointers are
calloc():ed before starting the pre-pass.
The pre-pass allocates the inner arrays on demand.
The unselect pass is designed to handle both the complete absence of a
bitmask, as well as row entries being NULL (both means the cell
is *not* pre-marked, and will thus be dirtied).
While we’re in scrollback search mode, the selection may be
cancelled (for example, if the application is scrolling out the
selected text). Trying to e.g. extend the search selection after this
has happened triggered a crash.
This fixes it by simply resetting the search match state when the
selection is cancelled.
Closes#644
This breaks out the scrollback erasing logic for \E[3J from csi.c, and
moves it to the new function term_erase_scrollback(), and changes the
logic to calculate the start and end row (absolute) numbers of the
scrollback, and only iterate those, instead of iterating *all* rows,
filtering out those that are on-screen.
It also adds an intersection range check of the selection range, and
cancels the selection if it touches any of the deleted scrollback
rows.
This fixes a crash when trying to render the next frame, since the
selection now references rows that have been freed.
Closes#633
The previous implementation stored compose chains in a dynamically
allocated array. Adding a chain was easy: resize the array and append
the new chain at the end. Looking up a compose chain given a compose
chain key/index was also easy: just index into the array.
However, searching for a pre-existing chain given a codepoint sequence
was very slow. Since the array wasn’t sorted, we typically had to scan
through the entire array, just to realize that there is no
pre-existing chain, and that we need to add a new one.
Since this happens for *each* codepoint in a grapheme cluster, things
quickly became really slow.
Things were ok:ish as long as the compose chain struct was small, as
that made it possible to hold all the chains in the cache. Once the
number of chains reached a certain point, or when we were forced to
bump maximum number of allowed codepoints in a chain, we started
thrashing the cache and things got much much worse.
So what can we do?
We can’t sort the array, because
a) that would invalidate all existing chain keys in the grid (and
iterating the entire scrollback and updating compose keys is *not* an
option).
b) inserting a chain becomes slow as we need to first find _where_ to
insert it, and then memmove() the rest of the array.
This patch uses a binary search tree to store the chains instead of a
simple array.
The tree is sorted on a “key”, which is the XOR of all codepoints,
truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range.
The grid now stores CELL_COMB_CHARS_LO+key, instead of
CELL_COMB_CHARS_LO+index.
Since the key is truncated, collisions may occur. This is handled by
incrementing the key by 1.
Lookup is of course slower than before, O(log n) instead of
O(1).
Insertion is slightly slower as well: technically it’s O(log n)
instead of O(1). However, we also need to take into account the
re-allocating the array will occasionally force a full copy of the
array when it cannot simply be growed.
But finding a pre-existing chain is now *much* faster: O(log n)
instead of O(n). In most cases, the first lookup will either
succeed (return a true match), or fail (return NULL). However, since
key collisions are possible, it may also return false matches. This
means we need to verify the contents of the chain before deciding to
use it instead of inserting a new chain. But remember that this
comparison was being done for each and every chain in the previous
implementation.
With lookups being much faster, and in particular, no longer requiring
us to check the chain contents for every singlec chain, we can now use
a dynamically allocated ‘chars’ array in the chain. This was
previously a hardcoded array of 10 chars.
Using a dynamic allocated array means looking in the array is slower,
since we now need two loads: one to load the pointer, and a second to
load _from_ the pointer.
As a result, the base size of a compose chain (i.e. an “empty” chain)
has now been reduced from 48 bytes to 32. A chain with two codepoints
is 40 bytes. This means we have up to 4 codepoints while still using
less, or the same amount, of memory as before.
Furthermore, the Unicode random test (i.e. write random “unicode”
chars) is now **faster** than current master (i.e. before text-shaping
support was added), **with** test-shaping enabled. With text-shaping
disabled, we’re _even_ faster.
When marking and unmarking cells, we don’t highlight trailing empty
cells. We do however highlight empty cells if they are followed by
non-empty cells.
I think this was an intentional choice. If one row ended with trailing
empty cells, but *no* hard linebreak, then we’d continue on the next
row, and emit all the empty cells once we hit a non-emtpy cell on the
second row.
But this is something that shouldn’t happen in any real-world use
cases.
Double-clicking on a word in the left or right margin, would line-wrap
the selection if there was a non-empty cell in the corresponding
right/left margin on the prev/next line. Regardless of whether there
was a hard linebreak or not.
Script to reprouce:
!/bin/bash
cols=$(tput cols)
printf "%*coo\nbar\n" $((${cols} - 2)) f
Run, then double click either “foo” or “bar”. Neither should select
the other part.
Closes#565
extract_finish() returns the extracted text in UTF-8, while
extract_finish_wide() returns the extracted text in Unicode.
This patch also adds a new argument to extract_finish{,_wide},
that when set to true, skips stripping trailing empty cells.
Instead of using CELL_SPACER for *all* cells that previously used
CELL_MULT_COL_SPACER, include the remaining number of spacers
following, and including, itself. This is encoded by adding to the
CELL_SPACER value.
So, a double width character will now store the character itself in
the first cell (just like before), and CELL_SPACER+1 in the second
cell.
A three-cell character would store the character itself, then
CELL_SPACER+2, and finally CELL_SPACER+1.
In other words, the last spacer is always CELL_SPACER+1.
CELL_SPACER+0 is used when padding at the right margin. I.e. when
writing e.g. a double width character in the last column, we insert a
CELL_SPACER+0 pad character, and then write the double width character
in the first column on the next row.
It’s ok to let the receiving end handle formatting C0 control
characters in bracketed paste mode.
In fact, we *must* let them through. Otherwise it is impossible to
paste e.g. tabs into editors and similar applications.
Before passing the pasted text to the decoder, we now replace \r\n,
and \n, with \r.
The URI decoder was looking for a \n, which meant we failed to split
up the list and instead pasted a single “multi-line” URI.
When ‘selection-target’ is set to ‘none’, selecting text does not copy
the text to _any_ clipboard.
This patch also refactors the value parsing to be data driven.