Previously, soft-wrapped lines were not selected correctly, as the
selection logic was hardcoded to simply select everything between the
first and last column on the current terminal row.
Now, we scan backward and forward, looking for hard-wrapped
lines. This is similar to how word-based selection works.
Closes#726
When updating the selection (i.e when changing it - adding or removing
cells to the selection), we need to do two things:
* Unset the ‘selected’ bit on all cells that are no longer selected.
* Set the ‘selected’ bit on all cells that *are* selected.
Since it’s quite tricky to calculate the difference between the “old”
and “new” selection, this is done by first un-selecting the old
selection, and then selecting the new, updated selection. I.e. first
we clear the ‘selected’ bit from *all* cells, and then we re-set it on
those cells that are still selected.
This process also dirties the cells, to make sure they are
re-rendered (needed to reflect their new selected/un-selected status).
To avoid dirtying *all* previously selected, and newly selected cells,
we have used an algorithm that first runs a “pre-pass”, marking all
cells that *will* be selected as such. The un-select pass would then
skip (no dirty) cells that have been marked by the pre-pass. Finally,
the select pass would only dirty cells that have *not* been marked by
the pre-pass.
In short, we only dirty cells whose selection state have *changed*.
To do this, we used a second ‘selected’ bit in the cell attribute
struct.
Those bits are *scarce*.
This patch implements an alternative algorithm, that frees up one of
the two ‘selected’ bits.
This is done by lazy allocating a bitmask for the entire grid. The
pre-pass sets bits in the bitmask. Thus, after the pre-pass, the
bitmask has set bits for all cells that *will* be selected.
The un-select pass simply skips cells with a one-bit in the
bitmask. Cells without a one-bit in the bitmask are dirtied, and their
‘selected’ bit is cleared.
The select-pass doesn’t even have to look at the bitmask - if the cell
already has its ‘selected’ bit set, it does nothing. Otherwise it sets
it and dirties the cell.
The bitmask is implemented as an array of arrays of 64-bit
integers. Each outer element represents one row. These pointers are
calloc():ed before starting the pre-pass.
The pre-pass allocates the inner arrays on demand.
The unselect pass is designed to handle both the complete absence of a
bitmask, as well as row entries being NULL (both means the cell
is *not* pre-marked, and will thus be dirtied).
While we’re in scrollback search mode, the selection may be
cancelled (for example, if the application is scrolling out the
selected text). Trying to e.g. extend the search selection after this
has happened triggered a crash.
This fixes it by simply resetting the search match state when the
selection is cancelled.
Closes#644
This breaks out the scrollback erasing logic for \E[3J from csi.c, and
moves it to the new function term_erase_scrollback(), and changes the
logic to calculate the start and end row (absolute) numbers of the
scrollback, and only iterate those, instead of iterating *all* rows,
filtering out those that are on-screen.
It also adds an intersection range check of the selection range, and
cancels the selection if it touches any of the deleted scrollback
rows.
This fixes a crash when trying to render the next frame, since the
selection now references rows that have been freed.
Closes#633
The previous implementation stored compose chains in a dynamically
allocated array. Adding a chain was easy: resize the array and append
the new chain at the end. Looking up a compose chain given a compose
chain key/index was also easy: just index into the array.
However, searching for a pre-existing chain given a codepoint sequence
was very slow. Since the array wasn’t sorted, we typically had to scan
through the entire array, just to realize that there is no
pre-existing chain, and that we need to add a new one.
Since this happens for *each* codepoint in a grapheme cluster, things
quickly became really slow.
Things were ok:ish as long as the compose chain struct was small, as
that made it possible to hold all the chains in the cache. Once the
number of chains reached a certain point, or when we were forced to
bump maximum number of allowed codepoints in a chain, we started
thrashing the cache and things got much much worse.
So what can we do?
We can’t sort the array, because
a) that would invalidate all existing chain keys in the grid (and
iterating the entire scrollback and updating compose keys is *not* an
option).
b) inserting a chain becomes slow as we need to first find _where_ to
insert it, and then memmove() the rest of the array.
This patch uses a binary search tree to store the chains instead of a
simple array.
The tree is sorted on a “key”, which is the XOR of all codepoints,
truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range.
The grid now stores CELL_COMB_CHARS_LO+key, instead of
CELL_COMB_CHARS_LO+index.
Since the key is truncated, collisions may occur. This is handled by
incrementing the key by 1.
Lookup is of course slower than before, O(log n) instead of
O(1).
Insertion is slightly slower as well: technically it’s O(log n)
instead of O(1). However, we also need to take into account the
re-allocating the array will occasionally force a full copy of the
array when it cannot simply be growed.
But finding a pre-existing chain is now *much* faster: O(log n)
instead of O(n). In most cases, the first lookup will either
succeed (return a true match), or fail (return NULL). However, since
key collisions are possible, it may also return false matches. This
means we need to verify the contents of the chain before deciding to
use it instead of inserting a new chain. But remember that this
comparison was being done for each and every chain in the previous
implementation.
With lookups being much faster, and in particular, no longer requiring
us to check the chain contents for every singlec chain, we can now use
a dynamically allocated ‘chars’ array in the chain. This was
previously a hardcoded array of 10 chars.
Using a dynamic allocated array means looking in the array is slower,
since we now need two loads: one to load the pointer, and a second to
load _from_ the pointer.
As a result, the base size of a compose chain (i.e. an “empty” chain)
has now been reduced from 48 bytes to 32. A chain with two codepoints
is 40 bytes. This means we have up to 4 codepoints while still using
less, or the same amount, of memory as before.
Furthermore, the Unicode random test (i.e. write random “unicode”
chars) is now **faster** than current master (i.e. before text-shaping
support was added), **with** test-shaping enabled. With text-shaping
disabled, we’re _even_ faster.
When marking and unmarking cells, we don’t highlight trailing empty
cells. We do however highlight empty cells if they are followed by
non-empty cells.
I think this was an intentional choice. If one row ended with trailing
empty cells, but *no* hard linebreak, then we’d continue on the next
row, and emit all the empty cells once we hit a non-emtpy cell on the
second row.
But this is something that shouldn’t happen in any real-world use
cases.
Double-clicking on a word in the left or right margin, would line-wrap
the selection if there was a non-empty cell in the corresponding
right/left margin on the prev/next line. Regardless of whether there
was a hard linebreak or not.
Script to reprouce:
!/bin/bash
cols=$(tput cols)
printf "%*coo\nbar\n" $((${cols} - 2)) f
Run, then double click either “foo” or “bar”. Neither should select
the other part.
Closes#565
extract_finish() returns the extracted text in UTF-8, while
extract_finish_wide() returns the extracted text in Unicode.
This patch also adds a new argument to extract_finish{,_wide},
that when set to true, skips stripping trailing empty cells.
Instead of using CELL_SPACER for *all* cells that previously used
CELL_MULT_COL_SPACER, include the remaining number of spacers
following, and including, itself. This is encoded by adding to the
CELL_SPACER value.
So, a double width character will now store the character itself in
the first cell (just like before), and CELL_SPACER+1 in the second
cell.
A three-cell character would store the character itself, then
CELL_SPACER+2, and finally CELL_SPACER+1.
In other words, the last spacer is always CELL_SPACER+1.
CELL_SPACER+0 is used when padding at the right margin. I.e. when
writing e.g. a double width character in the last column, we insert a
CELL_SPACER+0 pad character, and then write the double width character
in the first column on the next row.
It’s ok to let the receiving end handle formatting C0 control
characters in bracketed paste mode.
In fact, we *must* let them through. Otherwise it is impossible to
paste e.g. tabs into editors and similar applications.
Before passing the pasted text to the decoder, we now replace \r\n,
and \n, with \r.
The URI decoder was looking for a \n, which meant we failed to split
up the list and instead pasted a single “multi-line” URI.
When ‘selection-target’ is set to ‘none’, selecting text does not copy
the text to _any_ clipboard.
This patch also refactors the value parsing to be data driven.
This fixes an assertion triggered when selecting the upper left cell
and dragging down.
We would end up trying to decrement the pivot end point, hitting an
assertion that only is valid while skipping spacers.
Remove the assertion, and allow pivot points to be moved across line
wraps, but take care not to move outside the visible screen area.
If the initial character is a space, find the next non-space
character.
If the initial character is a delimiter, find the next non-delimiter
character (space, or word character).
If the initial character is neither (i.e, it is a word character),
find the next non-word character.
Extend selection pivoting to allow selections to pivot around a
range.
Use this in word- and row-based selections to pivot around the initial
word/row that was selected.
This mimics the behavior of at least urxvt and xterm.