Commit graph

21 commits

Author SHA1 Message Date
Daniel Eklöf
96605bf52f
extract: number of spaces after the tab shouldn't include the tab cell itself
This fixes an off by one, where we sometimes "ate" an extra space when
extracting contents with tabs. This happened if the tab (and its
subsequent spaces) were followed by an additional space.

Closes #2194
2025-10-11 10:13:10 +02:00
Daniel Eklöf
8356dfac2f
Disable debug logging 2022-04-27 18:44:17 +02:00
Daniel Eklöf
e0227266ca
fcft: adapt to API changes in fcft-3.x
Fcft no longer uses wchar_t, but plain uint32_t to represent
codepoints.

Since we do a fair amount of string operations in foot, it still makes
sense to use something that actually _is_ a string (or character),
rather than an array of uint32_t.

For this reason, we switch out all wchar_t usage in foot to
char32_t. We also verify, at compile-time, that char32_t used
UTF-32 (which is what fcft expects).

Unfortunately, there are no string functions for char32_t. To avoid
having to re-implement all wcs*() functions, we add a small wrapper
layer of c32*() functions.

These wrapper functions take char32_t arguments, but then simply call
the corresponding wcs*() function.

For this to work, wcs*() must _also_ be UTF-32 compatible. We can
check for the presence of the  __STDC_ISO_10646__ macro. If set,
wchar_t is at least 4 bytes and its internal representation is UTF-32.

FreeBSD does *not* define this macro, because its internal wchar_t
representation depends on the current locale. It _does_ use UTF-32
_if_ the current locale is UTF-8.

Since foot enforces UTF-8, we simply need to check if __FreeBSD__ is
defined.

Other fcft API changes:

* fcft_glyph_rasterize() -> fcft_codepoint_rasterize()
* font.space_advance has been removed
* ‘tags’ have been removed from fcft_grapheme_rasterize()
* ‘fcft_log_init()’ removed
* ‘fcft_init()’ and ‘fcft_fini()’ must be explicitly called
2022-02-05 17:00:54 +01:00
Daniel Eklöf
69e2bff8c8
extract: ensure line-based selections are terminated with a newline
Closes #869
2022-02-03 17:58:25 +01:00
Daniel Eklöf
fe8ca23cfe
composed: store compose chains in a binary search tree
The previous implementation stored compose chains in a dynamically
allocated array. Adding a chain was easy: resize the array and append
the new chain at the end. Looking up a compose chain given a compose
chain key/index was also easy: just index into the array.

However, searching for a pre-existing chain given a codepoint sequence
was very slow. Since the array wasn’t sorted, we typically had to scan
through the entire array, just to realize that there is no
pre-existing chain, and that we need to add a new one.

Since this happens for *each* codepoint in a grapheme cluster, things
quickly became really slow.

Things were ok:ish as long as the compose chain struct was small, as
that made it possible to hold all the chains in the cache. Once the
number of chains reached a certain point, or when we were forced to
bump maximum number of allowed codepoints in a chain, we started
thrashing the cache and things got much much worse.

So what can we do?

We can’t sort the array, because

a) that would invalidate all existing chain keys in the grid (and
iterating the entire scrollback and updating compose keys is *not* an
option).

b) inserting a chain becomes slow as we need to first find _where_ to
insert it, and then memmove() the rest of the array.

This patch uses a binary search tree to store the chains instead of a
simple array.

The tree is sorted on a “key”, which is the XOR of all codepoints,
truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range.

The grid now stores CELL_COMB_CHARS_LO+key, instead of
CELL_COMB_CHARS_LO+index.

Since the key is truncated, collisions may occur. This is handled by
incrementing the key by 1.

Lookup is of course slower than before, O(log n) instead of
O(1).

Insertion is slightly slower as well: technically it’s O(log n)
instead of O(1). However, we also need to take into account the
re-allocating the array will occasionally force a full copy of the
array when it cannot simply be growed.

But finding a pre-existing chain is now *much* faster: O(log n)
instead of O(n). In most cases, the first lookup will either
succeed (return a true match), or fail (return NULL). However, since
key collisions are possible, it may also return false matches. This
means we need to verify the contents of the chain before deciding to
use it instead of inserting a new chain. But remember that this
comparison was being done for each and every chain in the previous
implementation.

With lookups being much faster, and in particular, no longer requiring
us to check the chain contents for every singlec chain, we can now use
a dynamically allocated ‘chars’ array in the chain. This was
previously a hardcoded array of 10 chars.

Using a dynamic allocated array means looking in the array is slower,
since we now need two loads: one to load the pointer, and a second to
load _from_ the pointer.

As a result, the base size of a compose chain (i.e. an “empty” chain)
has now been reduced from 48 bytes to 32. A chain with two codepoints
is 40 bytes. This means we have up to 4 codepoints while still using
less, or the same amount, of memory as before.

Furthermore, the Unicode random test (i.e. write random “unicode”
chars) is now **faster** than current master (i.e. before text-shaping
support was added), **with** test-shaping enabled. With text-shaping
disabled, we’re _even_ faster.
2021-06-24 17:30:49 +02:00
Daniel Eklöf
b9ef703eb1
wip: grapheme shaping 2021-06-24 17:30:45 +02:00
Daniel Eklöf
4d56dd430b
extract: consume spaces following a tab
That is, we choose to copy the tab character, and not the spaces it
represents. Most importantly, we don’t copy *both* the tab and the
spaces.
2021-06-08 19:53:26 +02:00
Daniel Eklöf
a6d9f01c0d
extract: move ‘strip_trailing_empty’ parameter from extra_finish() to extract_begin() 2021-05-17 18:14:10 +02:00
Daniel Eklöf
1bc9fd5fe1
extract: add extract_finish_wide(), and optionally skip stripping trailing empty cells
extract_finish() returns the extracted text in UTF-8, while
extract_finish_wide() returns the extracted text in Unicode.

This patch also adds a new argument to extract_finish{,_wide},
that when set to true, skips stripping trailing empty cells.
2021-05-17 18:14:09 +02:00
Daniel Eklöf
d9e1aefb91
term: rename CELL_MULT_COL_SPACER -> CELL_SPACER, and change its definition
Instead of using CELL_SPACER for *all* cells that previously used
CELL_MULT_COL_SPACER, include the remaining number of spacers
following, and including, itself. This is encoded by adding to the
CELL_SPACER value.

So, a double width character will now store the character itself in
the first cell (just like before), and CELL_SPACER+1 in the second
cell.

A three-cell character would store the character itself, then
CELL_SPACER+2, and finally CELL_SPACER+1.

In other words, the last spacer is always CELL_SPACER+1.

CELL_SPACER+0 is used when padding at the right margin. I.e. when
writing e.g. a double width character in the last column, we insert a
CELL_SPACER+0 pad character, and then write the double width character
in the first column on the next row.
2021-05-14 14:41:02 +02:00
Craig Barnes
e56136ce11 debug: rename assert() to xassert(), to avoid clashing with <assert.h> 2021-01-16 20:16:00 +00:00
Daniel Eklöf
3a9172342f
selection: combine enum selection_kind with selection_semantic 2021-01-06 10:53:27 +01:00
Daniel Eklöf
fd5d68c819
extract: finish: increase ‘idx’ when pushing new data, for consistency
We don’t write anything more to the buffer after this, but this makes
this code consistent with all other code that pushes new data to the
buffer.

This makes it easier to search, and validate, the
ensure_size()+push-data pattern.
2021-01-04 19:48:44 +01:00
Daniel Eklöf
07078da0f0
extract: finish: fix bad assertion - ‘idx’ may be equal to ‘size’
‘idx’ is where _new_ data should be pushed into the buffer. Thus it is
perfectly valid for it to be equal to ‘size’ - it just means we need
to allocate more space before pushing data to it.
2021-01-04 19:48:44 +01:00
Daniel Eklöf
f85cf47b65
extract: only emit newlines if followed by non-empty cells
Closes #97
2020-08-20 19:22:13 +02:00
Craig Barnes
7a77958ba2 Convert most dynamic allocations to use functions from xmalloc.h 2020-08-08 20:37:57 +01:00
Daniel Eklöf
d4ee9be4d7
config: add 'hide-when-typing'
When enabled, the mouse cursor is hidden when the user types in the
terminal. It is un-hidden when the user moves the mouse, or when the
window loses keyboard focus.
2020-07-31 17:09:06 +02:00
Daniel Eklöf
e5401c845c
extract: finish: allocate buffer before writing the terminator
When all cells were empty, we'll have written 0 bytes to the buffer
and hence it will still be un-allocated (i.e. NULL).
2020-07-16 17:46:02 +02:00
Daniel Eklöf
e6acafa118
extract: extract_one() sets a fail flag that extract_finish() reads
This allows us to safely call extract_finish() when extract_one()
failed, and we'll get the expected result; false, indicating
extract_finish() failed.
2020-07-15 11:32:40 +02:00
Daniel Eklöf
2539e3cbb2
extract: extract_one: make arguments const 2020-07-15 11:31:38 +02:00
Daniel Eklöf
aafa120f92
selection: refactor: break out text extraction to a separate file 2020-07-15 11:19:18 +02:00