mirrors/foot - Forgejo: Beyond coding. We Forge.

mirror of https://codeberg.org/dnkl/foot.git synced 2026-02-05 04:06:08 -05:00

Author	SHA1	Message	Date
Daniel Eklöf	415ecfc6fa	vt: codespell: bumb -> bump	2021-06-24 17:30:50 +02:00
Daniel Eklöf	fe8ca23cfe	composed: store compose chains in a binary search tree The previous implementation stored compose chains in a dynamically allocated array. Adding a chain was easy: resize the array and append the new chain at the end. Looking up a compose chain given a compose chain key/index was also easy: just index into the array. However, searching for a pre-existing chain given a codepoint sequence was very slow. Since the array wasn’t sorted, we typically had to scan through the entire array, just to realize that there is no pre-existing chain, and that we need to add a new one. Since this happens for each codepoint in a grapheme cluster, things quickly became really slow. Things were ok:ish as long as the compose chain struct was small, as that made it possible to hold all the chains in the cache. Once the number of chains reached a certain point, or when we were forced to bump maximum number of allowed codepoints in a chain, we started thrashing the cache and things got much much worse. So what can we do? We can’t sort the array, because a) that would invalidate all existing chain keys in the grid (and iterating the entire scrollback and updating compose keys is not an option). b) inserting a chain becomes slow as we need to first find _where_ to insert it, and then memmove() the rest of the array. This patch uses a binary search tree to store the chains instead of a simple array. The tree is sorted on a “key”, which is the XOR of all codepoints, truncated to the CELL_COMB_CHARS_HI-CELL_COMB_CHARS_LO range. The grid now stores CELL_COMB_CHARS_LO+key, instead of CELL_COMB_CHARS_LO+index. Since the key is truncated, collisions may occur. This is handled by incrementing the key by 1. Lookup is of course slower than before, O(log n) instead of O(1). Insertion is slightly slower as well: technically it’s O(log n) instead of O(1). However, we also need to take into account the re-allocating the array will occasionally force a full copy of the array when it cannot simply be growed. But finding a pre-existing chain is now much faster: O(log n) instead of O(n). In most cases, the first lookup will either succeed (return a true match), or fail (return NULL). However, since key collisions are possible, it may also return false matches. This means we need to verify the contents of the chain before deciding to use it instead of inserting a new chain. But remember that this comparison was being done for each and every chain in the previous implementation. With lookups being much faster, and in particular, no longer requiring us to check the chain contents for every singlec chain, we can now use a dynamically allocated ‘chars’ array in the chain. This was previously a hardcoded array of 10 chars. Using a dynamic allocated array means looking in the array is slower, since we now need two loads: one to load the pointer, and a second to load _from_ the pointer. As a result, the base size of a compose chain (i.e. an “empty” chain) has now been reduced from 48 bytes to 32. A chain with two codepoints is 40 bytes. This means we have up to 4 codepoints while still using less, or the same amount, of memory as before. Furthermore, the Unicode random test (i.e. write random “unicode” chars) is now faster than current master (i.e. before text-shaping support was added), with test-shaping enabled. With text-shaping disabled, we’re _even_ faster.	2021-06-24 17:30:49 +02:00
Daniel Eklöf	81131e3a87	vt: utf8: don’t scan all previous chains When checking if we already have a compose chain for the current sequence of characters, don’t search the list from the beginning, unless we have to. Taking the following things into consideration: * New compose chains are always appended at the end of the list * If the current sequence is 3 or more characters, it must consist of an existing compose chain, plus the new character. Thus, when searching, start at index 0 if we only have two characters, since then the base cell originally contained a regular base character, and not a compose chain. I.e. the new chain may be _anywhere_ in the chain list. If however we have a sequence of three or more characters, start at the index the base chain was at. If the chain we’re searching for exists, it must have been added after the base chain, and thus it must be located after the base chain in the chain list.	2021-06-24 17:30:48 +02:00
Daniel Eklöf	e81d1845bf	vt: utf8: de-duplicate; jump to end of function to print to grid	2021-06-24 17:30:48 +02:00
Daniel Eklöf	dc5019a535	vt: utf8-print: don’t build a compose chain on a zero-width base character	2021-06-24 17:30:47 +02:00
Daniel Eklöf	f865612667	vt: utf8-print: check base character before count when looking for existing compose chain Count is more likely to be the same for many chains. Thus we’re likely to fail sooner by checking the base character first.	2021-06-24 17:30:47 +02:00
Daniel Eklöf	57e636dd8e	vt: don’t call wcwidth() on all combining characters every time we add We already have all the widths needed to calculate the new one; it’s the base characters width (base_width), or the previous combining chain’s width (composed->width) plus the new characters’s width (width).	2021-06-24 17:30:46 +02:00
Daniel Eklöf	09431dd15c	vt: presentation selectors may be anywhere in the cluster	2021-06-24 17:30:46 +02:00
Daniel Eklöf	6c70cd9366	vt: don’t force cols=2 when we see an emoji variant selector Fish appears to be the only shell expecting this. The rest probably just does wcswidth(), like usual.	2021-06-24 17:30:45 +02:00
Daniel Eklöf	0a9531ac6c	vt: cache grapheme cluster width in composed struct * Use regular wcswidth() to calculate the width * Explicitly set to ‘2’ if we see a emoji variant selector * Cache the result in the composed struct	2021-06-24 17:30:45 +02:00
Daniel Eklöf	b9ef703eb1	wip: grapheme shaping	2021-06-24 17:30:45 +02:00
Craig Barnes	2a75da4143	Merge branch 'charset-shift-fixes'	2021-06-09 10:18:52 +01:00
Craig Barnes	e030a2ca08	terminal: add 'charset_designator' enum to make code more self-documenting This commit also renames the term_set_single_shift_ascii_printer() function to term_single_shift(), since the former is overly verbose and not really even accurate.	2021-06-09 10:00:25 +01:00
Craig Barnes	a2c9c56f19	vt: fix SS2/SS3 escape sequences to act correctly as single shifts These sequences are supposed to affect the next printable ASCII character and then reset to the previous character set, but before this commit they were behaving like locking shifts.	2021-06-08 21:09:40 +01:00
Craig Barnes	e72e8b1b8e	vt: add support for LS2 and LS3 locking shifts	2021-06-08 21:06:18 +01:00
Daniel Eklöf	9d3351472d	vt: TAB: don’t print a ‘\t’ to the grid if the current cell isn’t empty If the cursor is already at the right edge, our logic that checked for non-empty cells failed; it didn’t check the current cell. Fix by initializing ‘emit_tab_char’ to true/false, depending on the contents of the current cell.	2021-06-08 19:53:26 +02:00
Daniel Eklöf	94b549f93e	vt: emit a tab character if all cells between cursor and tab stop are empty TAB (\t) move the cursor to the next tab stop. That’s it, according to the specification. However, many terminal emulators try to keep tabs in the grid, to be able to e.g. copy them. That is, copying a text chunk containing tabs should result in tabs being pasted, not spaces. In order to do that, we need to print a tab character to the grid. To improve text reflow of tabs, we also print spaces to the subsequent cells, up until (but not including) the next tab stop. However, we can only do this if all the cells between the cursor and the next tab stop are empty, since (obviously), we cannot overwrite pre-existing characters. Finally, while some fonts render tabs as spaces (i.e. an empty glyph), some use a glyph representing “unprintable” characters, or similar. Thus, we need to exclude cells with tab characters when rendering.	2021-06-08 19:53:26 +02:00
Craig Barnes	620fe8e764	vt: fix buggy chains of ternary expressions in action_esc_dispatch() Only the first character in the chain was being compared with `priv` and the rest were just being evaluated as simple expressions. This was causing the G2 and G3 operations to erroneously use the G1 index. Since the characters are a contiguous range, we can just subtract the start of the range to get the appropriate index. The outer switch statement already ensures the values are in range.	2021-06-08 16:52:00 +01:00
Daniel Eklöf	95c4a8ccfb	vt: \E#8: print ‘E’ using the default attributes	2021-06-07 21:35:17 +02:00
Craig Barnes	f14b294dcc	vt: remove action_utf8_print(term, 0) calls from UTF-8 state handlers These calls appear to be left over from a previous refactoring of the code. Calling this function with `wc == 0` is a no-op.	2021-05-25 21:45:55 +01:00
Craig Barnes	14a55de4e7	vt: remove partial support for 8-bit C1 control chars These are part of the "anywhere" state in Paul Flo Williams' VT parser state diagram[1]. That means that they should be accepted anywhere in a byte sequence, including in the middle of other sequences or even in the middle of a multi-byte UTF-8 sequence. Adhering to this requirement makes them incompatible with the use of UTF-8 as a universal encoding. Not adhering to the aforementioned requirement by making a special case for UTF-8 sequences may seem tempting, but it's much more at odds with the relevant standards[2] than it appears on the surface. UTF-8 is not an "8-bit code", at least not according to the parlance of ECMA-43, nor does it map the C1 control range in a compatible way. [1]: https://vt100.net/emu/dec_ansi_parser [2]: ECMA-35, ECMA-43, ECMA-48	2021-05-25 21:37:38 +01:00
Daniel Eklöf	3405a9c81c	Merge branch 'reflow-performance' Part of #504	2021-05-16 18:48:19 +02:00
Craig Barnes	d37b2a7f7b	Update `term->vt.state` for each iteration of vt_from_slave() loop Otherwise it may be stale when read by the anywhere() function.	2021-05-15 19:20:36 +01:00
Daniel Eklöf	d9e1aefb91	term: rename CELL_MULT_COL_SPACER -> CELL_SPACER, and change its definition Instead of using CELL_SPACER for all cells that previously used CELL_MULT_COL_SPACER, include the remaining number of spacers following, and including, itself. This is encoded by adding to the CELL_SPACER value. So, a double width character will now store the character itself in the first cell (just like before), and CELL_SPACER+1 in the second cell. A three-cell character would store the character itself, then CELL_SPACER+2, and finally CELL_SPACER+1. In other words, the last spacer is always CELL_SPACER+1. CELL_SPACER+0 is used when padding at the right margin. I.e. when writing e.g. a double width character in the last column, we insert a CELL_SPACER+0 pad character, and then write the double width character in the first column on the next row.	2021-05-14 14:41:02 +02:00
Craig Barnes	e4ff8d83d1	vt: make anywhere() function return `term->vt.state` by default Instead of passing a `default_return` parameter, which is always just the current state anyway.	2021-05-13 07:47:32 +01:00
Craig Barnes	8bb69f22b7	vt: clean up handling of "anywhere" actions	2021-05-13 07:47:26 +01:00
Daniel Eklöf	5be2c53d8c	term/vt: only do reverse-wrapping (‘bw’) on cub1 Foot currently does reverse-wrapping (‘auto_left_margin’, or ’bw’) on everything that calls ‘term_cursor_left()’. This is wrong; it should only be done for cub1. From man terminfo: auto_left_margin \| bw \| bw \| cub1 wraps from column 0 to last column This patch moves the reverse-wrapping logic from term_cursor_left() to the handling of BS (backspace). Closes #441	2021-04-08 13:11:58 +02:00
Daniel Eklöf	60b3ccc641	term: runtime switch between a ‘fast’ and a ‘generic’ ASCII print function term_print() is called whenever the client application “prints” something to the grid. It is called for both ASCII and UTF-8 characters, and needs to handle sixels, insert mode and ASCII vs. graphical charsets. Since it’s on the hot path, this becomes unnecessarily slow. This patch adds a “fast” version of term_print(), tailored for the common case: ASCII characters in non-insert mode, without any sixels and non-graphical charsets. A new function, term_update_ascii_printer(), has been added, and must be called whenever: * The currently selected charset index changes * The currently selected charset changes (from ASCII to graphical, or vice verse) * Sixels are added to the grid * Sixels are removed from the grid * Insert mode is enabled/disabled	2021-03-16 08:45:18 +01:00
Daniel Eklöf	cb60ddd090	vt: remove xassert(), that cannot be optimized out, from action_print() action_print() is in the hot path, and having if-statement here does have an impact on performance. Much more so when that if-statement involves a functional call to wcwidth(). Closes #330	2021-02-07 11:14:07 +01:00
Craig Barnes	e56136ce11	debug: rename assert() to xassert(), to avoid clashing with <assert.h>	2021-01-16 20:16:00 +00:00
Craig Barnes	22f25a9e4f	Print stack trace on assert() failure or when calling fatal_error() Note: this uses the __sanitizer_print_stack_trace() function from the AddressSanitizer runtime, so it only works when AddressSanitizer is in use.	2021-01-16 19:56:33 +00:00
Daniel Eklöf	bcf46d9eab	Merge branch 'decset-1047-and-1048'	2021-01-16 15:27:20 +01:00
Daniel Eklöf	bc053e4879	vt: document correct BS behavior, and why we do differently	2021-01-15 18:40:07 +01:00
Daniel Eklöf	bae3c871bb	term/vt/csi: break out cursor save/restore to dedicated functions	2021-01-15 17:08:30 +01:00
Craig Barnes	39b2e46e72	Use wrappers from macros.h instead of bare GCC attributes/pragmas	2021-01-03 08:56:47 +00:00
Daniel Eklöf	69cd5fd3ab	vt: codespell: ony -> only	2020-12-16 15:06:34 +01:00
Daniel Eklöf	2e137c0a7e	vt: don’t ignore extra private/intermediate characters Take ‘\E(#0’ for example - this is not the same as ‘\E(0’. Up until now, foot has however treated them as the same escape, because the handler for ‘\E(0’ didn’t verify there weren’t any _other_ private characters present. Fix this by turning the ‘private’ array into a single 4-byte integer. This allows us to match all privates with a single comparison. Private characters are added to the LSB first, and MSB last. This means we can check for single privates in pretty much the same way as before: switch (term->vt.private) { case ‘?’: ... break; } Checking for two (or more) is much uglier, but foot only supports a single escape with two privates, and no escapes with three or more: switch (term->vt.private) { case 0x243f: /* ‘?$’ */ ... break; } The ‘clear’ action remains simple (and fast), with a single write operation. Collecting privates is potentially _slightly_ more complex than before; we now need mask and compare, instead of simply comparing, when checking how many privates we already have. We _could_ add a counter, which would make collecting privates easier, but this would add an additional write to the ‘clean’ action which is really bad since it’s in the hot path.	2020-12-16 14:30:49 +01:00
Daniel Eklöf	7c6686221f	bell: optionally render margins in red when receiving BEL Add anew config option, ‘bell=none\|set-urgency’. When set to ‘set-urgency’, the margins will be painted in red (if the window did not have keyboard focus). This is intended as a cheap replacement for the ‘urgency’ hint, that doesn’t (yet) exist on Wayland. Closes #157	2020-10-08 19:55:32 +02:00
Daniel Eklöf	377f1b7ad3	vt: BS: only reset lcf if cursor is beyond right margin: don’t move cursor This is needed to make reverse auto-wrap work correctly. Without it, we’ll end up moving the cursor left one cell extra.	2020-10-02 21:40:30 +02:00
Daniel Eklöf	9db78c3122	vt: hide pedantic warnings around the VT state machine's switch cases The switch statements use the GCC extension "case X ... Y", and here it doesn't really make any sense to convert it to "case X: case Y:", so hide the warnings instead.	2020-08-23 10:07:08 +02:00
Craig Barnes	44499bbfe1	Fix spelling mistake in vt.c	2020-08-16 14:37:27 +01:00
Craig Barnes	7a77958ba2	Convert most dynamic allocations to use functions from xmalloc.h	2020-08-08 20:37:57 +01:00
Daniel Eklöf	6f2cffd8c0	vt: never call term_print() with a width <= 0	2020-07-16 08:04:12 +02:00
Daniel Eklöf	9508804b18	vt: ignore 0x7f (DEL) in ground state This ensures all bytes mapped to action_print() have wcwidth == 1. DEL has wcwidth == -1, and would thus have been ignored by term_print() anwyway.	2020-07-16 08:01:37 +02:00
Daniel Eklöf	6183f7f64a	vt: utf8: handle multi-column spacer values correctly when combining	2020-07-16 07:41:51 +02:00
Daniel Eklöf	5c99e8013b	term: rename COMB_CHARS_LO,HI -> CELL_COMB_CHARS_LO,HI	2020-07-14 16:41:57 +02:00
Daniel Eklöf	cabcc615c1	vt: change HT (horizontal tab) to not clear LCF According to the specification, HT should clear LCF. However, nearly all emulators do not. In particular, XTerm doesn't. So we follow suite.	2020-07-14 10:47:17 +02:00
Daniel Eklöf	b9719673a1	term: rename term_formfeed() -> term_carriage_return()	2020-07-14 09:29:10 +02:00
Daniel Eklöf	7f7ab00e11	vt: implement C0::FF - processed in the same way as C0::LF	2020-07-14 09:18:52 +02:00
Daniel Eklöf	4849a16f37	vt: process C0::VT the same way we process C0::LF Previously, C0::VT was implemented as a simple 'cursor down'. I.e. it would behave as LF until it reached the bottom of the screen, where instead of scrolling, it became a no-op. See https://vt100.net/docs/vt102-ug/chapter5.html	2020-07-14 09:15:15 +02:00

1 2 3 4 5

224 commits