Commit graph

208 commits

Author SHA1 Message Date
Craig Barnes
e72e8b1b8e vt: add support for LS2 and LS3 locking shifts 2021-06-08 21:06:18 +01:00
Craig Barnes
620fe8e764 vt: fix buggy chains of ternary expressions in action_esc_dispatch()
Only the first character in the chain was being compared with `priv`
and the rest were just being evaluated as simple expressions. This
was causing the G2 and G3 operations to erroneously use the G1 index.

Since the characters are a contiguous range, we can just subtract the
start of the range to get the appropriate index. The outer switch
statement already ensures the values are in range.
2021-06-08 16:52:00 +01:00
Daniel Eklöf
95c4a8ccfb
vt: \E#8: print ‘E’ using the default attributes 2021-06-07 21:35:17 +02:00
Craig Barnes
f14b294dcc vt: remove action_utf8_print(term, 0) calls from UTF-8 state handlers
These calls appear to be left over from a previous refactoring of the
code. Calling this function with `wc == 0` is a no-op.
2021-05-25 21:45:55 +01:00
Craig Barnes
14a55de4e7 vt: remove partial support for 8-bit C1 control chars
These are part of the "anywhere" state in Paul Flo Williams' VT parser
state diagram[1]. That means that they should be accepted *anywhere* in
a byte sequence, including in the middle of other sequences or even in
the middle of a multi-byte UTF-8 sequence. Adhering to this requirement
makes them incompatible with the use of UTF-8 as a universal encoding.

Not adhering to the aforementioned requirement by making a special case
for UTF-8 sequences may seem tempting, but it's much more at odds with
the relevant standards[2] than it appears on the surface. UTF-8 is not
an "8-bit code", at least not according to the parlance of ECMA-43, nor
does it map the C1 control range in a compatible way.

[1]: https://vt100.net/emu/dec_ansi_parser
[2]: ECMA-35, ECMA-43, ECMA-48
2021-05-25 21:37:38 +01:00
Daniel Eklöf
3405a9c81c
Merge branch 'reflow-performance'
Part of #504
2021-05-16 18:48:19 +02:00
Craig Barnes
d37b2a7f7b Update term->vt.state for each iteration of vt_from_slave() loop
Otherwise it may be stale when read by the anywhere() function.
2021-05-15 19:20:36 +01:00
Daniel Eklöf
d9e1aefb91
term: rename CELL_MULT_COL_SPACER -> CELL_SPACER, and change its definition
Instead of using CELL_SPACER for *all* cells that previously used
CELL_MULT_COL_SPACER, include the remaining number of spacers
following, and including, itself. This is encoded by adding to the
CELL_SPACER value.

So, a double width character will now store the character itself in
the first cell (just like before), and CELL_SPACER+1 in the second
cell.

A three-cell character would store the character itself, then
CELL_SPACER+2, and finally CELL_SPACER+1.

In other words, the last spacer is always CELL_SPACER+1.

CELL_SPACER+0 is used when padding at the right margin. I.e. when
writing e.g. a double width character in the last column, we insert a
CELL_SPACER+0 pad character, and then write the double width character
in the first column on the next row.
2021-05-14 14:41:02 +02:00
Craig Barnes
e4ff8d83d1 vt: make anywhere() function return term->vt.state by default
Instead of passing a `default_return` parameter, which is always
just the current state anyway.
2021-05-13 07:47:32 +01:00
Craig Barnes
8bb69f22b7 vt: clean up handling of "anywhere" actions 2021-05-13 07:47:26 +01:00
Daniel Eklöf
5be2c53d8c
term/vt: only do reverse-wrapping (‘bw’) on cub1
Foot currently does reverse-wrapping (‘auto_left_margin’, or ’bw’) on
everything that calls ‘term_cursor_left()’. This is wrong; it should
only be done for cub1. From man terminfo:

    auto_left_margin | bw | bw | cub1 wraps from column 0 to last
    column

This patch moves the reverse-wrapping logic from term_cursor_left() to
the handling of BS (backspace).

Closes #441
2021-04-08 13:11:58 +02:00
Daniel Eklöf
60b3ccc641
term: runtime switch between a ‘fast’ and a ‘generic’ ASCII print function
term_print() is called whenever the client application “prints”
something to the grid. It is called for both ASCII and UTF-8
characters, and needs to handle sixels, insert mode and ASCII
vs. graphical charsets.

Since it’s on the hot path, this becomes unnecessarily slow.

This patch adds a “fast” version of term_print(), tailored for the
common case: ASCII characters in non-insert mode, without any sixels
and non-graphical charsets.

A new function, term_update_ascii_printer(), has been added, and must
be called whenever:

* The currently selected charset *index* changes
* The currently selected charset changes (from ASCII to graphical, or
  vice verse)
* Sixels are added to the grid
* Sixels are removed from the grid
* Insert mode is enabled/disabled
2021-03-16 08:45:18 +01:00
Daniel Eklöf
cb60ddd090
vt: remove xassert(), that cannot be optimized out, from action_print()
action_print() is in the hot path, and having if-statement here *does*
have an impact on performance.

Much more so when that if-statement involves a functional call to
wcwidth().

Closes #330
2021-02-07 11:14:07 +01:00
Craig Barnes
e56136ce11 debug: rename assert() to xassert(), to avoid clashing with <assert.h> 2021-01-16 20:16:00 +00:00
Craig Barnes
22f25a9e4f Print stack trace on assert() failure or when calling fatal_error()
Note: this uses the __sanitizer_print_stack_trace() function from the
AddressSanitizer runtime, so it only works when AddressSanitizer is
in use.
2021-01-16 19:56:33 +00:00
Daniel Eklöf
bcf46d9eab
Merge branch 'decset-1047-and-1048' 2021-01-16 15:27:20 +01:00
Daniel Eklöf
bc053e4879
vt: document correct BS behavior, and why we do differently 2021-01-15 18:40:07 +01:00
Daniel Eklöf
bae3c871bb
term/vt/csi: break out cursor save/restore to dedicated functions 2021-01-15 17:08:30 +01:00
Craig Barnes
39b2e46e72 Use wrappers from macros.h instead of bare GCC attributes/pragmas 2021-01-03 08:56:47 +00:00
Daniel Eklöf
69cd5fd3ab
vt: codespell: ony -> only 2020-12-16 15:06:34 +01:00
Daniel Eklöf
2e137c0a7e
vt: don’t ignore extra private/intermediate characters
Take ‘\E(#0’ for example - this is *not* the same as ‘\E(0’.

Up until now, foot has however treated them as the same escape,
because the handler for ‘\E(0’ didn’t verify there weren’t any _other_
private characters present.

Fix this by turning the ‘private’ array into a single 4-byte
integer. This allows us to match *all* privates with a single
comparison.

Private characters are added to the LSB first, and MSB last. This
means we can check for single privates in pretty much the same way as
before:

  switch (term->vt.private) {
  case ‘?’:
      ...
      break;
  }

Checking for two (or more) is much uglier, but foot only supports
a *single* escape with two privates, and no escapes with three or
more:

  switch (term->vt.private) {
  case 0x243f:  /* ‘?$’ */
      ...
      break;
  }

The ‘clear’ action remains simple (and fast), with a single write
operation.

Collecting privates is potentially _slightly_ more complex than
before; we now need mask and compare, instead of simply comparing,
when checking how many privates we already have.

We _could_ add a counter, which would make collecting privates easier,
but this would add an additional write to the ‘clean’ action which is
really bad since it’s in the hot path.
2020-12-16 14:30:49 +01:00
Daniel Eklöf
7c6686221f
bell: optionally render margins in red when receiving BEL
Add anew config option, ‘bell=none|set-urgency’. When set to
‘set-urgency’, the margins will be painted in red (if the window did
not have keyboard focus).

This is intended as a cheap replacement for the ‘urgency’ hint, that
doesn’t (yet) exist on Wayland.

Closes #157
2020-10-08 19:55:32 +02:00
Daniel Eklöf
377f1b7ad3
vt: BS: *only* reset lcf if cursor is beyond right margin: don’t move cursor
This is needed to make reverse auto-wrap work correctly. Without it,
we’ll end up moving the cursor left one cell extra.
2020-10-02 21:40:30 +02:00
Daniel Eklöf
9db78c3122
vt: hide pedantic warnings around the VT state machine's switch cases
The switch statements use the GCC extension "case X ... Y", and here
it doesn't really make any sense to convert it to "case X: case Y:",
so hide the warnings instead.
2020-08-23 10:07:08 +02:00
Craig Barnes
44499bbfe1 Fix spelling mistake in vt.c 2020-08-16 14:37:27 +01:00
Craig Barnes
7a77958ba2 Convert most dynamic allocations to use functions from xmalloc.h 2020-08-08 20:37:57 +01:00
Daniel Eklöf
6f2cffd8c0
vt: never call term_print() with a width <= 0 2020-07-16 08:04:12 +02:00
Daniel Eklöf
9508804b18
vt: ignore 0x7f (DEL) in ground state
This ensures *all* bytes mapped to action_print() have wcwidth == 1.

DEL has wcwidth == -1, and would thus have been ignored by
term_print() anwyway.
2020-07-16 08:01:37 +02:00
Daniel Eklöf
6183f7f64a
vt: utf8: handle multi-column spacer values correctly when combining 2020-07-16 07:41:51 +02:00
Daniel Eklöf
5c99e8013b
term: rename COMB_CHARS_LO,HI -> CELL_COMB_CHARS_LO,HI 2020-07-14 16:41:57 +02:00
Daniel Eklöf
cabcc615c1
vt: change HT (horizontal tab) to *not* clear LCF
According to the specification, HT **should** clear LCF. However,
nearly all emulators do not. In particular, XTerm doesn't. So we
follow suite.
2020-07-14 10:47:17 +02:00
Daniel Eklöf
b9719673a1
term: rename term_formfeed() -> term_carriage_return() 2020-07-14 09:29:10 +02:00
Daniel Eklöf
7f7ab00e11
vt: implement C0::FF - processed in the same way as C0::LF 2020-07-14 09:18:52 +02:00
Daniel Eklöf
4849a16f37
vt: process C0::VT the same way we process C0::LF
Previously, C0::VT was implemented as a simple 'cursor down'. I.e. it
would behave as LF **until** it reached the bottom of the screen,
where instead of scrolling, it became a no-op.

See https://vt100.net/docs/vt102-ug/chapter5.html
2020-07-14 09:15:15 +02:00
Daniel Eklöf
7357bb54eb
vt: sort C0's in the switch statement, and use escaped character possible 2020-07-14 09:11:17 +02:00
Daniel Eklöf
fb001ee7a7
unicode combining: don't log overflow errors unless LOG_ENABLE_DBG == 1 2020-06-09 17:31:58 +02:00
Daniel Eklöf
97221dd09b
vt: utf8-print: check width == 0 first, when deciding whether to do combining 2020-06-09 17:30:49 +02:00
Daniel Eklöf
9452aff020
vt: initial version of UTF-8 decoding built-in into the VT parser 2020-06-07 16:16:50 +02:00
Daniel Eklöf
d9028b2394
vt: utf8: use mbtowc() instead of mbrtowc()
This is slightly faster, since we don't need to initialize an
mbstate_t struct (using mbrtowc() with a NULL-pointer for 'ps' also
works).

Also, avoid a branch by setting wc=0 and then ignoring the
result/error code from mbtowc().
2020-05-31 12:41:35 +02:00
Daniel Eklöf
c38b9be6a4
vt: utf8: don't need one entry action for each UTF8 variant 2020-05-31 12:41:07 +02:00
Daniel Eklöf
00df12f1a3
unicode-combine: simplify - remove -Dunicode-precompose option
Since the pre-composing functionality is now part of fcft, it makes
little sense to have a compile time option - there's no size benefit
to be had.

Furthermore, virtually all terminal emulators do
pre-composing (alacritty being an exception), this really isn't that
controversial.
2020-05-10 17:10:33 +02:00
Daniel Eklöf
b1b32152c1
unicode-precompose: use fcft's precompose functionality
This allows us more options when determining whether to use a
pre-composed character or not:

We now only use the pre-composed character if it's from the primary
font, or if at least one of the base or combining characters are from
a fallback font.

I.e. use glyphs from the primary font if possible. But, if one or more
of the decomposed glyphs are from a fallback font, use the
pre-composed character anyway.
2020-05-09 12:06:11 +02:00
Daniel Eklöf
4d4df92f66
unicode-combining: limit maximum number of allowed composed chains 2020-05-03 11:31:59 +02:00
Daniel Eklöf
1ebdc01162
unicode-combining: detect when we've reached the chain limit
We currently store up to 5 combining characters in any given
base+combining chain.

This adds a check for when that limit is about to be exceeded. When
this happens, we log the chain + the new combining character.

Since things will break anyway, we simply overwrite the last combining
character.
2020-05-03 11:27:06 +02:00
Daniel Eklöf
62e0774319
unicode-combining: store seen combining chains "globally" in the term struct
Instead of storing combining data per cell, realize that most
combinations are re-occurring and that there's lots of available space
left in the unicode range, and store seen base+combining combinations
chains in a per-terminal array.

When we encounter a combining character, we first try to pre-compose,
like before. If that fails, we then search for the current
base+combining combo in the list of previously seen combinations. If
not found there either, we allocate a new combo and add it to the
list. Regardless, the result is an index into this array. We store
this index, offsetted by COMB_CHARS_LO=0x40000000ul in the cell.

When rendering, we need to check if the cell character is a plain
character, or if it's a composed character (identified by checking if
the cell character is >= COMB_CHARS_LO).

Then we render the grapheme pretty much like before.
2020-05-03 11:03:22 +02:00
Daniel Eklöf
b10436e49b
vt: use signed integers to correctly detect when we're done 2020-05-02 20:01:43 +02:00
Daniel Eklöf
804642580e
meson: don't generate pre-compose table when -Dunicode-precompose=false 2020-05-02 18:43:13 +02:00
Daniel Eklöf
d945b68b73
unicode-combine: remove utf8proc dependency
We only used utf8proc to try to pre-compose a glyph from a base and
combining character.

We can do this ourselves by using a pre-compiled table of valid
pre-compositions. This table isn't _that_ big, and binary searching it
is fast.

That is, for a very small amount of code, and not too much extra RO
data, we can get rid of the utf8proc dependency.
2020-05-02 17:29:00 +02:00
Daniel Eklöf
8389c76549
unicode-combining: don't limit ourselves to the (western) diacritics blocks 2020-05-02 16:11:51 +02:00
Daniel Eklöf
50543983ad
unicode-combine: only compose if we don't have any other combining characters
If the client sent the sequence SAB, where SA does NOT have a composed
representation, but SB does, the old code would compose SB and throw
away A.

This patch fixes this by only allowing a compose if there aren't
any pre-existing combining characters.
2020-05-01 20:17:37 +02:00