Commit graph

206 commits

Author SHA1 Message Date
Daniel Eklöf
95c4a8ccfb
vt: \E#8: print ‘E’ using the default attributes 2021-06-07 21:35:17 +02:00
Craig Barnes
f14b294dcc vt: remove action_utf8_print(term, 0) calls from UTF-8 state handlers
These calls appear to be left over from a previous refactoring of the
code. Calling this function with `wc == 0` is a no-op.
2021-05-25 21:45:55 +01:00
Craig Barnes
14a55de4e7 vt: remove partial support for 8-bit C1 control chars
These are part of the "anywhere" state in Paul Flo Williams' VT parser
state diagram[1]. That means that they should be accepted *anywhere* in
a byte sequence, including in the middle of other sequences or even in
the middle of a multi-byte UTF-8 sequence. Adhering to this requirement
makes them incompatible with the use of UTF-8 as a universal encoding.

Not adhering to the aforementioned requirement by making a special case
for UTF-8 sequences may seem tempting, but it's much more at odds with
the relevant standards[2] than it appears on the surface. UTF-8 is not
an "8-bit code", at least not according to the parlance of ECMA-43, nor
does it map the C1 control range in a compatible way.

[1]: https://vt100.net/emu/dec_ansi_parser
[2]: ECMA-35, ECMA-43, ECMA-48
2021-05-25 21:37:38 +01:00
Daniel Eklöf
3405a9c81c
Merge branch 'reflow-performance'
Part of #504
2021-05-16 18:48:19 +02:00
Craig Barnes
d37b2a7f7b Update term->vt.state for each iteration of vt_from_slave() loop
Otherwise it may be stale when read by the anywhere() function.
2021-05-15 19:20:36 +01:00
Daniel Eklöf
d9e1aefb91
term: rename CELL_MULT_COL_SPACER -> CELL_SPACER, and change its definition
Instead of using CELL_SPACER for *all* cells that previously used
CELL_MULT_COL_SPACER, include the remaining number of spacers
following, and including, itself. This is encoded by adding to the
CELL_SPACER value.

So, a double width character will now store the character itself in
the first cell (just like before), and CELL_SPACER+1 in the second
cell.

A three-cell character would store the character itself, then
CELL_SPACER+2, and finally CELL_SPACER+1.

In other words, the last spacer is always CELL_SPACER+1.

CELL_SPACER+0 is used when padding at the right margin. I.e. when
writing e.g. a double width character in the last column, we insert a
CELL_SPACER+0 pad character, and then write the double width character
in the first column on the next row.
2021-05-14 14:41:02 +02:00
Craig Barnes
e4ff8d83d1 vt: make anywhere() function return term->vt.state by default
Instead of passing a `default_return` parameter, which is always
just the current state anyway.
2021-05-13 07:47:32 +01:00
Craig Barnes
8bb69f22b7 vt: clean up handling of "anywhere" actions 2021-05-13 07:47:26 +01:00
Daniel Eklöf
5be2c53d8c
term/vt: only do reverse-wrapping (‘bw’) on cub1
Foot currently does reverse-wrapping (‘auto_left_margin’, or ’bw’) on
everything that calls ‘term_cursor_left()’. This is wrong; it should
only be done for cub1. From man terminfo:

    auto_left_margin | bw | bw | cub1 wraps from column 0 to last
    column

This patch moves the reverse-wrapping logic from term_cursor_left() to
the handling of BS (backspace).

Closes #441
2021-04-08 13:11:58 +02:00
Daniel Eklöf
60b3ccc641
term: runtime switch between a ‘fast’ and a ‘generic’ ASCII print function
term_print() is called whenever the client application “prints”
something to the grid. It is called for both ASCII and UTF-8
characters, and needs to handle sixels, insert mode and ASCII
vs. graphical charsets.

Since it’s on the hot path, this becomes unnecessarily slow.

This patch adds a “fast” version of term_print(), tailored for the
common case: ASCII characters in non-insert mode, without any sixels
and non-graphical charsets.

A new function, term_update_ascii_printer(), has been added, and must
be called whenever:

* The currently selected charset *index* changes
* The currently selected charset changes (from ASCII to graphical, or
  vice verse)
* Sixels are added to the grid
* Sixels are removed from the grid
* Insert mode is enabled/disabled
2021-03-16 08:45:18 +01:00
Daniel Eklöf
cb60ddd090
vt: remove xassert(), that cannot be optimized out, from action_print()
action_print() is in the hot path, and having if-statement here *does*
have an impact on performance.

Much more so when that if-statement involves a functional call to
wcwidth().

Closes #330
2021-02-07 11:14:07 +01:00
Craig Barnes
e56136ce11 debug: rename assert() to xassert(), to avoid clashing with <assert.h> 2021-01-16 20:16:00 +00:00
Craig Barnes
22f25a9e4f Print stack trace on assert() failure or when calling fatal_error()
Note: this uses the __sanitizer_print_stack_trace() function from the
AddressSanitizer runtime, so it only works when AddressSanitizer is
in use.
2021-01-16 19:56:33 +00:00
Daniel Eklöf
bcf46d9eab
Merge branch 'decset-1047-and-1048' 2021-01-16 15:27:20 +01:00
Daniel Eklöf
bc053e4879
vt: document correct BS behavior, and why we do differently 2021-01-15 18:40:07 +01:00
Daniel Eklöf
bae3c871bb
term/vt/csi: break out cursor save/restore to dedicated functions 2021-01-15 17:08:30 +01:00
Craig Barnes
39b2e46e72 Use wrappers from macros.h instead of bare GCC attributes/pragmas 2021-01-03 08:56:47 +00:00
Daniel Eklöf
69cd5fd3ab
vt: codespell: ony -> only 2020-12-16 15:06:34 +01:00
Daniel Eklöf
2e137c0a7e
vt: don’t ignore extra private/intermediate characters
Take ‘\E(#0’ for example - this is *not* the same as ‘\E(0’.

Up until now, foot has however treated them as the same escape,
because the handler for ‘\E(0’ didn’t verify there weren’t any _other_
private characters present.

Fix this by turning the ‘private’ array into a single 4-byte
integer. This allows us to match *all* privates with a single
comparison.

Private characters are added to the LSB first, and MSB last. This
means we can check for single privates in pretty much the same way as
before:

  switch (term->vt.private) {
  case ‘?’:
      ...
      break;
  }

Checking for two (or more) is much uglier, but foot only supports
a *single* escape with two privates, and no escapes with three or
more:

  switch (term->vt.private) {
  case 0x243f:  /* ‘?$’ */
      ...
      break;
  }

The ‘clear’ action remains simple (and fast), with a single write
operation.

Collecting privates is potentially _slightly_ more complex than
before; we now need mask and compare, instead of simply comparing,
when checking how many privates we already have.

We _could_ add a counter, which would make collecting privates easier,
but this would add an additional write to the ‘clean’ action which is
really bad since it’s in the hot path.
2020-12-16 14:30:49 +01:00
Daniel Eklöf
7c6686221f
bell: optionally render margins in red when receiving BEL
Add anew config option, ‘bell=none|set-urgency’. When set to
‘set-urgency’, the margins will be painted in red (if the window did
not have keyboard focus).

This is intended as a cheap replacement for the ‘urgency’ hint, that
doesn’t (yet) exist on Wayland.

Closes #157
2020-10-08 19:55:32 +02:00
Daniel Eklöf
377f1b7ad3
vt: BS: *only* reset lcf if cursor is beyond right margin: don’t move cursor
This is needed to make reverse auto-wrap work correctly. Without it,
we’ll end up moving the cursor left one cell extra.
2020-10-02 21:40:30 +02:00
Daniel Eklöf
9db78c3122
vt: hide pedantic warnings around the VT state machine's switch cases
The switch statements use the GCC extension "case X ... Y", and here
it doesn't really make any sense to convert it to "case X: case Y:",
so hide the warnings instead.
2020-08-23 10:07:08 +02:00
Craig Barnes
44499bbfe1 Fix spelling mistake in vt.c 2020-08-16 14:37:27 +01:00
Craig Barnes
7a77958ba2 Convert most dynamic allocations to use functions from xmalloc.h 2020-08-08 20:37:57 +01:00
Daniel Eklöf
6f2cffd8c0
vt: never call term_print() with a width <= 0 2020-07-16 08:04:12 +02:00
Daniel Eklöf
9508804b18
vt: ignore 0x7f (DEL) in ground state
This ensures *all* bytes mapped to action_print() have wcwidth == 1.

DEL has wcwidth == -1, and would thus have been ignored by
term_print() anwyway.
2020-07-16 08:01:37 +02:00
Daniel Eklöf
6183f7f64a
vt: utf8: handle multi-column spacer values correctly when combining 2020-07-16 07:41:51 +02:00
Daniel Eklöf
5c99e8013b
term: rename COMB_CHARS_LO,HI -> CELL_COMB_CHARS_LO,HI 2020-07-14 16:41:57 +02:00
Daniel Eklöf
cabcc615c1
vt: change HT (horizontal tab) to *not* clear LCF
According to the specification, HT **should** clear LCF. However,
nearly all emulators do not. In particular, XTerm doesn't. So we
follow suite.
2020-07-14 10:47:17 +02:00
Daniel Eklöf
b9719673a1
term: rename term_formfeed() -> term_carriage_return() 2020-07-14 09:29:10 +02:00
Daniel Eklöf
7f7ab00e11
vt: implement C0::FF - processed in the same way as C0::LF 2020-07-14 09:18:52 +02:00
Daniel Eklöf
4849a16f37
vt: process C0::VT the same way we process C0::LF
Previously, C0::VT was implemented as a simple 'cursor down'. I.e. it
would behave as LF **until** it reached the bottom of the screen,
where instead of scrolling, it became a no-op.

See https://vt100.net/docs/vt102-ug/chapter5.html
2020-07-14 09:15:15 +02:00
Daniel Eklöf
7357bb54eb
vt: sort C0's in the switch statement, and use escaped character possible 2020-07-14 09:11:17 +02:00
Daniel Eklöf
fb001ee7a7
unicode combining: don't log overflow errors unless LOG_ENABLE_DBG == 1 2020-06-09 17:31:58 +02:00
Daniel Eklöf
97221dd09b
vt: utf8-print: check width == 0 first, when deciding whether to do combining 2020-06-09 17:30:49 +02:00
Daniel Eklöf
9452aff020
vt: initial version of UTF-8 decoding built-in into the VT parser 2020-06-07 16:16:50 +02:00
Daniel Eklöf
d9028b2394
vt: utf8: use mbtowc() instead of mbrtowc()
This is slightly faster, since we don't need to initialize an
mbstate_t struct (using mbrtowc() with a NULL-pointer for 'ps' also
works).

Also, avoid a branch by setting wc=0 and then ignoring the
result/error code from mbtowc().
2020-05-31 12:41:35 +02:00
Daniel Eklöf
c38b9be6a4
vt: utf8: don't need one entry action for each UTF8 variant 2020-05-31 12:41:07 +02:00
Daniel Eklöf
00df12f1a3
unicode-combine: simplify - remove -Dunicode-precompose option
Since the pre-composing functionality is now part of fcft, it makes
little sense to have a compile time option - there's no size benefit
to be had.

Furthermore, virtually all terminal emulators do
pre-composing (alacritty being an exception), this really isn't that
controversial.
2020-05-10 17:10:33 +02:00
Daniel Eklöf
b1b32152c1
unicode-precompose: use fcft's precompose functionality
This allows us more options when determining whether to use a
pre-composed character or not:

We now only use the pre-composed character if it's from the primary
font, or if at least one of the base or combining characters are from
a fallback font.

I.e. use glyphs from the primary font if possible. But, if one or more
of the decomposed glyphs are from a fallback font, use the
pre-composed character anyway.
2020-05-09 12:06:11 +02:00
Daniel Eklöf
4d4df92f66
unicode-combining: limit maximum number of allowed composed chains 2020-05-03 11:31:59 +02:00
Daniel Eklöf
1ebdc01162
unicode-combining: detect when we've reached the chain limit
We currently store up to 5 combining characters in any given
base+combining chain.

This adds a check for when that limit is about to be exceeded. When
this happens, we log the chain + the new combining character.

Since things will break anyway, we simply overwrite the last combining
character.
2020-05-03 11:27:06 +02:00
Daniel Eklöf
62e0774319
unicode-combining: store seen combining chains "globally" in the term struct
Instead of storing combining data per cell, realize that most
combinations are re-occurring and that there's lots of available space
left in the unicode range, and store seen base+combining combinations
chains in a per-terminal array.

When we encounter a combining character, we first try to pre-compose,
like before. If that fails, we then search for the current
base+combining combo in the list of previously seen combinations. If
not found there either, we allocate a new combo and add it to the
list. Regardless, the result is an index into this array. We store
this index, offsetted by COMB_CHARS_LO=0x40000000ul in the cell.

When rendering, we need to check if the cell character is a plain
character, or if it's a composed character (identified by checking if
the cell character is >= COMB_CHARS_LO).

Then we render the grapheme pretty much like before.
2020-05-03 11:03:22 +02:00
Daniel Eklöf
b10436e49b
vt: use signed integers to correctly detect when we're done 2020-05-02 20:01:43 +02:00
Daniel Eklöf
804642580e
meson: don't generate pre-compose table when -Dunicode-precompose=false 2020-05-02 18:43:13 +02:00
Daniel Eklöf
d945b68b73
unicode-combine: remove utf8proc dependency
We only used utf8proc to try to pre-compose a glyph from a base and
combining character.

We can do this ourselves by using a pre-compiled table of valid
pre-compositions. This table isn't _that_ big, and binary searching it
is fast.

That is, for a very small amount of code, and not too much extra RO
data, we can get rid of the utf8proc dependency.
2020-05-02 17:29:00 +02:00
Daniel Eklöf
8389c76549
unicode-combining: don't limit ourselves to the (western) diacritics blocks 2020-05-02 16:11:51 +02:00
Daniel Eklöf
50543983ad
unicode-combine: only compose if we don't have any other combining characters
If the client sent the sequence SAB, where SA does NOT have a composed
representation, but SB does, the old code would compose SB and throw
away A.

This patch fixes this by only allowing a compose if there aren't
any pre-existing combining characters.
2020-05-01 20:17:37 +02:00
Daniel Eklöf
cb5f80ec6a
vt: utf8: track combining characters that we failed to compose
When we detect a combining character, we first try to compose it with
the base character (like before).

When this fails, we instead add the combining character to the base
cell's combining characters array.

The reason for using a composed character when possible is twofold:
one, the rendered glyph will look better since it will be a single
glyph instead of two separate glyphs (possibly from different
fonts(!)). And two, for performance. A composed glyph is a single
glyph to render, while a decomposed glyph sequence means the renderer
has to render multiple glyphs for a single cell.
2020-05-01 11:52:40 +02:00
Daniel Eklöf
69c3e74498
util.h: new header file defining commonly used macros 2020-05-01 11:46:24 +02:00