Commit graph

188 commits

Author SHA1 Message Date
Daniel Eklöf
2e137c0a7e
vt: don’t ignore extra private/intermediate characters
Take ‘\E(#0’ for example - this is *not* the same as ‘\E(0’.

Up until now, foot has however treated them as the same escape,
because the handler for ‘\E(0’ didn’t verify there weren’t any _other_
private characters present.

Fix this by turning the ‘private’ array into a single 4-byte
integer. This allows us to match *all* privates with a single
comparison.

Private characters are added to the LSB first, and MSB last. This
means we can check for single privates in pretty much the same way as
before:

  switch (term->vt.private) {
  case ‘?’:
      ...
      break;
  }

Checking for two (or more) is much uglier, but foot only supports
a *single* escape with two privates, and no escapes with three or
more:

  switch (term->vt.private) {
  case 0x243f:  /* ‘?$’ */
      ...
      break;
  }

The ‘clear’ action remains simple (and fast), with a single write
operation.

Collecting privates is potentially _slightly_ more complex than
before; we now need mask and compare, instead of simply comparing,
when checking how many privates we already have.

We _could_ add a counter, which would make collecting privates easier,
but this would add an additional write to the ‘clean’ action which is
really bad since it’s in the hot path.
2020-12-16 14:30:49 +01:00
Daniel Eklöf
7c6686221f
bell: optionally render margins in red when receiving BEL
Add anew config option, ‘bell=none|set-urgency’. When set to
‘set-urgency’, the margins will be painted in red (if the window did
not have keyboard focus).

This is intended as a cheap replacement for the ‘urgency’ hint, that
doesn’t (yet) exist on Wayland.

Closes #157
2020-10-08 19:55:32 +02:00
Daniel Eklöf
377f1b7ad3
vt: BS: *only* reset lcf if cursor is beyond right margin: don’t move cursor
This is needed to make reverse auto-wrap work correctly. Without it,
we’ll end up moving the cursor left one cell extra.
2020-10-02 21:40:30 +02:00
Daniel Eklöf
9db78c3122
vt: hide pedantic warnings around the VT state machine's switch cases
The switch statements use the GCC extension "case X ... Y", and here
it doesn't really make any sense to convert it to "case X: case Y:",
so hide the warnings instead.
2020-08-23 10:07:08 +02:00
Craig Barnes
44499bbfe1 Fix spelling mistake in vt.c 2020-08-16 14:37:27 +01:00
Craig Barnes
7a77958ba2 Convert most dynamic allocations to use functions from xmalloc.h 2020-08-08 20:37:57 +01:00
Daniel Eklöf
6f2cffd8c0
vt: never call term_print() with a width <= 0 2020-07-16 08:04:12 +02:00
Daniel Eklöf
9508804b18
vt: ignore 0x7f (DEL) in ground state
This ensures *all* bytes mapped to action_print() have wcwidth == 1.

DEL has wcwidth == -1, and would thus have been ignored by
term_print() anwyway.
2020-07-16 08:01:37 +02:00
Daniel Eklöf
6183f7f64a
vt: utf8: handle multi-column spacer values correctly when combining 2020-07-16 07:41:51 +02:00
Daniel Eklöf
5c99e8013b
term: rename COMB_CHARS_LO,HI -> CELL_COMB_CHARS_LO,HI 2020-07-14 16:41:57 +02:00
Daniel Eklöf
cabcc615c1
vt: change HT (horizontal tab) to *not* clear LCF
According to the specification, HT **should** clear LCF. However,
nearly all emulators do not. In particular, XTerm doesn't. So we
follow suite.
2020-07-14 10:47:17 +02:00
Daniel Eklöf
b9719673a1
term: rename term_formfeed() -> term_carriage_return() 2020-07-14 09:29:10 +02:00
Daniel Eklöf
7f7ab00e11
vt: implement C0::FF - processed in the same way as C0::LF 2020-07-14 09:18:52 +02:00
Daniel Eklöf
4849a16f37
vt: process C0::VT the same way we process C0::LF
Previously, C0::VT was implemented as a simple 'cursor down'. I.e. it
would behave as LF **until** it reached the bottom of the screen,
where instead of scrolling, it became a no-op.

See https://vt100.net/docs/vt102-ug/chapter5.html
2020-07-14 09:15:15 +02:00
Daniel Eklöf
7357bb54eb
vt: sort C0's in the switch statement, and use escaped character possible 2020-07-14 09:11:17 +02:00
Daniel Eklöf
fb001ee7a7
unicode combining: don't log overflow errors unless LOG_ENABLE_DBG == 1 2020-06-09 17:31:58 +02:00
Daniel Eklöf
97221dd09b
vt: utf8-print: check width == 0 first, when deciding whether to do combining 2020-06-09 17:30:49 +02:00
Daniel Eklöf
9452aff020
vt: initial version of UTF-8 decoding built-in into the VT parser 2020-06-07 16:16:50 +02:00
Daniel Eklöf
d9028b2394
vt: utf8: use mbtowc() instead of mbrtowc()
This is slightly faster, since we don't need to initialize an
mbstate_t struct (using mbrtowc() with a NULL-pointer for 'ps' also
works).

Also, avoid a branch by setting wc=0 and then ignoring the
result/error code from mbtowc().
2020-05-31 12:41:35 +02:00
Daniel Eklöf
c38b9be6a4
vt: utf8: don't need one entry action for each UTF8 variant 2020-05-31 12:41:07 +02:00
Daniel Eklöf
00df12f1a3
unicode-combine: simplify - remove -Dunicode-precompose option
Since the pre-composing functionality is now part of fcft, it makes
little sense to have a compile time option - there's no size benefit
to be had.

Furthermore, virtually all terminal emulators do
pre-composing (alacritty being an exception), this really isn't that
controversial.
2020-05-10 17:10:33 +02:00
Daniel Eklöf
b1b32152c1
unicode-precompose: use fcft's precompose functionality
This allows us more options when determining whether to use a
pre-composed character or not:

We now only use the pre-composed character if it's from the primary
font, or if at least one of the base or combining characters are from
a fallback font.

I.e. use glyphs from the primary font if possible. But, if one or more
of the decomposed glyphs are from a fallback font, use the
pre-composed character anyway.
2020-05-09 12:06:11 +02:00
Daniel Eklöf
4d4df92f66
unicode-combining: limit maximum number of allowed composed chains 2020-05-03 11:31:59 +02:00
Daniel Eklöf
1ebdc01162
unicode-combining: detect when we've reached the chain limit
We currently store up to 5 combining characters in any given
base+combining chain.

This adds a check for when that limit is about to be exceeded. When
this happens, we log the chain + the new combining character.

Since things will break anyway, we simply overwrite the last combining
character.
2020-05-03 11:27:06 +02:00
Daniel Eklöf
62e0774319
unicode-combining: store seen combining chains "globally" in the term struct
Instead of storing combining data per cell, realize that most
combinations are re-occurring and that there's lots of available space
left in the unicode range, and store seen base+combining combinations
chains in a per-terminal array.

When we encounter a combining character, we first try to pre-compose,
like before. If that fails, we then search for the current
base+combining combo in the list of previously seen combinations. If
not found there either, we allocate a new combo and add it to the
list. Regardless, the result is an index into this array. We store
this index, offsetted by COMB_CHARS_LO=0x40000000ul in the cell.

When rendering, we need to check if the cell character is a plain
character, or if it's a composed character (identified by checking if
the cell character is >= COMB_CHARS_LO).

Then we render the grapheme pretty much like before.
2020-05-03 11:03:22 +02:00
Daniel Eklöf
b10436e49b
vt: use signed integers to correctly detect when we're done 2020-05-02 20:01:43 +02:00
Daniel Eklöf
804642580e
meson: don't generate pre-compose table when -Dunicode-precompose=false 2020-05-02 18:43:13 +02:00
Daniel Eklöf
d945b68b73
unicode-combine: remove utf8proc dependency
We only used utf8proc to try to pre-compose a glyph from a base and
combining character.

We can do this ourselves by using a pre-compiled table of valid
pre-compositions. This table isn't _that_ big, and binary searching it
is fast.

That is, for a very small amount of code, and not too much extra RO
data, we can get rid of the utf8proc dependency.
2020-05-02 17:29:00 +02:00
Daniel Eklöf
8389c76549
unicode-combining: don't limit ourselves to the (western) diacritics blocks 2020-05-02 16:11:51 +02:00
Daniel Eklöf
50543983ad
unicode-combine: only compose if we don't have any other combining characters
If the client sent the sequence SAB, where SA does NOT have a composed
representation, but SB does, the old code would compose SB and throw
away A.

This patch fixes this by only allowing a compose if there aren't
any pre-existing combining characters.
2020-05-01 20:17:37 +02:00
Daniel Eklöf
cb5f80ec6a
vt: utf8: track combining characters that we failed to compose
When we detect a combining character, we first try to compose it with
the base character (like before).

When this fails, we instead add the combining character to the base
cell's combining characters array.

The reason for using a composed character when possible is twofold:
one, the rendered glyph will look better since it will be a single
glyph instead of two separate glyphs (possibly from different
fonts(!)). And two, for performance. A composed glyph is a single
glyph to render, while a decomposed glyph sequence means the renderer
has to render multiple glyphs for a single cell.
2020-05-01 11:52:40 +02:00
Daniel Eklöf
69c3e74498
util.h: new header file defining commonly used macros 2020-05-01 11:46:24 +02:00
Daniel Eklöf
3f3fff768a
vt: lazily reset utf8 in action_utf8_*_entry
action_clear() is in the super hot code path. Avoid resetting utf8
state there, as utf8 input is relatively uncommon.

Instead, reset it when we explicitly enter any of the utf8 collecting
states, as this is exactly the point where we need it.
2020-04-27 15:50:44 +02:00
Daniel Eklöf
d1fc419e34
vt: action_utf8_print: idx is cleared in action_clear() 2020-04-27 15:49:07 +02:00
Daniel Eklöf
4278af99d2
vt: utf8-*-entry: idx is cleared in action_clear() 2020-04-27 15:47:44 +02:00
Daniel Eklöf
e478874dd9
term: remove unneeded utf8.left member 2020-04-27 15:06:23 +02:00
Daniel Eklöf
4283a8c51b
utf8: add support for unicode combining characters
This feature lets foot combine e.g. "a\u0301" to "á".

We first check if the current character (that we're about to print) is
a combining character, by checking if it's in one of the following
ranges:

* Combining Diacritical Marks (0300–036F), since version 1.0, with
  modifications in subsequent versions down to 4.1
* Combining Diacritical Marks Extended (1AB0–1AFF), version 7.0
* Combining Diacritical Marks Supplement (1DC0–1DFF), versions 4.1 to 5.2
* Combining Diacritical Marks for Symbols (20D0–20FF), since version
  1.0, with modifications in subsequent versions down to 5.1
* Combining Half Marks (FE20–FE2F), versions 1.0, with modifications
  in subsequent versions down to 8.0

If it is, we check if the last cell appears to contain a valid symbol,
and if so, we attempt to compose (combine) the last cell with the
current character, using utf8proc.

If the result is a combined character, we replace the content in the
previous cell with the new, combined character.

Thus, if you select and copy the printed character, you would get
e.g. "\u00e1" instead of "a\u0301".

This feature can be disabled. By default, it is enabled if the
utf8proc library is found, but can be explicitly disabled, or enabled,
with 'meson -Dunicode-combining=disabled|enabled'.
2020-04-27 12:13:30 +02:00
Daniel Eklöf
89559d5466
grid: move 'cursor' state from terminal to grid
This way, the 'normal' and 'alt' grids have their own cursor state,
and we don't need to switch between them.
2020-04-16 18:51:14 +02:00
Daniel Eklöf
d67f437458
mbstate: fix compile warning on systems where mbstate_t isn't an integral
An empty initializer still ensures the entire object is
zero-initialized.
2020-04-13 11:58:38 +02:00
Daniel Eklöf
1006608093
alt-screen: use a custom 'saved' cursor when switching to alt screen
This fixes an issue where we failed to restore the cursor correctly
when exiting from the alternate screen, if the client had sent escapes
to save the cursor position while inside the alternate screen.

This was because we used the *same* storage for saving the cursor
position through escapes, as for saving it when entering the alternate
screen.

Fix by using a custom variable dedicated to normal <--> alt screen
switching.
2020-03-16 12:00:25 +01:00
Daniel Eklöf
4a169f5643
vt: tag cells that were form-feed:ed, to allow correct text reflow
To handle text reflow correctly when a line has a printable character
in the last column, but was still line breaked, we need to track the
fact that the slave inserted a line break here.

Otherwise, when the window width is increased, we'll end up pulling up
the next line, when we really should have inserted a line break.
2020-02-10 21:54:37 +01:00
Daniel Eklöf
8c32e3ccf0
vt: ensure we never step outside our parameter and sub-parameter arrays
We only support 16 parameters, and for each parameter, 16
sub-parameters. If we ever hit that limit (or rather, if the client
writes 17 (sub) parameters), log this and stop incrementing the
parameter index variable.

For performance reason, we implement the following behavior:

* We never increment the parameter index past the supported
  number. This ensures all code *accessing* the parameter list can do
  so without verifying the validity of the index.

* The *first* time we see too many parameters, and the first time we
  see too many sub parameters, log this. Then *never* log again. Even
  if we see too many parameters in a completely different escape. This
  is so that we don't have to keep a "have warned" boolean in the
  terminal struct, but can use a simple function local static
  variable.
2020-02-01 19:44:56 +01:00
Daniel Eklöf
07a0c7238c
vt: collect (intermediate): log a warning if user supplied more than two intermediates 2020-02-01 19:29:31 +01:00
Daniel Eklöf
bbb7b60b17
vt: collect (intermediate): log _which_ character we collected 2020-02-01 19:29:14 +01:00
Daniel Eklöf
4a64e4aebc
vt: bug: state machine: csi entry: handle 0x3a/0x3b correctly
0x3a/0x3b are ':' and ';'. These should not only switch to the 'csi
param' state, but also be parsed as a parameter.

This fixes an issue where a multi-parameter escape with the first
parameter omitted was parsed incorrectly - as if the first parameter
wasn't there.

I.e. "\e[;123r" was parsed as "\e[123r"
2020-01-26 00:44:53 +01:00
Daniel Eklöf
75b8fc52b8
vt: bug: fix check for error from mbrtowc()
mbrtowc() returns an unsigned. Need to cast to signed before checking
if less than zero.

This fixes an issue where invalid utf-8 sequences where treated as valid.
2020-01-23 17:39:25 +01:00
Daniel Eklöf
300f83e66b
term: factor out character printing to new function term_print() 2020-01-20 18:34:32 +01:00
Daniel Eklöf
5a6cbb8c3e
dcs: initial handling of DCS in general
Add data structure to term->vt. This structure tracks the free-form
data that is passed-through, and the handler to call at the end.

Intermediates and parameters are collected by the normal VT
parser. Then, when we enter the passthrough state, we call dcs_hook().

This function checks the intermediate(s) and parameters, and selects
the appropriate unhook handler (and optionally does some execution
already).

In passthrough mode, we simply append strings to an internal
buffer. This might have to be changed in the future, if we need to
support a DCS that needs to execute as we go.

In unhook (i.e. when the DCS is terminated), we execute the unhook
handler.

As a proof-of-concept, handlers for BSU/ESU (Begin/End Synchronized
Update) has been added (but are left unimplemented).
2020-01-12 11:55:22 +01:00
Daniel Eklöf
56824e459d
Revert "vt: refactor"
This reverts commit a575204bc7.
2019-12-20 23:59:23 +01:00
Daniel Eklöf
a575204bc7
vt: refactor 2019-12-20 23:45:21 +01:00