This would be done cleaner by using destination clipping in pixman,
but since we have multiple threads rendering cells simultaneously,
that is not possible.
We also cannot use source clipping since we need to offset the
destination x,y coordinates with the glyph offsets.
So, roll our own clipping by not allowing the x,y offsets to go
outside the cell boundaries, and adjusting the glyph offset
accordingly.
Closes#21
For performance reasons, we track whether a cell is selected or not
using a bit in a cell's attributes.
This makes it easy for the renderer to determine if the cells should
be rendered as selected or not - it just have to look at the
'selected' bit instead of doing a complex range check against the
current selection.
This works nicely in most cases. But, if the cell is updated, the
'selected' bit is cleared. This results in the renderer rendering the
cell normally, i.e. _not_ selected.
Checking for this, and re-setting the 'selected' bit when the cell is
updated (printed to) is way too expensive as it is in the hot path.
Instead, sync the 'selected' bits just before rendering. This isn't so
bad as it may sound; if there is no selection this is a no-op. Even if
there is a selection, only those cells whose 'selected' bit have been
cleared are dirtied (and thus re-rendered) - these cells would have
been re-rendered anyway.
Instead of storing combining data per cell, realize that most
combinations are re-occurring and that there's lots of available space
left in the unicode range, and store seen base+combining combinations
chains in a per-terminal array.
When we encounter a combining character, we first try to pre-compose,
like before. If that fails, we then search for the current
base+combining combo in the list of previously seen combinations. If
not found there either, we allocate a new combo and add it to the
list. Regardless, the result is an index into this array. We store
this index, offsetted by COMB_CHARS_LO=0x40000000ul in the cell.
When rendering, we need to check if the cell character is a plain
character, or if it's a composed character (identified by checking if
the cell character is >= COMB_CHARS_LO).
Then we render the grapheme pretty much like before.
We only used utf8proc to try to pre-compose a glyph from a base and
combining character.
We can do this ourselves by using a pre-compiled table of valid
pre-compositions. This table isn't _that_ big, and binary searching it
is fast.
That is, for a very small amount of code, and not too much extra RO
data, we can get rid of the utf8proc dependency.
This is basically just a loop that renders additional glyphs on top
off the base glyph.
Since we now need access to the row struct, for the combining
characters, the function prototype for 'render_cell()' has changed.
When rendering the combining characters we also need to deal with
broken fonts; some fonts use positive offsets (as if the combining
character was a regular character), while others use a negative
offset (to be used as if you had already advanced the pen position).
We handle both - a positive offset is rendered just like a regular
glyph. When we see a negative offset we simply add the width of a cell
first.
A lot of the escape sequences on the "CSI Ps ; Ps ; Ps t" form are
expected to return a reply. Thus, not having these implemented means
we will hang if the client sends these escapes.
Not all of these escapes can be meaningfully implemented on Wayland,
and thus this implementation is best effort.
We now support the following escapes:
* 11t - report if window is iconified (always replies "no")
* 13t - report window position (always replies 0,0)
* 13;2t - report text area position (replies with margins, since we
cannot get the window's position)
* 14t - report text area size, in pixels
* 14;2t - report window size, in pixels
* 15t - report screen size, in pixels
* 16t - report cell size, in pixels
* 18t - report text area size, in cells
* 19t - report screen size, in cells
While it *looked* like the selection was working before, it really
wasn't.
When rendering, we're looking at the cells' attributes to determine
whether they have been selected or not.
When copying text however, we use the terminal's 'selection' state,
which consists of 'start' and 'end' coordinates.
These need to be translated when reflowing text.
This lessens the burden on (primarily) the compositor, since we no
longer tear down and re-create the SHM pool when scrolling.
The SHM pool is setup once, and its size is fixed at the maximum
allowed (512MB for now, 2GB would be possible).
This also allows us to mmap() the memfd once. The exposed raw pointer
is simply an offset from the memfd mmapping.
Note that this means e.g. rouge rendering code will be able to write
outside the buffer.
Finally, only do this if the caller explicitly wants to enable
scrolling. The memfd of other buffers are sized to the requested size.
Implemented by truncating the file size and moving the offset
backwards. This means we can only reverse scroll when we've previously
scrolled forward.
TODO: figure out if we can somehow do fast reverse scrolling even
though offset is 0 (or well, less than required for scrolling).
Before, we applied delayed rendering (that is, we gave the client a
chance to do more writes before we scheduled a render refresh) only
when the renderer were idle.
However, with e.g. a high keyboard repeat rate, it is very much
possible to start the render loop and then never break out of it while
receiving keyboard input.
This causes screen flickering, as we're no longer even trying to
detect the clients transaction boundaries.
So, let's rewrite how this is done.
First, we give the user the ability to disable delayed rendering
altogether, by setting either the lower or upper timeout to 0.
Second, when delayed rendering is enabled, we ignore the frame
callback. That is, when receiving input, we *always* reschedule the
lower timeout timer, regardless of whether the render is idle or not.
The render's frame callback handler will *not* render the grid if the
delayed render timers are armed.
This means for longer client data bursts, we may now skip frames. That
is, we're trading screen flicker for the occasional frame hickup.
For short client data bursts we should behave roughly as before.
This greatly improves the behavior of fullscreen, or near fullscreen,
updates of large grids (example, scrolling in emacs in fullscreen,
with a vertical buffer split).
shm_scroll() is fast when memmove() is slow. That is, when scrolling
a *small* amount of lines, shm_scroll() is fast.
Conversely, when scrolling a *large* number of lines, memmove() is
fast.
For now, assume the two methods perform _roughly_ the same given a
certain number of bytes they have to touch.
This means we should use shm_scroll() when number of scroll lines is
less than half the screen. Otherwise we use memmove.
Since we need to repair the bottom scroll region after a shm_scroll,
those lines are added to the count when determining which method to
use.
TODO:
* Check if it's worth to do shm scrolling when we have a top scroll
region.
* Do more performance testing with a) various amount of scrolling
lines, and b) larger bottom scroll regions.
When resizing in alt mode, we never updated the saved 'normal'
cursor. This meant that when we exited alt mode, the cursor would be
positioned where it was in the old pre-resize/reflow grid.
Now, we update the saved cursor in the same way we update visible
cursor. The result is that when we exit the alt screen, the cursor
is restored to the same coordinates it would have been updated to had
the resize been done in the 'normal' screen.