When set to 'auto', use 10-bit surfaces if gamma-correct blending is
enabled, and 8-bit surfaces otherwise.
Note that we may still fallback to 8-bit surfaces (without disabling
gamma-correct blending) if the compositor does not support 10-bit
surfaces.
Closes#2082
This implements gamma-correct blending, which mainly affects font
rendering.
The implementation requires compile-time availability of the new
color-management protocol (available in wayland-protocols >= 1.41),
and run-time support for the same in the compositor (specifically, the
EXT_LINEAR TF function and sRGB primaries).
How it works: all colors are decoded from sRGB to linear (using a
lookup table, generated in the exact same way pixman generates it's
internal conversion tables) before being used by pixman. The resulting
image buffer is thus in decoded/linear format. We use the
color-management protocol to inform the compositor of this, by tagging
the wayland surfaces with the 'ext_linear' image attribute.
Sixes: all colors are sRGB internally, and decoded to linear before
being used in any sixels. Thus, the image buffers will contain linear
colors. This is important, since otherwise there would be a
decode/encode penalty every time a sixel is blended to the grid.
Emojis: we require fcft >= 3.2, which adds support for sRGB decoding
color glyphs. Meaning, the emoji pixman surfaces can be blended
directly to the grid, just like sixels.
Gamma-correct blending is enabled by default *when the compositor
supports it*. There's a new option to explicitly enable/disable it:
gamma-correct-blending=no|yes. If set to 'yes', and the compositor
does not implement the required color-management features, warning
logs are emitted.
There's a loss of precision when storing linear pixels in 8-bit
channels. For this reason, this patch also adds supports for 10-bit
surfaces. For now, this is disabled by default since such surfaces
only have 2 bits for alpha. It can be enabled with
tweak.surface-bit-depth=10-bit.
Perhaps, in the future, we can enable it by default if:
* gamma-correct blending is enabled
* the user has not enabled a transparent background
Before this patch. Wayland surface damage tracking was done on a
per-row basis. That is, even if just one cell was updated, the entire
row was "damaged".
Now, damage is per cell. This hopefully results in lower latencies in
many use cases, and especially on high DPI monitors.
On compositors that forces us to double buffer, we need to re-apply
the last frame’s damage to the current frame (which uses the buffer
from the next-to-last frame).
General cell updates are handled by simply copying from the last
frame’s pixman buffer to the current frame’s.
In an attempt to improve performance, scroll damage were up until now
handled by re-playing the last frame’s scroll damage (on the current
frame’s buffer). This does not work, and resulted in glitches when
scrolling in the scrollback.
This patch does the following:
* grid_render_scroll{,_reverse}() now update the buffer’s "dirty"
region. This means the generic copy-old-frames-buffer handles the
scroll damage (albeit in, potentially, a less efficient way).
* Tracking of, and re-applying old scroll damage is completely
removed.
Closes#1173
This allows a caller to return a buffer (obtained with
shm_get_buffer()) to the pool,
The buffer must not have been used. I.e. it must not have been
attached and committed to a wayland surface.
Up until now, *all* buffers have been tracked in a single, global
buffer list. We've used 'cookies' to separate buffers from different
contexts (so that shm_get_buffer() doesn't try to re-use e.g. a
search-box buffer for the main grid).
This patch refactors this, and completely removes the global
list.
Instead of cookies, we now use 'chains'. A chain tracks both the
properties to apply to newly created buffers (scrollable, number of
pixman instances to instantiate etc), as well as the instantiated
buffers themselves.
This means there's strictly speaking not much use for shm_fini()
anymore, since its up to the chain owner to call shm_chain_free(),
which will also purge all buffers.
However, since purging a buffer may be deferred, if the buffer is
owned by the compositor at the time of the call to shm_purge() or
shm_chain_free(), we still keep a global 'deferred' list, on to which
deferred buffers are pushed. shm_fini() iterates this list and
destroys the buffers _even_ if they are still owned by the
compositor. This only happens at program termination, and not when
destroying a terminal instance. I.e. closing a window in a “foot
--server” does *not* trigger this.
Each terminal instatiates a number of chains, and these chains are
destroyed when the terminal instance is destroyed. Note that some
buffers may be put on the deferred list, as mentioned above.
The initial ref-count is either 1 or 0, depending on whether the
buffer is supposed to be released "immeidately" (meaning, as soon as
the compositor releases it).
Two new user facing functions have been added: shm_addref() and
shm_unref().
Our renderer now uses these two functions instead of manually setting
and clearing the 'locked' attribute.
shm_unref() will decrement the ref-counter, and destroy the buffer
when the counter reaches zero. Except if the buffer is currently
"busy" (compositor owned), in which case destruction is deferred to
the release event. The buffer is still removed from the list though.
shm_get_many() always returns new buffers (i.e. never old, cached
ones). The newly allocated buffers are also marked for immediate
purging, meaning they’ll be destroyed on the next call to either
shm_get_buffer(), or shm_get_many().
Furthermore, we add a new attribute, ‘locked’, to the buffer
struct. When auto purging buffers, look at this instead of comparing
cookies.
Buffer consumers are expected to set ‘locked’ while they hold a
reference to it, and don’t want it destroyed behind their back.
* Break out cursor cell dirtying to separate functions
* Break out handling of double buffering
* Handle buffers with age > 1 (we’re swapping between more than 2
buffers)
* Detect full screen repaints, and skip re-applying old frame’s damage
* Use an allocated array insted of a tll list for old frame’s scroll damage
* When logging frame rendering time, including the amount used for
double buffering.
By default, age all matching buffers that are busy (i.e. in use by the
compositor).
This allows us to detect whether we can apply the current frame’s
damage directly, or if we need to prepare the buffer first (e.g. copy
old buffer, or re-apply last frame’s damage etc).
This can be set to 'none' (the default), 'osd', 'log' or 'both'.
When 'osd' is enabled, we'll render the frame rendering time to a
sub-surface after each frame.
When 'log' is enabled, the frame rendering time is logged on stderr.
Our home rolled clip-to-cell code was, obviously, not correct.
The original problem was that we couldn't use pixman clipping since we
have multiple threads writing to the same pixman image, and thus there
would be races between the threads setting clipping.
The fix is actually simple - just instantiate one pixman
image (referencing the same backing image data) for each rendering
thread.
This lessens the burden on (primarily) the compositor, since we no
longer tear down and re-create the SHM pool when scrolling.
The SHM pool is setup once, and its size is fixed at the maximum
allowed (512MB for now, 2GB would be possible).
This also allows us to mmap() the memfd once. The exposed raw pointer
is simply an offset from the memfd mmapping.
Note that this means e.g. rouge rendering code will be able to write
outside the buffer.
Finally, only do this if the caller explicitly wants to enable
scrolling. The memfd of other buffers are sized to the requested size.
* Impose a maximum memfd size limit. In theory, this can be
2GB (wl_shm_create_pool() is the limiting factor - its size argument
is an int32_t). For now, use 256MB.
This is mainly to reduce the amount of virtual address space used by
the compositor, which keeps at least one mmapping (of the entire
memfd) around. One mmapping *per terminal window* that is.
Given that we have 128TB with 48-bit virtual addresses, we could
probably bump this to 2GB without any issues. However, 256MB should
be enough.
TODO: check how much we typically move the offset when scrolling in
a fullscreen window on a 4K monitor. 256MB may turn out to be too
small.
On 32-bit shm_scroll() is completely disabled. There simply isn't
enough address space.
* Wrapping is done by moving the offset to "the other end" of the
memfd, and copying the buffer contents to the new, wrapped offset.
The "normal" scrolling code then does the actual scrolling. This
means we'll re-instantiate all objects twice when wrapping.
This function "scrolls" the buffer by the specified number of (pixel)
rows.
The idea is move the image offset by re-sizing the underlying memfd
object. I.e. to scroll forward, increase the size of the memfd file,
and move the pixman image offset forward (and the Wayland SHM buffer
as well).
Only increasing the file size would, obviously, cause the memfd file
to grow indefinitely. To deal with this, we "punch" a whole from the
beginning of the file to the new offset. This frees the associated
memory.
Thus, while we have a memfd file whose size is (as seen by
e.g. fstat()) is ever growing, the actual file size is always the
original buffer size.
Some notes:
* FALLOC_FL_PUNCH_HOLE can be quite slow when the number of used pages
to drop is large.
* all normal fallocate() usages have been replaced with ftruncate(),
as this is *much* faster. fallocate() guarantees subsequent writes
wont fail. I.e. it actually reserves (disk) space. While it doesn't
allocate on-disk blocks for on-disk files, it *does* zero-initialize
the in-memory blocks. And this is slow. ftruncate() doesn't do this.
TODO: implement reverse scrolling (i.e. a negative row count).