It may happen that we end up with multiple non-busy, same-sized
buffers for the same cookie (context), and thus eligible for re-use.
Before this patch, we would keep all those buffers around. This is
completely unnecessary. Under normal circumstances, we’ll either be
re-using a single buffer, or swap between two. In the second case, the
“other” buffer is always busy, and thus not eligible for re-use.
So, if we _do_ detect multiple, re-usable buffers, pick the one with
the lowest “age” (increasing the chance of applying damage tracking,
instead of re-drawing everything), and mark the other one for purging.
shm_get_many() always returns new buffers (i.e. never old, cached
ones). The newly allocated buffers are also marked for immediate
purging, meaning they’ll be destroyed on the next call to either
shm_get_buffer(), or shm_get_many().
Furthermore, we add a new attribute, ‘locked’, to the buffer
struct. When auto purging buffers, look at this instead of comparing
cookies.
Buffer consumers are expected to set ‘locked’ while they hold a
reference to it, and don’t want it destroyed behind their back.
This fixes an issue where we ended up "double closing" buffer FDs.
In many cases (especially on compositors with SSDs) this was pretty
rare. And even when it did happen, the FD was normally unused, and
thus nothing bad happened.
However, by quickly resizing the window while using CSDs, it was
fairly easy to trigger this. We sometimes ended up closing the
TIOCSWINCH timer FD while thinking it was a buffer FD, but most of the
times we just ended up closing _another_ buffer’s pool FD, leading to
an immediate disconnect by the compositor.
The only reason to keep the pool FD open is if we’re going to SHM
scroll the buffer; we need the FD for fallocate(FALLOC_FL_PUNCH_HOLE).
In all other cases, there’s absolutely no need to keep the FD
open. Thus, close it as soon as we’ve instantiated the buffer. This
frees up FDs, and help keep foot from FD ulimit.
* Break out cursor cell dirtying to separate functions
* Break out handling of double buffering
* Handle buffers with age > 1 (we’re swapping between more than 2
buffers)
* Detect full screen repaints, and skip re-applying old frame’s damage
* Use an allocated array insted of a tll list for old frame’s scroll damage
* When logging frame rendering time, including the amount used for
double buffering.
By default, age all matching buffers that are busy (i.e. in use by the
compositor).
This allows us to detect whether we can apply the current frame’s
damage directly, or if we need to prepare the buffer first (e.g. copy
old buffer, or re-apply last frame’s damage etc).
shm.c:301:26: error: implicit declaration of function 'fallocate' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
can_punch_hole = fallocate(
^
shm.c:302:22: error: use of undeclared identifier 'FALLOC_FL_PUNCH_HOLE'
pool_fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, 1) == 0;
^
shm.c:302:45: error: use of undeclared identifier 'FALLOC_FL_KEEP_SIZE'
pool_fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, 1) == 0;
^
shm.c:432:9: error: implicit declaration of function 'fallocate' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (fallocate(
^
shm.c:434:13: error: use of undeclared identifier 'FALLOC_FL_PUNCH_HOLE'
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
^
shm.c:434:36: error: use of undeclared identifier 'FALLOC_FL_KEEP_SIZE'
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
^
shm.c:501:9: error: implicit declaration of function 'fallocate' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (fallocate(
^
shm.c:503:13: error: use of undeclared identifier 'FALLOC_FL_PUNCH_HOLE'
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
^
shm.c:503:36: error: use of undeclared identifier 'FALLOC_FL_KEEP_SIZE'
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
^
shm.c:597:9: error: implicit declaration of function 'fallocate' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (fallocate(
^
shm.c:599:13: error: use of undeclared identifier 'FALLOC_FL_PUNCH_HOLE'
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
^
shm.c:599:36: error: use of undeclared identifier 'FALLOC_FL_KEEP_SIZE'
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
^
Our home rolled clip-to-cell code was, obviously, not correct.
The original problem was that we couldn't use pixman clipping since we
have multiple threads writing to the same pixman image, and thus there
would be races between the threads setting clipping.
The fix is actually simple - just instantiate one pixman
image (referencing the same backing image data) for each rendering
thread.
This both prevents accidental resizing of the memfd, and allows the
Wayland server to optimze reads from the buffer - it no longer has to
setup SIGBUS handlers.
This lessens the burden on (primarily) the compositor, since we no
longer tear down and re-create the SHM pool when scrolling.
The SHM pool is setup once, and its size is fixed at the maximum
allowed (512MB for now, 2GB would be possible).
This also allows us to mmap() the memfd once. The exposed raw pointer
is simply an offset from the memfd mmapping.
Note that this means e.g. rouge rendering code will be able to write
outside the buffer.
Finally, only do this if the caller explicitly wants to enable
scrolling. The memfd of other buffers are sized to the requested size.
* Impose a maximum memfd size limit. In theory, this can be
2GB (wl_shm_create_pool() is the limiting factor - its size argument
is an int32_t). For now, use 256MB.
This is mainly to reduce the amount of virtual address space used by
the compositor, which keeps at least one mmapping (of the entire
memfd) around. One mmapping *per terminal window* that is.
Given that we have 128TB with 48-bit virtual addresses, we could
probably bump this to 2GB without any issues. However, 256MB should
be enough.
TODO: check how much we typically move the offset when scrolling in
a fullscreen window on a 4K monitor. 256MB may turn out to be too
small.
On 32-bit shm_scroll() is completely disabled. There simply isn't
enough address space.
* Wrapping is done by moving the offset to "the other end" of the
memfd, and copying the buffer contents to the new, wrapped offset.
The "normal" scrolling code then does the actual scrolling. This
means we'll re-instantiate all objects twice when wrapping.
Implemented by truncating the file size and moving the offset
backwards. This means we can only reverse scroll when we've previously
scrolled forward.
TODO: figure out if we can somehow do fast reverse scrolling even
though offset is 0 (or well, less than required for scrolling).
This function "scrolls" the buffer by the specified number of (pixel)
rows.
The idea is move the image offset by re-sizing the underlying memfd
object. I.e. to scroll forward, increase the size of the memfd file,
and move the pixman image offset forward (and the Wayland SHM buffer
as well).
Only increasing the file size would, obviously, cause the memfd file
to grow indefinitely. To deal with this, we "punch" a whole from the
beginning of the file to the new offset. This frees the associated
memory.
Thus, while we have a memfd file whose size is (as seen by
e.g. fstat()) is ever growing, the actual file size is always the
original buffer size.
Some notes:
* FALLOC_FL_PUNCH_HOLE can be quite slow when the number of used pages
to drop is large.
* all normal fallocate() usages have been replaced with ftruncate(),
as this is *much* faster. fallocate() guarantees subsequent writes
wont fail. I.e. it actually reserves (disk) space. While it doesn't
allocate on-disk blocks for on-disk files, it *does* zero-initialize
the in-memory blocks. And this is slow. ftruncate() doesn't do this.
TODO: implement reverse scrolling (i.e. a negative row count).