The two-pass blend image is created with VK_IMAGE_LAYOUT_UNDEFINED, so
on its first use loadOp=LOAD loads uninitialized memory. This oughtn't
be an issue, as we render onto it before we read it. These renders are
blends, so even opaque content is rendered with reference to an uninit
dst. This too ought to be fine: src*1 + dst*0 = src for all finite dst.
But the blend image pixfmt is VK_FORMAT_R16G16B16A16_SFLOAT, so uninit
pixels can be NaN, inf, or -inf, and now src*1 + dst*0 = NaN/inf/-inf.
This is bad enough assuming the uninitialized blend image holds random
bytes (2048/65536 values are not finite), even worse on any driver/GPU
with a framebuffer compression scheme that so happens to reliably read
NaNs from any uninitialized compressed image...
Most Mesa drivers happen not to do this perfectly valid thing, so this
is only reliably a problem (afaict) for honeykrisp i.e. AGX i.e. Asahi
Linux i.e. Apple Silicon, where after an upgrade to wlroots 0.20, sway
renders a black screen forever, unless you get quite lucky spamming VT
switches, in which case there's flickery garbage on exactly one of the
two swapchain buffers.
The blend image persists across frames, so it suffices to clear before
first real use. Rather than clear by hand, make a loadOp=CLEAR variant
of the render pass and use it for that first frame only. Adding a pass
sounds heavy, but render pass compatibility ignores loadOp and layouts
such that the new pass reuses the pipelines and framebuffer, and costs
one VkRenderPass object but not the usual pipeline/shader (re)compile.
Similar to what we have already done for gles2. To simplify things we
use the staging ring buffer for the vertex buffers by extending the
usage bits, rather than introducing a separate pool.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>
Implement a ring-buffer that uses timeline points to track and release
allocated spans. We add larger ring-buffers when it fills, and remove
them when they have been unused for many collection cycles.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>
This allows using the vulkan renderer on platforms that provide all
the necessary Vulkan extensions.
Tested on a Mali G52 platform with Mesa 26.0.0 and 25.3.5, which only
support Vulkan API 1.0.
When we're reading from a DMA-BUF texture using implicit sync, we
need to (1) wait for any writer to be done and (2) prevent any
writers from mutating the texture while we're still reading. We
were doing (1) but not (2).
Fix this by calling dmabuf_import_sync_file() with DMA_BUF_SYNC_READ
for all DMA-BUF textures we've used in the render pass.
We'll need to grab textures from there in the next commit.
Also rename it to better reflect what it does: synchronize release
fences after a render pass has been submitted.
Fixes d1c88e9 "render/vulkan: add linear single-subpass"
`find_or_create_render_setup` still assumed a single-buffer setup
always implied use of an srgb format
A combined image sampler may need several descriptors in a descriptor
set. We are not currently checking how many descriptors are required,
nor is it presumably guaranteed that such multi-descriptor allocation
will not fail due to fragmentation.
If the pool free counter is not zero, try to allocate but continue with
the next pool and fall back to creating a new pool if the allocation
failed.
Fixes: https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/4010
We pass an alpha multiplier plus a luminance multiplier now.
Fixes the following validation layer error:
vkCmdPushConstants(): is called with
stageFlags (VK_SHADER_STAGE_FRAGMENT_BIT), offset (80), size (72)
but the VkPipelineLayout 0x510000000051 doesn't have a VkPushConstantRange with VK_SHADER_STAGE_FRAGMENT_BIT.
The Vulkan spec states: For each byte in the range specified by offset and size and for each shader stage in stageFlags, there must be a push constant range in layout that includes that byte and that stage (https://docs.vulkan.org/spec/latest/chapters/descriptorsets.html#VUID-vkCmdPushConstants-offset-01795) (VUID-vkCmdPushConstants-offset-01795)
Fixes: 56d95c2ecb ("render/vulkan: introduce wlr_vk_frag_texture_pcr_data")
Perform a primitive garbage collection of buffers that have not been
used in the past 10 seconds, an arbitrarily selected number.
As garbage collection also makes span buffer allocation happen much more
often, logging on allocation activity leads to a lot of log noise so get
rid of that while at it.
The spec for VkMemoryPropertyFlagBits says:
> device coherent accesses may be slower than equivalent accesses
> without device coherence [...] it is generally inadvisable to
> use device coherent or device uncached memory except when really
> needed
We don't really need coherent memory so let's not require it and
invalidate the memory range after mapping instead.
Closes: https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/3868
This change makes it possible to support both the direct srgb-format
pipeline and indirect (color-managed, or non-srgb-format) pipeline
for the same render buffer.