Some systems have a split render/display architecture: the display
side is managed by a KMS driver and the render side is managed by a
separate Vulkan driver. Figuring out that these two drivers can
work together is not trivial.
Currently, the Vulkan renderer tries to find a Vulkan physical
device which matches the DRM device's dev_t. On split render/display,
there is no such device.
The platform bus has historically been abused for situations
where no other bus would make sense (e.g. VKMS, evdi). A new "faux"
bus has been introduced [1] for such devices, so the platform bus
should now be a pretty good hint that all devices are on the same
system-on-chip.
When we don't find a Vulkan physical device and the DRM device is
using the platform bus, fall back to any Vulkan physical which
also uses the platform bus.
[1]: https://lore.kernel.org/all/2025021023-sandstorm-precise-9f5d@gregkh/
Closes: https://gitlab.freedesktop.org/wlroots/wlroots/-/work_items/4055
The two-pass blend image is created with VK_IMAGE_LAYOUT_UNDEFINED, so
on its first use loadOp=LOAD loads uninitialized memory. This oughtn't
be an issue, as we render onto it before we read it. These renders are
blends, so even opaque content is rendered with reference to an uninit
dst. This too ought to be fine: src*1 + dst*0 = src for all finite dst.
But the blend image pixfmt is VK_FORMAT_R16G16B16A16_SFLOAT, so uninit
pixels can be NaN, inf, or -inf, and now src*1 + dst*0 = NaN/inf/-inf.
This is bad enough assuming the uninitialized blend image holds random
bytes (2048/65536 values are not finite), even worse on any driver/GPU
with a framebuffer compression scheme that so happens to reliably read
NaNs from any uninitialized compressed image...
Most Mesa drivers happen not to do this perfectly valid thing, so this
is only reliably a problem (afaict) for honeykrisp i.e. AGX i.e. Asahi
Linux i.e. Apple Silicon, where after an upgrade to wlroots 0.20, sway
renders a black screen forever, unless you get quite lucky spamming VT
switches, in which case there's flickery garbage on exactly one of the
two swapchain buffers.
The blend image persists across frames, so it suffices to clear before
first real use. Rather than clear by hand, make a loadOp=CLEAR variant
of the render pass and use it for that first frame only. Adding a pass
sounds heavy, but render pass compatibility ignores loadOp and layouts
such that the new pass reuses the pipelines and framebuffer, and costs
one VkRenderPass object but not the usual pipeline/shader (re)compile.
Without this check, the reply value might be smaller than
xcb_window_t and will result in an invalid memory read.
Reported-by: Tristan <TristanInSec@gmail.com>
The name pointer points into the drmDevice structure, which is freed by
drmFreeDevice(). The error log was using name after the free, which is
undefined behavior.
Move the error log before drmFreeDevice() so name is still valid when
used in the log message.
Signed-off-by: Wang Yu <wangyu@uniontech.com>
Send the failed event immediately when a client sends capture_output
or capture_output_region for an output that already has a live frame
owned by that client.
Enforces one frame per (client, output) pair, matching ext-image-copy-
capture-v1's create_frame restriction without adding a new error code
or bumping the interface version.
When a compositor uses color transforms (ICC profiles) the output's
postrender buffer is in the display's color space, not sRGB. A
screenshot client receives this buffer and saves it as an untagged PNG,
which appears oversaturated in non-colormanaged viewers.
To fix this without altering the semantics of the raw output source
(which must deliver the exact hardware scanned buffer, including
overlays and direct scanout), add an optional, compositor driven
scene-per-output capture source. This source re-renders the entire scene
graph for a given output with an identity color transform (sRGB), using
a hidden headless output to avoid flicker.
The new function
`wlr_ext_image_capture_source_v1_create_with_scene_output()` takes a
wlr_scene, a reference wlr_output (for dimensions, scale,
renderer/allocator), and an optional wlr_output_layout (for correct
positioning). The source is created on demand in the existing output
capture manager when the compositor has called
`wlr_ext_output_image_capture_source_manager_v1_set_scene()` and
`wlr_ext_output_image_capture_source_manager_v1_set_layout()`.
If the compositor never provides a scene, the manager continues to
create the original raw output source, preserving backward compatibility
and hardware plane capture for compositors that need it.
current attempts at copying regions outside the first output end up
wrapping into the first output. Fix this by allowing compositors to
expose the layout.
screencopy
When a compositor uses color transforms (ICC profiles), the output's
post-render buffer is in the display's color space, not sRGB. A
screenshot client like grim receives this buffer and saves it as an
untagged PNG, which then appears over-saturated in non-color-managed
viewers.
To avoid this, the output capture source now creates a hidden headless
output that re-renders the same scene graph with an identity color
transform (sRGB). The hidden output is driven entirely within the
capture source and does not affect the real output or cause any visual
flicker.
The pixman renderer can not be disabled and other parts of wlroots
do not carry their own dependency of libpixman around. Use a single
global dependency which also satisfies the pixman renderer.
The "restrict" keyword can be used to indicate that no other
pointer will be used to access a chunk of memory while the
restricted pointer is alive. If that promise is not upheld,
undefined behavior is triggered.
It may be difficult to ensure this property, and the property may
be brittle - becoming invalid as code evolves. Just like "inline",
let's just leave optimizations up to the compiler to figure out.
xwayland_surface_associate() asserts that the surface has not yet
been associated yet. Arbitrary clients can send these messages,
don't abort when that happens.
This field is difficult to use correctly, its meaning depends on
format.
xcb docs read:
> You should use the corresponding accessor instead of this field.
Replace all uses with the safe accessor.
This fixes potential out-of-bounds array accesses when the format
field isn't what we expect.
Add `wlr_backend_get_libinput` getter, which returns a direct
handle to `struct libinput`.
Exposing the libinput context will allow compositors to load
libinput plugins after creating a `wlr_backend`.
Lua plugins are supported since libinput 1.30.0.
In handling scene buffer output updates, wlroots would send a leave event to
all entered outputs, even those that the scene root for the scene output update
event did not own. Leaving the output list inaccurate.
Sending leave events only for the given scene introduces a problem, though:
existing logic to de-duplicate leave events stops us from sending a leave event
when we leave all the outputs in a scene, and when the surface then becomes
visible in another scene, the frame pacing output cannot be selected
accurately. This breaks screen capture for off-screen windows in sway.
So, let us also mark outputs that would have been left but were spared by the
deduplication logic as "suspended" indicating they are ineligible as frame
pacing outputs.
Fixes: https://github.com/swaywm/sway/issues/9094
Fix a memory leak when waiter_init fails in wlr_drm_syncobj_merger_add(),
and prevent the old sync_fd from being closed when sync_file_merge fails
in wlr_drm_syncobj_merger_add_sync_file().
This reverts commit 02abf1cd28.
The change doesn't make sense. It causes a use-after-free when trying
to read the pixel data of the icon. The docs clearly state to use
'xcb_ewmh_get_wm_icon_reply_wipe()' when using the function which
correctly frees the reply *after* processing the pixel data.
If a very large number of clip rects are accumulated in rect_union_add,
rect_union_evaluate can end up being disproportionately expensive, and
as an extreme numbers of clip rects are not beneficial for drawing, this
is without any benefit.
Limit the number of rects to 1024 in rect_union_add, switching over to
bounding box mode instead when the limit is exceeded. This leads to a
small 70% reduction in CPU time in the Vulkan renderer on the
stacked/clip200/1024 benchmarks.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>
rect_union_add takes a pixman_box32_t by value, and passes it along by
value to internal helpers. render_pass_mark_box_updated which is the
only caller receives the pixman_box32_t by reference, so just plumb it
through that way.
Results in a 13% improvement in CPU time when using the Vulkan renderer
on the stacked/clip200/1024 benchmarks on my machine.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>
Similar to what we have already done for gles2. To simplify things we
use the staging ring buffer for the vertex buffers by extending the
usage bits, rather than introducing a separate pool.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>
We are spending quite significant CPU time walking through the clip
rects, taking a pixman box, converting it to a wlr box, intersecting it
and ultimately converting it back to a pixman box before adding it to
the rect union.
Just intersect the clip region once ahead of time, and use pixman boxes
the entire way. This also makes it easy to bail early if nothing
intersects.
Gives a small 97.95% reduction in CPU time for the Vulkan renderer in
the grid/clip200/1024 benchmark.
Signed-off-by: Kenny Levinsen <kl@kl.wtf>