The Wayland Protocol

Basic Principles The wayland protocol is an asynchronous object oriented protocol. All requests are method invocations on some object. The request include an object id that uniquely identifies an object on the server. Each object implements an interface and the requests include an opcode that identifies which method in the interface to invoke. The server sends back events to the client, each event is emitted from an object. Events can be error conditions. The event includes the object id and the event opcode, from which the client can determine the type of event. Events are generated both in response to requests (in which case the request and the event constitutes a round trip) or spontaneously when the server state changes. State is broadcast on connect, events are sent out when state changes. Clients must listen for these changes and cache the state. There is no need (or mechanism) to query server state. The server will broadcast the presence of a number of global objects, which in turn will broadcast their current state.

Code Generation The interfaces, requests and events are defined in protocol/wayland.xml. This xml is used to generate the function prototypes that can be used by clients and compositors. The protocol entry points are generated as inline functions which just wrap the wl_proxy_* functions. The inline functions aren't part of the library ABI and language bindings should generate their own stubs for the protocol entry points from the xml.

Wire Format The protocol is sent over a UNIX domain stream socket. Currently, the endpoint is named \wayland, but it is subject to change. The protocol is message-based. A message sent by a client to the server is called request. A message from the server to a client is called event. Every message is structured as 32-bit words, values are represented in the host's byte-order. The message header has 2 words in it: The first word is the sender's object id (32-bit). The second has 2 parts of 16-bit. The upper 16-bits are the message size in bytes, starting at the header (i.e. it has a minimum value of 8).The lower is the request/event opcode. The payload describes the request/event arguments. Every argument is always aligned to 32-bits. Where padding is required, the value of padding bytes is undefined. There is no prefix that describes the type, but it is inferred implicitly from the xml specification. The representation of argument types are as follows: int uint The value is the 32-bit value of the signed/unsigned int. string Starts with an unsigned 32-bit length, followed by the string contents, including terminating NUL byte, then padding to a 32-bit boundary. object 32-bit object ID. new_id The 32-bit object ID. On requests, the client decides the ID. The only events with new_id are advertisements of globals, and the server will use IDs below 0x10000. array Starts with 32-bit array size in bytes, followed by the array contents verbatim, and finally padding to a 32-bit boundary. fd The file descriptor is not stored in the message buffer, but in the ancillary data of the UNIX domain socket message (msg_control).

Interfaces The protocol includes several interfaces which are used for interacting with the server. Each interface provides requests, events, and errors (which are really just special events) as described above. Specific compositor implementations may have their own interfaces provided as extensions, but there are several which are always expected to be present. Core interfaces: wl_display provides global functionality like objecting binding and fatal error events wl_callback callback interface for done events wl_compositor core compositor interface, allows surface creation wl_shm buffer management interface with buffer creation and format handling wl_buffer buffer handling interface for indicating damage and object destruction, also provides buffer release events from the server wl_data_offer for accepting and receiving specific mime types wl_data_source for offering specific mime types wl_data_device lets clients manage drag & drop, provides pointer enter/leave events and motion wl_data_device_manager for managing data sources and devices wl_shell shell surface handling wl_shell_surface shell surface handling and desktop-like events (e.g. set a surface to fullscreen, display a popup, etc.) wl_seat cursor setting, motion, button, and key events, etc. wl_output events describing an attached output (subpixel orientation, current mode & geometry, etc.)

Connect Time no fixed format connect block, the server emits a bunch of events at connect time presence events for global objects: output, compositor, input devices

Security and Authentication mostly about access to underlying buffers, need new drm auth mechanism (the grant-to ioctl idea), need to check the cmd stream? getting the server socket depends on the compositor type, could be a system wide name, through fd passing on the session dbus. or the client is forked by the compositor and the fd is already opened.

Creating Objects client allocates object ID, uses range protocol server tracks how many IDs are left in current range, sends new range when client is about to run out.

Compositor The compositor is a global object, advertised at connect time. See for the protocol description.

Surface Created by the client. See for the protocol description. Needs a way to set input region, opaque region.

Input Represents a group of input devices, including mice, keyboards. Has a keyboard and pointer focus. Global object. Pointer events are delivered in both screen coordinates and surface local coordinates. See for the protocol description. Talk about: keyboard map, change events xkb on wayland multi pointer wayland A surface can change the pointer image when the surface is the pointer focus of the input device. Wayland doesn't automatically change the pointer image when a pointer enters a surface, but expects the application to set the cursor it wants in response the pointer focus and motion events. The rationale is that a client has to manage changing pointer images for UI elements within the surface in response to motion events anyway, so we'll make that the only mechanism for setting changing the pointer image. If the server receives a request to set the pointer image after the surface loses pointer focus, the request is ignored. To the client this will look like it successfully set the pointer image. The compositor will revert the pointer image back to a default image when no surface has the pointer focus for that device. Clients can revert the pointer image back to the default image by setting a NULL image. What if the pointer moves from one window which has set a special pointer image to a surface that doesn't set an image in response to the motion event? The new surface will be stuck with the special pointer image. We can't just revert the pointer image on leaving a surface, since if we immediately enter a surface that sets a different image, the image will flicker. Broken app, I suppose.

Output A output is a global object, advertised at connect time or as they come and go. See for the protocol description. laid out in a big (compositor) coordinate system basically xrandr over wayland geometry needs position in compositor coordinate system\ events to advertise available modes, requests to move and change modes

Shared Object Cache Cache for sharing glyphs, icons, cursors across clients. Lets clients share identical objects. The cache is a global object, advertised at connect time. Interface: cache Requests: upload(key, visual, bo, stride, width, height) Events: item(key, bo, x, y, stride) retire(bo) Upload by passing a visual, bo, stride, width, height to the cache. Upload returns a bo name, stride, and x, y location of object in the buffer. Clients take a reference on the atlas bo. Shared objects are refcounted, freed by client (when purging glyphs from the local cache) or when a client exits. Server can't delete individual items from an atlas, but it can throw out an entire atlas bo if it becomes too sparse. The server sends out an retire event when this happens, and clients must throw away any objects from that bo and reupload. Between the server dropping the atlas and the client receiving the retire event, clients can still legally use the old atlas since they have a ref on the bo. cairo needs to hook into the glyph cache, and maybe also a way to create a read-only surface based on an object form the cache (icons). cairo_wayland_create_cached_surface(surface-data)

Drag and Drop Multi-device aware. Orthogonal to rest of wayland, as it is its own toplevel object. Since the compositor determines the drag target, it works with transformed surfaces (dragging to a scaled down window in expose mode, for example). See , and for protocol descriptions. Issues: we can set the cursor image to the current cursor + dragged object, which will last as long as the drag, but maybe an request to attach an image to the cursor will be more convenient? Should drag.send() destroy the object? There's nothing to do after the data has been transferred. How do we marshal several mime-types? We could make the drag setup a multi-step operation: dnd.create, drag.offer(mime-type1), drag.offer(mime-type2), drag.activate(). The drag object could send multiple offer events on each motion event. Or we could just implement an array type, but that's a pain to work with. Middle-click drag to pop up menu? Ctrl/Shift/Alt drag? Send a file descriptor over the protocol to let initiator and source exchange data out of band? Action? Specify action when creating the drag object? Ask action? Sequence of events: The initiator surface receives a click (which grabs the input device to that surface) and then enough motion to decide that a drag is starting. Wayland has no subwindows, so it's entirely up to the application to decide whether or not a draggable object within the surface was clicked. The initiator creates a drag object by calling the create_drag method on the dnd global object. As for any client created object, the client allocates the id. The create_drag method also takes the originating surface, the device that's dragging and the mime-types supported. If the surface has indeed grabbed the device passed in, the server will create an active drag object for the device. If the grab was released in the meantime, the drag object will be in-active, that is, the same state as when the grab is released. In that case, the client will receive a button up event, which will let it know that the drag finished. To the client it will look like the drag was immediately cancelled by the grab ending. The special mime-type application/x-root-target indicates that the initiator is looking for drag events to the root window as well. To indicate the object being dragged, the initiator can replace the pointer image with an larger image representing the data being dragged with the cursor image overlaid. The pointer image will remain in place as long as the grab is in effect, since the initiating surface keeps pointer focus, and no other surface receives enter events. As long as the grab is active (or until the initiator cancels the drag by destroying the drag object), the drag object will send offer events to surfaces it moves across. As for motion events, these events contain the surface local coordinates of the device as well as the list of mime-types offered. When a device leaves a surface, it will send an offer event with an empty list of mime-types to indicate that the device left the surface. If a surface receives an offer event and decides that it's in an area that can accept a drag event, it should call the accept method on the drag object in the event. The surface passes a mime-type in the request, picked from the list in the offer event, to indicate which of the types it wants. At this point, the surface can update the appearance of the drop target to give feedback to the user that the drag has a valid target. If the offer event moves to a different drop target (the surface decides the offer coordinates is outside the drop target) or leaves the surface (the offer event has an empty list of mime-types) it should revert the appearance of the drop target to the inactive state. A surface can also decide to retract its drop target (if the drop target disappears or moves, for example), by calling the accept method with a NULL mime-type. When a target surface sends an accept request, the drag object will send a target event to the initiator surface. This tells the initiator that the drag currently has a potential target and which of the offered mime-types the target wants. The initiator can change the pointer image or drag source appearance to reflect this new state. If the target surface retracts its drop target of if the surface disappears, a target event with a NULL mime-type will be sent. If the initiator listed application/x-root-target as a valid mime-type, dragging into the root window will make the drag object send a target event with the application/x-root-target mime-type. When the grab is released (indicated by the button release event), if the drag has an active target, the initiator calls the send method on the drag object to send the data to be transferred by the drag operation, in the format requested by the target. The initiator can then destroy the drag object by calling the destroy method. The drop target receives a data event from the drag object with the requested data. MIME is defined in RFC's 2045-2049. A registry of MIME types is maintained by the Internet Assigned Numbers Authority (IANA).