impl-link: add 1 quantum latency for async links

Update the scheduling doc with some information about how async scheduling works. Also add something about the latency. Async links add 1 quantum of latency so take that into account when aggregating latencies. Also a source directly linked to an async node does not add latency (we evaluate the tee before incrementing the cycle so that it effectively is executed in the previous cycle and consumed immediately by async nodes). We can do this because the driver source always provides data before the async node, and never concurrently. Add a listener to the link for the node driver change as well because that can now influence the latency for async nodes.
2025-10-29 05:40:27 -04:00 · 2025-09-15 17:36:20 +02:00 · 2025-09-15 17:36:20 +02:00 · 4dccddd564
commit 4dccddd564
parent f89428d9f8
4 changed files with 151 additions and 10 deletions
--- a/doc/dox/internals/scheduling.dox
+++ b/doc/dox/internals/scheduling.dox
@ -152,6 +152,7 @@ will then:
 - Check the previous cycle. Did it complete? Mark xrun on unfinished nodes.
 - Perform reposition requests if any, timebase changes, etc..
 - The pending counter of each follower node is set to the required field.
+ - Update the cycle counter in the driver activation io.
 - It then loops over all targets of the driver and atomically decrements the required
   field of the activation record. When the required field is 0, the eventfd is signaled
   and the node can be scheduled.
@ -186,6 +187,118 @@ fields from all the nodes in the target list of the driver are now 0.

 The driver calculates some stats about cpu time etc.

+# Async scheduling
+
+When a node has the node.async property set to true, it will be considered an async
+node and will be scheduled differently.
+
+Async nodes don't increment the pending counter of their peers and the upstream peers
+also don't increment the async node pending counters. Only the driver increments the
+pending counter to the async node.
+
+This means that the async nodes do not depend on any other node and also are not a
+dependency for other nodes. This also means that the async nodes can be scheduled as
+soon as the driver has started the graph.
+
+The completion of the async node does not influence the completion of the graph in
+any way and async nodes are therefor interesting is real-time performance can not
+be guaranteed, for example when the processing threads are not running in a real-time
+priority.
+
+A link between a port of an async node and another port (async or not) is called an
+async link and will have the link.async=true property.
+
+Because async nodes then run concurrently with other nodes, a method must be in place
+to avoid concurrent access to buffer data. This is done by sending a spa_io_async_buffers
+io to the (mixer) ports of an async link. The spa_io_async_buffers has 2 spa_io_buffer
+slots.
+
+The driver will increment a cycle counter for each cycle that it starts. Output ports
+will write to the spa_io_async_buffers (cycle+1)&1 slot and input ports will read from
+(cycle&1) slots. This way the async node will always consume the output of the previous
+cycle and will provide data for the next cycle. They will therefore always add 1 cycle
+of latency in the graph.
+
+A special exception is made for the output ports of the driver node. When the driver is
+started, the output port buffers are copied to the previous cycle spa_io_buffer slot.
+This way, the async nodes will immediately pick up the new data from the driver source.
+
+Because there are 2 buffers in flight on the spa_io_async_buffers io area, the link needs
+to negotiate at least 2 buffers for this to work.
+
+
+## Example
+
+A, B, C are async nodes and have async links between their ports. The async
+link has the spa_io_async_buffers with 2 slots (named 0 and 1) below. All the
+slots are empty.
+
+   +--------+          +-------+          +-------+
+   | A      |          |  B    |          |    C  |
+   |        0 -(   )-> 0       0 -(   )-> 0       |
+   |        1  (   )   1       1  (   )   1       |
+   +--------+          +-------+          +-------+
+
+
+cycle 0: A produces a buffer AB0 on the output port in the (cycle+1)&1 slot (1).
+         B consumes slot cycle&1 (0) with the empty buffer and produces BC0 in slot 1
+         C consumes slot cycle&1 (0) with the empty buffer
+
+   +--------+          +-------+          +-------+
+   | A      |          |  B    |          |    C  |
+   |  (AB0) 0 -(   )-> 0 (   ) 0 -(   )-> 0 (   ) |
+   |        1  (AB0)   1       1  (BC0)   1       |
+   +--------+          +-------+          +-------+
+
+
+cycle 1: A produces a buffer AB1 on the output port in the (cycle+1)&1 slot (0).
+         B consumes slot cycle&1 (1) with buffer AB0 and produces BC1 in slot 0
+         C consumes slot cycle&1 (1) with buffer BC0
+
+   +--------+          +-------+          +-------+
+   | A      |          |  B    |          |    C  |
+   |  (AB1) 0 -(AB1)-> 0 (AB0) 0 -(BC1)-> 0 (BC0) |
+   |        1  (AB0)   1       1  (BC0)   1       |
+   +--------+          +-------+          +-------+
+
+cycle 2: A produces a buffer AB2 on the output port in the (cycle+1)&1 slot (1).
+         B consumes slot cycle&1 (0) with buffer AB1 and produces BC2 in slot 1
+         C consumes slot cycle&1 (0) with buffer BC1
+
+   +--------+          +-------+          +-------+
+   | A      |          |  B    |          |    C  |
+   |  (AB2) 0 -(AB1)-> 0 (AB1) 0 -(BC1)-> 0 (BC1) |
+   |        1  (AB2)   1       1  (BC2)   1       |
+   +--------+          +-------+          +-------+
+
+Each async link adds 1 cycle of latency to the chain. Notice how AB0 from cycle 0,
+produces BC1 in cycle 1, which arrives in node C at cycle 2.
+
+## Latency reporting
+
+Because the latency is really introduced by the links, the additional cycle of
+latency is added when the SPA_PARAM_Latency is copied between the output and
+input ports of a link.
+
+It is possible for a sync node A to be linked to another sync node D and an
+async node B:
+
+   +--------+          +-------+
+   | A      |          |  B    |
+   |  (AB1) 0 -(AB1)-> 0 (AB0) 0 ...
+   |        1 \(AB0)   1       1
+   +--------+  \       +-------+
+                \
+                 \          +-------+
+                  \         |  D    |
+                   -(AB1)-> 0 (AB1) |
+                            |       |
+                            +-------+
+
+The Output latency on A's output port is what A reports. When it copied to the
+input port of B, 1 cycle is added and when it is copied to D, nothing is added.
+
+
 # Remote nodes.

 For remote nodes, the eventfd and the activation is transferred from the server
@ -206,7 +319,8 @@ After they complete (and only when the profiler is active), they will trigger an
 extra eventfd to signal the server that the graph completed. This is used by the
 server to generate the profiler info.

-## Lazy scheduling
+
+# Lazy scheduling

 Normally, a driver will wake up the graph and all the followers need to process
 the data in sync. There are cases where:
--- a/src/pipewire/impl-link.c
+++ b/src/pipewire/impl-link.c
@ -52,8 +52,6 @@ struct impl {
 	struct pw_properties *properties;

 	struct spa_io_buffers io[2];
-
-	bool async;
 };

 /** \endcond */
@ -799,7 +797,7 @@ int pw_impl_link_activate(struct pw_impl_link *this)
 		!impl->input.node->runnable || !impl->output.node->runnable)
 		return 0;

-	if (impl->async) {
+	if (this->async) {
 		io_type = SPA_IO_AsyncBuffers;
 		io_size = sizeof(struct spa_io_async_buffers);
 	} else {
@ -1200,16 +1198,29 @@ static void node_active_changed(void *data, bool active)
 	pw_impl_link_prepare(&impl->this);
 }

+static void node_driver_changed(void *data, struct pw_impl_node *old, struct pw_impl_node *driver)
+{
+	struct impl *impl = data;
+	if (impl->this.async) {
+		/* for async links, input and output port latency depends on if the
+		 * output node is directly driving the input node. */
+		pw_impl_port_recalc_latency(impl->output.port);
+		pw_impl_port_recalc_latency(impl->input.port);
+	}
+}
+
 static const struct pw_impl_node_events input_node_events = {
 	PW_VERSION_IMPL_NODE_EVENTS,
 	.result = input_node_result,
 	.active_changed = node_active_changed,
+	.driver_changed = node_driver_changed,
 };

 static const struct pw_impl_node_events output_node_events = {
 	PW_VERSION_IMPL_NODE_EVENTS,
 	.result = output_node_result,
 	.active_changed = node_active_changed,
+	.driver_changed = node_driver_changed,
 };

 static bool pw_impl_node_can_reach(struct pw_impl_node *output, struct pw_impl_node *input, int hop)
@ -1496,11 +1507,11 @@ struct pw_impl_link *pw_context_create_link(struct pw_context *context,
 	if (this->passive && str == NULL)
 		 pw_properties_set(properties, PW_KEY_LINK_PASSIVE, "true");

-	impl->async = (output_node->async || input_node->async) &&
+	this->async = (output_node->async || input_node->async) &&
 		SPA_FLAG_IS_SET(output->flags, PW_IMPL_PORT_FLAG_ASYNC) &&
 		SPA_FLAG_IS_SET(input->flags, PW_IMPL_PORT_FLAG_ASYNC);

-	if (impl->async)
+	if (this->async)
 		 pw_properties_set(properties, PW_KEY_LINK_ASYNC, "true");

 	spa_hook_list_init(&this->listener_list);
@ -1551,7 +1562,7 @@ struct pw_impl_link *pw_context_create_link(struct pw_context *context,
 	pw_log_info("(%s) (%s) -> (%s) async:%d:%d:%d:%04x:%04x:%d", this->name, output_node->name,
 			input_node->name, output_node->driving,
 			output_node->async, input_node->async,
-			output->flags, input->flags, impl->async);
+			output->flags, input->flags, this->async);

 	pw_impl_port_emit_link_added(output, this);
 	pw_impl_port_emit_link_added(input, this);
--- a/src/pipewire/impl-port.c
+++ b/src/pipewire/impl-port.c
@ -1679,7 +1679,7 @@ int pw_impl_port_for_each_link(struct pw_impl_port *port,
 int pw_impl_port_recalc_latency(struct pw_impl_port *port)
 {
 	struct pw_impl_link *l;
-	struct spa_latency_info latency, *current;
+	struct spa_latency_info latency, *current, other_latency;
 	struct pw_impl_port *other;
 	struct spa_pod *param;
 	struct spa_pod_builder b = { 0 };
@ -1702,7 +1702,14 @@ int pw_impl_port_recalc_latency(struct pw_impl_port *port)
 						port->info.id, other->info.id);
 				continue;
 			}
-			spa_latency_info_combine(&latency, &other->latency[other->direction]);
+			other_latency = other->latency[other->direction];
+			if (l->async && other->node->driver_node != port->node) {
+				/* we add 1 cycle delay from async links */
+				other_latency.min_quantum++;
+				other_latency.max_quantum++;
+			}
+			spa_latency_info_combine(&latency, &other_latency);
+
 			pw_log_debug("port %d: peer %d: latency %f-%f %d-%d %"PRIu64"-%"PRIu64,
 					port->info.id, other->info.id,
 					latency.min_quantum, latency.max_quantum,
@ -1718,7 +1725,15 @@ int pw_impl_port_recalc_latency(struct pw_impl_port *port)
 						port->info.id, other->info.id);
 				continue;
 			}
-			spa_latency_info_combine(&latency, &other->latency[other->direction]);
+			other_latency = other->latency[other->direction];
+			if (l->async && other->node != port->node->driver_node) {
+				/* we only add 1 cycle delay for async links that
+				 * are not from our driver */
+				other_latency.min_quantum++;
+				other_latency.max_quantum++;
+			}
+			spa_latency_info_combine(&latency, &other_latency);
+
 			pw_log_debug("port %d: peer %d: latency %f-%f %d-%d %"PRIu64"-%"PRIu64,
 					port->info.id, other->info.id,
 					latency.min_quantum, latency.max_quantum,
--- a/src/pipewire/private.h
+++ b/src/pipewire/private.h
@ -1022,6 +1022,7 @@ struct pw_impl_link {

 	void *user_data;

+	unsigned int async:1;
 	unsigned int registered:1;
 	unsigned int feedback:1;
 	unsigned int preparing:1;