Add 12 Phase 5 tests for the AECP/AEM entity model: - READ_DESCRIPTOR for existing and non-existent descriptors - AECP packet filtering (wrong EtherType, wrong subtype) - Unsupported AECP message types (ADDRESS_ACCESS, etc.) - Unimplemented AEM commands (REBOOT, etc.) - ACQUIRE_ENTITY and LOCK_ENTITY for legacy mode - Milan ENTITY_AVAILABLE, LOCK_ENTITY (lock/contention/unlock) - Milan LOCK_ENTITY for non-entity descriptors - Milan ACQUIRE_ENTITY returns NOT_SUPPORTED - Milan READ_DESCRIPTOR Also adds Milan test server helper with properly sized entity descriptor for lock state, and AECP/AEM packet builder utility. Updates avb-bugs.md with 3 new bugs found (bugs #6-#8). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 KiB
AVB Module Bugs Found via Test Suite
The following bugs were discovered by building a software test harness for the AVB protocol stack. All have been fixed in the accompanying patch series.
1. Heap corruption in server_destroy_descriptors
File: src/modules/module-avb/internal.h
Commit: 69c721006
server_destroy_descriptors() called free(d->ptr) followed by
free(d), but d->ptr points into the same allocation as d
(set via SPA_PTROFF(d, sizeof(struct descriptor), void) in
server_add_descriptor()). This is a double-free / heap corruption
that could cause crashes or memory corruption when tearing down an
AVB server.
Fix: Remove the erroneous free(d->ptr) call.
2. NULL pointer dereference in MSRP notify dispatch
File: src/modules/module-avb/msrp.c, src/modules/module-avb/mvrp.c
Commit: b056e9f85
msrp_notify() unconditionally calls dispatch[a->attr.type].notify()
but the dispatch table entry for AVB_MSRP_ATTRIBUTE_TYPE_TALKER_FAILED
has notify = NULL. If a talker-failed attribute receives a registrar
state change (e.g., RX_NEW triggers NOTIFY_NEW), this crashes with
a NULL pointer dereference. The same unguarded pattern exists in
mvrp_notify().
Fix: Add if (dispatch[a->attr.type].notify) NULL check before
calling, matching the defensive pattern already used in the encode path.
3. MRP NEW messages never transmitted
File: src/modules/module-avb/mrp.h, src/modules/module-avb/mrp.c,
src/modules/module-avb/msrp.c, src/modules/module-avb/mvrp.c
Commit: bc2c41daa
AVB_MRP_SEND_NEW was defined as 0. The MSRP and MVRP event handlers
skip attributes with if (!a->attr.mrp->pending_send), treating 0 as
"no pending send". Since the MRP state machine sets pending_send to
AVB_MRP_SEND_NEW (0) when an attribute in state VN or AN receives a
TX event, NEW messages were silently dropped instead of being
transmitted. This violates IEEE 802.1Q which requires NEW messages to
be sent when an attribute is first declared.
In practice, the attribute would cycle through VN -> AN -> AA over successive TX events, eventually sending a JOINMT instead of the initial NEW. The protocol still functioned because JOINMT also registers the attribute, but the initial declaration was lost.
Fix: Shift all AVB_MRP_SEND_* values to start at 1, so that 0
unambiguously means "no send pending". Update MSRP and MVRP encoders
to subtract 1 when encoding to the IEEE 802.1Q wire format.
4. ACMP error responses sent with wrong message type
File: src/modules/module-avb/acmp.c
Commit: 9f4147104
In handle_connect_tx_command() and handle_disconnect_tx_command(),
AVB_PACKET_ACMP_SET_MESSAGE_TYPE() is called after the goto done
jump target. When find_stream() fails (returns NULL), the code jumps
to done: without setting the message type, so the error response is
sent with the original command message type (e.g.,
CONNECT_TX_COMMAND = 0) instead of the correct response type
(CONNECT_TX_RESPONSE = 1).
A controller receiving this malformed response would not recognize it as a response to its command and would eventually time out.
Fix: Move AVB_PACKET_ACMP_SET_MESSAGE_TYPE() before the
find_stream() call so the response type is always set correctly.
5. ACMP pending_destroy skips controller cleanup
File: src/modules/module-avb/acmp.c
Commit: 9f4147104
pending_destroy() iterates with list_id < PENDING_CONTROLLER
(where PENDING_CONTROLLER = 2), which only cleans up
PENDING_TALKER (0) and PENDING_LISTENER (1) lists, skipping
PENDING_CONTROLLER (2). Any pending controller requests leak on
shutdown.
Fix: Change < to <= to include the controller list.
6. Legacy AECP handlers read payload at wrong offset
File: src/modules/module-avb/aecp-aem.c
handle_acquire_entity_avb_legacy() and handle_lock_entity_avb_legacy()
assign const struct avb_packet_aecp_aem *p = m; where m is the full
ethernet frame (starting with struct avb_ethernet_header). The handlers
then access p->payload to read the acquire/lock fields, but this reads
from m + offsetof(avb_packet_aecp_aem, payload) instead of the correct
m + sizeof(avb_ethernet_header) + offsetof(avb_packet_aecp_aem, payload).
This causes descriptor_type and descriptor_id to be read from the
wrong position, leading to incorrect descriptor lookups.
All other AEM command handlers (e.g., handle_read_descriptor_common)
correctly derive p via SPA_PTROFF(h, sizeof(*h), void).
Fix: Change const struct avb_packet_aecp_aem *p = m; to properly
skip the ethernet header:
const struct avb_ethernet_header *h = m;
const struct avb_packet_aecp_aem *p = SPA_PTROFF(h, sizeof(*h), void);
7. Milan LOCK_ENTITY error response uses wrong packet pointer
File: src/modules/module-avb/aecp-aem-cmds-resps/cmd-lock-entity.c
In handle_cmd_lock_entity_milan_v12(), when server_find_descriptor()
returns NULL, reply_status() is called with p (the AEM packet pointer
past the ethernet header) instead of m (the full ethernet frame).
reply_status() assumes its third argument is the full frame and casts
it as struct avb_ethernet_header *. With the wrong pointer, the
response ethernet header (including destination MAC) is corrupted.
Fix: Change reply_status(aecp, ..., p, len) to
reply_status(aecp, ..., m, len).
8. Lock entity re-lock timeout uses wrong units
File: src/modules/module-avb/aecp-aem-cmds-resps/cmd-lock-entity.c
When a controller that already holds the lock sends another lock request (to refresh it), the expire timeout is extended by:
lock->base_info.expire_timeout +=
AECP_AEM_LOCK_ENTITY_EXPIRE_TIMEOUT_SECOND;
AECP_AEM_LOCK_ENTITY_EXPIRE_TIMEOUT_SECOND is 60 (raw seconds),
but expire_timeout is in nanoseconds. This adds only 60 nanoseconds
instead of 60 seconds. The initial lock correctly uses
AECP_AEM_LOCK_ENTITY_EXPIRE_TIMEOUT_SECOND * SPA_NSEC_PER_SEC.
Fix: Multiply by SPA_NSEC_PER_SEC to match the nanosecond units.