Skip to content

Temporal reaggregation processor should only accept pdata if there are 3 outbound slots available and not 2 #2653

@JakeDern

Description

@JakeDern

Pre-filing checklist

  • I searched existing issues and didn't find a duplicate

Component(s)

Rust OTAP dataflow (rust/otap-dataflow/)

Bug Description

The temporal reaggregation processor accept_pdata implementation gates on having one inbound and two outbound slots available, the maximum required to accommodate an incoming pdata.

This does not account for, however, the outbound slot required by the processor when a flush signal comes in. So, we can have the following situation:

  1. Accept pdata with exactly two slots left
  2. Allocate an outbound slot for non aggregable data to be passed through
  3. Overflow the existing aggregating batch (via stream count or ID overflow), thereby triggering a flush and using a second outbound slot
  4. Place the incoming data in the pending buffer after flushing
  5. Have a timer tick/wakeup message come in while the outbound slots are full which causes the processor to crash.

When step (5) happens we could choose to nack the data, but applying backpressure through the engines queues is the generally better solution than getting ourselves into a situation where we have to nack.

The solution is to change this to requiring three outbound slots available before accepting pdata.

Steps to Reproduce

Expected Behavior

Actual Behavior

OTel-Arrow Version

Environment

No response

Configuration

Log Output

Additional Context

No response

Metadata

Metadata

Assignees

Labels

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions