Skip to content

debug onnx benchmark workflow#13

Closed
Human9000-bit wants to merge 1 commit into
enthropy7:mainfrom
Human9000-bit:onnx-job-fix
Closed

debug onnx benchmark workflow#13
Human9000-bit wants to merge 1 commit into
enthropy7:mainfrom
Human9000-bit:onnx-job-fix

Conversation

@Human9000-bit

Copy link
Copy Markdown
Collaborator

No description provided.

@Human9000-bit

Copy link
Copy Markdown
Collaborator Author

/bench

4 similar comments
@Human9000-bit

Copy link
Copy Markdown
Collaborator Author

/bench

@Human9000-bit

Copy link
Copy Markdown
Collaborator Author

/bench

@Human9000-bit

Copy link
Copy Markdown
Collaborator Author

/bench

@Human9000-bit

Copy link
Copy Markdown
Collaborator Author

/bench

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

ONNX CPU benchmark

GitHub-hosted runner signal, advisory only. Base and head are compared within the same architecture job.

arch case base p50 head p50 delta
aarch64 gemm_relu_256x512x512 1.22 ms 1.21 ms -0.7% flat
aarch64 small_conv_relu_64 194 us 474 us +144.3% slower
aarch64 small_dw_pw_64 168 us 165 us -1.8% faster
aarch64 small_gemm_relu_1x128 8 us 8 us +0.0% flat
aarch64 small_residual_conv_64 439 us 1.09 ms +148.5% slower
aarch64 winograd_3x3_20x20_c256 3.47 ms 36.66 ms +955.4% slower
aarch64 winograd_3x3_80x80_c64 6.18 ms 76.78 ms +1142.6% slower
aarch64 yolo11n_bus_640 79.79 ms 415.10 ms +420.3% slower
aarch64 yolov8n_bus_640 89.37 ms 656.39 ms +634.4% slower
x86_64 gemm_relu_256x512x512 1.00 ms 987 us -1.6% faster
x86_64 small_conv_relu_64 383 us 144 us -62.4% faster
x86_64 small_dw_pw_64 143 us 154 us +7.7% slower
x86_64 small_gemm_relu_1x128 7 us 7 us +0.0% flat
x86_64 small_residual_conv_64 421 us 236 us -43.9% faster
x86_64 winograd_3x3_20x20_c256 2.52 ms 2.75 ms +9.3% slower
x86_64 winograd_3x3_80x80_c64 6.00 ms 5.95 ms -0.9% flat
x86_64 yolo11n_bus_640 86.22 ms 85.61 ms -0.7% flat
x86_64 yolov8n_bus_640 90.12 ms 91.66 ms +1.7% slower

Head op/kernel summary

arch case ops dispatched kernels
aarch64 gemm_relu_256x512x512 Gemm x1, Relu x1 Gemm via blocked-mr4 x1
aarch64 small_conv_relu_64 Conv_Relu x1 Conv_Relu via indirect-nhwc-3x3 x1
aarch64 small_dw_pw_64 FusedDwPw x1 -
aarch64 small_gemm_relu_1x128 Gemm x1, Relu x1 Gemm via row-gemm x1
aarch64 small_residual_conv_64 Conv_Add_Relu x1 Conv_Add_Relu via indirect-nhwc-3x3 x1
aarch64 winograd_3x3_20x20_c256 Conv_Relu x1 Conv_Relu via indirect-nhwc-3x3 x1
aarch64 winograd_3x3_80x80_c64 Conv_Relu x1 Conv_Relu via indirect-nhwc-3x3 x1
aarch64 yolo11n_bus_640 Conv x85, Mul x83, Sigmoid x78, Concat x39, Add x22, Reshape x17, +18 more Conv via nhwc-gemm-prepacked/pw-gemm x44, Conv via indirect-nhwc-3x3 x34, Conv via dw-nhwc-padded/dw-neon x6, MatMul via blocked-mr8 x2, Conv via nhwc-padded/first-layer-rgb x1, Conv_Add via dw-nhwc-padded/dw-neon x1
aarch64 yolov8n_bus_640 Conv x64, Mul x61, Sigmoid x58, Concat x33, Add x17, Reshape x14, +15 more Conv via indirect-nhwc-3x3 x38, Conv via nhwc-gemm-prepacked/pw-gemm x25, Conv via nhwc-padded/first-layer-rgb x1
x86_64 gemm_relu_256x512x512 Gemm x1, Relu x1 Gemm via blocked-mr4 x1
x86_64 small_conv_relu_64 Conv_Relu x1 Conv_Relu via nhwc-padded/im2col-gemm x1
x86_64 small_dw_pw_64 FusedDwPw x1 -
x86_64 small_gemm_relu_1x128 Gemm x1, Relu x1 Gemm via row-gemm x1
x86_64 small_residual_conv_64 Conv_Add_Relu x1 Conv_Add_Relu via nhwc-padded/im2col-gemm x1
x86_64 winograd_3x3_20x20_c256 Conv_Relu x1 Conv_Relu via nhwc-padded/im2col-gemm x1
x86_64 winograd_3x3_80x80_c64 Conv_Relu x1 Conv_Relu via nhwc-padded/im2col-gemm x1
x86_64 yolo11n_bus_640 Conv x85, Mul x83, Sigmoid x78, Concat x39, Add x22, Reshape x17, +18 more Conv via nhwc-padded/im2col-gemm x34, Conv via nhwc-gemm-prepacked/pw-gemm x24, Conv via nhwc-gemm-prepacked/pw-nx16-direct x20, Conv via dw-nhwc-padded/dw-avx-fma x6, MatMul via blocked-mr4 x2, Conv via nhwc-padded/first-layer-rgb x1, +1 more
x86_64 yolov8n_bus_640 Conv x64, Mul x61, Sigmoid x58, Concat x33, Add x17, Reshape x14, +15 more Conv via nhwc-padded/im2col-gemm x38, Conv via nhwc-gemm-prepacked/pw-gemm x17, Conv via nhwc-gemm-prepacked/pw-nx16-direct x8, Conv via nhwc-padded/first-layer-rgb x1

Raw JSON artifacts include per-run min/p50/avg/p95/p99, optimized node counts, yscv CPU dispatch report, and per-node runner profile summaries.

@Human9000-bit

Copy link
Copy Markdown
Collaborator Author

yay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant