Skip to content

Feature/rvv support#4359

Open
Sherlockzhangjinge wants to merge 5 commits intoalibaba:masterfrom
Sherlockzhangjinge:feature/rvv_support
Open

Feature/rvv support#4359
Sherlockzhangjinge wants to merge 5 commits intoalibaba:masterfrom
Sherlockzhangjinge:feature/rvv_support

Conversation

@Sherlockzhangjinge
Copy link
Copy Markdown
Contributor

Description

Develop the RVV accelerated version of the methods below:
MNNAvgPoolInt8
MNNFloat2Int8
MNNInt8ScaleToFloat
MNNLineDepthWiseInt8AddBiasScaleUnit
MNNMaxPoolInt8
MNNReluWithSlopeChannelInt8

Module

CPU

Type

  • Feature
  • Bugfix
  • Perf
  • Refact
  • Style
  • Doc
  • Test
  • Chore

Sherlockzhangjinge and others added 3 commits April 8, 2026 19:07
Signed-off-by: typer-J <2236066784@qq.com>

Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com>

Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
@wangzhaode wangzhaode self-assigned this Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@wangzhaode wangzhaode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

🔴 Blocking Issues

1. 代码风格检查不通过(CRLF 换行 / 缩进混用 / 未使用代码)

多处违反项目 clang-format 规范(.clang-format: UseTab: Never, IndentWidth: 4):

  • MNNAvgPoolInt8.cpp, MNNInt8ScaleToFloat.cpp, MNNLineDepthWiseInt8AddBiasScaleUnit.cpp, MNNMaxPoolInt8.cpp, MNNReluWithSlopeChannelInt8.cpp — 全部使用 CRLF (\r\n) 换行,与项目其余文件的 LF 换行不一致,会导致 clang-format 和 diff 工具报错
  • Int8FunctionsOpt.cpp 第 2586 行 — 使用 tab 缩进而非 4 空格,与同文件其他行风格不一致
  • MNNReluWithSlopeChannelInt8.cpp 第 13-18 行 — 定义了 ALIMAX/ALIMIN 宏但全文未使用,属于残留代码
  • MNNFloat2Int8.cpp 第 44 行 — 行尾存在 trailing whitespace

修复建议:运行 clang-format -i -style=file 统一格式化,并将 5 个新文件的换行符转为 LF。移除未使用的宏定义。


2. vfcvt.x.f 截断取整 vs 参考实现 roundf() 四舍五入不一致

文件: MNNFloat2Int8.cpp:68, MNNLineDepthWiseInt8AddBiasScaleUnit.cpp:63, MNNReluWithSlopeChannelInt8.cpp:67

RVV 的 vfcvt.x.f.v 指令是向零截断取整,而参考实现(Int8FunctionsOpt.cpp:1710, 1669)使用 roundf() 做四舍五入。例如 2.7f 参考得 3,RVV 得 2-2.7f 参考得 -3,RVV 得 -2。这会导致量化精度偏差,推理结果与 CPU 参考路径不一致。

修复建议:在 vfcvt 前加 +/-0.5f 偏移模拟四舍五入。正数路径:vfcvt(v + 0.5f);负数路径:vfcvt(v - 0.5f)。可借助 vmfgt mask + vmerge 实现分支。


3. MNNFloat2Int8.cppvsetvlmax 返回值不确定,小 VLEN 设备可能越界

文件: MNNFloat2Int8.cpp, 第 34 行

vsetvlmax 取决于硬件 VLEN。VLEN=128 的最小配置下,e32m2vlmax 仅为 2。后续用 vid % 4 构造 channel index 并通过 vloxeiscale[4] gather,当 vl_template < 4 时无法覆盖 4 个 channel,scale/zero 模板不完整。

修复建议:添加 vl_template < 4 的 fallback 标量路径,或改用 e32m1 类型并确保 vlmax >= 4,否则回退到参考实现。


Reviewed by qwen3.6-plus-preview

Copy link
Copy Markdown
Collaborator

@wangzhaode wangzhaode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR adds RVV (RISC-V Vector) accelerated implementations for 6 Int8 quantization functions: MNNAvgPoolInt8, MNNFloat2Int8, MNNInt8ScaleToFloat, MNNLineDepthWiseInt8AddBiasScaleUnit, MNNMaxPoolInt8, and MNNReluWithSlopeChannelInt8. The RVV intrinsics usage looks correct overall.

Blocking Issues

Two CI checks are currently failing and must be fixed before merge:

  1. Commit Message Format: Two commits do not follow the required [Module:Type] Description format:

    • 670284beimplement some methods in Int8FunctionsOpt.cpp with RVV
    • 5f6bce67Fix duplicate assignment for MNNMaxPoolInt8
    • Please squash/reword to match the format, e.g. [CPU:Feature] Add RVV accelerated Int8 functions
  2. Changed Lines Format (clang-format): Multiple files have formatting issues detected by CI:

    • source/backend/cpu/compute/Int8FunctionsOpt.cpp: extern declarations exceed line width and need wrapping; line 2586 uses a tab character instead of spaces for indentation
    • source/backend/cpu/riscv/rvv/MNNReluWithSlopeChannelInt8.cpp: macro definitions and function signature formatting do not conform to clang-format
    • Other new .cpp files likely have similar issues
    • Fix: run clang-format -i -style=file on all changed files

Minor Suggestions

  • The Chinese comment // ---- 循环外预构造周期向量 ---- and // ---- 主循环:纯向量运算,无标量介入 ---- in MNNFloat2Int8.cpp — consider using English comments for consistency with the rest of the codebase.

@Sherlockzhangjinge Could you please fix the above formatting and commit message issues? Thanks!

Reviewed by qwen3.6-plus-preview

…thods by rvv; Fix missing simicolon and format problem

Signed-off-by: typer-J <2236066784@qq.com>

Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com>

Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: typer-J <2236066784@qq.com>

Signed-off-by: Sherlockzhangjinge <zjgzhangjinge@outlook.com>

Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Copy link
Copy Markdown
Collaborator

@wangzhaode wangzhaode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds RVV (RISC-V Vector Extension) accelerated implementations for 6 Int8 quantization operators: MNNAvgPoolInt8, MNNFloat2Int8, MNNInt8ScaleToFloat, MNNLineDepthWiseInt8AddBiasScaleUnit, MNNMaxPoolInt8, and MNNReluWithSlopeChannelInt8. It also applies clang-format to Int8FunctionsOpt.cpp.

Blocking issues

1. MNNInt8ScaleToFloat_RVV — scale/zero vector length mismatch bug

  • File: source/backend/cpu/riscv/rvv/MNNInt8ScaleToFloat.cpp, lines 15–37
  • v_scale and v_zero are initialized with vl=4 (only 4 elements written), but the main loop uses a much larger vl (e.g., VLMAX_e32m4 = 16 when VLEN=128). Elements beyond index 3 are undefined, producing incorrect results.
  • Suggested fix: Use the same vloxei32 + index-modulo pattern as MNNFloat2Int8_RVV to replicate the 4-element scale/zero pattern across the full vector length.

2. Missing #include for struct type definitions — compile error

  • MNNLineDepthWiseInt8AddBiasScaleUnit.cpp uses QuanPostTreatParameters without including its definition.
  • MNNReluWithSlopeChannelInt8.cpp uses QuanPrePostParameters without including its definition.
  • The existing MNNGemmInt8AddBiasScale_16x4_Unit_RVV.cpp correctly includes "../../compute/Int8FunctionsOpt.h" — the new files should do the same.

Minor suggestions

  • Rounding mode: RVV vfcvt_x_f defaults to round-to-nearest-even, while the reference C code uses roundf() (round-half-away-from-zero). Consider using __riscv_vfcvt_x_f_v_i32m4_rm(v, __RISCV_FRM_RMM, vl) if exact parity is needed.
  • MNNFloat2Int8.cpp is missing a trailing newline.
  • The bulk of Int8FunctionsOpt.cpp changes (~500 deleted lines) are clang-format reformatting — consider splitting into a separate commit for easier review.

Reviewed by qwen3.6-plus-preview

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants