Skip to content

fix: align PTO op overloads with pto-isa#747

Open
HecreReed wants to merge 5 commits into
hw-native-sys:mainfrom
HecreReed:codex/trowexpandadd-optional-tmp
Open

fix: align PTO op overloads with pto-isa#747
HecreReed wants to merge 5 commits into
hw-native-sys:mainfrom
HecreReed:codex/trowexpandadd-optional-tmp

Conversation

@HecreReed

Copy link
Copy Markdown
Collaborator

Summary

  • add optional %tmp to pto.trowexpandadd so PTO IR matches the current pto-isa overload set
  • reuse the existing row-expand binary parser/print path and forward the optional scratch operand in EmitC lowering
  • add a focused lit test and update the row-expand docs/examples to show both 3-operand and 4-operand forms

Validation

  • ninja -C build lib/PTO/Transforms/CMakeFiles/obj.PTOTransforms.dir/PTOToEmitC.cpp.o
  • manual syntax-only compile of lib/PTO/IR/PTO.cpp with -Wno-error=covered-switch-default
  • manual syntax-only compile of lib/PTO/Transforms/PTOToEmitC.cpp
  • ninja -C build ptoas is currently blocked by pre-existing unrelated warnings/errors in lib/PTO/IR/PTOAttrs.cpp, lib/PTO/IR/VPTO.cpp, lib/PTO/Transforms/VPTOPtrNormalize.cpp, and lib/PTO/IR/PTO.cpp:435

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds an optional scratch %tmp operand to the pto.trowexpandadd operation to align the PTO IR signature across A2/A3 and A5 architectures. The changes update the operation definition, custom assembly format, verification logic, EmitC lowering, documentation, and tests. The review feedback recommends restoring the row-major layout check for the destination buffer in the verifier, which was accidentally removed, and simplifying the operands vector initialization in the EmitC lowering code using a more idiomatic push_back pattern.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread lib/PTO/IR/PTO.cpp
Comment on lines 9344 to 9345
if (!isRowMajorTileBuf(src0Ty))
return emitOpError("expects src0 to use row-major layout");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The check verifying that dst uses a row-major layout (isRowMajorTileBuf(dstTy)) was removed during this refactoring. Since verifyTileBufSameValidShape and verifyTRowExpandBinaryCore do not necessarily enforce that dst is row-major, this check should be preserved to prevent invalid IR from passing verification.

Suggested change
if (!isRowMajorTileBuf(src0Ty))
return emitOpError("expects src0 to use row-major layout");
if (!isRowMajorTileBuf(src0Ty))
return emitOpError("expects src0 to use row-major layout");
if (!isRowMajorTileBuf(dstTy))
return emitOpError("expects dst to use row-major layout");

Comment on lines +9877 to +9881
SmallVector<Value, 4> operands;
if (tmp)
operands.assign({dst, src0, src1, tmp});
else
operands.assign({dst, src0, src1});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of using assign with duplicated initializer lists, we can initialize the SmallVector with the common operands and conditionally push_back the optional tmp operand. This is cleaner and more idiomatic.

Suggested change
SmallVector<Value, 4> operands;
if (tmp)
operands.assign({dst, src0, src1, tmp});
else
operands.assign({dst, src0, src1});
SmallVector<Value, 4> operands{dst, src0, src1};
if (tmp)
operands.push_back(tmp);

@HecreReed HecreReed changed the title fix: align trowexpandadd optional tmp with pto-isa fix: align PTO op overloads with pto-isa Jun 4, 2026
@reedhecre

reedhecre commented Jun 4, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: fix: align PTO op overloads with pto-isa #747 fix: align PTO op overloads with pto-isa
  • Author: HecreReed
  • Base/Head: main / codex/trowexpandadd-optional-tmp
  • Head SHA: 00158295c29a
  • Trigger: PR 有新提交
  • Generated At: 2026-06-10T14:04:31Z
  • Previous Head SHA: 60a0abedb72a
  • Status: completed

Summary

发现 2 个问题:Coalesce 的 C/Python 绑定把 row/elem 反向映射,且 TQuantOp 的 Python 位置参数调用被无兼容层地破坏。

Findings

  1. P1 Coalesce 的 C/Python 绑定把 `row` 和 `elem` 映射反了 include/pto-c/Dialect/PTO.h:211

方言枚举本身定义的是 row=0, elem=1,但这里新增的公开 C API 却把 MlirPTOCoalesce_Elem = 0MlirPTOCoalesce_Row = 1 写反了。后面的 mlirPTOCoalesceAttrGet()/GetValue() 和 Python 的 CoalesceAttr.get() 都直接按整数 static_cast 转换,所以 pto.CoalesceAttr.get(pto.Coalesce.Row, ctx) 实际会构造出 #pto<coalesce elem>,反之亦然。这会把显式 row/elem coalesce 的语义静默翻转,导致 MGATHER/MSCATTER 结果错误;PR 里新加的 test/samples/Mgather/mgather.pytest/samples/Mscatter/mscatter.py 就会命中这个问题。

  1. P2 `TQuantOp` 的 Python 位置参数 API 被无兼容层地破坏了 include/PTO/IR/PTOOps.td:5263

TQuantOp 这里把 fp 从必选位置参数改成了可选 segment,并且又在 dst 前面插入了 exp/max/scaling。这会改变生成出来的 Python builder 签名;PR 里也已经把样例从旧的 TQuantOp(src, fp, dst, ...) 改成了新的 TQuantOp(src, dst, quant_type, fp=...) 调用方式。问题在于 python/pto/dialects/pto.py 没有像其他兼容包装那样给 TQuantOp 补一个过渡层,所以仓外所有仍在使用旧位置参数形式的 Python 前端代码,合入后都会直接失效。这是明确的 Python API 兼容性回归。

@HecreReed HecreReed force-pushed the codex/trowexpandadd-optional-tmp branch from 28f055b to a1651f5 Compare June 4, 2026 11:58
@HecreReed HecreReed marked this pull request as ready for review June 4, 2026 12:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1651f5f15

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread include/PTO/IR/PTOOps.td
let arguments = (ins
PTODpsType:$src0,
PTODpsType:$src1,
Optional<PTODpsType>:$tmp,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Track the new trowexpandadd scratch operand

When the new 4-operand trowexpandadd form is used, %tmp is lowered as a scratch operand, but TRowExpandAddOp::getEffects still uses PTO_DEFINE_BINARY_EFFECTS(... src0, src1, dst) and never reports a write to tmp unlike the other row-expand ops with optional scratch. The local memory planner discovers scratch buffers from MemoryEffects, so this form can have its temporary treated as non-writing/unused and overlapped with live buffers.

Useful? React with 👍 / 👎.

Comment thread lib/PTO/Transforms/PTOToEmitC.cpp Outdated
Comment on lines +11077 to +11079
if (auto formatAttr = op.getPrintFormatAttr()) {
templateArgVec.push_back(emitc::OpaqueAttr::get(
rewriter.getContext(), printFormatTok(formatAttr.getValue())));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject formatted tprint without tmp

If printFormat is set but no tmp operand is supplied, this lowering emits TPRINT<Format>(src), while the new op contract only describes TPRINT(src) and TPRINT<Format>(src, tmp). The verifier currently accepts that combination, so a user can write IR that passes verification but produces a C++ call with the wrong PTO-ISA overload.

Useful? React with 👍 / 👎.

@HecreReed HecreReed force-pushed the codex/trowexpandadd-optional-tmp branch 3 times, most recently from 129b617 to ed00542 Compare June 6, 2026 15:51
@HecreReed

Copy link
Copy Markdown
Collaborator Author

/review

@reedhecre

Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

收到 /review,正在执行 Codex review。

Findings

Review in progress.

@reedhecre

Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

PR #747 未同步 Python 绑定中的新属性枚举,导致更新后的 mgather/mscatter Python 样例会直接失败。

Findings

  1. P2 Python 侧没有暴露新的 coalesce 属性,更新后的 mgather/mscatter 样例无法运行 lib/Bindings/Python/PTOModule.cpp:671

这个 PR 把 pto.mgather/pto.mscattercoalesce 变成必填属性(include/PTO/IR/PTOOps.td),并在 test/samples/Mgather/mgather.py:31test/samples/Mscatter/mscatter.py:31 开始调用 pto.CoalesceAttr.get(pto.Coalesce.Row, ctx)。但绑定层只补了 QuantType:C API 只新增了 mlirPTOQuantTypeAttr*include/pto-c/Dialect/PTO.h:234, lib/CAPI/Dialect/PTO.cpp:488),pybind 也只导出了 QuantType/QuantTypeAttr(这里的 lib/Bindings/Python/PTOModule.cpp:671),python/pto/dialects/pto.py 也没有重新导出 Coalesce/CoalesceAttr。结果是新的 Python 样例会在运行时因为缺少 pto.Coalesce/pto.CoalesceAttr 符号而直接失败,无法构造这次 PR 引入的新 overload contract。

@HecreReed HecreReed force-pushed the codex/trowexpandadd-optional-tmp branch 3 times, most recently from 23fea8b to c2bf35c Compare June 9, 2026 08:38
@HecreReed

Copy link
Copy Markdown
Collaborator Author

/run a3

@reedhecre

Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre

Copy link
Copy Markdown

A3 板测完成(有跳过)

  • 触发方式:manual
  • 源码提交:6cc8ac916e4a
  • 结果汇总:OK 221 / FAIL 0 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260609_165606_manual_pr747.log
  • 结果 TSV:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260609_165606_manual_pr747.tsv
  • 手动指令:/run a3
  • 触发人:HecreReed
  • 触发评论:fix: align PTO op overloads with pto-isa #747 (comment)

@zhangstevenunity zhangstevenunity left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: align PTO op overloads with pto-isa

Thanks for this. First, a heads-up that the change is much larger than the summary: the description only mentions optional %tmp on pto.trowexpandadd, but the diff also adds MXFP8 to pto.tquant, preQuantScalar+reluPreMode to textract/tinsert, a required coalesce + scatterConflict to mgather/mscatter, printFormat+tmp to tprint, custom assembly formats for 5 ops, and unrelated rewrites in VPTOPtrNormalize.cpp / ExpandTileOp.cpp. Please expand the description so reviewers and users see the real surface area.

Requesting changes

1. The 11 new lit tests appear to have never run. The Validation section says ninja ptoas fails (on -Werror, in the very files this PR edits), and the PR sprinkles [[maybe_unused]] over freshly-added dead symbols (see inline on PTOAttrs.cpp / VPTO.cpp). With no working ptoas binary, none of the new test/lit/pto/*.pto cases could have executed. Please land a green ptoas and run the full lit suite before merge.

2. tquant_mxfp8_a5.pto A3 negative check looks unreachable (inline on the test). Its memref operands hit the shouldBypassDecodedMemrefVerifier early-exit, so the A2/A3 MXFP8 gate never runs and --pto-arch=a3 lowers it like a5.

3. coalesce is now required on every arch, but the docs call it A5-only (inline on the ODS). That is both a doc/impl mismatch and a backward-incompatible IR change, made a hard failure by the new up-front mlir::verify in ptoas.cpp.

Non-blocking

4. PTOBC v0 cannot serialize the new operand forms. tools/ptobc/generated/ptobc_opcodes_v0.h pins fixed operand counts (mgather/mscatter=3, tquant=3, trowexpandadd=3, tprint=1, textract/tinsert=4) and Encoder::encodeKnownOpOperands (mlir_encode.cpp:475) throws on mismatch, so the tmp / preQuantScalar / MXFP8 exp+max+scaling overloads can't round-trip to .ptobc. Not exercised by CI, but a real gap — the same one PR #724 ultimately fixed by routing the extended forms through the generic opcode. Worth doing the same or calling it out explicitly.

5. Scope creep. The VPTOPtrNormalize.cpp changes (the computeSubviewElementOffset static/dynamic-offset rewrite + new ConvertPtrNormalizeMemRefCastOp pattern) and the ExpandTileOp.cpp StringMap dangling-StringRef fix look reasonable on their own but are unrelated to op-overload alignment and have no coverage here. They would be safer as separate PRs.

Nice work

  • Emitting the coalesce token straight from the verified attribute (and deleting the old physical-shape re-check in PTOToEmitC) is the right call — it avoids the false-reject that the shape-inference approach in PR #724 had.
  • The multiplyLayoutInts overflow guards in inferLayout are a good hardening.

Happy to re-review once the suite runs clean.

// INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
// See LICENSE in the root of the software repository for the full text of the License.
// RUN: ptoas --pto-arch=a5 %s 2>&1 | FileCheck %s --check-prefix=A5
// RUN: not ptoas --pto-arch=a3 %s 2>&1 | FileCheck %s --check-prefix=A3ERR

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This A3 negative check looks unreachable. The op's operands are all memref, so TQuantOp::verify() returns at the shouldBypassDecodedMemrefVerifier early-exit (PTO.cpp:9018) right after the structural check and before dispatchVerifierByArch — so the A2/A3 gate that prints MXFP8 quantization is only supported on A5 targets (PTO.cpp:9065) never runs. PTOQuantToEmitC has no arch check either, so --pto-arch=a3 lowers this to TQUANT<...MXFP8...> just like a5 and exits 0, which makes not ptoas fail this test.

The gate does work for the normal pre-lowering flow — so to exercise it, give the A3-reject case tile_buf operands (the verifier runs with arch dispatch before PTOViewToMemref). As written, the memref form only ever reaches the bypass, so this RUN line doesn't test what it claims.

Comment thread include/PTO/IR/PTOOps.td Outdated
PTODpsType:$mem,
PTODpsType:$idx,
PTODpsType:$dst,
PTO_CoalesceAttr:$coalesce,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making coalesce a required (non-default) attribute means MGatherOp::verify / MScatterOp::verify now reject a missing coalesce on every arch — there is no isTargetArchA5 gate on the requires attribute 'coalesce' check. But docs/PTO_IR_manual.md labels this a 'Required A5 coalesce mode'.

Two problems: (a) doc vs impl mismatch — it is required on A2/A3 as well; (b) backward-incompatible IR change — any existing pto.mgather/pto.mscatter without coalesce now fails to verify, and the newly-added up-front mlir::verify in ptoas.cpp turns that into an immediate driver error. You updated the two in-repo issue664 tests and the python samples, but out-of-tree IR (and any serialized .ptobc) breaks. If this is intended, please fix the docs; if A2/A3 should keep the previous shape-inferred default, gate the requirement to A5.

Comment thread lib/PTO/IR/PTOAttrs.cpp Outdated
namespace {

constexpr unsigned kI32BitWidth = 32;
[[maybe_unused]] constexpr int32_t kFractalSize16 = 16;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kFractalSize16/32/1024 are added but never referenced (only kFractalSize512 is used) and silenced with [[maybe_unused]]. Please remove them rather than suppress. Adding dead code annotated [[maybe_unused]] to satisfy -Werror is a signal the build isn't actually clean — which matches the PR note that ninja ptoas fails. Same pattern with hasAll/hasAny in VPTO.cpp.

Comment thread lib/PTO/IR/VPTO.cpp Outdated
return success();
}

[[maybe_unused]] static bool hasAll(Value first, Value second, Value third) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code: these hasAll/hasAny free functions are never called. The hasAll/hasAny used later (around line 3111) are local bool variables, not these helpers. Please delete them rather than mark [[maybe_unused]].

@HecreReed HecreReed force-pushed the codex/trowexpandadd-optional-tmp branch 2 times, most recently from 4acf3c2 to 4fad259 Compare June 9, 2026 12:04
@HecreReed

Copy link
Copy Markdown
Collaborator Author

/review

@HecreReed HecreReed force-pushed the codex/trowexpandadd-optional-tmp branch from 8dbce96 to 0015829 Compare June 10, 2026 09:30
@reedhecre

Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

收到 /review,正在执行 Codex review。

Findings

Review in progress.

@reedhecre

Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

发现 2 个明确的兼容性/合同问题:新的 C API Coalesce 枚举值与 IR 定义相反,且现有 scatter_conflict 枚举值被重排,会让外部调用方或已序列化产物得到错误语义。

Findings

  1. P2 New C API `MlirPTOCoalesce` constants are reversed relative to the IR enum include/pto-c/Dialect/PTO.h:211

PTO_CoalesceEnum in the IR is still Row = 0, Elem = 1, but the new public C enum is declared as Elem = 0, Row = 1. The new helpers in lib/CAPI/Dialect/PTO.cpp then static_cast directly between the C enum and the IR enum, so any external C caller that uses the named constants will silently request the opposite coalesce mode. That means MlirPTOCoalesce_Row produces #pto<coalesce elem> and vice versa, which changes the selected MGATHER/MSCATTER overloads and breaks the new API contract.

  1. P2 `scatter_conflict` renumbers an existing enum and flips semantics for serialized/numeric users include/PTO/IR/PTOAttrs.td:1028

This PR changes the underlying numeric values of the existing scatter_conflict enum from Default=0, Last=1 on origin/main to Last=0, Default=1. Textual spellings still parse, but any persisted PTOBC/bytecode, generic IR, or downstream code that relies on the old numeric values will silently decode to the opposite conflict mode after this change. Because scatter_conflict affects MSCATTER write-conflict behavior, this is a real compatibility regression with incorrect runtime semantics for existing artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants