fix: align PTO op overloads with pto-isa#747
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds an optional scratch %tmp operand to the pto.trowexpandadd operation to align the PTO IR signature across A2/A3 and A5 architectures. The changes update the operation definition, custom assembly format, verification logic, EmitC lowering, documentation, and tests. The review feedback recommends restoring the row-major layout check for the destination buffer in the verifier, which was accidentally removed, and simplifying the operands vector initialization in the EmitC lowering code using a more idiomatic push_back pattern.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (!isRowMajorTileBuf(src0Ty)) | ||
| return emitOpError("expects src0 to use row-major layout"); |
There was a problem hiding this comment.
The check verifying that dst uses a row-major layout (isRowMajorTileBuf(dstTy)) was removed during this refactoring. Since verifyTileBufSameValidShape and verifyTRowExpandBinaryCore do not necessarily enforce that dst is row-major, this check should be preserved to prevent invalid IR from passing verification.
| if (!isRowMajorTileBuf(src0Ty)) | |
| return emitOpError("expects src0 to use row-major layout"); | |
| if (!isRowMajorTileBuf(src0Ty)) | |
| return emitOpError("expects src0 to use row-major layout"); | |
| if (!isRowMajorTileBuf(dstTy)) | |
| return emitOpError("expects dst to use row-major layout"); |
| SmallVector<Value, 4> operands; | ||
| if (tmp) | ||
| operands.assign({dst, src0, src1, tmp}); | ||
| else | ||
| operands.assign({dst, src0, src1}); |
There was a problem hiding this comment.
Instead of using assign with duplicated initializer lists, we can initialize the SmallVector with the common operands and conditionally push_back the optional tmp operand. This is cleaner and more idiomatic.
| SmallVector<Value, 4> operands; | |
| if (tmp) | |
| operands.assign({dst, src0, src1, tmp}); | |
| else | |
| operands.assign({dst, src0, src1}); | |
| SmallVector<Value, 4> operands{dst, src0, src1}; | |
| if (tmp) | |
| operands.push_back(tmp); |
Codex Review该评论由 review 机器人自动更新。
Summary发现 2 个问题:Coalesce 的 C/Python 绑定把 row/elem 反向映射,且 Findings
方言枚举本身定义的是
|
28f055b to
a1651f5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1651f5f15
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let arguments = (ins | ||
| PTODpsType:$src0, | ||
| PTODpsType:$src1, | ||
| Optional<PTODpsType>:$tmp, |
There was a problem hiding this comment.
Track the new trowexpandadd scratch operand
When the new 4-operand trowexpandadd form is used, %tmp is lowered as a scratch operand, but TRowExpandAddOp::getEffects still uses PTO_DEFINE_BINARY_EFFECTS(... src0, src1, dst) and never reports a write to tmp unlike the other row-expand ops with optional scratch. The local memory planner discovers scratch buffers from MemoryEffects, so this form can have its temporary treated as non-writing/unused and overlapped with live buffers.
Useful? React with 👍 / 👎.
| if (auto formatAttr = op.getPrintFormatAttr()) { | ||
| templateArgVec.push_back(emitc::OpaqueAttr::get( | ||
| rewriter.getContext(), printFormatTok(formatAttr.getValue()))); |
There was a problem hiding this comment.
Reject formatted tprint without tmp
If printFormat is set but no tmp operand is supplied, this lowering emits TPRINT<Format>(src), while the new op contract only describes TPRINT(src) and TPRINT<Format>(src, tmp). The verifier currently accepts that combination, so a user can write IR that passes verification but produces a C++ call with the wrong PTO-ISA overload.
Useful? React with 👍 / 👎.
129b617 to
ed00542
Compare
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryPR #747 未同步 Python 绑定中的新属性枚举,导致更新后的 mgather/mscatter Python 样例会直接失败。 Findings
这个 PR 把 |
23fea8b to
c2bf35c
Compare
|
/run a3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A3 板测完成(有跳过)
|
zhangstevenunity
left a comment
There was a problem hiding this comment.
Review: align PTO op overloads with pto-isa
Thanks for this. First, a heads-up that the change is much larger than the summary: the description only mentions optional %tmp on pto.trowexpandadd, but the diff also adds MXFP8 to pto.tquant, preQuantScalar+reluPreMode to textract/tinsert, a required coalesce + scatterConflict to mgather/mscatter, printFormat+tmp to tprint, custom assembly formats for 5 ops, and unrelated rewrites in VPTOPtrNormalize.cpp / ExpandTileOp.cpp. Please expand the description so reviewers and users see the real surface area.
Requesting changes
1. The 11 new lit tests appear to have never run. The Validation section says ninja ptoas fails (on -Werror, in the very files this PR edits), and the PR sprinkles [[maybe_unused]] over freshly-added dead symbols (see inline on PTOAttrs.cpp / VPTO.cpp). With no working ptoas binary, none of the new test/lit/pto/*.pto cases could have executed. Please land a green ptoas and run the full lit suite before merge.
2. tquant_mxfp8_a5.pto A3 negative check looks unreachable (inline on the test). Its memref operands hit the shouldBypassDecodedMemrefVerifier early-exit, so the A2/A3 MXFP8 gate never runs and --pto-arch=a3 lowers it like a5.
3. coalesce is now required on every arch, but the docs call it A5-only (inline on the ODS). That is both a doc/impl mismatch and a backward-incompatible IR change, made a hard failure by the new up-front mlir::verify in ptoas.cpp.
Non-blocking
4. PTOBC v0 cannot serialize the new operand forms. tools/ptobc/generated/ptobc_opcodes_v0.h pins fixed operand counts (mgather/mscatter=3, tquant=3, trowexpandadd=3, tprint=1, textract/tinsert=4) and Encoder::encodeKnownOpOperands (mlir_encode.cpp:475) throws on mismatch, so the tmp / preQuantScalar / MXFP8 exp+max+scaling overloads can't round-trip to .ptobc. Not exercised by CI, but a real gap — the same one PR #724 ultimately fixed by routing the extended forms through the generic opcode. Worth doing the same or calling it out explicitly.
5. Scope creep. The VPTOPtrNormalize.cpp changes (the computeSubviewElementOffset static/dynamic-offset rewrite + new ConvertPtrNormalizeMemRefCastOp pattern) and the ExpandTileOp.cpp StringMap dangling-StringRef fix look reasonable on their own but are unrelated to op-overload alignment and have no coverage here. They would be safer as separate PRs.
Nice work
- Emitting the
coalescetoken straight from the verified attribute (and deleting the old physical-shape re-check inPTOToEmitC) is the right call — it avoids the false-reject that the shape-inference approach in PR #724 had. - The
multiplyLayoutIntsoverflow guards ininferLayoutare a good hardening.
Happy to re-review once the suite runs clean.
| // INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. | ||
| // See LICENSE in the root of the software repository for the full text of the License. | ||
| // RUN: ptoas --pto-arch=a5 %s 2>&1 | FileCheck %s --check-prefix=A5 | ||
| // RUN: not ptoas --pto-arch=a3 %s 2>&1 | FileCheck %s --check-prefix=A3ERR |
There was a problem hiding this comment.
This A3 negative check looks unreachable. The op's operands are all memref, so TQuantOp::verify() returns at the shouldBypassDecodedMemrefVerifier early-exit (PTO.cpp:9018) right after the structural check and before dispatchVerifierByArch — so the A2/A3 gate that prints MXFP8 quantization is only supported on A5 targets (PTO.cpp:9065) never runs. PTOQuantToEmitC has no arch check either, so --pto-arch=a3 lowers this to TQUANT<...MXFP8...> just like a5 and exits 0, which makes not ptoas fail this test.
The gate does work for the normal pre-lowering flow — so to exercise it, give the A3-reject case tile_buf operands (the verifier runs with arch dispatch before PTOViewToMemref). As written, the memref form only ever reaches the bypass, so this RUN line doesn't test what it claims.
| PTODpsType:$mem, | ||
| PTODpsType:$idx, | ||
| PTODpsType:$dst, | ||
| PTO_CoalesceAttr:$coalesce, |
There was a problem hiding this comment.
Making coalesce a required (non-default) attribute means MGatherOp::verify / MScatterOp::verify now reject a missing coalesce on every arch — there is no isTargetArchA5 gate on the requires attribute 'coalesce' check. But docs/PTO_IR_manual.md labels this a 'Required A5 coalesce mode'.
Two problems: (a) doc vs impl mismatch — it is required on A2/A3 as well; (b) backward-incompatible IR change — any existing pto.mgather/pto.mscatter without coalesce now fails to verify, and the newly-added up-front mlir::verify in ptoas.cpp turns that into an immediate driver error. You updated the two in-repo issue664 tests and the python samples, but out-of-tree IR (and any serialized .ptobc) breaks. If this is intended, please fix the docs; if A2/A3 should keep the previous shape-inferred default, gate the requirement to A5.
| namespace { | ||
|
|
||
| constexpr unsigned kI32BitWidth = 32; | ||
| [[maybe_unused]] constexpr int32_t kFractalSize16 = 16; |
There was a problem hiding this comment.
kFractalSize16/32/1024 are added but never referenced (only kFractalSize512 is used) and silenced with [[maybe_unused]]. Please remove them rather than suppress. Adding dead code annotated [[maybe_unused]] to satisfy -Werror is a signal the build isn't actually clean — which matches the PR note that ninja ptoas fails. Same pattern with hasAll/hasAny in VPTO.cpp.
| return success(); | ||
| } | ||
|
|
||
| [[maybe_unused]] static bool hasAll(Value first, Value second, Value third) { |
There was a problem hiding this comment.
Dead code: these hasAll/hasAny free functions are never called. The hasAll/hasAny used later (around line 3111) are local bool variables, not these helpers. Please delete them rather than mark [[maybe_unused]].
4acf3c2 to
4fad259
Compare
|
/review |
8dbce96 to
0015829
Compare
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
Summary发现 2 个明确的兼容性/合同问题:新的 C API Findings
This PR changes the underlying numeric values of the existing |
Summary
%tmptopto.trowexpandaddso PTO IR matches the currentpto-isaoverload setValidation
ninja -C build lib/PTO/Transforms/CMakeFiles/obj.PTOTransforms.dir/PTOToEmitC.cpp.olib/PTO/IR/PTO.cppwith-Wno-error=covered-switch-defaultlib/PTO/Transforms/PTOToEmitC.cppninja -C build ptoasis currently blocked by pre-existing unrelated warnings/errors inlib/PTO/IR/PTOAttrs.cpp,lib/PTO/IR/VPTO.cpp,lib/PTO/Transforms/VPTOPtrNormalize.cpp, andlib/PTO/IR/PTO.cpp:435