Skip to content

Comprehensive ROCm helper for centralized Windows package integration and installation support#1629

Draft
NeuralFault wants to merge 17 commits intoLykosAI:mainfrom
NeuralFault:universal-rocm
Draft

Comprehensive ROCm helper for centralized Windows package integration and installation support#1629
NeuralFault wants to merge 17 commits intoLykosAI:mainfrom
NeuralFault:universal-rocm

Conversation

@NeuralFault
Copy link
Copy Markdown
Contributor

@NeuralFault NeuralFault commented May 2, 2026

Introduces significant improvements to AMD GPU (ROCm) support and consolidates Windows ROCm support behind a shared helper and expands AMD GPU coverage across the current Windows-native ROCm path. The result is a more consistent install and launch flow for ROCm-capable packages, less duplicated package-specific logic, and broader support from Vega/GCN5 through the entire RDNA lineup. It also establishes the shared ROCm helper foundation that ComfyUI and Wan2GP now use directly, with the same model intended to be reused by other AMD/ROCm-capable packages going forward.

ROCm Support Integration and Refactoring:

  • Introduced the IRocmPackageHelper dependency to PackageFactory, ComfyUI, and Wan2GP, so ROCm compatibility checks, runtime/install context resolution, Windows-native package installation, and launch environment construction all flow through the same shared service instead of being reimplemented per package. This centralizes the Windows ROCm path and makes it easier to extend the same behavior to additional packages later. [1] [2] [3] [4] [5] [6] [7] [8]

  • Refactored the Windows ROCm path in ComfyUI around the shared helper and a package-owned WindowsRocmProfile, replacing hardcoded install/index handling with shared compatibility, install, and environment policy. This keeps the package-specific behavior limited to the pieces that actually need to remain package-specific while letting the helper own the common ROCm workflow. [1] [2] [3]

  • Updated ROCm support detection and launch environment injection to use the centralized helper throughout the package startup. This makes ROCm eligibility checks, EnVar defaults, and runtime-specific overrides consistent across packages, while still allowing package-owned extras such as ComfyUI-specific COMFYUI_ENABLE_MIOPEN. [1] [2] [3]
    This also fixes user-set EnVars in SM settings/Environment Variables not overriding the package configured EnVars due to immutability coded in comfyui.cs's original Windows-ROCm package specific handling. Environment Variable injection flow is as follows: Helper Defaults > Package config > User-set. So the user-set variables are added as last step pulling them from SettingsManager, before finally being set into the package's launch flow. Prioritizing user variables over any previously injected variable of the same key if the user wishes to disable or modify a default variable.

GPU Architecture and Compatibility Improvements:

  • Extended the AMD GPU architecture detection matrix in GpuInfo to cover a wider set of Vega/GCN5, RDNA1, RDNA2, RDNA3/3.5, related handheld/mobile variants, and adding handling support for R9600 RDNA4 Pro GPU that was previously absent. Improving Windows ROCm coverage across the supported lineup instead of limiting support to a narrower subset of cards, as TheRock ROCm Technical Preview PyTorch builds exist now for Vega dGPUs, RDNA1 dGPUs, and practically the entire RDNA2 family. From Vega 56 all the way to RX 9070/R9700. This still excludes Radeon Instinct Vega/CDNA HPC/Datacenter GPUs due no official driver support and/or needing custom hacked drivers for Windows. [1] [2]

  • Refactored ROCm compatibility checks in GpuInfo and HardwareHelper to use shared WindowsRocmSupport logic, removing duplicated support tables and consolidating both support detection and architecture-based policy decisions in one place as a single source of truth in the domain of the ROCm helper. Keeping GpuInfo and HardwareHelper specifically handling just GPU-name > gfxarch translation, and extracting of hardware information from the OS for the GPU installed on the user's system respectively.
    Simplifying further addition of future released GPU gfxarch translation, along with installation indexes and special environmental handling, to 2 dedicated files which can be updated and automatically apply globally to packages wired into the ROCm helper's call paths. [1] [2]

Other Installation and Launch Improvements:

  • Adjusted ComfyUI launch defaults to account for Windows ROCm architecture differences, including defaulting to legacy Windows ROCm GPUs to Quad Cross-Attention where that remains the better default. This keeps package launch behavior aligned with the shared architecture policy without hardcoding that policy in multiple places. [1] [2]

  • Updated package launch environment injection to use ROCm helper-generated variables and shared defaults, so ROCm-enabled installs consistently receive the expected runtime configuration on Windows while still leaving room for package-specific additions where needed. [1] [2] [3]


In Progress / Considerations for further development while still Draft PR status
  • [Abandoned for future implementation revisit] Expand the shared Windows ROCm helper wiring into additional packages, particularly A3 WebUI and SDForge variants (Forge / reForge), so they can reuse the same compatibility checks, package selection, install flow, and launch environment defaults now used by ComfyUI and Wan2GP. Adding the expansive support provided by the ROCm helper to these most popular additional packages.
  • [Implemented] Improve SwarmUI Windows ROCm integration so the default ROCm launch environment is passed through when SwarmUI starts its self-launched ComfyUI backend, while also layering in user-set variables in SM Settings. This never happened previously, only user set variables were passed from SettingsManager, So variables that were hardcoded in original Win/ROCm handling in ComfyUI never got passed to the SwarmUI Comfy Self-Start backend leading to degraded performance unless the user specifically set them in SM Settings.
  • Evaluate adding Flash Attention 2 support for legacy-architecture installs through AMD AITER using custom builds by 0xDELUXA, either as a default path where stable enough or as an optional Package Command action. With consideration for gating to RDNA2 and older. (RDNA3+ gets its own FA2 via AOTriton which is enabled by default if using modern arch's)
  • Evaluate adding Sage Attention 1.0 as an optional Package Command install, including any required ops patching (sourced from ComfyUI-Zluda installation scripting needed for the Windows ROCm environment along with triton-windows. With potiential gating to RDNA2 and older. (AOTriton Flash Attn 2 in RDNA3+ is preferrable for modern arch's)
  • [ROCm runtime implemented by default] Consider defaulting Windows ROCm installs to the runtime-only ROCm module set rather than the full ROCm + SDK module set, with fuller SDK components installable later through Package Commands if needed. The full SDK stack modules add considerable footprint to the overall size of the venv, installing just the ROCm core runtime modules can be preferrable but needs testing to verify. Potentially making installation of the SDK modules optional if the user intends to compile/build modules needing it at any point.
  • Consider offering an optimized ROCm-aware bitsandbytes install path through the Package Command menu for packages and workflows that benefit from it. These would be prebuilt optimized wheels by 0xDELUXA that cover the entire Win-ROCm supported GPU lineup. Improves fp8 and other low quantization performance for GPUs that support it (needs verification of support variance)
Other future considerations
  • Refactoring and package integration for future Linux install support for passing the same default environment variables for better user experience and performance when using AMD GPUs in a Linux environment with Stability Matrix.
  • Expanding package integration to include other WebUI packages such as InvokeAI, AI Toolkit, Trainers, etc.
  • [Implemented] Refactor build index decision and handling to use new multi-arch url format from TheRock. Consolidating the ROCm/PyTorch installation URLs to a single universal index and have the gfxarch applied in the pip command for the current GPU.

NeuralFault and others added 8 commits April 21, 2026 19:02
- Add initial ROCm helper structure
- Set up ROCm helper foundation
Compile test sucessful.
- Add initial ROCm helper calls/config
- Removed pre-existing Windows ROCm blocks which will be obsolete following helper implementation
- Windows ROCm install/bootstrap logic into shared ROCm helper
- Add gfx-family mapping for Windows-native TheRock ROCm URLs
- Route ComfyUI Win Rocm installs through helper-resolved ROCm runtime, rocm-sdk, and pytorch setup
- Prevent requirements.txt from overwritting helper-installed ROCm torch packages
- Add helper-owned post-install torch verification and improve unsupported GPU failure handling
…ntime/install/environment API and simplify the ROCm profile/context models around the helper’s real responsibilities

- add a centralized Windows ROCm support map so GPU detection, architecture support checks, and package index resolution all use the same source of truth
- expand AMD architecture detection to cover additional RDNA4, Steam Deck, RDNA1, and Vega-class GPUs used by the Windows ROCm support path
- add a helper-managed Windows ROCm bootstrap flow that installs the ROCm runtime, initializes/reinitializes the SDK, aligns rocm-sdk-devel with the resolved torch build, and verifies both torch ROCm metadata and runtime availability
- centralize ROCm launch environment construction in the helper, including default MIOpen, allocator, flash-attention, and AOTriton settings plus legacy SDP fallback, RDNA1 overrides, and user env override layering
- switch ComfyUI to helper-driven Windows ROCm compatibility and launch env handling, and default legacy Windows ROCm GPUs to quad cross-attention while keeping Comfy-specific MIOpen enablement as a preset
- integrate Wan2GP with the shared Windows ROCm helper for install and launch flows, while updating its Linux ROCm path to use upstream rocm7.2 torch/vision/audio installs
- wire the ROCm helper through package construction and add focused test coverage for ROCm build/version parsing, runtime failure classification, and Windows ROCm support/index resolution
- centralize Windows ROCm architecture classification and legacy-attention fallback policy in WindowsRocmSupport
- move ComfyUI-specific MIOpen env handling out of the helper and into package-owned ROCm config
- reuse shared ROCm policy for ComfyUI quad-attention defaults and helper-managed AOTriton / math SDP / RDNA1 gates
- remove dead ROCm preset plumbing and trim unused RocmPackageProfile surface
- rename helper/package methods for clearer default-policy semantics
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request centralizes Windows ROCm support by introducing a shared IRocmPackageHelper service and associated models, refactoring ComfyUI and Wan2GP to use this new framework. The changes include expanded GPU architecture detection and standardized installation and environment configuration logic. Feedback identifies a missing dependency injection for Wan2GP in the factory and a configuration regression in ComfyUI's memory allocation settings. Further suggestions focus on optimizing the helper by reducing redundant hardware probing, avoiding unnecessary hardware refreshes, and using more appropriate exception types.

Comment thread StabilityMatrix.Core/Helper/Factory/PackageFactory.cs
Comment thread StabilityMatrix.Core/Models/Packages/ComfyUI.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs Outdated
Comment thread StabilityMatrix.Core/Services/Rocm/RocmPackageHelper.cs
NeuralFault and others added 9 commits May 2, 2026 18:32
Was used during debugging and was unintentionally left on.
Add helper-managed ROCm torch compatibility profiles for Windows packages

Enable ComfyUI to use shared ROCm dependency fallback behavior

Add shared Windows ROCm launch notice and experimental-support messaging

Align Wan2GP Windows ROCm disclaimer text with the shared helper messaging

Lower helper-managed MIOPEN_FIND_ENFORCE default from 3 to 1

Add gfx103x borrowed dependency fallback using ROCm-hosted setuptools and mpmath from gfx103x-dgpu due to -all index missing compatible versions

Add helper preinstall and supplemental dependency handling for ROCm torch installs

Add temporary environment override support for helper-owned install steps
- generalize the ROCm helper install path around a new multi-arch TheRock nightly repo index flow
- unify ROCm and PyTorch installation into a single command that automatically selects the correct PyTorch build for the user's GPU architecture
- centralize Windows ROCm GPU and architecture resolution behind one shared helper machine-state path
- trim the ROCm helper API and remove obsolete package-side ROCm install/runtime duplication
- deleted no longer used torch compatibility model since A3WebUI/reForge work was rolled back and abandoned to be visited again in the future for intergration
…upport.

- extract the shared Windows ComfyUI ROCm profile from package-local code
- reuse the shared ROCm helper/profile for both direct ComfyUI launch behavior and SwarmUI self-launch pass-through
- inject Windows ROCm ComfyUI env vars into SwarmUI launching so they propagate to the self-launched backend

- Modified Torch ROCm AOTriton activation EnVar to exclude gfx1152/1153 due to no support yet.
…lityMatrix into universal-rocm

Deleted stale models from previous implementation state.
@NeuralFault
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist recheck

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the recheck request. I have reviewed the changes in this pull request, which introduces a centralized IRocmPackageHelper to streamline Windows ROCm support across multiple packages like ComfyUI, Wan2GP, and future integrations. The refactoring effectively consolidates compatibility checks, installation logic, and environment variable management, reducing code duplication and improving maintainability. The extension of the GPU architecture detection matrix to include RDNA4 and additional Vega/RDNA variants is a significant improvement for broad hardware support. The implementation appears robust and follows the established architectural patterns of the project. No further issues identified.

@NeuralFault
Copy link
Copy Markdown
Contributor Author

Refactored installation path to use the new multi-arch index, simplifying the torch install handling to a single command which the gfxarch for the user's GPU is pulled and integrated into the command used from resolved gfxarch, installing PyTorch modules and ROCm runtime in one go. "torch[device-gfx###]" "torchvision[device-gfx###]" torchaudio --index-url https://rocm.nightlies.amd.com/whl-staging-multi-arch/

Centralized most of ComfyUI’s Windows ROCm handling into the shared helper domain to make launch environment variable generation and reuse more consistent, and to support env-var passthrough for ComfyUI self-launch backends in SwarmUI. That Swarm-side integration is intentionally gated behind the expected operating environment and Windows ROCm compatibility checks, and is limited to launch-time passthrough rather than expanding Swarm’s install behavior. As part of that, a shared Windows ROCm Comfy profile was extracted so direct ComfyUI launches and Swarm-managed self-launches follow the same ROCm-specific behavior.

Added global notice message sent to console output of Comfy and future integrated packages notifying user of experimental state of the Windows ROCm configuration and on issue reporting for triage so unnecessary issue reports get submitted upstream for if the issue has nothing to do with the WebUI package itself. (Used package install disclaimer instead for Wan2GP due to not being able to successfully getting it to show in console)

Added a temporary exclusion for gfx1152 and gfx1153 from the AOTriton experimental EnVar passing since those architectures are not currently supported just yet from upstream, and added post-install torch verification so helper-managed installs confirm that the resulting torch build actually reports usable ROCm metadata after installation.

Integration with the helper for reForge (and A3WebUI since it inherits a lot from it) was worked on and got functional, but required rather large dedicated functions for override and fallbacks due to missing modules in the 103x-all index and incompatibility with certain extension that comes with reforge, and setuptools issue. So rolled all that back and gave up on it for the time being but can more easily revisit it due to switching to the multi-arch install handling as this should alleviate a lot of the bloating of the helper and package configs I had previously when they were part of the implementation.

Tested and verified compatibility and function with both R9700 (RDNA4) and 6900XT (RDNA2). Could not test older generation due to not having my Vega 64 anymore, but the special handling for legacy GPUs are already in place.

Last thing is to look over the consideration of the other attention mechanism builds and acceleration library (bnb), along with possible option for the user to install the ROCm SDK devel module for if they need or wish to compile modules or extensions against their installed ROCm, in the package command menu before marking ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant