Skip to content

HwRender: fix ARM64 crash when GPU is suspended during Present (DXGI_ERROR_DEVICE_REMOVED)#11472

Open
etvorun wants to merge 1 commit into
dotnet:mainfrom
etvorun:fix/arm64-dxgi-device-removed-crash
Open

HwRender: fix ARM64 crash when GPU is suspended during Present (DXGI_ERROR_DEVICE_REMOVED)#11472
etvorun wants to merge 1 commit into
dotnet:mainfrom
etvorun:fix/arm64-dxgi-device-removed-crash

Conversation

@etvorun

@etvorun etvorun commented Feb 20, 2026

Copy link
Copy Markdown

Summary

On ARM64 devices, WPF crashes when the GPU is suspended during an active IDirect3DSwapChain9::Present call. The D3D9-over-DXGI compatibility layer on ARM64 surfaces DXGI_ERROR_DEVICE_REMOVED (0x887A0005) directly instead of translating it to D3DERR_DEVICELOST. WPF's PresentWithD3D did not recognize this code; it fell through to MIL_THR, which treats 0x887A0005 as fatal and crashes the process. HandlePresentFailure — and the device-lost recovery path — was never reached.

What changed

src/Microsoft.DotNet.Wpf/src/WpfGfx/include/wgx_error.h

  • Added a local #define DXGI_ERROR_DEVICE_REMOVED ((HRESULT)0x887A0005L) guarded by #ifndef, avoiding a new dependency on dxgi.h. The value is a stable DirectX ABI constant.

src/Microsoft.DotNet.Wpf/src/WpfGfx/core/hw/d3ddevice.cpp

  • PresentWithD3D: new else if (hr == DXGI_ERROR_DEVICE_REMOVED) branch placed before the MIL_THR call, converting the error to D3DERR_DEVICELOST so the existing device-lost recovery path handles it.
  • HandlePresentFailure: added DXGI_ERROR_DEVICE_REMOVED to the device-lost if condition, as defense-in-depth for other callers such as PresentWithGDI.

Why

The fix is aligned with the previously validated internal source fix behavior. The change is surgical: no new recovery logic is introduced; WPF reuses the existing TDR / device-lost path (MarkUnusable()WGXERR_DISPLAYSTATEINVALIDRENDERING_STATUS_DEVICE_LOST → resource recreation on GPU resume).

Recovery sequence

State What happens
Present returns DXGI_ERROR_DEVICE_REMOVED Converted to D3DERR_DEVICELOST; HandlePresentFailure calls MarkUnusable(), invalidates GPU resources, notifies device manager
Same frame — render thread CRenderTargetManager::HandlePresentErrors swallows WGXERR_DISPLAYSTATEINVALID; fires RENDERING_STATUS_DEVICE_LOST to UI thread. No crash, no zombie.
Subsequent frames (GPU suspended) UpdateDisplayState returns WGXERR_DISPLAYSTATEINVALID; render passes are skipped
GPU comes back UpdateDisplayState succeeds; NotifyTierChange fires; render targets recreated; window repainted

Validation

Validated manually on ARM64 hardware by triggering GPU suspension events while a WPF application is actively rendering. The application recovered without crashing after applying this fix. No automated test infrastructure changes are included.

Fixes #11471

Microsoft Reviewers: Open in CodeFlow

On ARM64, the D3D9-over-DXGI compatibility layer surfaces
DXGI_ERROR_DEVICE_REMOVED (0x887A0005) directly when the GPU is
suspended or removed, instead of mapping it to D3DERR_DEVICELOST.

PresentWithD3D only checked for S_OK, S_PRESENT_MODE_CHANGED, and
S_PRESENT_OCCLUDED. Any unrecognised HRESULT fell through to MIL_THR,
which treats 0x887A0005 as fatal, triggering VS_FatalError and a crash.
HandlePresentFailure was never reached.

Fix:
- wgx_error.h: define DXGI_ERROR_DEVICE_REMOVED locally to avoid
  pulling in dxgi.h; the value is a stable DirectX ABI constant.
- PresentWithD3D: add else-if branch that converts
  DXGI_ERROR_DEVICE_REMOVED to D3DERR_DEVICELOST before MIL_THR, so
  the existing device-lost recovery path handles it gracefully.
- HandlePresentFailure: add DXGI_ERROR_DEVICE_REMOVED to the device-lost
  condition block as defense-in-depth for other call sites (PresentWithGDI).

Recovery follows the existing TDR path: MarkUnusable() invalidates GPU
resources, render thread skips the frame and fires
RENDERING_STATUS_DEVICE_LOST, and when the GPU resumes
UpdateDisplayState recreates resources and repaints the window.
@etvorun etvorun requested review from a team and Copilot February 20, 2026 23:03
@dotnet-policy-service dotnet-policy-service Bot added the PR metadata: Label to tag PRs, to facilitate with triage label Feb 20, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical crash on ARM64 devices when the GPU is suspended during active rendering. On ARM64, the D3D9-over-DXGI compatibility layer returns DXGI_ERROR_DEVICE_REMOVED (0x887A0005) instead of the expected D3D9 error code D3DERR_DEVICELOST. WPF's PresentWithD3D did not recognize this code, causing MIL_THR to treat it as fatal and crash the process. The fix converts the DXGI error to the D3D9 equivalent before error processing, allowing the existing device-lost recovery mechanism to handle GPU suspension gracefully.

Changes:

  • Defined DXGI_ERROR_DEVICE_REMOVED constant locally in wgx_error.h to avoid adding a dependency on dxgi.h
  • Added error code translation in PresentWithD3D to convert DXGI_ERROR_DEVICE_REMOVED to D3DERR_DEVICELOST before MIL_THR processing
  • Added DXGI_ERROR_DEVICE_REMOVED to the device-lost condition in HandlePresentFailure as defense-in-depth

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/Microsoft.DotNet.Wpf/src/WpfGfx/include/wgx_error.h Adds local definition of DXGI_ERROR_DEVICE_REMOVED constant (0x887A0005L) with #ifndef guard
src/Microsoft.DotNet.Wpf/src/WpfGfx/core/hw/d3ddevice.cpp Converts DXGI_ERROR_DEVICE_REMOVED to D3DERR_DEVICELOST in PresentWithD3D before MIL_THR processing; adds DXGI_ERROR_DEVICE_REMOVED to HandlePresentFailure's device-lost condition

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lindexi lindexi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amarinov-msft

amarinov-msft commented Jul 1, 2026

Copy link
Copy Markdown

@etvorun
Just two potential improvements:
Your fix only covers Present, not the other D3D9 call sites that share the same root cause. The same reasoning applies to every other D3D9 call:
CheckDeviceState checks D3DERR_DEVICELOST/DEVICEHUNG/DEVICEREMOVED but not DXGI_ERROR_DEVICE_REMOVED. This is the API whose whole job is polling device state; if it returns the raw DXGI code on ARM64, device-loss recovery is missed.
-Several calls ( CreateTexture, GetRenderTargetData , etc.) do MIL_THR(hr) then route through HandleDIE(hr), and HandleDIE only maps D3DERR_DRIVERINTERNALERROR . A raw DXGI_ERROR_DEVICE_REMOVED from any of these propagates unconverted.
If suspension is caught mid-frame by a draw/resource call or by CheckDeviceState, the crash can still occur. The most robust fix would normalize DXGI_ERROR_DEVICE_REMOVED  to D3DERR_DEVICELOST centrally (e.g., in HandleDIE / the MIL_THR normalization) rather than per-site. At minimum, CheckDeviceState should get the same branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR metadata: Label to tag PRs, to facilitate with triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HwRender: WPF crashes on ARM64 when GPU is suspended during frame presentation (DXGI_ERROR_DEVICE_REMOVED)

4 participants