Disconnect stale automation peers from UIA to fix memory leak in virtualized ItemsControls#11657
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a memory leak in WPF UI Automation (UIA) for virtualized ItemsControl scenarios by explicitly disconnecting UIA providers for removed automation peers, allowing COM references held by UIA Core to be released and enabling GC of stale peers and their visual subtrees.
Changes:
- Added a helper to disconnect an
AutomationPeer’sElementProxyfrom UIA viaUiaDisconnectProvider. - Updated
UpdateChildrenInternalto disconnect removed children even when noStructureChangedlisteners are registered. - Added the required interop import to call into
UIAutomationCore.dll.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (proxyWeakRef?.Target is ElementProxy proxy) | ||
| { | ||
| UiaDisconnectProvider(proxy); | ||
| } | ||
|
|
||
| peer._elementProxyWeakReference = null; |
03cdc2d to
65afa49
Compare
h3xds1nz
left a comment
There was a problem hiding this comment.
@amarinov-msft I'm really worried about perf with these changes. UIA implementation itself is already a really heavy mess (things like #10676 and more), and this will further increase the weight (an async DispatcherOperation for every disconnected peer), further enumeration over the intermediate collections.
Now, I don't have an exact solution here since in our wpf fork I've reworked this extensively but I think it's worth a further investigation and profiling to fix this.
| if (dispatcher != null && !dispatcher.HasShutdownStarted) | ||
| { | ||
| dispatcher.BeginInvoke(DispatcherPriority.Background, | ||
| new Action(() => UiaDisconnectProvider(proxy))); |
There was a problem hiding this comment.
This creates a closure unnecessarily and a new action everytime. Ideally use the parameter variant, it's an object you're passing so no boxing here.
| // UiaDisconnectProvider must NOT be called during a UIA callback (e.g. during Navigate/FindAll handling). | ||
| // UpdateChildrenInternal is invoked from within UIA callbacks, so defer the actual COM disconnect to a separate dispatcher operation. | ||
| Dispatcher dispatcher = peer.Dispatcher; | ||
| if (dispatcher != null && !dispatcher.HasShutdownStarted) |
There was a problem hiding this comment.
I'm not sure this is even required/true; the UIA callbacks should already run on the Dispatcher synchronously iirc, so that check seems redundant. If invoked from proxy directly, this would already throw if it was the case.
| /// </remarks> | ||
| private static void DisconnectPeerFromUia(AutomationPeer peer) | ||
| { | ||
| if (peer == null) |
There was a problem hiding this comment.
This check is redundant since you're already checking in the calling function.
| } | ||
|
|
||
| [DllImport("UIAutomationCore.dll", EntryPoint = "UiaDisconnectProvider", CharSet = CharSet.Unicode)] | ||
| private static extern int UiaDisconnectProvider(IRawElementProviderSimple provider); |
There was a problem hiding this comment.
This shouldn't be defined in this file.
| // This must happen regardless of StructureChanged event registration. | ||
| if (removedCount > 0) | ||
| { | ||
| foreach (AutomationPeer removedChild in hs) |
There was a problem hiding this comment.
This is just a sigh for another PR but this whole thing should be turned into a simple merge walk to differentiate old and new and avoid the HashSet allocations altogether. Sometimes this is tens of MBs for something that could have been a simple loop.
The least that can be done here is to repeat this loop over the HashSet only once; by either doing all the StructureChanged stuff or not.
| if (old._elementProxyWeakReference == null) | ||
| continue; | ||
|
|
||
| if (_children == null || !_children.Contains(old)) |
There was a problem hiding this comment.
This basically O(n*m), if this is a top automation peer, it's gonna be pretty heavy.
- Cache static DispatcherOperationCallback; drop closure/Action allocation per call. - Remove redundant Dispatcher null/HasShutdownStarted guard (matches InvalidatePeer pattern). - Remove redundant peer null check inside DisconnectPeerFromUia (callers guarantee non-null). - Move UiaDisconnectProvider P/Invoke onto ElementProxy as a single Disconnect() method; collapse ClearPeer() into it. Use DllImport.UIAutomationCore string constant per codebase convention. - Build HashSet of new children lazily in EnsureChildren to avoid O(n*m) Contains scan; common path stays allocation-free. - Drop redundant disconnect loop from UpdateChildrenInternal (EnsureChildren already handles it on this path); revert the merged loop back to events-only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@h3xds1nz thank you for your review. I tried addressing your comments. While the fundamental question about this issue is resolved (performance vs behavior), please check the updates when you have the opportunity and let me know if they are in the direction that you had in mind. |
h3xds1nz
left a comment
There was a problem hiding this comment.
@amarinov-msft Just a few nits, I guess this is acceptable at the current codebase state to resolve the issue.
| peer._childrenValid = false; | ||
| } | ||
|
|
||
| private static readonly DispatcherOperationCallback _disconnectProxyCallback = static arg => |
There was a problem hiding this comment.
You should be able to use SendOrPostCallback which doesn't return but is special-cased.
| _peer = null; | ||
| } | ||
|
|
||
| [DllImport(DllImport.UIAutomationCore, EntryPoint = "UiaDisconnectProvider", CharSet = CharSet.Unicode)] |
There was a problem hiding this comment.
When I've mentioned it before, I meant that the common practice is to put it in specialized file; in this case that would be https://github.com/dotnet/wpf/blob/main/src/Microsoft.DotNet.Wpf/src/UIAutomation/UIAutomationProvider/MS/Internal/Automation/UiaCoreProviderApi.cs ... the provider library would have been loaded at this point anyway.
|
@h3xds1nz thank you for your review and effort! However, while testing this - I found a regression introduced by my changes where visible peers are being treated as stale and disconnected. The disconnect-based approach in the PR seems to be fundamentally incompatible with how UIA traverses peers. We probably need to change the approach completely. |
|
@amarinov-msft No worries; I unfortunately cannot test this as I'm unable to replicate the memory leak with the repro whatsoever. Plus WPFAutomationClient report seems invalid, given that there's literally no GC pressure, just forcing a GC manually clears out everything (not that this fix would be an attempt to solve that). |
c0b8c49 to
cff011c
Compare
…item peers ElementProxy now holds a weak reference to data-item automation peers (those for which AutomationPeer.IsDataItemAutomationPeer() returns true: ItemAutomationPeer and its subclasses, DataGridCellItemAutomationPeer, and DateTimeAutomationPeer), matching the weak-reference treatment already applied to UIElement/ContentElement/ UIElement3D peers. This lets UI Automation client references release virtualized item peers - and the controls they transitively root - so they can be collected, fixing unbounded provider-side growth (customer OOM on a ~200k-row DataGrid under continuous UIA querying). To avoid an ElementNotAvailableException if a peer is collected mid-walk or during property readback, a short-lived strong "keep-alive" (PeerKeepAlive) roots every peer touched at StaticWrap/Peer access for a bounded window using a two-bucket rotation. The 9s window was sized from a measured worst-case FindAll+readback (~4.4s under GC stress, validated 16/16 trials clean across GC modes). A new opt-out AppContext switch, Switch.System.Windows.Automation.Peers.UseStrongReferenceForItemAutomationPeers, restores the legacy strong-reference behavior. Supersedes the earlier disconnect-based approach, which had an unrecoverable mid-walk regression. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cff011c to
6604bff
Compare
PR dotnet#11657 bounded automation peers but ElementProxy bridge objects whose weak peer had been collected stayed rooted by their UI Automation wrapper (CCW), so memory still grew under a sustained UIA client. Add a background sweep that calls UiaDisconnectProvider on ElementProxy instances whose weak peer is gone (which already throw ENA, so the sweep cannot disturb a walk of a live element). The sweep runs on a Background DispatcherTimer, is bounded per tick, and self-stops when idle. Weak proxies cache their runtime id so disconnect can still serve it. Gated by the Switch.System.Windows.Automation.Peers.DisableUiaProviderDisconnect AppContext kill-switch (default off = sweep enabled). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
510e300 to
95d4f3b
Compare
PR dotnet#11657 bounded automation peers but ElementProxy bridge objects whose weak peer had been collected stayed rooted by their UI Automation wrapper (CCW), so memory still grew under a sustained UIA client. Add a background sweep that calls UiaDisconnectProvider on ElementProxy instances whose weak peer is gone (which already throw ENA, so the sweep cannot disturb a walk of a live element). The sweep runs on a Background DispatcherTimer, is bounded per tick, and self-stops when idle. Weak proxies cache their runtime id so disconnect can still serve it. Gated by the Switch.System.Windows.Automation.Peers.DisableUiaProviderDisconnect AppContext kill-switch (default off = sweep enabled). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
95d4f3b to
25feec4
Compare
PR dotnet#11657 bounded automation peers but ElementProxy bridge objects whose weak peer had been collected stayed rooted by their UI Automation wrapper (CCW), so memory still grew under a sustained UIA client. Add a background sweep that calls UiaDisconnectProvider on ElementProxy instances whose weak peer is gone (which already throw ENA, so the sweep cannot disturb a walk of a live element). The sweep runs on a Background DispatcherTimer, is bounded per tick, and self-stops when idle. Weak proxies cache their runtime id so disconnect can still serve it. Gated by the Switch.System.Windows.Automation.Peers.DisableUiaProviderDisconnect AppContext kill-switch (default off = sweep enabled). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
25feec4 to
59b1f02
Compare
PR dotnet#11657 bounded automation peers but ElementProxy bridge objects whose weak peer had been collected stayed rooted by their UI Automation wrapper (CCW), so memory still grew under a sustained UIA client. Add a background sweep that calls UiaDisconnectProvider on ElementProxy instances whose weak peer is gone (which already throw ENA, so the sweep cannot disturb a walk of a live element). The sweep runs on a Background DispatcherTimer, is bounded per tick, and self-stops when idle. Weak proxies cache their runtime id so disconnect can still serve it. Gated by the Switch.System.Windows.Automation.Peers.DisableUiaProviderDisconnect AppContext kill-switch (default off = sweep enabled). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
59b1f02 to
1d053aa
Compare
PR dotnet#11657 bounded automation peers but ElementProxy bridge objects whose weak peer had been collected stayed rooted by their UI Automation wrapper (CCW), so memory still grew under a sustained UIA client. Add a background sweep that calls UiaDisconnectProvider on ElementProxy instances whose weak peer is gone (which already throw ENA, so the sweep cannot disturb a walk of a live element). The sweep runs on a Background DispatcherTimer, is bounded per tick, and self-stops when idle. Weak proxies cache their runtime id so disconnect can still serve it. Gated by the Switch.System.Windows.Automation.Peers.DisableUiaProviderDisconnect AppContext kill-switch (default off = sweep enabled). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1d053aa to
0fbde13
Compare
PR dotnet#11657 bounded automation peers but ElementProxy bridge objects whose weak peer had been collected stayed rooted by their UI Automation wrapper (CCW), so memory still grew under a sustained UIA client. Add a background sweep that calls UiaDisconnectProvider on ElementProxy instances whose weak peer is gone (which already throw ENA, so the sweep cannot disturb a walk of a live element). The sweep runs on a Background DispatcherTimer, is bounded per tick, and self-stops when idle. Weak proxies cache their runtime id so disconnect can still serve it. Gated by the Switch.System.Windows.Automation.Peers.DisableUiaProviderDisconnect AppContext kill-switch (default off = sweep enabled). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0fbde13 to
3ae37fc
Compare
PR dotnet#11657 bounded automation peers but ElementProxy bridge objects whose weak peer had been collected stayed rooted by their UI Automation wrapper (CCW), so memory still grew under a sustained UIA client. Add a background sweep that calls UiaDisconnectProvider on ElementProxy instances whose weak peer is gone (which already throw ENA, so the sweep cannot disturb a walk of a live element). The sweep runs on a Background DispatcherTimer, is bounded per tick, and self-stops when idle. Weak proxies cache their runtime id so disconnect can still serve it. Gated by the Switch.System.Windows.Automation.Peers.DisableUiaProviderDisconnect AppContext kill-switch (default off = sweep enabled). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3ae37fc to
a256bd2
Compare
Apply low-risk review suggestions to the data-item peer keep-alive: - Move the (UseStrongReference / IsDataItemAutomationPeer) eligibility guard into PeerKeepAlive.KeepAlive so the two call sites cannot drift, and additionally honor the legacy AutomationWeakReferenceDisallow key (AutomationInteropReferenceType) that the StaticWrap stamp ignored. - Skip the striped-lock write when a peer is already stamped for the current generation (repeat touches are the common case). - Document that IsPeerCollected must not touch Peer, since doing so would hand out a transient strong reference and defeat the sweep. No behavioral change to the leak fix; keep-alive and disconnect semantics are unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes #11337
Description
When a UIA client (screen reader, test-automation, inspector) repeatedly enumerates a virtualized
ItemsControl(e.g. a largeDataGrid), WPF accumulates two kinds of never-released objects, producing an unbounded leak:ElementProxyheld its peer with a strong reference, and the UIA CCW held the proxy, so every peer surfaced to a client (and the data row / visual sub-tree it roots) was pinned forever.ElementProxyCCWs — even once a peer is no longer referenced, the proxy itself stays rooted by its UIA COM-callable wrapper (CCW), dragging itsWeakReference, runtime-id and name strings with it.This PR addresses both:
ElementProxynow holdsUIElement/ContentElement/UIElement3Dpeers - and newly data-item peers - viaWeakReference, so UIA references no longer pin recycled/virtualized peers. Data-item behavior is gated by theSwitch.System.Windows.Automation.Peers.UseStrongReferenceForItemAutomationPeersAppContext switch (defaultfalse= weak; settrueto restore legacy strong refs).ElementNotAvailableException), each peer surfaced to or touched by UIA is strong-rooted in a small rotating bucket for a bounded time window (one-to-two windows ≈ 9–18 s) via aBackground-priorityDispatcherTimer, then released. Retention is bounded by the in-flight working set, not the data-set size.Background-priorityDispatcherTimersweep (~2 s cadence, capped at 5000 disconnects/tick) calls the newUiaDisconnectProvideronElementProxyinstances whose weak peer has already been collected, releasing the CCW so the proxy and everything it roots can be reclaimed. Gated bySwitch.System.Windows.Automation.Peers.DisableUiaProviderDisconnect(defaultfalse= sweep enabled). Both timers self-stop when there is nothing to track.A new public API,
AutomationInteropProvider.DisconnectProvider(IRawElementProviderSimple), is added (with matching reference-assembly entry) to exposeUiaDisconnectProviderto the bridge.UIA clients may observe
ElementNotAvailableExceptionwhen accessing an element whose peer has been collected/disconnected — standard UIA contract behavior that well-behaved clients already handle.Customer Impact
WPF applications that expose a virtualized
ItemsControl(DataGrid, ListView, etc.) to UI Automation experience an unbounded memory leak (≈260 MB/min in the customer's 200k-row DataGrid repro), ending inOutOfMemoryExceptionand crashes in long-running sessions. Any app with accessibility active (assistive technology or test automation present) is affected. The customer is currently held on a workaround.Regression
No.
Testing
Controlled customer repro (200k-row DataGrid + a UIA client driving
FindAll~1×/s), memory sampled after forced Gen2 GC:ElementProxyand data-item peers grow without bound.DataGridCellItemAutomationPeerflat/declining), butElementProxyCCWs still grew (~3.7× over 15 min).ElementProxybounded over a 12-min run (2.5k ->3.6k, not growing); process working set flat (≈614–892 MB); no OOM.AppContextswitch restores the prior behavior.Risk
Moderate, well-contained, and fully reversible via two AppContext kill-switches.
The risk areas are:
UseStrongReferenceForItemAutomationPeers=true.ElementNotAvailableException.AutomationInteropProvider.DisconnectProvideris additive (plus a ref-assembly entry) and needs API-review sign-off; it is a thin wrapper over the documentedUiaDisconnectProvider.DispatcheratBackgroundpriority and only disconnects proxies whose peer is already collected (which already throw ENA), so it cannot disturb a walk of a live element. Work is capped per tick (5000). Failed-disconnect handling.AutomationInteropProvider.DisconnectProvidercan occasionally fail transiently (e.g. re-entrant input-sync, or a proxy whose runtime id was never cached becauseGetRuntimeIdwas never called on it).ElementProxy.Disconnect()returns a success/failure result, and the sweep re-queues a failed proxy for a later tick - its CCW is a GC root, so silently dropping it would leak the proxy. A per-proxy attempt counter (failedAttempts) caps this atMaxDisconnectAttempts(8): genuine transients clear within 1–2 ticks and never approach the cap, while a proxy that fails every time is abandoned after 8 tries (leaking only the lightweight shell) so the timer can still self-stop instead of spinning forever. Reversible viaDisableUiaProviderDisconnect=true.ElementNotAvailableExceptionduring steady state (not only on tree changes). Conformant clients (Narrator, NVDA, JAWS) handle this; non-conformant tools that don't catch it could surface new errors.FindItemByPropertyand then callsVirtualizedItemPattern.Realizeafter the keep-alive window (~10–18 s) will getElementNotAvailableException, since the peer was collected and its proxy disconnected. This is within the UIA ENA contract but changes the pattern's previously indefinite realizability. Kill switchDisableUiaProviderDisconnectrestores the old behavior.{7, ProcessId, GetHashCode()}, so identity-hash collisions are possible. BecauseUiaDisconnectProvidermatches by runtime id, a dead proxy's cached id that collides with a live peer's could invalidate a client's view of a healthy element. Pre-existing weakness; low probability, but the disconnect sweep is the first path to exercise it at scale.Microsoft Reviewers: Open in CodeFlow