perf: optimize ColorValidator — up to 3x faster on arrays, 14x on scalars#5576
Open
KRRT7 wants to merge 7 commits intoplotly:mainfrom
Open
perf: optimize ColorValidator — up to 3x faster on arrays, 14x on scalars#5576KRRT7 wants to merge 7 commits intoplotly:mainfrom
KRRT7 wants to merge 7 commits intoplotly:mainfrom
Conversation
- Replace custom fullmatch() shim (which rebuilt regex strings and recompiled on every call via dir() + re.match) with compiled pattern .fullmatch() — Python 3.4+ compat shim is no longer needed - Convert named_colors from list to frozenset for O(1) lookups instead of O(n) linear scan through 148 entries - Merge validate + find_invalid_els into a single pass over arrays, eliminating redundant second iteration - Call perform_validate_coerce directly for 1-D numpy array elements, skipping the full validate_coerce type-dispatch per element - Reorder checks: named color lookup (now O(1)) before rare ddk regex Benchmarks (1000 color strings, 50 iterations): List path: 17.71ms → 9.00ms (1.97x faster) Numpy path: 29.03ms → 9.49ms (3.06x faster) Scalar: 78.3µs → 5.7µs (13.7x faster)
2D+ numpy arrays with invalid color strings were silently replacing them with None instead of raising ValueError. The list path correctly raised for the same input. This was caused by the multidimensional numpy fallback not collecting invalid elements from sub-array results. Also adds comprehensive tests covering all ColorValidator code paths: - None and typed_array_spec inputs - 1D numpy with invalid colors (raise path) - 2D numpy with invalid colors (now raises, was silently accepting) - 3-level nested lists (find_invalid_els recursion) - Numeric numpy fast path with numbers_allowed - Removes dead code (unreachable default arg in find_invalid_els) 100% line coverage on the changed region (lines 1360-1500).
Three changes to the hot path hit by every fig.show(), write_html(), to_json(), and write_image() call: 1. to_typed_array_spec: replace copy_to_readonly_numpy_array (which copies the array, wraps through narwhals, and sets readonly flag) with a lightweight np.asarray — the input is already a deepcopy from to_dict(), so copying again is pure waste. 2. convert_to_base64: replace is_homogeneous_array (which checks numpy, pandas, narwhals, and __array_interface__) with a direct isinstance(value, np.ndarray) check. In the to_dict() context, data is already validated and stored as numpy arrays. 3. is_skipped_key: replace list scan with frozenset lookup (O(1)). Profile results (10 traces × 100K points, 20 calls): to_typed_array_spec: 1811ms → 1097ms (40% faster) copy_to_readonly_numpy_array: 226ms → 0ms (eliminated) narwhals from_native: 68ms → 0ms (eliminated) is_skipped_key: 41ms → ~0ms (eliminated)
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Optimizes
ColorValidatorvalidation with up to 3x speedup on numpy array inputs and fixes a bug where invalid colors in 2D numpy arrays were silently accepted.Changes
1. ColorValidator optimization (commit 1)
fullmatch()shim (which calleddir(), rebuilt regex strings, and recompiled viare.match()on every call) with compiled.fullmatch()— the Python 3.4 shim is unnecessary since plotly requires ≥3.8named_colorsfromlisttofrozensetfor O(1) lookups instead of O(n) linear scan through 148 entriesvalidate_coerceloop andfind_invalid_elssecond pass into a single passperform_validate_coercedirectly for 1-D numpy array elements, skipping the full type-dispatch per elementvar(--*)ddk regex2. Bug fix: 2D numpy silent invalid color acceptance (commit 2)
Noneinstead of raisingValueError. The equivalent list input correctly raised.find_invalid_elsdefault arg that was no longer reachable)Benchmarks
ColorValidator (1000 color strings, 50 iterations)
Testing
validate_coerce)ruff formatpasses