Skip to content

Optimize JsonCompleter.parse string scanning#6

Merged
default-anton merged 3 commits into
mainfrom
optimize-parse-string-scanning
Mar 13, 2026
Merged

Optimize JsonCompleter.parse string scanning#6
default-anton merged 3 commits into
mainfrom
optimize-parse-string-scanning

Conversation

@default-anton
Copy link
Copy Markdown
Collaborator

Summary

Optimize JsonCompleter.parse for long streamed string values.

The hot path was Scanners.scan_string. It used to walk ordinary string content one character at a time, even when the next few hundred bytes were just plain text. This change makes it jump to the next interesting character (", \\, or a control character) and copy the whole plain run in one slice.

That keeps the existing parsing behavior, escape handling, and unicode/surrogate validation, but does a lot less Ruby work on the common case.

Before / after

Benchmark command:

JSON_COMPLETER_BENCHMARK=1 bundle exec rspec spec/parse_benchmark_spec.rb

Benchmark settings from the spec run:

  • payload bytes: 77690
  • prefixes: 9712
  • iterations: 50
  • chunk size: 8

parse

Metric Before After Change
total runtime 1.9350s 1.2476s 35.5% faster
per iteration 38.700ms 24.952ms 35.5% faster
allocated objects 4,381,588 2,047,569 53.3% fewer

Relative to complete + JSON.parse

Metric Before After
speedup 9.07x 12.98x
allocation reduction 4.66x 8.83x

What changed

  • added fast-path patterns for string scanning
  • appended contiguous plain string runs in slices instead of per character
  • kept escape handling, unicode escapes, and surrogate-pair validation on the slower path where it matters
  • documented the optimization in README.md
  • added a changelog entry

Why this helps

Most streamed JSON text is boring: long runs of normal characters with only occasional escapes or quotes.

Before:

  • read one character
  • branch on that character
  • append one character
  • repeat thousands of times

After:

  • ask Ruby where the next special character is
  • append the whole plain chunk at once
  • only switch to character-by-character logic near escapes / terminators

Same result, less interpreter overhead, fewer temporary objects.

Validation

  • bundle exec rubocop
  • bundle exec rspec
  • JSON_COMPLETER_BENCHMARK=1 bundle exec rspec spec/parse_benchmark_spec.rb

@default-anton default-anton merged commit 95d68eb into main Mar 13, 2026
5 checks passed
@default-anton default-anton deleted the optimize-parse-string-scanning branch March 13, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant