Skip to content

docs: Arlo comparison audit transparency report and guide#2350

Open
nealmcb wants to merge 7 commits into
votingworks:mainfrom
gwexploratoryaudits:docs/comparison-audit-transparency-report
Open

docs: Arlo comparison audit transparency report and guide#2350
nealmcb wants to merge 7 commits into
votingworks:mainfrom
gwexploratoryaudits:docs/comparison-audit-transparency-report

Conversation

@nealmcb

@nealmcb nealmcb commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Adds three planning documents for Arlo comparison audit transparency.


docs/transparency-report.md

Analysis of what Arlo currently exports at each phase of a comparison audit, where the transparency gaps are, and prioritized recommendations for closing them. Covers pre-seed, post-seed, and post-audit phases; identifies gaps in machine-readable formats, opportunistic contest risk levels, pre-seed commitment workflow, and per-jurisdiction phase exports.


docs/transparency-implementation-plan.md

Phased implementation plan grounded in two principles:

Software independence — the ability to detect voting system errors without relying on the same software stack that produced them. Requires both mechanical verification (anyone can replicate the sample draw and risk calculation from published artifacts) and human verification (physically present observers independently record board interpretations and compare against Arlo's record).

The blind audit principle — audit boards must interpret each ballot without seeing the CVR. Observers follow along silently using a pre-generated excerpt that joins the retrieval list with the CVR; they never show it to the board or speak during the session.

Track A — Observer Toolkit (no Arlo changes required)

  • A1 Pytest integration test suite: full 2-round audit, saving phase artifacts with SHA-256 hashes at each transition
  • A2 Official export functions: download and hash phase artifacts for public posting (public posting is required — observers have no Arlo instance access)
  • A3 Independent observer verification scripts: replicate_sample.py, replicate_risk_level.py, end_to_end_verify.py — replicates sample draw and risk level from published artifacts alone
  • A4 Test coverage for A3/A5 scripts
  • A5 Observer excerpt generator: joins retrieval list with CVR to produce a print-ready per-ballot sheet; includes a printed notice reminding observers not to show it to the board

Track B — Arlo Improvements

  • B1 Machine-readable audit report (JSON endpoint) — may be worth implementing before some Track A work
  • B2 Opportunistic contest risk levels: compute and export risk for contests that received ballots incidentally; add universe_ballot_count per contest to sample sizes response
  • B3 Per-contest, per-jurisdiction eligible ballot count endpoint: needed to verify opportunistic risk level denominators — the manifest CSVs give total ballot counts but not the count of ballots containing a given contest, which comes from CVR metadata
  • B4 Pre-seed hash-index JSON endpoint
  • B5 Per-jurisdiction phase exports (new data only per phase)
  • B6 UI transparency checklist panel at each phase transition (soft gate, not a hard block)
  • B8 Trusted timestamping support (approach TBD)

Also notes CVR anonymization requirements (rare ballot styles < ~10 must be aggregated before public CVR release) and why Arlo server library code reuse is appropriate for observer scripts.


docs/cloud-testing-deployment-plan.md

Notes on cloud testing and deployment context for running Arlo in a test environment.


Originally drafted with Copilot. Developed with Claude Code.

Adds docs/transparency-report.md — an analysis of what Arlo currently
exports at each phase of a comparison audit, where the transparency
gaps are, and prioritized recommendations for closing them.

Covers three audit phases (pre-seed, post-seed/pre-comparisons,
post-audit), identifies 12 prioritized recommendations, and includes a
summary table mapping each transparency need to current Arlo status and
the gap.

Co-Authored-By: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-Authored-By: Neal McBurnett <nealmcb@gmail.com>
nealmcb and others added 6 commits June 20, 2026 17:00
Survey of Heroku (fully supported), VPS, and Docker (absent) deployment
options; Cypress E2E and Artillery load-testing tooling; fastest path
to a test instance using FLASK_ENV=development + nOAuth.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two-track plan: Track A (observer toolkit — pytest harness, official
export scripts, observer mechanical-verification scripts) and Track B
(Arlo improvements — JSON report, opportunistic contest risk levels,
sampler-inputs artifact, pre-seed hash bundle, per-jurisdiction phase
exports, UI transparency checklist, reproducibility bundle).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the software-independence framing: computers verifying computers is
not sufficient; human observers physically present during audit board
sessions are the critical missing link.

Add A5 (generate_transcript.py): joins retrieval list with CVR to
produce a right-justified per-ballot transcript (matching the
rightJustifiedBallotList.pdf format) that observers follow silently
during sessions, marking any deviation from what the board says aloud.
No writing required — just listening and marking. Post-session
comparison against the Arlo audit report can be done entirely on paper.

Add blind-audit protocol: audit boards never see the CVR at all;
observer transcript must likewise never be shown to boards.

Add CVR anonymization context referencing loriinboulder/anonymize_cvr
and Branscomb et al. 2018.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tighten software-independence framing; add Arlo itself to the list of
systems that must not leak CVR interpretations to audit boards.

Remove the paragraph stating the excerpt generator uses the unredacted
CVR — resolved by Q5: excerpt generation must use the publicly-posted
(anonymized) CVR, not raw data from the Arlo server.

Revise open questions:
- Q1: redaction timing and overlap with selected ballots need design
- Q3: unaudited contests should prompt "address via other auditing"
- Q4 (was "auth"): reframed as data access — observers have no Arlo
  instance access; export flow must include public posting design
- Q5: answered — excerpt generator is an observer-side tool on public CVR
- Q6: closed — code audit confirmed the audit board UI does not expose
  CVR vote choices to boards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A5 (generate_excerpt.py): specify the --cvr argument must be the
publicly-posted (anonymized) CVR, not a raw Arlo export. Add rare-style
redaction context to the missing-imprinted-ID warning (step 6).

A2 (export scripts): make explicit that public posting of each phase
bundle is a required workflow step, not optional — observers have no
Arlo instance access and depend on the public artifacts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- B3 reframed: sampler inputs artifact should include per-jurisdiction
  eligible ballot counts per contest (needed to verify opportunistic
  risk levels), not just a JSON repackaging of manifest data
- B7 removed: step-by-step observation is preferred over a single
  reproducibility bundle artifact
- B8 simplified: RFC 3161 specifics removed, approach left as TBD
- p-value → risk level throughout
- deviation → discrepancy throughout
- Observer Toolkit scripts: Arlo server library reuse is fine
- A2 renamed to Official Export Functions (not just scripts)
- Blind-audit principle description softened slightly
- Excerpt format updated (underscore separator, longer decorators)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant