Skip to content

docs: add a diagram-heavy developer guide for the research codebase#1

Open
levon003 wants to merge 1 commit into
mainfrom
claude/research-docs-diagrams-9p9xw4
Open

docs: add a diagram-heavy developer guide for the research codebase#1
levon003 wants to merge 1 commit into
mainfrom
claude/research-docs-diagrams-9p9xw4

Conversation

@levon003

Copy link
Copy Markdown
Owner

What

Adds a docs/ folder that documents HealthBlogRec at the architecture level rather than the docstring level — the goal being to make the choices, layout, and shortcomings of this 2021 research code easy to understand for someone reading the paper and the source side by side.

Per the brief: fewer docstrings, more Mermaid diagrams in Markdown summaries, with candid notes throughout about what more modern approaches would recommend.

New files

File Contents
docs/README.md Orientation, repo map, suggested reading order, one-paragraph summary
docs/glossary.md The project's vocabulary — USP, initiation, eligible/existing/active, triple, coverage — with a state diagram. Read this first; nothing else parses without it.
docs/architecture.md The whole system on one page (5-stage flow), the two entry points, the end-to-end offline experiment loop, and a package map
docs/data-pipeline.md The heart of the project: the timestamp-ordered "replay history one interaction at a time" simulation that produces training triples and test contexts, the stateful machinery (eligibility/graph/activity), the async writer, and feature dedup
docs/modeling.md The 1563-d feature vector layout, the model zoo (LinearNet/SimNet/ConcatNet/LearnedSimNet/InteractionNet), the training loop, baselines, and the cached offline-evaluation trick used for hyperparameter sweeps
docs/modernization.md Consolidated, candid "what would you do today?" notes — and a short "what aged well" section

Also links the guide from the top-level README.md.

Approach & accuracy

  • Documentation is grounded in the actual source — file/line references throughout (e.g. the test-time target fallback whose own comment admits "random might literally be better" at reccontext.py:165, the FIXME is this reasonable? cache resize, reconstructed amp timestamps, pointwise-BCE-vs-ranking-metrics mismatch).
  • The "🕰️ Modern take" call-outs are framed as orientation, not bug reports — they explicitly acknowledge the original constraints (fixed dataset, single Slurm cluster, paper deadline, 2021 tooling).
  • All 14 Mermaid diagrams were validated against the mermaid parser (with a JSDOM backend) — 0 parse failures.

Docs-only change; no code is touched.

https://claude.ai/code/session_018fRrzqPsGMHL3roZ2E3gVq


Generated by Claude Code

Add a docs/ folder that documents the system at the architecture level
rather than the docstring level: how the pieces fit together, the
project-specific vocabulary, the streaming data-generation pipeline, the
model zoo and offline evaluation, and candid notes on what modern
practice would recommend.

 - docs/README.md       orientation + repo map + reading order
 - docs/glossary.md     USP / initiation / eligible-existing-active /
                        triple / coverage, with a state diagram
 - docs/architecture.md whole-system overview, entry points, experiment
                        loop, package map (Mermaid)
 - docs/data-pipeline.md the timestamp-ordered replay that produces
                        training triples and test contexts, plus the
                        async writer and feature dedup
 - docs/modeling.md     1563-d feature vector, model zoo, training loop,
                        baselines, cached offline evaluation, sweeps
 - docs/modernization.md consolidated "what would you do today" notes

All 14 Mermaid diagrams validated with the mermaid parser. Link the new
guide from the top-level README.

https://claude.ai/code/session_018fRrzqPsGMHL3roZ2E3gVq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants