CoReasoning

Teaching and assessing the skill of working with generative AI, decomposed into three independently-assessable competencies: Framing, Judging, and Steering.

Most AI-in-education tools optimize for speed-to-answer. CoReasoning optimizes for quality-of-thinking: learners improve intentionally imperfect AI output through structured reasoning loops, and the three skills are scored separately.

📄 Paper (open-source, MIT)

"Framing, Judging, Steering: An Assessable Competency Model for Teaching Students to Reason With Generative AI" by Alexander Apartsin (Holon Institute of Technology) and Yehudit Aperstein (Afeka College of Engineering).

A theoretically-grounded competency model with a proof-of-concept instrument and a feasibility demonstration that the three skills dissociate: own-skill grade effects average +1.02 versus +0.01 for cross-skill effects (N=80 across 10 subjects; the cleanest pair, Framing and Judging, is uncorrelated, ρ=−0.03 ns), the result replicates across grader backends, and the grader is 92% self-consistent. No learning-outcome claims are made; a human-rater validation study is prepared.

📖 Read it (HTML + KaTeX): https://apartsinprojects.github.io/CoReason/
📝 Word: docs/coreasoning.docx · Source: paper/coreasoning.md
📚 Bibliography: paper/references.bib (55 entries, validated)
✉️ Submission package: paper/SUBMISSION/ (cover letter + anticipated reviewer response)

The framework

Framing: turn an ill-defined problem into a well-specified task before invoking AI.
Judging: critically evaluate AI output for errors, gaps, unstated assumptions, and risk.
Steering: iteratively redirect the AI toward a better solution across cycles.

The defining move is separating the pre-generation skill (Framing) from the post-generation corrective skill (Steering), which prior frameworks fuse under "prompting." Each skill is grounded in established theory (metacognitive monitoring and control, self-regulated learning, epistemic vigilance, productive struggle), and the model states five testable propositions about how the skills relate.

The instrument and CoReasoning Lab

CoReasoning Lab is a prototype learning platform that auto-generates ill-defined problems with seeded flaws, presents deliberately-imperfect AI output, runs judge-and-steer cycles, and scores the three skills with rubric-driven LLM evaluators. What this repository releases and evaluates is the scoring engine: the sixteen prompts at code/artifacts/prompt-debug/originals/*.yaml plus the controlled-generation harness in research/. The interface figures in the paper are representative mockups; static UI mockups live in screens/.

Repository map

paper/            manuscript (coreasoning.md), references.bib, figures, build_html.py, SUBMISSION/, reviews/
docs/             rendered site: index.html (KaTeX), coreasoning.docx, assets/ (served via GitHub Pages)
research/
  scripts/        reproducible harness: harness.py, e3_dissociation.py, e3_expand.py,
                  e2_reliability.py, e1_analysis.py, make_figures.py, render_bib.py, batch_openai.py
  results/        grades CSVs + analysis JSONs (N=80 + ablations + robustness)
  experiments/    registry: PROJECT_LOG.md, INDEX.md
  DATASHEET.md    dataset documentation (Gebru-style)
human-study/      prepared human-rater study: CODEBOOK.md, score_agreement.py
code/artifacts/prompt-debug/originals/   the sixteen-prompt scoring instrument (YAML)
screens/          static UI mockups of the prototype

Reproduce

API keys live outside the repo (never committed). Provide them in ~/.config/coreason/.env.all (or set COREASON_ENV_FILE), with OPENAI_API_KEY and/or OPENROUTER_API_KEY / GROQ_API_KEY.

# dissociation (N=80 across 10 subjects) + analyses
COREASON_GRADER=openai:gpt-4o python research/scripts/e3_dissociation.py 5   # 5 subjects
COREASON_GRADER=openai:gpt-4o python research/scripts/e3_expand.py           # +5 subjects -> N=80
python research/scripts/e1_analysis.py research/results/e3_dissociation_grades.csv
COREASON_GRADER=openai:gpt-4o python research/scripts/e2_reliability.py      # grader test-retest
python research/scripts/make_figures.py                                     # regenerate figures
python paper/build_html.py                                                  # rebuild docs/index.html

A deterministic disk cache (research/data/llm_cache/) makes runs reproducible and free on re-run. See research/README.md and research/experiments/INDEX.md for the full experiment list.

Status

The conceptual paper and the feasibility demonstration (E1 construct validity, E2 reliability, E3 dissociation at N=80, E4 ground-truth ablation, grader-backend robustness, a harsher-steering ablation, and per-subject breakdowns) are complete. The one open item is the human-rater agreement study, fully prepared in human-study/ and requiring human graders. No learning-outcome claims are made.

License

MIT. The system design, the instrument, the harness, the data, and the figures are open-source. The hero image was generated with the Google Gemini Image API.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.claude		.claude
code/artifacts/prompt-debug/originals		code/artifacts/prompt-debug/originals
content		content
docs		docs
human-study		human-study
paper		paper
research		research
screens		screens
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build-content.js		build-content.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoReasoning

📄 Paper (open-source, MIT)

The framework

The instrument and CoReasoning Lab

Repository map

Reproduce

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CoReasoning

📄 Paper (open-source, MIT)

The framework

The instrument and CoReasoning Lab

Repository map

Reproduce

Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages