Statistical analysis pipeline for the Human-AI Reliance study (SHARE Lab, University of Waterloo). Processes engagement metrics from the Supabase chat database, identifies non-engagers for sensitivity filtering, and runs the primary/secondary MLM analyses on observed and perceived reliance.
Post-Study-Analysis/
├── HAI_analysis_directionalR_revisionFF_MLM_bootstrap.ipynb ← main analysis notebook
├── schema.sql ← Supabase DDL (context only)
├── requirements.txt
├── README.md
│
├── scripts/
│ ├── fix_duplicates.py # Step 1: dedup raw Supabase exports → merged/
│ ├── engagement_metrics.py # Step 2: compute turns/words/duration → analysis/
│ ├── generate_scoring_yaml.py # Step 3: per-scenario YAML for qualitative scoring
│ └── engagement_thresholds.py # Step 4: exclusion lists + distribution plots
│
├── data/ (gitignored — participant data)
│ ├── comprehensive_theme_file2.csv ← Qualtrics + theme cosine scores
│ ├── comprehensive_language_file2.csv ← Qualtrics + LIWC cosine scores
│ ├── comprehensive_detail_lengths2.csv ← Qualtrics + word/char counts
│ ├── comprehensive_stance_file.csv ← Qualtrics + stance classifications
│ ├── unified_theme_analysis_results_3.xlsx ← combined theme + perceived reliance
│ ├── H-AI_Subjectivity_Study_High_Interaction_filtered.csv ← Qualtrics export, high cond.
│ ├── H-AI_Subjectivity_Study_Low_Interaction_filtered.csv ← Qualtrics export, low cond.
│ ├── H-AI_Subjectivity_Study_Baseline_filtered.csv ← Qualtrics export, baseline
│ └── ai_conflicts_high.xlsx ← AI vs. participant stance conflicts
│
├── raw/ (gitignored — Supabase row exports)
│ ├── conversations_rows.csv
│ ├── messages_rows.csv
│ └── participants_rows.csv
│
├── merged/ (gitignored — fix_duplicates.py output)
│ ├── conversations_merged.csv
│ ├── messages_merged.csv
│ └── participants_merged.csv
│
├── analysis/ (gitignored — engagement_metrics.py + threshold outputs)
│ ├── messages_clean.csv
│ ├── conversation_engagement_metrics.csv
│ ├── conversation_transcripts_for_scoring.csv
│ ├── engagement_thresholds.txt
│ ├── hard_exclusions_test_participants.csv
│ ├── threshold_turns2_words20.csv
│ ├── threshold_turns2_words30.csv
│ ├── exclude_words20.csv ← USE THIS for sensitivity run A
│ ├── exclude_words30.csv ← USE THIS for sensitivity run B
│ ├── prolific_to_response_mapping.csv ← Qualtrics ResponseId ↔ Prolific ID (high cond.)
│ ├── scoring/ (gitignored — per-scenario YAML, all conversations)
│ ├── scoring_words20/ (gitignored — non-engagers removed)
│ └── scoring_words30/
│
├── outputs_updated_directionalR_revisionFF/ (gitignored — notebook output CSVs)
│
└── backups/ (gitignored — pg_dump archive)
HAI_analysis_directionalR_revisionFF_MLM_bootstrap.ipynb — the main statistical analysis. Run all cells in order.
Required input files (gitignored, place in the paths shown):
| File | Path | Source |
|---|---|---|
comprehensive_theme_file2.csv |
data/ |
Qualtrics + theme cosine scores |
comprehensive_language_file2.csv |
data/ |
Qualtrics + LIWC cosine scores |
comprehensive_detail_lengths2.csv |
data/ |
Qualtrics + word/char counts |
comprehensive_stance_file.csv |
data/ |
Qualtrics + stance classifications |
unified_theme_analysis_results_3.xlsx |
data/ |
Combined theme + perceived reliance (needed for perceived reliance cells 25–31) |
H-AI_Subjectivity_Study_*_filtered.csv |
data/ |
Raw Qualtrics exports (all 3 conditions) |
| Cells | Section |
|---|---|
| 1 | Imports, paths, output directory |
| 2 (helpers) | MixedLM fitter, bootstrap CI, Cohen's d |
| 3 (load) | Load CSVs, compute R and FF per modality |
| 3b (filter) | Engagement filter — removes non-engager (response_id, scenario) pairs |
| 4 | Primary: directional reliance R, low vs. high, Holm across 3 modalities |
| X | Alternative reliance (FinalAI − FirstFinal) |
| 5 | Secondary: revision magnitude FF omnibus (baseline/low/high) |
| 6 | Post-hoc FF pairwise contrasts + bootstrap CIs |
| 5/6 (FinalAI) | Alignment to AI final response |
| 7–master | Sensitivity checks (MWU, participant aggregation) |
| 25–31 | Perceived reliance — requires unified_theme_analysis_results_3.xlsx |
Approved exclusion criterion (Sharon, 2026-06-03): turns < 2 AND total words < 20 in the high-interactivity chat condition.
- Source list:
analysis/exclude_words20.csv— 28(prolific_id, scenario_id)pairs - ID mapping:
analysis/prolific_to_response_mapping.csv— maps high-condition Prolific IDs to QualtricsResponseId(built fromQ84column in the high-interaction Qualtrics export) - Result: 21
(response_id, scenario)pairs removed; 7 pairs had no survey response and were already absent from the input CSVs
To regenerate the mapping from the Qualtrics export:
import pandas as pd
df = pd.read_csv("data/H-AI_Subjectivity_Study_High_Interaction_filtered.csv")
mapping = df[["ResponseId","Q84"]].rename(columns={"ResponseId":"response_id","Q84":"prolific_id"})
mapping = mapping[mapping["prolific_id"].str.len() == 24].drop_duplicates("prolific_id")
mapping.to_csv("analysis/prolific_to_response_mapping.csv", index=False)Separate from the main notebook — processes raw Supabase exports to compute per-conversation engagement metrics and identify non-engagers.
raw/*.csv → fix_duplicates.py → merged/*.csv
↓
engagement_metrics.py → analysis/
↓
generate_scoring_yaml.py → analysis/scoring*/
↓
engagement_thresholds.py → analysis/exclude_words*.csv
Run from the repo root (with .venv activated):
source .venv/bin/activate
python scripts/fix_duplicates.py
python scripts/engagement_metrics.py
python scripts/generate_scoring_yaml.py
python scripts/engagement_thresholds.pyExclusion files produced:
| File | Criterion | Pairs excluded |
|---|---|---|
analysis/exclude_words20.csv |
turns < 2 AND words < 20, plus test participants | 28 |
analysis/exclude_words30.csv |
turns < 2 AND words < 30, plus test participants | 43 |
Python 3.10+. Install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtNo participant data is committed to this repo. All *.csv, *.xlsx, *.gz, *.backup, and analysis/scoring*/ are gitignored — they contain Prolific IDs, IP addresses, and full transcripts. Request the Supabase export and Qualtrics exports from the study authors.
- Per-table CSV exports (Supabase dashboard → table → Export to CSV) — suffix
_rows.csv. Feed tofix_duplicates.py. pg_dumpcluster backup (.backup.gz) — not consumed by scripts. Kept inbackups/for disaster recovery.
All output CSVs land in outputs_updated_directionalR_revisionFF/ (gitignored). Key files:
| File | Contents |
|---|---|
all_mod.csv |
Combined outcomes (R, FF) before engagement filter |
all_mod_filtered.csv |
Combined outcomes (R, FF) after engagement filter applied |
primary_directionalR_results.csv |
Primary family: R low vs. high, Holm-corrected (Cell 4) |
rowlevel_alt_reliance_finalAI_minus_firstFinal.csv |
Alternative reliance row-level data (Cell X) |
secondary_revisionFF_omnibus.csv |
Secondary family: FF omnibus across all conditions, Holm-corrected (Cell 5) |
secondary_revisionFF_posthoc_pairwise.csv |
Post-hoc FF pairwise contrasts + bootstrap CIs (Cell 6) |
SENS_A_directionalR_MWU.csv |
Sensitivity A: non-parametric R low vs. high (MWU) |
SENS_B_relativeAlignment_RA_MWU.csv |
Sensitivity B: relative alignment (RA) MWU |
SENS_C_finalAI_FA_MWU.csv |
Sensitivity C: final-AI alignment (FA) MWU |
SENS_D1_revisionFF_KW_omnibus.csv |
Sensitivity D1: FF Kruskal-Wallis omnibus |
SENS_D2_revisionFF_pairwise_MWU.csv |
Sensitivity D2: FF pairwise MWU contrasts |
perceived_by_condition_means.csv |
Perceived reliance means by condition (low/high) |
perceived_by_condition_models.csv |
Perceived reliance MLM: condition effect |
perceived_by_observedR_models.csv |
Perceived reliance MLM: observed R as predictor |