Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

This repository contains the official codebase, pre-trained weights, and evaluation environments for the preprint: "Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO". We provide a minimal, standalone reproducible example (MRE) using standard MLPs on LunarLander-v2 to demonstrate the pathology of surrogate hacking and our proposed solution.

Paper

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

arXiv: https://arxiv.org/abs/2604.13517

TL;DR

We identify and formalize two severe optimization pathologies in multi-timescale RL: Surrogate Objective Hacking (exploiting short-term shaping rewards at the expense of the true objective) and the Paradox of Temporal Uncertainty (irreversible myopic degeneration caused by gradient-free variance routing).

To overcome these fundamental vulnerabilities, we introduce Target Decoupling, a novel architectural and algorithmic intervention that disentangles representation learning from temporal routing, allowing the agent to align with the true long-term objective (γ = 0.999) without collapsing into short-term behavioral traps.

Visual Proof: The Ablation Journey

The core contribution of this work is isolating and systematically solving the pathologies of multi-timescale learning. The comparison between Stage 1 and Stage 4 is particularly striking: while the baseline is paralyzed by the fear of crashing and greedy hoarding of small centering rewards, our decoupled agent acts with true foresight.

Stage 1: Baseline	Stage 2: Surrogate Hacking

Hovering & Wasting Fuel The agent falls into a local optimum. Out of fear of crashing, it hovers endlessly, wasting fuel and failing the main objective just to hoard small, short-term shaping rewards (centering).	Surrogate Objective Hacking Attempting to route values dynamically across different timescales via Actor-driven attention leads to gradient exploitation. The policy collapses as it artificially minimizes the surrogate loss by manipulating attention weights rather than improving physical control.
Stage 3: Temporal Paradox	Stage 4: Target Decoupling

Aimless Wandering The agent suffers from temporal uncertainty. Unable to confidently attribute credit over long horizons, it fails to commit to a landing strategy and wanders aimlessly above the landing pad.	Intelligent Landing The agent uncovers true intelligence by decoupling the target. It understands the ultimate long-term goal (γ = 0.999) and executes a highly fuel-efficient, safe landing, smartly ignoring the strict need to be perfectly centered if it means saving fuel.

Repository Structure

The repository is structured to perfectly mirror our 4-stage ablation study. Each stage is completely standalone, strictly utilizing standard MLPs to ensure clarity and ease of reproducibility.

.
├── 1_baseline.py                      # Stage 1: Standard PPO Baseline
├── 2_surrogate_hacking_attention.py   # Stage 2: Introduction of multi-timescale collapse
├── 3_temporal_paradox_variance.py     # Stage 3: Attempted variance reduction
├── 4_target_decoupling_final.py       # Stage 4: Proposed Target Decoupling architecture
├── 5_evaluate_seeds_plot.py           # Multi-seed evaluation and plotting script
├── record_1_baseline.py               # Evaluation script for Stage 1
├── record_2_surrogate.py              # Evaluation script for Stage 2
├── record_3_paradox.py                # Evaluation script for Stage 3
├── record_4_decoupling.py             # Evaluation script for Stage 4
├── weights_stage_1.pth                # Pre-trained weights for Baseline
├── weights_stage_2.pth                # Pre-trained weights for Surrogate Hacking
├── weights_stage_3.pth                # Pre-trained weights for Temporal Paradox
├── weights_stage_4.pth                # Pre-trained weights for Target Decoupling
└── docs/                              # Assets (GIFs, etc.)
    ├── baseline_hovering.gif
    ├── seed_comparison_plot.png
    ├── surrogate_hacking_crash.gif
    ├── temporal_paradox_wandering.gif
    └── target_decoupling_landing.gif

Quick Start

Evaluating the pre-trained models is designed to be frictionless.

Install Dependencies
```
pip install -r requirements.txt
```
Evaluate the Proposed Solution (Stage 4) See the Target Decoupling agent elegantly solve the environment:
```
python record_4_decoupling.py
```
Observe the Baseline Pathology (Stage 1) Contrast it by watching the baseline agent frantically hover and waste fuel:
```
python record_1_baseline.py
```
Multi-Seed Evaluation Run the full comparison across 5 random seeds to reproduce the statistical significance plots:
```
python 5_evaluate_seeds_plot.py
```

Note: You can run any of the standalone X_*.py scripts to train the given stage from scratch.

Statistical Significance

To rigorously validate our claims, we evaluate the Target Decoupling architecture against the Baseline over multiple random seeds (n=5). The Target Decoupling agent consistently solves the environment with minimal variance, easily eliminating the failure modes and escaping hovering local optima.

Citation

If you find this code or our insights useful in your research, please consider citing our work:

@misc{sunRepresentationRoutingOvercoming2026b,
  title = {Representation over {{Routing}}: {{Overcoming Surrogate Hacking}} in {{Multi-Timescale PPO}}},
  shorttitle = {Representation over {{Routing}}},
  author = {Sun, Jing},
  year = 2026,
  publisher = {arXiv},
  doi = {10.48550/ARXIV.2604.13517},
  urldate = {2026-04-16},
  copyright = {Creative Commons Attribution 4.0 International},
  keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

Paper

TL;DR

Visual Proof: The Ablation Journey

Repository Structure

Quick Start

Statistical Significance

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
1_baseline.py		1_baseline.py
2_surrogate_hacking_attention.py		2_surrogate_hacking_attention.py
3_temporal_paradox_variance.py		3_temporal_paradox_variance.py
4_target_decoupling_final.py		4_target_decoupling_final.py
5_evaluate_seeds_plot.py		5_evaluate_seeds_plot.py
LICENSE		LICENSE
README.md		README.md
record_1_baseline.py		record_1_baseline.py
record_2_surrogate.py		record_2_surrogate.py
record_3_paradox.py		record_3_paradox.py
record_4_decoupling.py		record_4_decoupling.py
requirements.txt		requirements.txt
weights_stage_1.pth		weights_stage_1.pth
weights_stage_2.pth		weights_stage_2.pth
weights_stage_3.pth		weights_stage_3.pth
weights_stage_4.pth		weights_stage_4.pth

Folders and files

Latest commit

History

Repository files navigation

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

Paper

TL;DR

Visual Proof: The Ablation Journey

Repository Structure

Quick Start

Statistical Significance

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages