PixelPolish

Pixel-level reinforcement learning for blind image enhancement — no ground truth, any modality.

PixelPolish trains a fully-convolutional policy that predicts a per-pixel tone-mapping curve y = clip(α · x^γ + β, 0, 1) and iteratively "polishes" an image over K=5 steps. The policy is optimized with PPO against a composite no-reference reward (CLIP-IQA / MUSIQ / gradient / entropy / EME) — no paired clean targets required. Because the state is just the image and the action is a generic tone curve, the same policy works on RGB, grayscale, and IR inputs.

Low-light RGB → Enhanced. IR → Enhanced. Same policy, no fine-tune.

Why pixel-level RL?

Classical regression	Diffusion / GAN	PixelPolish
Needs paired (degraded, clean) data	Needs large priors / distilled teachers	Needs only unpaired images
One model per degradation type	One model per modality	One model per everything
Deterministic output	Stochastic, often hallucinatory	Curve-based → minimally invasive, reversible

The policy is shared across pixels (FCN), so the receptive field of dilated convs lets each pixel condition its local action on neighborhood context — uniform or spatially varying, up to you.

Features

Modality-agnostic: train on a mix of RGB / grayscale / IR, evaluate anywhere.
No ground truth: composite NR-IQA reward (CLIP-IQA, MUSIQ) + physics-based (Sobel gradient, soft-histogram entropy, Agaian EME).
Two reward modes: scalar per-image (Phase 3) or per-pixel reward map with spatial GAE (Phase 4).
PPO by default, GRPO for ablation: swap via one config line.
Config-driven: configs/base.yaml is the single source of truth; configs/local_dev.yaml lets the same code run on a 6 GB laptop GPU.
Type-hinted, shape-asserted, pytest-covered: 30+ unit / integration tests, CPU-runnable.
Soft pyiqa dependency: the physics-only subset of rewards needs no external models; CLIP-IQA / MUSIQ / NIQE / BRISQUE auto-enable if pyiqa is installed.

Architecture at a glance

           ┌──────────────────────────── PolicyValueFCN ────────────────────────────┐
 x_t  ─▶   │ Conv3 → ReLU → [Dilated3 d=1,2,3,4 × ReLU]₄ → Conv3 → ReLU            │
 [B,C,H,W] │                  │                                    │                 │
           │                  ├──── 1×1 → μ        [B,3,H,W]       │                 │
           │                  ├──── 1×1 → log σ    [B,3,H,W]       │                 │
           │                  └──── 1×1 → V        [B,1,H,W]       │                 │
           └────────────────────────────────────────────────────────┘
                                         │ sample
                                         ▼
                          tanh + affine → (γ, α, β)
                                         │ apply_curve
                                         ▼
                                     x_{t+1}
                                         │ reward = R(x_{t+1}) − R(x_t)
                                         ▼
                              PPO / GRPO update

Core design decisions (more detail in CLAUDE.md):

Dimension	Choice
State	Current image only `[B, C, H, W]`
Action	Continuous per-pixel `[γ ∈ [0.3,3], α ∈ [0.5,2], β ∈ [-0.2,0.2]]`
Reward	Composite NR, relative `ΔR` per step
Episode	Fixed K=5, dense reward
Agent	Pixel-level, FCN parameter sharing (PixelRL-style)
Algorithm	PPO (primary) / GRPO (optional ablation)

Install

git clone https://github.com/<you>/pixelpolish.git
cd pixelpolish
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / remote
source .venv/bin/activate

pip install -r requirements.txt

Requires PyTorch ≥ 2.1 and, optionally, pyiqa for CLIP-IQA / MUSIQ / NIQE / BRISQUE.

Quick start

1. Smoke test (CPU or GPU, no data needed)

python -m scripts.smoke_forward --config configs/base.yaml --overrides configs/local_dev.yaml

Expected: tensor-shape dump + SMOKE TEST OK. This is the Phase-1 validation gate from CLAUDE.md.

2. Train

Remote GPU (RTX 4090, 24 GB):

python -m scripts.train \
    --config configs/base.yaml \
    --data-root /path/to/images \
    --output-dir runs --run-name exp01

Laptop GPU (6 GB VRAM):

python -m scripts.train \
    --config configs/base.yaml \
    --overrides configs/local_dev.yaml \
    --data-root ./data/small \
    --output-dir runs --run-name local01

Per-pixel reward map (Phase 4):

python -m scripts.train \
    --config configs/base.yaml \
    --overrides configs/ablation/ppo_pixel.yaml \
    --data-root /path/to/images

GRPO ablation (Phase 6):

python -m scripts.train \
    --config configs/base.yaml \
    --overrides configs/ablation/grpo.yaml \
    --data-root /path/to/images

CLI dotlist overrides are supported too:

--set train.lr=1e-4 --set train.batch_size=4

3. Evaluate

python -m scripts.eval \
    --config configs/base.yaml \
    --ckpt runs/exp01/checkpoints/final.pt \
    --data-root /path/to/test

Reports NIQE, BRISQUE, CLIP-IQA, MUSIQ (via pyiqa) plus built-in gradient / entropy / EME, with per-metric (input, output, Δ).

4. Visualize

python -m scripts.visualize \
    --config configs/base.yaml \
    --ckpt runs/exp01/checkpoints/final.pt \
    --input path/to/image.png \
    --output outputs/vis01

Writes the input, each intermediate step x_{t+1}, and normalized γ / α / β action maps per step.

5. Run tests

pytest -q

All 30+ tests run on CPU in a few seconds — pyiqa is not required.

Project layout

pixelpolish/
├── CLAUDE.md                 # Design doc (start here)
├── configs/
│   ├── base.yaml             # Full hyperparameters
│   ├── local_dev.yaml        # 6 GB VRAM override
│   └── ablation/
│       ├── ppo_pixel.yaml    # Phase 4: per-pixel reward map
│       └── grpo.yaml         # Phase 6: GRPO
├── src/
│   ├── models/
│   │   ├── policy_fcn.py     # Dilated-CNN backbone + μ/logσ/V heads
│   │   └── actions.py        # Curve bounds, sampling, log-prob, entropy, apply_curve
│   ├── rewards/
│   │   ├── base.py           # RewardFunction ABC + RelativeReward
│   │   ├── physics.py        # Gradient / Entropy / EME
│   │   ├── iqa.py            # pyiqa wrappers (CLIP-IQA, MUSIQ, NIQE, BRISQUE)
│   │   └── composite.py      # Weighted composition, scalar / pixel mode
│   ├── env/
│   │   ├── image_env.py      # Gym-style batched env
│   │   └── degradation.py    # Optional synthetic degradations
│   ├── algorithms/
│   │   ├── rollout.py        # Episode collection with bootstrap value
│   │   ├── advantage.py      # GAE (shape-generic)
│   │   ├── ppo.py            # PPOTrainer, scalar + pixel
│   │   └── grpo.py           # GRPOTrainer, group z-norm
│   ├── data/
│   │   └── dataset.py        # UnpairedImageDataset + MixedModalityDataset
│   └── utils/
│       ├── config.py         # OmegaConf loader with overrides
│       ├── logging.py        # TensorBoard wrapper
│       ├── checkpoints.py    # save / load / prune
│       └── seed.py
├── scripts/
│   ├── smoke_forward.py      # Phase-1 gate
│   ├── train.py              # Main entry
│   ├── eval.py               # NR-IQA metric sweep
│   └── visualize.py          # Save action maps + enhanced steps
└── tests/
    ├── test_actions.py
    ├── test_policy_fcn.py
    ├── test_rewards.py
    ├── test_dataset.py
    ├── test_smoke_forward.py
    └── test_training_loop.py

Configuration cheat sheet

Key knobs in configs/base.yaml:

action:         # per-pixel curve bounds
  gamma_range: [0.3, 3.0]
  alpha_range: [0.5, 2.0]
  beta_range:  [-0.2, 0.2]

reward:
  mode: scalar                # or 'pixel' for Phase 4
  relative: true              # ΔR per step (recommended)
  weights: {gradient: 1.0, entropy: 0.5, eme: 0.5, clipiqa: 1.0, musiq: 0.0}
  scales:  {gradient: 10.0, entropy: 1.0, eme: 0.1, clipiqa: 1.0, musiq: 0.01}

train:
  algorithm: ppo              # or 'grpo'
  batch_size: 8
  episode_length: 5
  ppo_epochs: 4
  clip_ratio: 0.2
  entropy_coef: 0.01          # annealed to entropy_coef_min
  gae_gamma: 0.99
  gae_lambda: 0.95

Roadmap (from `CLAUDE.md`)

Phase 1 — Skeleton: FCN policy + curve actions + smoke pass
Phase 2 — Composite reward (physics + CLIP-IQA / MUSIQ), relative deltas
Phase 3 — PPO with scalar reward, GAE, entropy bonus
Phase 4 — Per-pixel reward map + spatial GAE
Phase 5 — Multi-modality benchmark (RGB low-light + grayscale + IR) vs Zero-DCE / ReLLIE / HE
Phase 6 — GRPO ablation (group z-norm, optional drop critic)

Known limitations & risks

Reward hacking. CLIP-IQA can be exploited by over-smoothing. Mitigations: relative rewards, log all sub-rewards separately, qualitative inspection every N updates. If CLIP-IQA saturates while physics drops → stop and investigate.
Entropy collapse. log σ is floored via config annealing (entropy_coef_min); watch it on TensorBoard.
Pixel-wise value variance. GAE λ=0.95 helps. If unstable, try grpo with drop_critic=true — no critic, group-normalized returns only.

Citation

If this helps your research, please cite:

@software{pixelpolish2026,
  title  = {PixelPolish: Pixel-Level RL for Blind Image Enhancement},
  author = {Your Name},
  year   = {2026},
  url    = {https://github.com/<you>/pixelpolish}
}

References

Furuta et al., PixelRL: Fully Convolutional Network with Reinforcement Learning for Image Processing, IEEE TMM 2020
Schulman et al., PPO: Proximal Policy Optimization, arXiv:1707.06347
DeepSeek, GRPO: Group Relative Policy Optimization, 2024
Wang et al., CLIP-IQA: Exploring CLIP for Assessing the Look and Feel of Images, AAAI 2023
Ke et al., MUSIQ: Multi-Scale Image Quality Transformer, ICCV 2021
Zhang et al., ReLLIE: Deep Reinforcement Learning for Customized Low-Light Image Enhancement, ACM MM 2021
Guo et al., Zero-DCE: Zero-Reference Deep Curve Estimation, CVPR 2020

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.claude		.claude
configs		configs
scripts		scripts
src		src
tests		tests
vis_dicm		vis_dicm
vis_v4		vis_v4
vis_v5		vis_v5
vis_v6		vis_v6
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PixelPolish

Why pixel-level RL?

Features

Architecture at a glance

Install

Quick start

1. Smoke test (CPU or GPU, no data needed)

2. Train

3. Evaluate

4. Visualize

5. Run tests

Project layout

Configuration cheat sheet

Roadmap (from `CLAUDE.md`)

Known limitations & risks

Citation

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PixelPolish

Why pixel-level RL?

Features

Architecture at a glance

Install

Quick start

1. Smoke test (CPU or GPU, no data needed)

2. Train

3. Evaluate

4. Visualize

5. Run tests

Project layout

Configuration cheat sheet

Roadmap (from CLAUDE.md)

Known limitations & risks

Citation

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Roadmap (from `CLAUDE.md`)

Packages