Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
-
Updated
Apr 17, 2024 - Python
Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
Implementation of a Deep Reinforcement Learning algorithm, Proximal Policy Optimization (SOTA), on a continuous action space openai gym (Box2D/Car Racing v0)
Policy Optimization with Penalized Point Probability Distance: an Alternative to Proximal Policy Optimization
Mirror Descent Policy Optimization
[AAAI 2026] D²PPO: Diffusion Policy Policy Optimization with Dispersive Loss.
Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (Moalla et al. 2024). Uses TorchRL and provides extensive tools for studying representation dynamics in policy optimization.
Model-based Policy Gradients
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
This repository contains the code for the paper "Local policy search with Bayesian optimization".
Bittensor Subnet 11 — an open skill factory that uses distributed compute and RL to produce state-of-the-art skills for AI agents.
CPPO: Contrastive Perception for Vision Language Policy Optimization
Reinforcement Learning (RL)! This repository is your hands-on guide to implementing RL algorithms, from Markov Decision Processes (MDPs) to advanced methods like PPO and DDPG. Build smart agents, learn the math behind policies, and experiment with real-world applications!
Implementation and explorations into MPO / DMPO
Paragraph-level Policy Optimization for Vision-Language Deepfake Detection - ICML 2026
Code for Policy Optimization as Online Learning with Mediator Feedback
An implementation of the reinforcement learning for CartPole-v0 by policy optimization
Code accompanying the NeurIPS 2025 paper "Sequential Monte Carlo for Policy Optimization in Continuous POMDPs".
A decision-focused uplift modeling framework that jointly optimizes CATE prediction and treatment allocation policy via a shared-layer neural network, benchmarked against S-Learner, X-Learner, UpliftRank, and GRF on CRITEO-UPLIFT v2.
Universal governance layer for critical infrastructure control systems. Deterministic validation of AI, automation, and human decisions.
Add a description, image, and links to the policy-optimization topic page so that developers can more easily learn about it.
To associate your repository with the policy-optimization topic, visit your repo's landing page and select "manage topics."