Skip to content
Change the repository type filter

All

    Repositories list

    • Public images, logos, from Apart that need a public url
      0000Updated Apr 16, 2026Apr 16, 2026
    • 📚📚📚📚📚📚📚📚📚 Reading everything
      CSS
      41511Updated Mar 11, 2026Mar 11, 2026
    • TypeScript
      0200Updated Jul 22, 2025Jul 22, 2025
    • 🎣 Breaking and entering through language model memory and context
      Python
      MIT License
      0350Updated May 27, 2025May 27, 2025
    • DarkBench

      Public
      Benchmarking Dark Patterns in LLMs (ICLR 2025)
      Python
      MIT License
      91601Updated Mar 29, 2025Mar 29, 2025
    • A website for partners to engage with Apart.
      HTML
      0000Updated Feb 9, 2025Feb 9, 2025
    • ✱ Interpreting learned feedback patterns in large language models
      Jupyter Notebook
      MIT License
      2570Updated Jan 8, 2025Jan 8, 2025
    • TypeScript
      0000Updated Nov 24, 2024Nov 24, 2024
    • ✱ Interpreting how similar sequence continuation tasks share internal representations ✱
      Jupyter Notebook
      MIT License
      2210Updated Nov 9, 2024Nov 9, 2024
    • 3cb

      Public
      3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models
      Python
      51521Updated Oct 30, 2024Oct 30, 2024
    • 🌍 Website for NeurIPS2023MI
      CSS
      2100Updated Aug 19, 2024Aug 19, 2024
    • ✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks
      Jupyter Notebook
      MIT License
      01800Updated Aug 16, 2024Aug 16, 2024
    • Python
      0400Updated Jul 19, 2024Jul 19, 2024
    • How to get started in evaluations and demonstrations research for dangerous capabilities
      MIT License
      1710Updated May 24, 2024May 24, 2024
    • 🦠 DeepDecipher: An open source API to MLP neurons
      Rust
      MIT License
      09460Updated May 2, 2024May 2, 2024
    • 🌍 Website for the Scaling Laws workshop
      CSS
      2100Updated Mar 22, 2024Mar 22, 2024
    • .github

      Public
      0000Updated Mar 14, 2024Mar 14, 2024
    • 🚨 METR Task Standard fork for the Code Red Hackathon
      TypeScript
      36100Updated Feb 29, 2024Feb 29, 2024
    • Jupyter Notebook
      0100Updated Feb 6, 2024Feb 6, 2024
    • 👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"
      Python
      Other
      42011Updated Jan 19, 2024Jan 19, 2024
    • open

      Public
      🌍 Repository to update our open data
      MIT License
      0000Updated Nov 30, 2023Nov 30, 2023
    • 0000Updated Oct 28, 2023Oct 28, 2023
    • Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
      Jupyter Notebook
      Apache License 2.0
      52310Updated Sep 28, 2023Sep 28, 2023
    • 💡 The web app CI/CD for aisafetyideas.com
      Svelte
      35221Updated Sep 25, 2023Sep 25, 2023
    • n2g

      Public archive
      Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
      Jupyter Notebook
      Apache License 2.0
      5100Updated Aug 9, 2023Aug 9, 2023
    • 🧠 Starter templates for doing interpretability research
      27500Updated Jul 16, 2023Jul 16, 2023
    • Cost-effectiveness models, tools, and results for various AI safety field-building programs.
      Python
      MIT License
      4200Updated Jul 15, 2023Jul 15, 2023
    • 🌍 Website template for academic papers
      JavaScript
      MIT License
      3400Updated Jun 9, 2023Jun 9, 2023
    • Interpretability Hackathon 2.0 entry
      Jupyter Notebook
      MIT License
      54210Updated Apr 28, 2023Apr 28, 2023
    • Uses ChatGPT to simulate a townhall discussion between avatars
      Python
      1000Updated Apr 3, 2023Apr 3, 2023
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.