llvm-obfus

llvm-obfus is an out-of-tree LLVM 21+ pass plugin for policy-driven IR obfuscation.

The project applies native LLVM IR transforms to selected functions. The main production entry point is obf-safe-pipeline, which composes virtualization, structural rewrites, string and constant protection, late indirect dispatch, and final artifact cleanup.

The design goal is simple: make static recovery materially harder while staying inside normal LLVM semantics. The project does not rely on malformed objects, inline-asm traps, EH spoofing, or target-specific parser breaks.

Main Features

Strong Virtualization And MBA Flattening

Protection levels are none, light, strong, vm, and strong_vm.
vm and strong_vm lower selected functions into VM-backed execution paths.
strong_vm implementation bodies continue through later hardening stages, not just the public wrapper.
MBA rewriting owns arithmetic identity diversification across add, sub, xor, and mul, both directly and as part of other transforms such as constant reconstruction and opaque predicates.
Shape families include linear identities (x ^ y = (x | y) - (x & y)), affine wrappers (Encode(x) = a*x + b with odd modular multiplier), polynomial zero terms (depth 3+), and constant-multiplication decomposition.
Entropy thunk interfaces are diversified per function across packed scalar, aggregate pair, and out-parameter forms, and first-hop entropy mixing uses per-site xor, mul_add, rotate_xor, or bit_split selection before MBA shape builders run.
A private BudgetTracker enforces a per-expression IR-instruction cap derived from mba.depth; when the budget is exhausted mid-expansion the engine falls back to the plain LLVM binary operation.
instruction_substitution stays focused on distinct logical rewrites such as boolean identity transformations instead of duplicating MBA arithmetic forms.

Seeded Indirect Dispatch

indirect_dispatch is a late pass in the safe pipeline.
It rewrites supported conditional branches and switch dispatch sites into per-site masked blockaddress plus arithmetic plus indirectbr sequences.
Each dispatch site derives its masking material from the protected function seed and site index.
The implementation reconstructs targets from same-function deltas in SSA instead of emitting absolute dispatch tables in globals.
This pass does not use the authenticated BLAKE2s runtime used by strings and constant pools.
Unsupported shapes are skipped conservatively: EH personalities, EH pads, invoke, callbr, existing indirectbr, catchswitch, catchreturn, cleanupreturn, resume, musttail, and non-integral program address spaces.

Keyed And Integrity-Checked Runtime Strings

String encoding is configured under string_encoding.
authenticated_mode enables the keyed and integrity-checked runtime decode path.
The runtime support lives in runtime/string_auth_runtime.c and handles keyed string and constant-pool recovery.
Lazy decode, eager decode, constructor fallback, and forwarded-pointer cases are handled in the transform.

Constant Pooling

Constant encoding modes are off, mba_inline, keyed_pool, auto, and all.
mba_inline reconstructs constants directly in IR.
keyed_pool moves constants into keyed, integrity-checked pools recovered at use sites.
auto chooses a strategy per use site.

Seed And Key Derivation

The top-level seed is the root build input. Function-selective passes such as indirect_dispatch derive per-function seeds from the module name, function name, and top-level seed; the keyed string and keyed-pool runtime currently uses the top-level seed directly.
authenticated_mode and keyed_pool use a domain-separated BLAKE2s schedule implemented in include/obf/support/auth_encoding.h.
The schedule is build_key(seed) -> function_key(module_id, function_id) -> per-site or per-pool key -> labeled enc and mac subkeys.
Authenticated strings derive distinct keys from descriptor metadata including module_id, a derived function_id, and site_id. Keyed constant pools derive distinct keys from module_id and pool_id.
Authentication uses a keyed BLAKE2s tag over descriptor metadata plus ciphertext, and encryption uses a BLAKE2s-derived XOR keystream with a derived nonce. It does not use AES, ChaCha20, HMAC, or SipHash.
The emitted artifacts store the 32-byte build_key in internal globals and reconstruct derived keys at runtime from descriptor metadata. This is an embedded-key, self-contained runtime: no hardware token, remote service, white-box key split, or entropy-anchor binding is involved.
Integrity verification is fail-closed: descriptor mismatches, tag mismatches, and length mismatches trap in the runtime instead of returning tampered plaintext.
runtime/entropy_anchor.c supports opaque arithmetic and MBA-style transforms; it is separate from the keyed string and constant-pool key schedule. It exposes five deterministic accessor variants (direct, stack_roundtrip, split_recombine, xor_neutral, add_sub_neutral) selected per function and salt.

Stealth ABI And Artifact Cleanup

Public runtime ABI names are generated at build time in build/include/obf/support/runtime_abi_generated.h.
The default public prefix is rt_core_.
Final cleanup strips marker attributes, removes annotation metadata, anonymizes local/internal obfuscation artifacts, and strips local SSA names.
Security gates can fail the build on leaked public obf symbols.

Architecture

Frontend

YAML loading and config parsing live in lib/frontend/.
Profiles are fast, standard, guarded, fortress, and lab.
Profile defaults are applied first; explicit top-level YAML sections override them; --obf-seed overrides the final seed after config loading.

Analysis And Policy

Per-function feature extraction lives in lib/analysis/.
Policy selection lives in lib/policy/.
The pipeline is function-selective rather than blanket-on for the whole module.

Transforms

Core transforms live in lib/transforms/.
VM lowering lives in lib/vm/.
Pass registration and safe-pipeline orchestration live in lib/plugin/.

Runtime

runtime/entropy_anchor.c provides the entropy anchor support object used by builds and tests.
runtime/string_auth_runtime.c provides keyed and integrity-checked decode support for strings and constant pools.

Safe Pipeline Order

obf-safe-pipeline is the integrated pipeline used by the benchmarks and lit coverage. Its current high-level order is:

entropy initialization
VM lowering and call rewriting for vm
VM lowering and call rewriting for strong_vm
post-VM string encoding
constant encoding
opaque GEP
instruction substitution for logical and boolean rewrites
opaque predicates
control flattening
function outlining
bogus control flow
block splitting
additional hardening on strong_vm implementation functions
CFG state cleanup
indirect dispatch
security gate enforcement
artifact cleanup

The late ordering matters. Indirect dispatch runs after the major structural passes so it can rewrite the final dispatch-heavy CFG shapes, including VM implementation functions.

Configuration

Top-level sections currently supported by the loader:

profile
seed
default_level
overrides
targets
block_split
string_encoding
constant_encoding
mba
indirect_dispatch
security
debug_preserve_generated_names

overrides entries match exact function names; targets entries support glob-style wildcard patterns (e.g., "verify_*").

Profile Defaults

Setting	`fast`	`standard`	`guarded`	`fortress`	`lab`
`mba.depth`	1	1	2	3	4
`mba.enable_polynomial`	derived	derived	derived	derived	true
`mba.enable_multiplication`	derived	derived	derived	derived	true
`mba.max_ir_instructions`	derived	derived	derived	derived	320
`block_split.max_splits_per_function`	1	1	2	4	8
`string_encoding.min_string_length`	3	2	2	1	1
`string_encoding.max_strings_per_module`	32	128	256	512	1024
`string_encoding.prefer_lazy_decode`	true	true	true	false	false
`string_encoding.allow_ctor_fallback`	true	true	false	false	false
`constant_encoding.max_constants_per_function`	2	4	8	16	32
`security.fail_on_public_obf_symbol`	false	true	true	true	true

All profiles default to authenticated_mode: false, indirect_dispatch.enabled: false, min_instructions_per_block: 2 (fortress and lab use 1), min_bit_width: 8, default_level: none, and constant_encoding.mode: mba_inline. MBA override fields (enable_polynomial, enable_multiplication, max_ir_instructions) are absent by default and derived from mba.depth: polynomial and multiplication families enable at depth 3+, and the IR-instruction budget scales with depth (64 at depth 1, 128 at depth 2, 192 at depth 3, 256 at depth 4). Explicit top-level YAML keys override profile defaults.

Per-Function Annotations

Protection levels can be set directly in source using LLVM's annotate attribute. The annotation value must be "obf:<level>" where <level> is one of none, light, strong, vm, or strong_vm.

__attribute__((annotate("obf:strong_vm")))
void sensitive_routine(void) { ... }

Annotations take precedence below explicit overrides entries but above targets rule matching. The automatic security floor applies independently and may raise the level further.

Minimal example:

profile: fortress
seed: 20260601
default_level: none

targets:
  - match: "verify_*"
    level: strong_vm
  - match: "license_*"
    level: strong_vm

string_encoding:
  authenticated_mode: true
  prefer_lazy_decode: true
  allow_ctor_fallback: false

constant_encoding:
  mode: auto
  max_constants_per_function: 8
  min_bit_width: 8

mba:
  depth: 3
  enable_polynomial: true
  enable_multiplication: true
  max_ir_instructions: 320

indirect_dispatch:
  enabled: true
  max_sites_per_function: 4
  max_switch_targets: 8
  target_vm_dispatchers: true
  target_flattened_headers: true

security:
  fail_on_public_obf_symbol: true
  strip_release_markers: true

Build

Requirements:

CMake 3.24+
C++23 compiler
LLVM 21+
Python 3
lit
LLVM tools: opt, clang, clang++, llvm-link, llc, llvm-strip, llvm-nm, llvm-objdump
Optional: strings for benchmark string audits

Configure and build:

cmake -S . -B build -DLLVM_DIR="$(llvm-config --cmakedir)"
cmake --build build

Useful cache variables:

OBF_BENCHMARK_SEED
OBF_RUNTIME_ABI_PREFIX
OBF_BENCHMARK_CLEAN_IR
OBF_BENCHMARK_CLEANUP_PASSES

Usage

Feature report:

obf-feature-report is read-only and emits obf.feature_report.v3 JSON with per-function policy decisions, per-transform strategy details, and MBA shape counters under the mba payload for functions that use MBA rewrites.

opt -load-pass-plugin build/obf_plugin.so \
  --obf-config=config.yaml \
  -passes=obf-feature-report \
  -disable-output input.ll

Policy audit:

obf-audit prints a policy-resolution table and can also write obf.audit.v1 JSON with --obf-audit-out.

opt -load-pass-plugin build/obf_plugin.so \
  --obf-config=config.yaml \
  --obf-audit-out=audit.json \
  -passes=obf-audit \
  -disable-output input.ll

Full safe pipeline:

opt -load-pass-plugin build/obf_plugin.so \
  --obf-config=config.yaml \
  -passes=obf-safe-pipeline \
  -S input.ll -o output.ll

Isolated indirect dispatch:

opt -load-pass-plugin build/obf_plugin.so \
  --obf-config=config.yaml \
  -passes=obf-indirect-dispatch \
  -S input.ll -o indirect.ll

Other standalone passes:

Read-only/reporting: obf-feature-report, obf-audit.
Transform stages: obf-entropy-init, obf-vm, obf-block-split, obf-string-encode, obf-constant-encode, obf-opaque-gep, obf-instruction-substitute, obf-control-flatten, obf-function-outline, obf-opaque-preds, obf-bogus-cf, obf-indirect-dispatch, obf-cfg-state-cleanup, and obf-artifact-cleanup.

obf-driver currently loads a config and prints a summary. It is not a full compile driver.

Visual Examples (Ghidra)

Expand visual examples and analysis

These screenshots compare one baseline function with one obfuscated function from the license_demo benchmark.

Baseline function: FUN_004008f0 from license_demo.baseline
Obfuscated function: FUN_00400510 from license_demo.obfuscated

What the baseline image shows:

A compact, readable verification-style routine.
Clear control flow (simple bounds and loop structure).
Data-dependent operations that remain semantically recoverable in the decompiler.

What the obfuscated image shows:

Large opaque arithmetic chains with mixed rotates/xors/add-masks.
Decompiler warnings around jump-table recovery and indirect control transfer.
Significantly reduced semantic readability despite valid executable behavior.

pseudocode comparison:

Baseline (`license_demo.baseline` / `FUN_004008f0`)	Obfuscated (`license_demo.obfuscated` / `FUN_00400510`)

Benchmarks

Benchmark targets build paired baseline and obfuscated artifacts under build/benchmarks/<name>/. The benchmark build passes --obf-seed=${OBF_EFFECTIVE_BENCHMARK_SEED} to opt, so OBF_BENCHMARK_SEED controls the effective benchmark seed for the whole build tree even when a sample benchmark config contains its own seed: entry.

Build benchmark pairs:

cmake --build build --target obf-benchmarks

Per-benchmark artifacts:

<name>.baseline.ll
<name>.obfuscated.ll
<name>.obfuscated.cleaned.ll when OBF_BENCHMARK_CLEAN_IR=ON
<name>.baseline
<name>.obfuscated

Benchmark and analysis targets:

obf-benchmarks builds stripped baseline and obfuscated pairs for the full corpus.
obf-benchmarks-mir emits MIR snapshots for linked benchmark targets such as wpo_demo.
obf-audit-benchmarks audits stripped obfuscated benchmark binaries for leaked symbols and, when strings is available, residual strings.
obf-re-harness scores how much VM structure is recoverable from obfuscated benchmark IR and writes build/re-harness/vm_recovery.json.
obf-seed-diversity verifies seed-driven IR diversity and writes build/diversity/diversity.json.

Current benchmark corpus:

license_demo
config_demo
vm_workflow_demo
wpo_demo

Measure keyed string decode overhead:

python tools/obf-bench/measure_string_auth_overhead.py --build-dir build

The helper writes temporary inputs under build/string-auth-bench/ and reports lazy first-decode cost, lazy steady-state helper cost, and constructor startup impact.

Verification

Requested release sweep:

cmake --build build --target obf-benchmarks obf-seed-diversity obf-unit-tests
ctest --test-dir build --output-on-failure -R "obf-lit|obf-unit-tests"

The lit suite covers 109 tests across MBA engine shapes, opaque predicates, bogus control flow, control flattening, opaque GEP, constant encoding (inline and keyed-pool), string encoding (lazy, eager, auth), indirect dispatch, VM lowering, VM handler and dispatcher polymorphism, seed determinism, safe pipeline ordering, security gates, and artifact cleanup. Every test passes opt -passes=verify, FileCheck, and lli execution validation. An InstCombine collapse audit confirms runtime entropy anchors prevent the simplifier from folding polynomial zeros or opaque predicates.

Repository Layout

include/obf/       public headers
lib/analysis/      feature extraction
lib/frontend/      config loading and annotations
lib/plugin/        pass registration and pipeline wiring
lib/policy/        function-level policy selection
lib/report/        reporting
lib/transforms/    IR transforms
lib/vm/            VM lowering and dispatch
runtime/           runtime support objects
tests/lit/         lit coverage
tests/unit/        unit tests
benchmarks/        corpus, configs, and build targets
tools/             helper tools and scripts

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.github		.github
benchmarks		benchmarks
images		images
include/obf		include/obf
lib		lib
runtime		runtime
tests		tests
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
verify_opcodes.py		verify_opcodes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llvm-obfus

Main Features

Strong Virtualization And MBA Flattening

Seeded Indirect Dispatch

Keyed And Integrity-Checked Runtime Strings

Constant Pooling

Seed And Key Derivation

Stealth ABI And Artifact Cleanup

Architecture

Frontend

Analysis And Policy

Transforms

Runtime

Safe Pipeline Order

Configuration

Profile Defaults

Per-Function Annotations

Build

Usage

Visual Examples (Ghidra)

Benchmarks

Verification

Repository Layout

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

llvm-obfus

Main Features

Strong Virtualization And MBA Flattening

Seeded Indirect Dispatch

Keyed And Integrity-Checked Runtime Strings

Constant Pooling

Seed And Key Derivation

Stealth ABI And Artifact Cleanup

Architecture

Frontend

Analysis And Policy

Transforms

Runtime

Safe Pipeline Order

Configuration

Profile Defaults

Per-Function Annotations

Build

Usage

Visual Examples (Ghidra)

Benchmarks

Verification

Repository Layout

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages