llvm-obfus is an out-of-tree LLVM 21+ pass plugin for policy-driven IR obfuscation.
The project applies native LLVM IR transforms to selected functions. The main production entry point is obf-safe-pipeline, which composes virtualization, structural rewrites, string and constant protection, late indirect dispatch, and final artifact cleanup.
The design goal is simple: make static recovery materially harder while staying inside normal LLVM semantics. The project does not rely on malformed objects, inline-asm traps, EH spoofing, or target-specific parser breaks.
- Protection levels are
none,light,strong,vm, andstrong_vm. vmandstrong_vmlower selected functions into VM-backed execution paths.strong_vmimplementation bodies continue through later hardening stages, not just the public wrapper.- MBA rewriting owns arithmetic identity diversification across
add,sub,xor, andmul, both directly and as part of other transforms such as constant reconstruction and opaque predicates. - Shape families include linear identities (
x ^ y = (x | y) - (x & y)), affine wrappers (Encode(x) = a*x + bwith odd modular multiplier), polynomial zero terms (depth 3+), and constant-multiplication decomposition. - Entropy thunk interfaces are diversified per function across packed scalar, aggregate pair, and out-parameter forms, and first-hop entropy mixing uses per-site
xor,mul_add,rotate_xor, orbit_splitselection before MBA shape builders run. - A private
BudgetTrackerenforces a per-expression IR-instruction cap derived frommba.depth; when the budget is exhausted mid-expansion the engine falls back to the plain LLVM binary operation. instruction_substitutionstays focused on distinct logical rewrites such as boolean identity transformations instead of duplicating MBA arithmetic forms.
indirect_dispatchis a late pass in the safe pipeline.- It rewrites supported conditional branches and switch dispatch sites into per-site masked
blockaddressplus arithmetic plusindirectbrsequences. - Each dispatch site derives its masking material from the protected function seed and site index.
- The implementation reconstructs targets from same-function deltas in SSA instead of emitting absolute dispatch tables in globals.
- This pass does not use the authenticated BLAKE2s runtime used by strings and constant pools.
- Unsupported shapes are skipped conservatively: EH personalities, EH pads,
invoke,callbr, existingindirectbr,catchswitch,catchreturn,cleanupreturn,resume,musttail, and non-integral program address spaces.
- String encoding is configured under
string_encoding. authenticated_modeenables the keyed and integrity-checked runtime decode path.- The runtime support lives in
runtime/string_auth_runtime.cand handles keyed string and constant-pool recovery. - Lazy decode, eager decode, constructor fallback, and forwarded-pointer cases are handled in the transform.
- Constant encoding modes are
off,mba_inline,keyed_pool,auto, andall. mba_inlinereconstructs constants directly in IR.keyed_poolmoves constants into keyed, integrity-checked pools recovered at use sites.autochooses a strategy per use site.
- The top-level
seedis the root build input. Function-selective passes such asindirect_dispatchderive per-function seeds from the module name, function name, and top-level seed; the keyed string and keyed-pool runtime currently uses the top-level seed directly. authenticated_modeandkeyed_pooluse a domain-separated BLAKE2s schedule implemented ininclude/obf/support/auth_encoding.h.- The schedule is
build_key(seed)->function_key(module_id, function_id)-> per-site or per-pool key -> labeledencandmacsubkeys. - Authenticated strings derive distinct keys from descriptor metadata including
module_id, a derivedfunction_id, andsite_id. Keyed constant pools derive distinct keys frommodule_idandpool_id. - Authentication uses a keyed BLAKE2s tag over descriptor metadata plus ciphertext, and encryption uses a BLAKE2s-derived XOR keystream with a derived nonce. It does not use AES, ChaCha20, HMAC, or SipHash.
- The emitted artifacts store the 32-byte
build_keyin internal globals and reconstruct derived keys at runtime from descriptor metadata. This is an embedded-key, self-contained runtime: no hardware token, remote service, white-box key split, or entropy-anchor binding is involved. - Integrity verification is fail-closed: descriptor mismatches, tag mismatches, and length mismatches trap in the runtime instead of returning tampered plaintext.
runtime/entropy_anchor.csupports opaque arithmetic and MBA-style transforms; it is separate from the keyed string and constant-pool key schedule. It exposes five deterministic accessor variants (direct,stack_roundtrip,split_recombine,xor_neutral,add_sub_neutral) selected per function and salt.
- Public runtime ABI names are generated at build time in
build/include/obf/support/runtime_abi_generated.h. - The default public prefix is
rt_core_. - Final cleanup strips marker attributes, removes annotation metadata, anonymizes local/internal obfuscation artifacts, and strips local SSA names.
- Security gates can fail the build on leaked public
obfsymbols.
- YAML loading and config parsing live in
lib/frontend/. - Profiles are
fast,standard,guarded,fortress, andlab. - Profile defaults are applied first; explicit top-level YAML sections override them;
--obf-seedoverrides the final seed after config loading.
- Per-function feature extraction lives in
lib/analysis/. - Policy selection lives in
lib/policy/. - The pipeline is function-selective rather than blanket-on for the whole module.
- Core transforms live in
lib/transforms/. - VM lowering lives in
lib/vm/. - Pass registration and safe-pipeline orchestration live in
lib/plugin/.
runtime/entropy_anchor.cprovides the entropy anchor support object used by builds and tests.runtime/string_auth_runtime.cprovides keyed and integrity-checked decode support for strings and constant pools.
obf-safe-pipeline is the integrated pipeline used by the benchmarks and lit coverage. Its current high-level order is:
- entropy initialization
- VM lowering and call rewriting for
vm - VM lowering and call rewriting for
strong_vm - post-VM string encoding
- constant encoding
- opaque GEP
- instruction substitution for logical and boolean rewrites
- opaque predicates
- control flattening
- function outlining
- bogus control flow
- block splitting
- additional hardening on
strong_vmimplementation functions - CFG state cleanup
- indirect dispatch
- security gate enforcement
- artifact cleanup
The late ordering matters. Indirect dispatch runs after the major structural passes so it can rewrite the final dispatch-heavy CFG shapes, including VM implementation functions.
Top-level sections currently supported by the loader:
profileseeddefault_leveloverridestargetsblock_splitstring_encodingconstant_encodingmbaindirect_dispatchsecuritydebug_preserve_generated_names
overrides entries match exact function names; targets entries support glob-style wildcard patterns (e.g., "verify_*").
| Setting | fast |
standard |
guarded |
fortress |
lab |
|---|---|---|---|---|---|
mba.depth |
1 | 1 | 2 | 3 | 4 |
mba.enable_polynomial |
derived | derived | derived | derived | true |
mba.enable_multiplication |
derived | derived | derived | derived | true |
mba.max_ir_instructions |
derived | derived | derived | derived | 320 |
block_split.max_splits_per_function |
1 | 1 | 2 | 4 | 8 |
string_encoding.min_string_length |
3 | 2 | 2 | 1 | 1 |
string_encoding.max_strings_per_module |
32 | 128 | 256 | 512 | 1024 |
string_encoding.prefer_lazy_decode |
true | true | true | false | false |
string_encoding.allow_ctor_fallback |
true | true | false | false | false |
constant_encoding.max_constants_per_function |
2 | 4 | 8 | 16 | 32 |
security.fail_on_public_obf_symbol |
false | true | true | true | true |
All profiles default to authenticated_mode: false, indirect_dispatch.enabled: false, min_instructions_per_block: 2 (fortress and lab use 1), min_bit_width: 8, default_level: none, and constant_encoding.mode: mba_inline. MBA override fields (enable_polynomial, enable_multiplication, max_ir_instructions) are absent by default and derived from mba.depth: polynomial and multiplication families enable at depth 3+, and the IR-instruction budget scales with depth (64 at depth 1, 128 at depth 2, 192 at depth 3, 256 at depth 4). Explicit top-level YAML keys override profile defaults.
Protection levels can be set directly in source using LLVM's annotate attribute. The annotation value must be "obf:<level>" where <level> is one of none, light, strong, vm, or strong_vm.
__attribute__((annotate("obf:strong_vm")))
void sensitive_routine(void) { ... }Annotations take precedence below explicit overrides entries but above targets rule matching. The automatic security floor applies independently and may raise the level further.
Minimal example:
profile: fortress
seed: 20260601
default_level: none
targets:
- match: "verify_*"
level: strong_vm
- match: "license_*"
level: strong_vm
string_encoding:
authenticated_mode: true
prefer_lazy_decode: true
allow_ctor_fallback: false
constant_encoding:
mode: auto
max_constants_per_function: 8
min_bit_width: 8
mba:
depth: 3
enable_polynomial: true
enable_multiplication: true
max_ir_instructions: 320
indirect_dispatch:
enabled: true
max_sites_per_function: 4
max_switch_targets: 8
target_vm_dispatchers: true
target_flattened_headers: true
security:
fail_on_public_obf_symbol: true
strip_release_markers: trueRequirements:
- CMake 3.24+
- C++23 compiler
- LLVM 21+
- Python 3
lit- LLVM tools:
opt,clang,clang++,llvm-link,llc,llvm-strip,llvm-nm,llvm-objdump - Optional:
stringsfor benchmark string audits
Configure and build:
cmake -S . -B build -DLLVM_DIR="$(llvm-config --cmakedir)"
cmake --build buildUseful cache variables:
OBF_BENCHMARK_SEEDOBF_RUNTIME_ABI_PREFIXOBF_BENCHMARK_CLEAN_IROBF_BENCHMARK_CLEANUP_PASSES
Feature report:
obf-feature-reportis read-only and emitsobf.feature_report.v3JSON with per-function policy decisions, per-transform strategy details, and MBA shape counters under thembapayload for functions that use MBA rewrites.
opt -load-pass-plugin build/obf_plugin.so \
--obf-config=config.yaml \
-passes=obf-feature-report \
-disable-output input.llPolicy audit:
obf-auditprints a policy-resolution table and can also writeobf.audit.v1JSON with--obf-audit-out.
opt -load-pass-plugin build/obf_plugin.so \
--obf-config=config.yaml \
--obf-audit-out=audit.json \
-passes=obf-audit \
-disable-output input.llFull safe pipeline:
opt -load-pass-plugin build/obf_plugin.so \
--obf-config=config.yaml \
-passes=obf-safe-pipeline \
-S input.ll -o output.llIsolated indirect dispatch:
opt -load-pass-plugin build/obf_plugin.so \
--obf-config=config.yaml \
-passes=obf-indirect-dispatch \
-S input.ll -o indirect.llOther standalone passes:
- Read-only/reporting:
obf-feature-report,obf-audit. - Transform stages:
obf-entropy-init,obf-vm,obf-block-split,obf-string-encode,obf-constant-encode,obf-opaque-gep,obf-instruction-substitute,obf-control-flatten,obf-function-outline,obf-opaque-preds,obf-bogus-cf,obf-indirect-dispatch,obf-cfg-state-cleanup, andobf-artifact-cleanup.
obf-driver currently loads a config and prints a summary. It is not a full compile driver.
Expand visual examples and analysis
These screenshots compare one baseline function with one obfuscated function from the license_demo benchmark.
- Baseline function:
FUN_004008f0fromlicense_demo.baseline - Obfuscated function:
FUN_00400510fromlicense_demo.obfuscated
What the baseline image shows:
- A compact, readable verification-style routine.
- Clear control flow (simple bounds and loop structure).
- Data-dependent operations that remain semantically recoverable in the decompiler.
What the obfuscated image shows:
- Large opaque arithmetic chains with mixed rotates/xors/add-masks.
- Decompiler warnings around jump-table recovery and indirect control transfer.
- Significantly reduced semantic readability despite valid executable behavior.
pseudocode comparison:
Baseline (license_demo.baseline / FUN_004008f0) |
Obfuscated (license_demo.obfuscated / FUN_00400510) |
|---|---|
![]() |
![]() |
Benchmark targets build paired baseline and obfuscated artifacts under build/benchmarks/<name>/. The benchmark build passes --obf-seed=${OBF_EFFECTIVE_BENCHMARK_SEED} to opt, so OBF_BENCHMARK_SEED controls the effective benchmark seed for the whole build tree even when a sample benchmark config contains its own seed: entry.
Build benchmark pairs:
cmake --build build --target obf-benchmarksPer-benchmark artifacts:
<name>.baseline.ll<name>.obfuscated.ll<name>.obfuscated.cleaned.llwhenOBF_BENCHMARK_CLEAN_IR=ON<name>.baseline<name>.obfuscated
Benchmark and analysis targets:
obf-benchmarksbuilds stripped baseline and obfuscated pairs for the full corpus.obf-benchmarks-miremits MIR snapshots for linked benchmark targets such aswpo_demo.obf-audit-benchmarksaudits stripped obfuscated benchmark binaries for leaked symbols and, whenstringsis available, residual strings.obf-re-harnessscores how much VM structure is recoverable from obfuscated benchmark IR and writesbuild/re-harness/vm_recovery.json.obf-seed-diversityverifies seed-driven IR diversity and writesbuild/diversity/diversity.json.
Current benchmark corpus:
license_democonfig_demovm_workflow_demowpo_demo
Measure keyed string decode overhead:
python tools/obf-bench/measure_string_auth_overhead.py --build-dir buildThe helper writes temporary inputs under build/string-auth-bench/ and reports lazy first-decode cost, lazy steady-state helper cost, and constructor startup impact.
Requested release sweep:
cmake --build build --target obf-benchmarks obf-seed-diversity obf-unit-tests
ctest --test-dir build --output-on-failure -R "obf-lit|obf-unit-tests"The lit suite covers 109 tests across MBA engine shapes, opaque predicates, bogus control flow, control flattening, opaque GEP, constant encoding (inline and keyed-pool), string encoding (lazy, eager, auth), indirect dispatch, VM lowering, VM handler and dispatcher polymorphism, seed determinism, safe pipeline ordering, security gates, and artifact cleanup. Every test passes opt -passes=verify, FileCheck, and lli execution validation. An InstCombine collapse audit confirms runtime entropy anchors prevent the simplifier from folding polynomial zeros or opaque predicates.
include/obf/ public headers
lib/analysis/ feature extraction
lib/frontend/ config loading and annotations
lib/plugin/ pass registration and pipeline wiring
lib/policy/ function-level policy selection
lib/report/ reporting
lib/transforms/ IR transforms
lib/vm/ VM lowering and dispatch
runtime/ runtime support objects
tests/lit/ lit coverage
tests/unit/ unit tests
benchmarks/ corpus, configs, and build targets
tools/ helper tools and scripts

