A Yazi-like terminal file explorer built in Modern C++ with a persistent LMDB-backed search engine.
- Project Overview
- Architecture
- Dependencies
- Installation & Build
- Configuration
- Features Deep Dive
- Indexing Architecture Deep Dive
- Complexity Analysis
- Testing
- Project Structure
- Troubleshooting & FAQs
- Future Roadmap
Refined Explorer is a high-performance, terminal-based file manager written in C++17. It is inspired by Yazi and Ranger but goes further by embedding a persistent, full-text search engine directly into the file explorer — powered by LMDB (Lightning Memory-Mapped Database).
| Feature | Standard TUI Explorers | Refined Explorer |
|---|---|---|
| Navigation | ✅ | ✅ |
| File Operations | ✅ | ✅ |
| Filename Search | Basic (find) |
✅ Recursive + Filtered |
| Content Search | ❌ Relies on grep |
✅ Persistent inverted index (LMDB) |
| Real-time Index Updates | ❌ | ✅ FSEvents (macOS) / inotify (Linux) |
| Search Speed | ~300-800ms (grep -R) |
~5-20ms (indexed) |
The application has a clean layered architecture:
┌────────────────────────────────────────────────────┐
│ Terminal UI Layer │
│ (3 panels: metadata | file list | preview) │
│ src/tui/ │
└──────────────────────┬─────────────────────────────┘
│ keypresses / commands
┌──────────────────────▼─────────────────────────────┐
│ Application Core Layer │
│ Navigation · Commands · Selection · Dir Cache │
│ src/core/ │
└────────┬──────────────────────────┬────────────────┘
│ │
┌────────▼─────────┐ ┌────────────▼────────────────┐
│ Utility Layer │ │ Indexing & Search Layer │
│ text_utils.cpp │ │ inverted_index.cpp │
│ format.cpp │ │ traversal.cpp │
│ file_utils.cpp │ │ watcher_mac/linux.cpp │
│ src/utils/ │ │ src/index/ │
└──────────────────┘ └─────────────────────────────┘
│
┌───────────▼──────────────┐
│ LMDB on-disk Database │
│ ~/.cache/refined-explorer│
└──────────────────────────┘
- User launches the app →
main.cpprunsloadConfig(),startIndexing(),initializeNavigation(), thennavigate(). - Background thread starts crawling the
indexingRoot, tokenizing files, and persisting them into LMDB. - Watcher thread listens for real-time filesystem events (file created/deleted/renamed/modified) and re-indexes changed files.
- User navigates → the UI redraws from the directory cache or a fresh filesystem scan.
- User searches → the query is tokenized, word IDs are looked up in LMDB, inode sets are intersected, and paths are resolved.
| Dependency | Version | Purpose |
|---|---|---|
| C++ Standard | C++17 | std::filesystem, structured bindings, if constexpr |
| POSIX | Standard | stat, opendir, readdir, fcntl, signal, ioctl |
| pthreads | POSIX | std::thread background workers |
| Library | Install | Purpose |
|---|---|---|
| LMDB | brew install lmdb (macOS) / apt install liblmdb-dev (Linux) |
Persistent B+ Tree key-value store for the inverted index |
| CoreServices (macOS only) | Included in Xcode | FSEvents API for real-time filesystem monitoring |
| inotify (Linux only) | Built into Linux kernel | Real-time filesystem event monitoring |
| Tool | Version | Purpose |
|---|---|---|
| CMake | 3.10+ | Cross-platform build system |
| Compiler | Apple Clang (macOS) / GCC 11+ (Linux) | C++17 compilation |
| GoogleTest | v1.14.0 | Auto-fetched via FetchContent for unit testing |
Important
On macOS, you must use Apple Clang (not Homebrew GCC). Apple's CoreServices.framework uses "Blocks" syntax (^) which only Clang supports. Set your compiler with CC=clang CXX=clang++ cmake ..
macOS:
brew install lmdb cmake
# Apple Clang is already available via Xcode Command Line Tools
xcode-select --installUbuntu/Debian:
sudo apt-get update
sudo apt-get install -y liblmdb-dev cmake g++ build-essential# Clone the repo
git clone https://github.com/your-username/refined-explorer.git
cd refined-explorer
# Create build directory
mkdir build && cd build
# Configure (macOS - use Apple Clang explicitly)
CC=clang CXX=clang++ cmake ..
# Build with parallel jobs (adjust -j to your CPU core count)
cmake --build . -j$(nproc)The build outputs two binaries in build/:
refined_explorer— the main applicationtests_runner— the automated test suite
# Start from current directory
./build/refined_explorer
# Start from a specific path
./build/refined_explorer /Users/you/DocumentsNote
On first run, LMDB opens at ~/.cache/refined-explorer/lmdb/. The background indexer begins crawling the indexingRoot defined in config.yml. This is a one-time crawl; subsequent launches re-use the persistent index and only re-index changed files.
Create or edit config.yml in the project root:
performance:
workers: 5 # Thread count for multi-threaded folder size calculation
indexing: true # Enable/disable the background LMDB indexer
indexing_root: /Users/you/Developer # Root directory to index| Key | Type | Default | Description |
|---|---|---|---|
performance.workers |
int | 4 | Threads for getFolderSizeMT() (directory size computation) |
performance.indexing |
bool | false | Master switch for LMDB indexing |
performance.indexing_root |
string | $HOME |
The root directory the indexer crawls |
The config is parsed at startup in system.cpp:loadConfig(). If the file is missing, safe defaults are used.
The UI is divided into three columns rendered with ANSI escape codes directly to stdout, with no ncurses dependency:
┌──────────────────┬────────────────────┬──────────────────────┐
│ LEFT PANEL │ MIDDLE PANEL │ RIGHT PANEL │
│ File Metadata │ File List │ Preview │
│ ─────────────── │ ─────────────── │ ─────────────── │
│ Name: main.cpp │ > src/ │ #include "..." │
│ Size: 712 B │ include/ │ int main() { │
│ User: varalika │ tests/ │ loadConfig(); │
│ Perms: rwxr-xr-x│ CMakeLists.txt │ startIndexing(); │
│ Modified: ... │ README.md │ navigate(); │
└──────────────────┴────────────────────┴──────────────────────┘
- Left Panel: Populated by
file_details.cpp. Callsstat()for file metadata (size, timestamps, permissions),getpwuid()for username,getgrgid()for group name. - Middle Panel: Rendered from
app.nav.fileList(the in-memory directory listing from the cache). - Right Panel: Calls
isBinaryFile()(reads first 512 bytes), then renders file content or directory listing. - Terminal Resize:
SIGWINCHsignal is registered tohandleResize(). The handler sets a flag, and the main loop callshandleResizeIfNeeded()which re-queries terminal size withioctl(TIOCGWINSZ)and redraws.
Navigation state is managed in NavigatorState (from myheader.h):
struct NavigatorState {
std::string currPath; // Current directory being viewed
std::string prevPath; // Previously visited directory
std::vector<std::string> fileList; // Visible files in currPath
int xcurr = 1; // Cursor row on screen (1-indexed)
int up_screen = 0; // Index of first visible file
int down_screen = 0; // Count of files scrolled past bottom
std::stack<NavState> backStack; // History for ← navigation
std::stack<NavState> forwardStack; // History for redo
};Cursor & Scroll Algorithm (navigator.cpp:normalizeRange):
The cursor is clamped to always be within [1, min(rowSize, visibleFiles)]. The scroll offset up_screen is clamped to [0, max(0, total - rowSize)].
if (up_screen < 0) up_screen = 0
if (up_screen > maxUp) up_screen = maxUp // maxUp = total - rowSize
if (xcurr < 1) xcurr = 1
if (xcurr > maxX) xcurr = maxX // maxX = min(rowSize, visible)
down_screen = max(0, total - up_screen - rowSize)
Keybindings:
| Key | Action | Implementation |
|---|---|---|
↑ |
Cursor up (scroll if at top of window) | navigation.cpp |
↓ |
Cursor down (scroll if at bottom of window) | navigation.cpp |
→ |
Enter directory / open file | navigation.cpp:handleEnterAction() |
← / Backspace |
Go to parent | Pops backStack |
:cd <path> |
Jump to absolute path | navigator.cpp:navigateToAbsolutePath() |
File: src/core/dir_cache.cpp
To avoid repeated filesystem syscalls (opendir/readdir) when entering the same directory multiple times, entries are cached in memory:
struct CacheState {
std::unordered_map<std::string, std::vector<std::string>> dirCache;
const int max_cache_entries = 1000000;
};Cache Lookup Logic (getDirectoryCount):
- Check if
dirCache[path]exists → if yes, return the cached listing immediately. O(1) average. - If not, call
fs::directory_iterator, collect and sort filenames alphabetically, store in cache, return listing. - For operations that change directory contents (create, delete, rename, paste),
invalidateDirCache(path)removes the stale entry so the next visit does a fresh scan.
Complexity: Cache hit → O(1). Cache miss → O(N log N) where N = number of files in directory (for sorting).
All operations are implemented in src/core/commands.cpp:
If selectedFiles is non-empty:
clipboard ← all paths in selectedFiles
Else:
clipboard ← [currPath/fileAtCursor]
- Uses
app.selection.clipboard(astd::vector<std::string>) - Complexity: O(S) where S = number of selected files
- Uses
std::filesystem::copywithoverwrite_existingflag. - Recursive directory copy via
copy_options::recursive. - Complexity: O(F) where F = total files being copied
- Uses
std::filesystem::remove_allfor full recursive deletion. - Clears the selection state after.
- Complexity: O(F) where F = total files being deleted
- Uses POSIX
rename()syscall — atomic on the same filesystem. - Complexity: O(1)
- Uses
std::ofstreamto create an empty file. - Complexity: O(1)
- Uses POSIX
mkdir()syscall with permissions0777. - Complexity: O(1)
The selection state is a std::unordered_set<std::string> of absolute file paths:
struct SelectionState {
std::vector<std::string> clipboard;
std::unordered_set<std::string> selectedFiles;
};| Key | Action |
|---|---|
Space |
Toggle current file in/out of selectedFiles |
u |
Clear all selections (selectedFiles.clear()) |
c / d |
Operate on all selectedFiles |
Toggle complexity: O(1) average (hash set insert/erase).
Press : to enter command mode. The input loop reads a line of text and calls processCommand(commandLine) in src/core/command_processor.cpp.
The processor splits the input into tokens and dispatches:
std::vector<std::string> args;
std::stringstream ss(commandLine);
while (ss >> word) args.push_back(word);
std::string command = args[0];Full Command Reference:
| Command | Example | Description |
|---|---|---|
:rename <new> |
:rename report_v2.md |
Rename the file under cursor |
:create_file <n> |
:create_file todo.txt |
Create a new empty file |
:create_dir <n> |
:create_dir projects |
Create a new directory |
:cd <path> |
:cd /Users/me/docs |
Jump to an absolute path (supports spaces) |
:search <q> |
:search readme |
Recursive filename search |
:search --file <q> |
:search --file config |
Search files only |
:search --dir <q> |
:search --dir test |
Search directories only |
:find <tokens> |
:find async lambda |
Content search via LMDB index (AND) |
:find --dir <tokens> |
:find --dir python script |
Content search, filtered to current dir |
:help |
:help |
Show in-app help |
:q |
:q |
Exit command mode |
:exit |
:exit |
Quit the application |
Note
:cd supports paths with spaces because the parser joins all tokens after cd with a space: for (int i = 2; i < args.size(); i++) absPath += " " + args[i];
File: src/core/search_engine.cpp, src/core/commands.cpp:searchCommand
This is a recursive directory crawl starting from app.nav.currPath. It does not use the index.
Process:
- Lowercase the query:
transform(..., ::tolower) - Call
searchAnything(path, filename, check_file, check_dir)which usesfs::recursive_directory_iterator. - Time the search and log it.
- Display results with
displaySearchResults().
Flags:
:search <q>→ searches both files and directories:search --file <q>→check_dir = false:search --dir <q>→check_file = false
Complexity: O(N) where N = total files and directories under current path (linear scan).
File: src/index/inverted_index.cpp:search()
This uses the persistent LMDB inverted index for fast multi-word AND semantic content search.
Step-by-step process:
Step 1 — Tokenize the query:
// Query: "async lambda move"
// After normalizeWord(): tokens = ["async", "lambda", "move"]Step 2 — Open a read-only LMDB transaction
Step 3 — For each token, look up its ID and collect matching inodes:
db_word2id["async"] → word_id = 42
db_inverted[42] → {ino=1001, ino=2040, ino=5500} // files containing "async"
db_word2id["lambda"] → word_id = 71
db_inverted[71] → {ino=2040, ino=5500} // files containing "lambda"
A fileCounter[ino]++ map counts how many query words each inode matches.
Step 4 — Intersect results (AND semantics):
int required = tokens.size(); // 2
for (auto& [ino, cnt] : fileCounter)
if (cnt == required) // only files matching ALL words
resolve_path(ino);Step 5 — Resolve inode → path (macOS):
// macOS-specific volfs path resolution
std::string volPath = "/.vol/" + rootDev + "/" + ino;
fcntl(fd, F_GETPATH, pathBuf); // kernel resolves inode to absolute pathComplexity Summary:
| Operation | Complexity | Note |
|---|---|---|
| Word ID lookup | O(log N) | LMDB B+ tree lookup |
| Inode set retrieval | O(K) | K = number of files with this word |
| Inode intersection | O(W × K) | W = words in query, K = avg posting list size |
| Path resolution | O(R) | R = matching results |
| Total search | ≈ O(W × K) | Sub-millisecond for typical queries |
The explorer tracks filesystem changes so the index stays fresh when you add, delete, or rename files.
Platform-specific implementation:
| Platform | API Used | File |
|---|---|---|
| macOS | FSEvents (Apple CoreServices) |
watcher_mac.cpp |
| Linux | inotify |
watcher_linux.cpp |
Why not kqueue on macOS?
kqueue requires one open file descriptor per watched directory. For large trees (100k+ directories), this hits OS FD limits instantly. FSEvents monitors entire directory subtrees with a single handle.
macOS FSEvents Flow:
FSEventStreamCreate()— registers callback for theindexingRoot.- Runs on a dedicated
CFRunLoopthread. - Events are pushed into
app.indexing.eventQueue(mutex-protected). - Background worker thread consumes events and calls
index.indexPath()orindex.removePath().
Event Types:
| Flag | Mapped To |
|---|---|
kFSEventStreamEventFlagItemCreated |
WatcherEventType::CREATE |
kFSEventStreamEventFlagItemRemoved |
WatcherEventType::DELETE |
kFSEventStreamEventFlagItemRenamed |
WatcherEventType::RENAME |
kFSEventStreamEventFlagItemModified |
WatcherEventType::MODIFY |
The index is stored in a single LMDB environment at ~/.cache/refined-explorer/lmdb/ with 5 named sub-databases:
db_files : ino (uint64) → mtime (uint64) [unique, INTEGERKEY]
db_word2id : word (str) → word_id (uint32) [unique]
db_id2word : word_id(uint32) → word (str) [unique, INTEGERKEY]
db_inverted : word_id(uint32) →+ ino (uint64) [DUPSORT | DUPFIXED]
db_forward : ino (uint64) →+ word_id(uint32) [DUPSORT | DUPFIXED]
The →+ notation means one key maps to multiple sorted values (DUPSORT).
Storing full path strings for every file was the main source of index bloat. Instead, we store inodes (the OS-assigned unique file ID):
- Each inode is a
uint64_t(8 bytes) vs a full path string (50-200+ bytes). - Inodes survive renames — if you rename a file, its inode doesn't change.
- On macOS, inodes are resolved to paths at query time using
/.vol/<dev>/<ino>andfcntl(F_GETPATH).
Every token from a filename or file content goes through normalizeWord():
Input: "Hello_World!"
↓ character filter (keep alnum + [@#_-$&])
"Hello_World" ← '!' is rejected → entire word rejected
Wait, the filter is stricter: if any character is disallowed, the entire word is rejected:
for (char c : word) {
if (!(isalnum(c) || c == '@' || ... ))
return ""; // reject entire word
}Then:
- Lowercased:
"hello_world" - Stopword check: removed if in the 100+ word stopword list
- Length limit: rejected if ≥ 128 characters
On every launch, the indexer does a differential crawl instead of a full re-index:
getLastSyncTime()reads the last indexed timestamp fromdb_files[ino=0](a sentinel metadata entry).- During traversal, each file's
mtimeis compared to its storedmtime. - Only new or modified files are re-indexed.
setLastSyncTime(now)updates the sentinel.
This makes subsequent startups very fast — only changed files are processed.
| Feature | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Directory listing (cache hit) | O(1) | O(N) cached | N = files in dir |
| Directory listing (cache miss) | O(N log N) | O(N) | N = files in dir (sort) |
| Cursor move Up/Down | O(1) | O(1) | Simple counter update |
:cd navigate absolute |
O(P) | O(P) | P = path segments |
:search (filename) |
O(N) | O(R) | N = all files in subtree |
:find (content, LMDB) |
O(W × K) | O(R) | W = words, K = avg posting size |
| Word ID lookup (LMDB) | O(log V) | — | V = vocab size, B+ tree |
| Add file to index | O(T × log V) | O(T) | T = tokens in file |
| Remove file from index | O(T × log V) | — | Scans forward index |
| Copy N files | O(F) | — | F = total bytes |
| Delete N files | O(F) | — | Recursive remove |
| Toggle selection | O(1) avg | — | Hash set |
| Binary file detection | O(1) | — | Reads first 512 bytes only |
Query: "async lambda" (in a project with 50,000 files)
grep -r "async" . | grep "lambda" → ~400ms (reads every file)
:find async lambda → ~8ms (LMDB O(log N) lookup)
The project uses GoogleTest (v1.14.0) for automated unit testing.
cd build
./tests_runner
# or with JSON output:
./tests_runner --gtest_output=json:test_results.jsonTests also run automatically after every cmake --build . via the POST_BUILD CMake hook:
add_custom_command(TARGET tests_runner POST_BUILD
COMMAND tests_runner --gtest_output=json:test_results.json
COMMAND ${CMAKE_COMMAND} -E remove_directory "${CMAKE_SOURCE_DIR}/tests/dummy"
COMMENT "Running automated tests and cleaning up..."
)| Module | File | Cases | What's Tested |
|---|---|---|---|
| TextUtils | test_utils.cpp |
24 | normalizeWord: symbols, stopwords, case, length limits |
| FormatUtils | test_utils.cpp |
13 | humanReadableSize (B→TB), truncateStr edge cases |
| FileUtils | test_file_utils.cpp |
12 | isReadable, isBinaryFile, isDirectory, permissions |
| Navigation | test_navigation.cpp |
13 | normalizeRange, scrollToIndex, isUnderCurrentDir |
| DirCache | test_dir_cache.cpp |
6 | Cache hit/miss, invalidateCache, sorted order |
| Commands | test_commands.cpp |
15 | cd with spaces, create/rename/delete, paste collision |
| Search | test_search.cpp |
6 | Partial match, case insensitive, --file/--dir flags |
| InvertedIndex | test_index.cpp |
12 | LMDB open/close, multi-word AND, persistence, word limits |
| Config | test_config.cpp |
3 | Default values, toggle indexing, worker count |
| Selection | test_selection.cpp |
6 | Single/multi select, clear, clipboard update |
| TOTAL | 110 | 100% Pass Rate ✅ |
Tests use an isolated tests/dummy/ directory created and filled at test start. This directory is automatically deleted after tests complete (via the CMake POST_BUILD hook), so it never pollutes the workspace.
refined-explorer/
├── CMakeLists.txt # Build system: core_lib, refined_explorer, tests_runner
├── config.yml # Runtime configuration (workers, indexing root)
├── main.cpp # Entry point: loadConfig → startIndexing → navigate
├── include/
│ └── myheader.h # Single project-wide header: all structs, enums, extern declarations
├── src/
│ ├── core/ # Application logic
│ │ ├── command_processor.cpp # Parses & dispatches : commands
│ │ ├── commands.cpp # copy, paste, delete, rename, search
│ │ ├── dir_cache.cpp # Directory listing cache (unordered_map)
│ │ ├── file_details.cpp # Left panel: stat(), getpwuid(), getgrgid()
│ │ ├── file_utils.cpp # isDirectory, isRegularFile, isReadable, isBinaryFile
│ │ ├── navigation.cpp # Key event loop, scroll logic, enter/back
│ │ ├── navigation_init.cpp # Builds backStack from startup path
│ │ ├── navigator.cpp # normalizeRange, scrollToIndex, navigateToAbsolutePath
│ │ ├── search_engine.cpp # :search recursive crawl
│ │ └── system.cpp # loadConfig, signal handlers, getFolderSizeMT
│ ├── index/ # Indexing engine
│ │ ├── inverted_index.cpp # LMDB: open/close/indexPath/removePath/search
│ │ ├── index_runner.cpp # startIndexing: opens LMDB, starts watcher + worker
│ │ ├── traversal.cpp # fd-based recursive directory traversal for crawl
│ │ ├── watcher_mac.cpp # FSEvents watcher (macOS)
│ │ └── watcher_linux.cpp # inotify watcher (Linux)
│ ├── tui/ # Terminal UI rendering
│ │ └── ... # render functions for 3 panels, status bar, help screen
│ └── utils/ # Shared utilities
│ ├── format.cpp # humanReadableSize, truncateStr
│ └── text_utils.cpp # normalizeWord, STOPWORDS set
├── tests/
│ ├── test_main.cpp # GTest environment setup
│ ├── stubs.cpp # Stubs for functions requiring a live terminal
│ ├── test_commands.cpp
│ ├── test_config.cpp
│ ├── test_dir_cache.cpp
│ ├── test_file_utils.cpp
│ ├── test_index.cpp
│ ├── test_navigation.cpp
│ ├── test_search.cpp
│ ├── test_selection.cpp
│ └── test_utils.cpp
├── build/ # CMake build artifacts (generated, not committed)
│ ├── refined_explorer # Main binary
│ ├── tests_runner # Test binary
│ └── test_results.json # Latest test report
├── logs/
│ └── debug.log # Runtime log output from logMessage()
├── README.md # Quick-start guide
├── DOCUMENTATION.md # This file
└── notes.md # Development journal
- Ensure
~/.cache/refined-explorer/is writable. - Run:
mkdir -p ~/.cache/refined-explorer/lmdb
- Check that
indexing: trueis set inconfig.yml. - Check
logs/debug.logfor "LMDB Inode-Index opened" — if absent, LMDB failed to open. - Wait ~5-30 seconds after launch for the initial crawl to complete.
- Try
:searchas a fallback — it doesn't need the index.
- The file's first 512 bytes contain a null byte or non-printable character.
isBinaryFile()uses this heuristic. This is expected for images,.pdf, compiled binaries, etc.
- Switch from Homebrew GCC to Apple Clang:
CC=clang CXX=clang++ cmake ..
- Simply rebuild:
cd build && cmake .. && cmake --build . -j8
| What | Size | Needed? |
|---|---|---|
_deps/ (GoogleTest source) |
~23 MB | Only for re-building tests |
CMakeFiles/ (object files) |
~7.6 MB | Only for incremental builds |
lib/ (compiled static libs) |
~2.2 MB | Only for linking |
Binaries (refined_explorer, tests_runner) |
~2.5 MB | Yes, to run |
You can safely delete everything except the binaries and test_results.json if you're not doing active development.
Instead of iterating through posting lists with a hash counter, use SIMD (AVX2/NEON) instructions to do bitwise AND across compressed inode bitsets — targeting sub-millisecond query latency for queries spanning millions of files.
Integrate Poppler (PDF) or a lightweight XML parser (DOCX is ZIP+XML) to extract text content and tokenize it for indexing. This turns the explorer into a universal document search engine.
Instead of storing raw 8-byte inodes, store the delta between consecutive sorted inodes:
Full: [1001, 1006, 1015, 1100]
Delta: [1001, 5, 9, 85]
Small deltas → better compression → smaller index. Especially effective when inodes are clustered.
Build a Trie over the word dictionary:
root
├── a
│ └── s
│ └── y
│ └── n
│ └── c ← word_id=42 (isEnd=true)
└── l
└── ...
A prefix query like "asy" traverses the trie and collects all word_ids below that node. This enables instant search-as-you-type autocomplete in command mode.
Currently limited to a single indexingRoot. Future: support multiple roots with separate LMDB environments or a partition table.
Documentation auto-generated from source code at version: April 2026.