feat(tools): add boosttools - 8x faster cluster-to-cluster sync tool by uk0 · Pull Request #35130 · taosdata/TDengine

uk0 · 2026-04-14T03:34:37Z

Summary

Add boosttools, a high-performance TDengine cluster-to-cluster data sync tool implemented in C, using the native taos_fetch_raw_block() + taos_write_raw_block() API for zero-copy block transfer.

Solves the problem where taosdump hangs/crashes when exporting 600K+ child tables and is extremely slow for 100GB+ datasets.

Test Environment

Item	Detail
OS	Ubuntu 22.04.5 LTS (kernel 5.15.0-174)
CPU	32 cores (x86_64)
Memory	64GB
Disk	HDD (mechanical disk)
Network	localhost loopback (eliminates network variable)
TDengine	3.4.1.0.alpha (built from source, community edition)
Instance A (source)	port 6030, 16 vnode query threads, 8 fetch threads
Instance B (destination)	port 6130, same config

Note: Tests were run on mechanical HDD, not SSD. boosttools bypasses disk entirely with in-memory transfer, making it less sensitive to disk speed. On SSD, taosdump would improve but boosttools would still maintain 5-6x advantage.

Test Dataset

Benchmark dataset (benchdb):

Metric	Value
Supertable	`meters` (8 data columns + 3 tag columns)
Child tables	10,000
Rows per table	1,750
Total rows	17,500,000
Data columns	`voltage`(FLOAT), `current`(FLOAT), `power`(FLOAT), `frequency`(INT), `temperature`(FLOAT), `humidity`(FLOAT), `status`(INT), `location`(BINARY-32)
Tag columns	`group_id`(INT), `region`(BINARY-16), `device_type`(BINARY-16)
Raw data size	~1.6 GB on disk
Generated by	`taosBenchmark` with 16 threads

Full-scale stress dataset (testdb): 600,000 child tables, 1.05 billion rows, 95 GB on disk.

Performance Benchmark (HDD)

taosdump vs boosttools — 10K tables, 17.5M rows

Metric	taosdump	boosttools	Improvement
Total time	336s (5.6 min)	42s	8.0x faster
Throughput	52,083 rows/s	416,666 rows/s	8.0x
Peak throughput	—	428,346 rows/s	—
Tables synced	10,000 ✅	10,000 ✅	100% match
Rows synced	17,500,000 ✅	17,500,000 ✅	100% match
Intermediate disk I/O	704 MB (Avro on HDD)	0 bytes	No temp files
Errors	0	0	—

Real-time progress (boosttools):

[03:28:26]  1,794/10,000 (17.9%)  285,409 rows/s  19.6 MB/s
[03:28:36]  4,190/10,000 (41.9%)  349,167 rows/s  24.0 MB/s
[03:28:41]  6,364/10,000 (63.6%)  428,346 rows/s  29.4 MB/s  ← peak
[03:28:51]  8,694/10,000 (86.9%)  422,625 rows/s  29.0 MB/s
[03:28:56] 10,000/10,000 (100%)   426,829 rows/s  29.3 MB/s  ✅ done

Disk Type Impact

Factor	taosdump (HDD)	boosttools (HDD)	SSD projection
Disk reads	Slow (~150 MB/s seq)	None	taosdump improves
Avro dump writes	Slow (random I/O)	None	taosdump improves
Avro read-back	Slow	None	taosdump improves
Expected speedup	baseline	8.0x	~5–6x (taosdump benefits more from SSD)

boosttools advantage grows on slower storage because it has zero disk dependency.

100GB Projection (HDD)

Scenario	taosdump	boosttools
100 GB localhost	~5.8 h (often hangs at 600K tables)	~44 min
100 GB 1 Gbps LAN	~8+ h	~1 h
1.3 GB / 600K tables	10–30 min (frequently crashes)	< 2 min

Why 8x Faster

taosdump bottleneck	boosttools approach
Avro serialize / deserialize	`taos_fetch_raw_block` zero-copy columnar blocks
SQL INSERT generation + parsing	`taos_write_raw_block` direct block injection
3× disk I/O (read → Avro → read)	In-memory direct transfer, 0 disk I/O
Single-threaded export	16 parallel workers with connection pool
Row-level processing	Block-level (thousands of rows per block)

Architecture

Source Cluster                      Destination Cluster
┌──────────────┐                    ┌──────────────┐
│   taosd      │                    │   taosd      │
└──────┬───────┘                    └──────▲───────┘
       │ fetch_raw_block()                 │ write_raw_block()
       ▼                                   │
  ┌────────────────────────────────────────┘
  │    boosttools  (N parallel workers)
  │    ┌────────┐ ┌────────┐ ┌────────┐
  │    │ W-0    │ │ W-1    │ │ W-N    │
  │    └────────┘ └────────┘ └────────┘
  │    Connection Pool (src) + Connection Pool (dst)
  └─── Work Queue: [child_table_1 … child_table_N]

Features

Zero-copy raw block transfer — bypasses SQL generation/parsing
Parallel workers — 1–64 threads (default 16)
Connection pool — thread-safe, timed wait, auto-retry
Schema sync — database / supertable / child table batch replication
Checkpoint / resume — JSON progress file for interrupted syncs
Time-range filter — --time-start / --time-end for incremental sync
Dry-run mode — preview without executing

Usage

cd tools/boosttools
make TDENGINE_DIR=/usr/local/taos

./boosttools --src-host 10.0.1.1 --dst-host 10.0.2.1 --database mydb --workers 16
./boosttools ... --data-only --resume --workers 32        # resume interrupted
./boosttools ... --time-start '2024-01-01' --time-end '2024-07-01'  # incremental

Files (2,315 lines total)

File	Lines	Purpose
`src/boost.h`	203	Core types, config, logging macros
`src/main.c`	293	CLI entry, argument parsing
`src/conn_pool.c`	218	Thread-safe connection pool
`src/schema_sync.c`	664	Schema replication engine
`src/data_sync.c`	395	Parallel raw-block data transfer
`src/progress.c`	219	Checkpoint / resume
`Makefile`	95	Build (auto-detect TDengine)
`CMakeLists.txt`	85	CMake alternative
`deploy.sh`	56	One-click remote deploy
`benchmark.sh`	87	Automated benchmark

Test Plan

Build on Ubuntu 22.04, TDengine 3.x from source, HDD
Schema sync: 10,000 child tables — 100 % accuracy
Data sync: 17.5 M rows — 100 % integrity, 0 errors
Benchmark: 8.0× faster than taosdump (42 s vs 336 s)
Cross-network test (separate physical clusters)
100 GB full-scale end-to-end validation
SSD benchmark comparison

Implements a native C tool for TDengine cluster data migration using raw_block zero-copy transfer (taos_fetch_raw_block + taos_write_raw_block), achieving ~8x throughput vs taosdump on benchmarks. Key features: - Thread-safe connection pool with configurable capacity - Parallel schema sync (database/supertable/child table replication) - Multi-worker data transfer engine with raw block pipeline - Checkpoint/resume support for interrupted syncs - CLI with time-range filtering, dry-run, schema-only/data-only modes Benchmark (10K tables, 17.5M rows, localhost): taosdump: 336s, 52K rows/s boosttools: 42s, 416K rows/s (8x faster, 0 disk I/O)

gemini-code-assist · 2026-04-14T03:34:42Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

uk0 requested review from guanshengliang and zitsen as code owners April 14, 2026 03:34

taosdata-bot bot added the from community label Apr 14, 2026

This was referenced Apr 14, 2026

导入数据集时遇到了性能问题 #34980

Open

Tdengine 3.2.0.0.0 集群环境下同步数据慢慢慢慢 #34802

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): add boosttools - 8x faster cluster-to-cluster sync tool#35130

feat(tools): add boosttools - 8x faster cluster-to-cluster sync tool#35130
uk0 wants to merge 1 commit intotaosdata:mainfrom
uk0:feat/boosttools

uk0 commented Apr 14, 2026

Uh oh!

gemini-code-assist bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

uk0 commented Apr 14, 2026

Summary

Test Environment

Test Dataset

Performance Benchmark (HDD)

Disk Type Impact

100GB Projection (HDD)

Why 8x Faster

Architecture

Features

Usage

Files (2,315 lines total)

Test Plan

Uh oh!

gemini-code-assist bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant