Add cross-provider multipart upload support by nuwang · Pull Request #333 · CloudVE/cloudbridge

nuwang · 2026-06-26T16:05:36Z

Summary

Adds a clean, provider-agnostic multi-part upload capability to CloudBridge. Large objects can now be uploaded reliably and memory-efficiently across AWS S3, Azure Blob, GCP Storage and OpenStack Swift (and therefore the moto mock).

Two things are delivered together:

Explicit lifecycle API — obj.create_multipart_upload() → upload_part(n, data) → complete(parts) / abort(). Parts may be uploaded in any order / in parallel; complete assembles them in ascending part-number order.
Transparent handling — upload() / upload_from_file() route inputs above a configurable threshold through the same mechanism, streaming one part at a time so large payloads are never fully buffered. Existing signatures and return values are unchanged.

Why

Large-object handling was inconsistent across providers: Azure and GCP buffered whole files into memory, Swift's single-PUT path failed above 5 GB, and only AWS/Swift handled large files well. This makes behaviour uniform, safe, and memory-efficient everywhere, and gives callers a single provider-agnostic API.

Design

Follows CloudBridge's three-layer (interface → base → provider) + subservice + @dispatch-event pattern:

Interfaces — new MultipartUpload / UploadPart, BucketObject.create_multipart_upload(), and four BucketObjectService methods.
Base — concrete BaseMultipartUpload (delegates to the _bucket_objects service) + BaseUploadPart; a memory-efficient streaming driver in BaseBucketObject; config knobs CB_MULTIPART_THRESHOLD / CB_MULTIPART_PART_SIZE (env + per-provider override) with a 5 MiB minimum enforced.
Providers — native S3 multipart; Azure block blobs (stage_block/commit_block_list, with a documented no-op abort since Azure has no server-side cancel); GCS objects.compose over temporary part objects with >32-source chaining + cleanup; Swift Static Large Objects with a manifest PUT. upload_from_file() keeps each provider's superior native path where one exists (AWS upload_file, Swift SwiftService); Azure's whole-file in-memory upload is replaced with a streaming path (removing the now-unused create_blob_from_* helpers).

Tests

New object-store tests — roundtrip, out-of-order parts, abort, transparent multipart, single-shot threshold, part-size validation. Written TDD-first against the moto mock; they run on the mock provider in CI without credentials and on aws/azure/gcp/openstack when selected via CB_TEST_PROVIDER. Existing object-store/storage suites remain green; flake8 (project CI config) is clean.

Backward compatibility

upload() / upload_from_file() keep their existing signatures and return semantics; only an internal threshold branch is added. The small-input path is unchanged.

Known limitation

GCS's >32-part compose chaining has no automated coverage (mock CI is AWS-only, and cloud tests use 3 parts). The logic is straightforward but only executes on very large GCS uploads.

Introduce a MultipartUpload/UploadPart abstraction in the interface and base layers, implemented by all four providers (AWS S3, Azure Blob, GCP Storage, OpenStack Swift) and therefore the moto mock. The explicit lifecycle is initiate -> upload_part(n) -> complete/abort, exposed via BucketObject.create_multipart_upload(). The high-level upload()/upload_from_file() methods now route inputs above a configurable threshold (CB_MULTIPART_THRESHOLD/PART_SIZE) through the same mechanism, streaming one part at a time so large payloads are never fully buffered. Existing method signatures and return values are preserved. Per-provider mapping: native S3 multipart; Azure block blobs (stage_block/commit_block_list, with a documented no-op abort); GCS compose over temporary part objects with >32-source chaining and cleanup; Swift Static Large Objects with a manifest PUT. Azure's whole-file in-memory upload is replaced with a streaming single-shot path, removing the unused create_blob_from_text/create_blob_from_file helpers. Adds object-store tests (roundtrip, out-of-order parts, abort, transparent multipart, single-shot threshold, part-size validation) that run on the mock provider in CI and on the cloud providers when selected.

The transparent multipart driver previously uploaded parts sequentially, which gave none of multipart's throughput benefit. It now uploads parts across a bounded thread pool (CB_MULTIPART_MAX_CONCURRENCY). To stay safe even on providers whose SDK client/connection is not thread-safe, each worker uploads through its own cloned provider, so no provider state is shared across threads. Reads are coalesced up to the part size so non-final parts are never undersized on short reads. Providers with an efficient, thread-safe native parallel uploader override the driver: AWS uses boto3 upload_fileobj (TransferManager) and Azure uses upload_blob(max_concurrency=...). GCP and OpenStack Swift inherit the base clone-pool driver, which gives Swift safe parallelism despite swiftclient's non-thread-safe connection. Adds a provider-agnostic unit test for the base driver (part ordering, short-read coalescing, bounded concurrency, per-worker clone isolation, abort-on-failure, part-size validation), since the AWS-backed mock provider exercises the native override rather than the base driver.

The abort test asserted the target object is absent after abort, which only holds on AWS, where objects.create() returns a bare handle. GCP, OpenStack and Azure materialise an empty placeholder on create(), so objects.get() returns that empty object rather than None. Assert the provider-agnostic contract instead: after abort the target is absent or empty, but never holds the uploaded part. Also clean up the placeholder so bucket teardown does not leak.

AWS upload_from_file called boto3's upload_file with no TransferConfig, so it used boto3's defaults and ignored CB_MULTIPART_* entirely -- unlike upload(), which builds a TransferConfig from those knobs. Pass a TransferConfig built from the same knobs so both upload paths honour a single configuration.

Introduce a provider-agnostic UploadConfig value object (threshold, part_size, max_concurrency) that callers may pass to upload() and upload_from_file() to tune a single transfer. It is deliberately not boto3's TransferConfig, so the abstraction stays provider-neutral; each provider maps the fields onto its native mechanism (boto3 TransferConfig, Azure max_concurrency, the base clone-pool driver for GCP/Swift). The three multipart resolvers now follow precedence: explicit UploadConfig field -> provider/global config -> class default. Providers whose upload_from_file uses a native uploader that manages its own segmenting (GCP resumable, Swift SwiftService) accept the argument for interface consistency but document that it does not affect that path.

nuwang added the safe-to-test label Jun 26, 2026

nuwang had a problem deploying to cloud-integration June 26, 2026 17:33 — with GitHub Actions Failure

github-actions Bot removed the safe-to-test label Jun 26, 2026

nuwang added the safe-to-test label Jun 26, 2026

nuwang temporarily deployed to cloud-integration June 26, 2026 19:15 — with GitHub Actions Inactive

nuwang had a problem deploying to cloud-integration June 26, 2026 19:15 — with GitHub Actions Failure

nuwang added 3 commits June 27, 2026 17:18

nuwang force-pushed the add-multipart-upload branch from c0c9eff to 58ae94a Compare June 27, 2026 11:57

github-actions Bot removed the safe-to-test label Jun 27, 2026

nuwang added the safe-to-test label Jun 27, 2026

nuwang temporarily deployed to cloud-integration June 27, 2026 12:00 — with GitHub Actions Inactive

nuwang had a problem deploying to cloud-integration June 27, 2026 12:00 — with GitHub Actions Failure

nuwang had a problem deploying to cloud-integration June 27, 2026 13:38 — with GitHub Actions Failure

nuwang had a problem deploying to cloud-integration June 27, 2026 13:50 — with GitHub Actions Failure

nuwang had a problem deploying to cloud-integration June 27, 2026 13:58 — with GitHub Actions Failure

nuwang had a problem deploying to cloud-integration June 27, 2026 14:13 — with GitHub Actions Failure

nuwang had a problem deploying to cloud-integration June 27, 2026 16:06 — with GitHub Actions Failure

nuwang had a problem deploying to cloud-integration June 27, 2026 18:04 — with GitHub Actions Failure

nuwang had a problem deploying to cloud-integration June 27, 2026 18:41 — with GitHub Actions Failure

nuwang had a problem deploying to cloud-integration June 27, 2026 19:51 — with GitHub Actions Failure

nuwang deployed to cloud-integration June 28, 2026 07:28 — with GitHub Actions Active

github-actions Bot removed the safe-to-test label Jun 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add cross-provider multipart upload support#333

Add cross-provider multipart upload support#333
nuwang wants to merge 5 commits into
mainfrom
add-multipart-upload

nuwang commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nuwang commented Jun 26, 2026

Summary

Why

Design

Tests

Backward compatibility

Known limitation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant