Skip to content

extract --strategy=complete_ways is slow with multi-extract configs on large files #312

Description

@yuiseki

What version of osmium-tool are you using?

osmium-tool 1.19.1 (also reproduced on current master)

What operating system version are you using?

Ubuntu 24.04, gcc 13, x86_64

Tell us something about your system

32 CPU cores, 94 GB RAM, software RAID array

What did you do exactly?

osmium extract --strategy=complete_ways -c tiles_z2.json planet-latest.osm.pbf

The config file defines 16 bounding boxes covering the full planet at z=2 Web Mercator tile boundaries.

What did you expect to happen?

I expected the runtime not to be dominated by avoidable I/O stalls.

What did happen instead?

Strategy Elapsed
complete_ways 51m 36s
simple 31m 27s

During Pass 2, disk utilisation reached 100% while CPU was mostly idle, suggesting the main thread was spending much of its time blocked waiting for writes to complete.

What did you do to try analyzing the problem?

I identified two likely causes:

Synchronous write flushing. Extract::write() calls osmium::io::Writer synchronously each time its buffer fills, blocking the main thread while the write is performed.

Redundant way node scanning in Pass 1. Pass1::eway() is called independently for each extract on every way, causing way.nodes() to be scanned multiple times per way.

I implemented fixes for both in my fork and measured on the same planet file and config:

Version Elapsed vs baseline
upstream 51m 36s
patched 41m 18s -20%

I plan to submit a pull request. Mentioning here first in case the approach raises concerns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions