What version of osmium-tool are you using?
osmium-tool 1.19.1 (also reproduced on current master)
What operating system version are you using?
Ubuntu 24.04, gcc 13, x86_64
Tell us something about your system
32 CPU cores, 94 GB RAM, software RAID array
What did you do exactly?
osmium extract --strategy=complete_ways -c tiles_z2.json planet-latest.osm.pbf
The config file defines 16 bounding boxes covering the full planet at z=2 Web Mercator tile boundaries.
What did you expect to happen?
I expected the runtime not to be dominated by avoidable I/O stalls.
What did happen instead?
| Strategy |
Elapsed |
| complete_ways |
51m 36s |
| simple |
31m 27s |
During Pass 2, disk utilisation reached 100% while CPU was mostly idle, suggesting the main thread was spending much of its time blocked waiting for writes to complete.
What did you do to try analyzing the problem?
I identified two likely causes:
Synchronous write flushing. Extract::write() calls osmium::io::Writer synchronously each time its buffer fills, blocking the main thread while the write is performed.
Redundant way node scanning in Pass 1. Pass1::eway() is called independently for each extract on every way, causing way.nodes() to be scanned multiple times per way.
I implemented fixes for both in my fork and measured on the same planet file and config:
| Version |
Elapsed |
vs baseline |
| upstream |
51m 36s |
— |
| patched |
41m 18s |
-20% |
I plan to submit a pull request. Mentioning here first in case the approach raises concerns.
What version of osmium-tool are you using?
osmium-tool 1.19.1 (also reproduced on current master)
What operating system version are you using?
Ubuntu 24.04, gcc 13, x86_64
Tell us something about your system
32 CPU cores, 94 GB RAM, software RAID array
What did you do exactly?
The config file defines 16 bounding boxes covering the full planet at z=2 Web Mercator tile boundaries.
What did you expect to happen?
I expected the runtime not to be dominated by avoidable I/O stalls.
What did happen instead?
During Pass 2, disk utilisation reached 100% while CPU was mostly idle, suggesting the main thread was spending much of its time blocked waiting for writes to complete.
What did you do to try analyzing the problem?
I identified two likely causes:
Synchronous write flushing.
Extract::write()callsosmium::io::Writersynchronously each time its buffer fills, blocking the main thread while the write is performed.Redundant way node scanning in Pass 1.
Pass1::eway()is called independently for each extract on every way, causingway.nodes()to be scanned multiple times per way.I implemented fixes for both in my fork and measured on the same planet file and config:
I plan to submit a pull request. Mentioning here first in case the approach raises concerns.