tilemaker

mirror of https://github.com/systemed/tilemaker.git synced 2026-05-06 08:26:46 -04:00

Author	SHA1	Message	Date
systemed	4baf3caf55	cmake update	2026-03-17 23:53:26 +00:00
AlanIWBFT	bf4f3f35ad	Fix build with Boost 1.89 * `boost::range::sort` conflicts with `boost::sort`. Implementations are moved into a dedicated .cpp file. * `boost::system` is a header only library now and the stub is removed	2026-01-13 22:11:17 +08:00
Richard Fairhurst	13b841d58f	Visvalingam-Whyatt simplification (#772 )	2024-10-19 11:34:29 +01:00
Colin Dellow	fe8399e522	use libdeflate rather than zlib (#769 )	2024-10-13 18:03:44 +01:00
Colin Dellow	ad86ab4a01	remove FAT_TILE_INDEX (#700 )	2024-04-01 17:24:47 +01:00
systemed	9080cdb10e	Merge branch 'way_keys' into v3	2024-01-10 18:01:27 +00:00
Colin Dellow	abb2e95f83	Merge remote-tracking branch 'origin/master' into lua-interop-3	2024-01-07 13:51:46 -05:00
Colin Dellow	a600524c90	extend NextRelation/FindInRelation to nodes (#632 )	2024-01-07 16:51:08 +00:00
Colin Dellow	7405050300	compile streamvbyte as C, not C++ (#634 )	2024-01-06 18:16:12 +00:00
Richard Fairhurst	65829e48cd	GeoJSON as alternative to shapefiles (#630 )	2024-01-01 23:08:08 +00:00
Richard Fairhurst	1a39f26a33	C++ tileserver (#631 )	2024-01-01 23:07:13 +00:00
Colin Dellow	3c1740ad4d	generalize node_keys; add way_keys This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes #402. I'm not too sure if this is generally useful - it's useful for one of my use cases, and I see someone asking about it in https://github.com/systemed/tilemaker/issues/190 and, elsewhere, in https://github.com/onthegomap/planetiler/issues/99 If you feel it complicates the maintainer story too much, please reject. The goal is to reduce memory usage for users doing thematic extracts by not indexing nodes that are only used by uninteresting ways. For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node store. By contrast, if your interest is only to build a railway map, you require only ~8M nodes, needing 70MB of RAM. Or, to build a map of national/provincial parks, 12M nodes and ~120MB of RAM. Currently, a user can achieve this by pre-filtering their PBF using osmium-tool. If you know exactly what you want, this is a good long-term solution. But if you're me, flailing about in the OSM data model, it's convenient to be able to tweak something in the Lua script and observe the results without having to re-filter the PBF and update your tilemaker command to use the new PBF. Sample use cases: ```lua -- Building a map without building polygons, ~ excludes ways whose -- only tags are matched by the filter. way_keys = {"~building"} ``` ```lua -- Building a railway map way_keys = {"railway"} ``` ```lua -- Building a map of major roads way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}` ``` Nodes used in ways which are used in relations (as identified by `relation_scan_function`) will always be indexed, regardless of `node_keys` and `way_keys` settings that might exclude them. A concrete example, given a Lua script like: ```lua function way_function() if Find("railway") ~= "" then Layer("lines", false) end end ``` it takes 13GB of RAM and 100 seconds to process North America. If you add: ```lua way_keys = {"railway"} ``` It takes 2GB of RAM and 47 seconds. Notes: 1. This is based on `lua-interop-3`, as it interacts with files that are changed by that. I can rebase against master after lua-interop-3 is merged. 2. The names `node_keys` and `way_keys` are perhaps out of date, as they can now express conditions on the values of tags in addition to their keys. Leaving them as-is is nice, as it's not a breaking change. But if breaking changes are OK, maybe these should be `node_filters` and `way_filters` ? 3. Maybe the value for `node_keys` in the OMT profile should be expressed in terms of a negation, e.g. `node_keys = {"~created_by"}`? This would avoid issues like https://github.com/systemed/tilemaker/issues/337 4. This also adds a SIGUSR1 handler during OSM processing, which prints the ID of the object currently being processed. This is helpful for tracking down slow geometries.	2023-12-29 18:02:11 -05:00
Colin Dellow	c4518f3cca	faster tag map, faster Find()/Holds(), avoid mallocs Cherry-picked from https://github.com/systemed/tilemaker/pull/604/commits/b3221667a9d2366410dbfdc7f25f3062d7a135ef, https://github.com/systemed/tilemaker/pull/604/commits/5c807a9841b866c6dc403141effd4c9d14459034, https://github.com/systemed/tilemaker/pull/604/commits/13b3465f1c80052aa2d622e3915af08b8c5eae9a and fixed up to work with protozero's data_view structure. Original commit messages below, the timings will vary but the idea is the same: Faster tagmap ===== Building a std::map for tags is somewhat expensive, especially when we know that the number of tags is usually quite small. Instead, use a custom structure that does a crappy-but-fast hash to put the keys/values in one of 16 buckets, then linear search the bucket. For GB, before: ``` real 1m11.507s user 16m49.604s sys 0m17.381s ``` After: ``` real 1m9.557s user 16m28.826s sys 0m17.937s ``` Saving 2 seconds of wall clock and 20 seconds of user time doesn't seem like much, but (a) it's not nothing and (b) having the tags in this format will enable us to thwart some of Lua's defensive copies in a subsequent commit. A note about the hash function: hashing each letter of the string using boost::hash_combine eliminated the time savings. Faster Find()/Holds() ===== We (ab?)use kaguya's parameter serialization machinery. Rather than take a `std::string`, we take a `KnownTagKey` and teach Lua how to convert a Lua string into a `KnownTagKey`. This avoids the need to do a defensive copy of the string when coming from Lua. It provides a modest boost: ``` real 1m8.859s user 16m13.292s sys 0m18.104s ``` Most keys are short enough to fit in the small-string optimization, so this doesn't help us avoid mallocs. An exception is `addr:housenumber`, which, at 16 bytes, exceeds g++'s limit of 15 bytes. It should be possible to also apply a similar trick to the `Attribute(...)` functions, to avoid defensive copies of strings that we've seen as keys or values. avoid malloc for Attribute with long strings ===== After: ``` real 1m8.124s user 16m6.620s sys 0m16.808s ``` Looks like we're solidly into diminishing returns at this point.	2023-12-28 16:43:16 -05:00
Colin Dellow	ae1981b0f0	use vtzero instead of libprotobuf (#625 )	2023-12-28 21:31:01 +00:00
Colin Dellow	12ed2414d9	use protozero (#623 )	2023-12-28 20:07:50 +00:00
Colin Dellow	d62c480deb	be able to render the planet with 32gb of RAM (#618 ) * move OutputObjects to mmap store For the planet, we need 1.3B output objects, 12 bytes per, so ~15GB of RAM. * treat objects at low zoom specially For GB, ~0.3% of objects are visible at low zooms. I noticed in previous planet runs that fetching the objects for tiles in the low zooms was quite slow - I think it's because we're scanning 1.3B objects each time, only to discard most of them. Now we'll only be scanning ~4M objects per tile, which is still an absurd number, but should mitigate most of the speed issue without having to properly index things. This will also help us maintain performance for memory-constrained users, as we won't be scanning all 15GB of data on disk, just a smaller ~45MB chunk. * make more explicit that this is unexpected * extend --materialize-geometries to nodes For Points stored via Layer(...) calls, store the node ID in the OSM store, unless `--materialize-geometries` is present. This saves ~200MB of RAM for North America, so perhaps 1 GB for the planet if NA has similar characteristics as the planet. Also fix the OSM_ID(...) macro - it was lopping off many more bits than needed, due to some previous experiments. Now that we want to track nodes, we need at least 34 bits. This may pose a problem down the road when we try to address thrashing. The mechanism I hoped to use was to divide the OSM stores into multiple stores covering different low zoom tiles. Ideally, we'd be able to recall which store to look in -- but we only have 36 bits, we need 34 to store the Node ID, so that leaves us with 1.5 bits => can divide into 3 stores. Since the node store for the planet is 44GB, dividing into 3 stores doesn't give us very much headroom on a 32 GB box. Ah well, we can sort this out later. * rejig AttributePair layout On g++, this reduces the size from 48 bytes to 34 bytes. There aren't _that_ many attribute pairs, even on the planet scale, but this plus a better encoding of string attributes might save us ~2GB at the planet level, which is meaningful for a 32GB box * fix initialization order warning * add PooledString Not used by anything yet. Given Tilemaker's limited needs, we can get away with a stripped-down string class that is less flexible than std::string, in exchange for memory savings. The key benefits - 16 bytes, not 32 bytes (g++) or 24 bytes (clang). When it does allocate (for strings longer than 15 bytes), it allocates from a pool so there's less per-allocation overhead. * add tests for attribute store ...I'm going to replace the string implementation, so let's have some backstop to make sure I don't break things * rejig isHot Break dependency on AttributePair, just work on std::string * teach PooledString to work with std::string ...this will be useful for doing map lookups when testing if an AttributePair has already been created with the given value. * use PooledString in AttributePair AttributePair has now been trimmed from 48 bytes to 18 bytes. There are 40M AttributeSets for the planet. That suggests there's probably ~30M AttributePairs, so hopefully this is a savings of ~900MB at the planet level. Runtime doesn't seem affected. There's a further opportunity for savings if we can make more strings qualify for the short string optimization. Only about 40% of strings fit in the 15 byte short string optimization. Of the remaining 60%, many are Latin-alphabet title cased strings like `Wellington Avenue` -- this could be encoded using 5 bits per letter, saving us an allocation. Even in the most optimistic case where: - there are 30M AttributePairs - of these, 90% are strings (= 27M) - of these, 60% don't fit in SSO (=16m) - of these, we can make 100% fit in SSO ...we only save about 256MB at the planet level, but at some significant complexity cost. So probably not worth pursuing at the moment. * log timings When doing the planet, especially on a box with limited memory, there are long periods with no output. Show some output so the user doesn't think things are hung. This also might be useful in detecting perf regressions more granularly. * AppendVector: an append-only chunked vector When using --store, deque is nice because growing doesn't require invalidating the old storage and copying it to a new location. However, it's also bad, because deque allocates in 512-byte chunks, which causes each 4KB OS page to have data from different z6 tiles. Instead, use our own container that tries to get the best of both worlds. Writing a random access iterator is new for me, so I don't trust this code that much. The saving grace is that the container is very limited, so errors in the iterator impelementation may not get exercised in practice. * fix progress when --store present * mutex on RelationScan progress output * make NodeStore/WayStore shardable This adds three methods to the stores: - `shard()` returns which shard you are - `shards()` returns how many shards total - `contains(shard, id)` returns whether or not shard N has an item with id X SortedNodeStore/SortedWayStore are not implemented yet, that'll come in a future commit. This will allow us to create a `ShardedNodeStore` and `ShardedWayStore` that contain N stores. We will try to ensure that each store has data that is geographically close to each other. Then, when reading, we'll do multiple passes of the PBF to populate each store. This should let us reduce the working set used to populate the stores, at the cost of additional linear scans of the PBF. Linear scans of disk are much less painful than random scans, so that should be a good trade. * add minimal SortedNodeStore test I'm going to rejig the innards of this class, so let's have some tests. * stop using internal linkage for atomics In order to shard the stores, we need to have multiple instances of the class. Two things block this currently: atomics at file-level, and thread-locals. Moving the atomics to the class is easy. Making the thread-locals per-class will require an approach similar to that adopted in https://github.com/systemed/tilemaker/blob/52b62dfbd5b6f8e4feb6cad4e3de86ba27874b3a/include/leased_store.h#L48, where we have a container that tracks the per-class data. * SortedNodeStore: abstract TLS behind storage() Still only supports 1 class, but this is a step along the path. * SortedWayStore: abstract TLS behind storage() * SortedNodeStore: support multiple instances * SortedWayStorage: support multiple instances * actually fix the low zoom object collection D'oh, this "worked" due to two bugs cancelling each other: (a) the code to find things in the low zoom list never found anything, because it assumed a base z6 tile of 0/0 (b) we weren't returning early, so the normal code still ran Rejigged to actually do what I was intending * AppendVector tweaks * more low zoom fixes * implement SortedNodeStore::contains * implement SortedWayStore::contains * use TileCoordinatesSet * faster covered tile enumeration Do a single pass, rather than one pass per zoom. * add ShardedNodeStore This distributes nodes into one of 8 shards, trying to roughly group parts of the globe by complexity. This should help with locality when writing tiles. A future commit will add a ShardedWayStore and teach read_pbf to read in a locality-aware manner, which should help when reading ways. * add ShardedWayStore Add `--shard-stores` flag. It's not clear yet this'll be a win, will need to benchmark. The cost of reading the PBF blocks repeatedly is a bit higher than I was expecting. It might be worth seeing if we can index the blocks to skip fruitless reads. * fewer, more balanced shards * skip ReadPhase::Ways passes if node store is empty * support multiple passes for ReadPhase::Relations * fix check for first way * adjust shards With this distribution, no node shard is more than ~8.5GB. * Relations: fix effectiveShards > 1 check Oops, bug that very moderately affected performance in the non `--shard-stores` case * extend --materialize-geometries to LayerAsCentroid It turns out that about 20% of LayerAsCentroid calls are for nodes, which this branch could already do. The remaining calls are predominantly ways, e.g. housenumbers. We always materialize relation centroids, as they're expensive to compute. In GB, this saves about 6.4M points, ~102M. Scaled to the planet, it's perhaps a 4.5GB savings, which should let us use a more aggressive shard strategy. It seems to add 3-4 seconds to the time to process GB. * add `DequeMap`, change AttributeStore to use it This implements the idea in https://github.com/systemed/tilemaker/issues/622#issuecomment-1866813888 Rather than storing a `deque<T>` and a `flat_map<T, uint32_t>`, store a `deque<T>` and `vector<uint32_t>`, to save 8 bytes per AttributePair and AttributeSet. capture s(this) Seems to save ~1.5 seconds on GB * fix warning * fix warning, really * fewer shards Shard 1 (North America) is ~4.8GB of nodes, shard 4 (some of Europe) is 3.7GB. Even ignoring the memory savings in the recent commits, these could be merged. * extract option parsing to own file We'd like to have different defaults based on whether `--store` is present. Now that option parsing will have some more complex logic, let's pull it into its own class so it can be more easily tested. * use sensible defaults based on presence of --store * improve test coverage * fixes * update number of shards to 6 This has no performance impact as we never put anything in the 7th shard, and so we skip doing the 7th pass in the ReadPhase::Ways and ReadPhase::Relations phase. The benefit is only to avoid emitting a noisy log about how the 7th store has 0 entries in it. Timings with 6 shards on Vultr's 16-core machine here: https://gist.github.com/cldellow/77991eb4074f6a0f31766cf901659efb The new peak memory is ~12.2GB. I am a little perplexed -- the runtime on a 16-core server was previously: ``` $ time tilemaker --store /tmp/store --input planet-latest.osm.pbf --output tiles.mbtiles --shard-stores real 195m7.819s user 2473m52.322s sys 73m13.116s ``` But with the most recent commits on this branch, it was: ``` real 118m50.098s user 1531m13.026s sys 34m7.252s ``` This is incredibly suspicious. I also tried re-running commit `bbf0957c1e`, and got: ``` real 123m15.534s user 1546m25.196s sys 38m17.093s ``` ...so I can't explain why the earlier runs took 195 min. Ideas: - the planet changed between runs, and a horribly broken geometry was fixed - Vultr gives quite different machines for the same class of server - perhaps most likely: I failed to click "CPU-optimized" when picking the earlier server, and got a slow machine the first time, and a fast machine the second time. I'm pretty sure I paid the same $, so I'm not sure I believe this. I don't think I really believe that a 33% reduction in runtime is explained by any of those, though. Anyway, just another thing to be befuddled by. * --store uses lazy geometries; permit overriding I did some experiments on a Hetzner 48-core box with 192GB of RAM: --store, materialize geometries: real 65m34.327s user 2297m50.204s sys 65m0.901s The process often failed to use 100% of CPU--if you naively divide user+sys/real you get ~36, whereas the ideal would be ~48. Looking at stack traces, it seemed to coincide with calls to Boost's rbtree_best_fit allocator. Maybe: - we're doing disk I/O, and it's just slower than recomputing the geometries - we're using the Boost mmap library suboptimally -- maybe there's some other allocator we could be using. I think we use the mmap allocator like a simple bump allocator, so I don't know why we'd need a red-black tree --store, lazy geometries: real 55m33.979s user 2386m27.294s sys 23m58.973s Faster, but still some overhead (user+sys/real => ~43) no --store, materialize geometries: OOM no --store, lazy geometries (used 175GB): real 51m27.779s user 2306m25.309s sys 16m34.289s This was almost 100% CPU - user+sys/real => ~45) From this, I infer: - `--store` should always default to lazy geometries in order to minimize the I/O burden - `--materialize-geometries` is a good default for non-store usage, but it's still useful to be able to override and use lazy geometries, if it then means you can fit the data entirely in memory	2023-12-28 15:23:35 +00:00
Richard Fairhurst	5acee418ba	PMTiles support (#620 )	2023-12-22 10:45:05 +00:00
Colin Dellow	52b62dfbd5	some memory and concurrency improvements (#612 ) * extract ClipCache to own file Some housekeeping: extract clip_cache.cpp * templatize ClipCache, apply to MultiLineStrings This provides a very small benefit. I think the reason is two-fold: there aren't many multilinestrings (relative to multipolygons), and clipping them is less expensive. Still, it did seem to provide a small boost, so leaving it in. * housekeeping: move test, minunit * --log-tile-timings: verbose timing logs This isn't super useful to end users, but is useful for developers. If it's not OK to leave it in, let me know & I'll revert it. You can then process the log: ```bash $ for x in {0..14}; do echo -n "z$x "; cat log-write-node-attributes.txt \| grep ' took ' \| sort -nk3 \| grep z$x/ \| awk 'BEGIN { min = 999999; max = 0; }; { n += 1; t += $3; if ($3 > max) { max = $3; max_id = $1; } } END { print n, t, t/n, max " (" max_id ")" }'; done z0 1 7.04769 7.04769 7.047685 (z0/0/0) z1 1 9.76067 9.76067 9.760671 (z1/0/0) z2 1 9.98514 9.98514 9.985141 (z2/1/1) z3 1 9.98514 9.98514 9.985141 (z3/2/2) z4 2 14.4699 7.23493 8.610035 (z4/5/5) z5 2 20.828 10.414 13.956526 (z5/10/11) z6 5 6464.05 1292.81 3206.252711 (z6/20/23) z7 13 11306.4 869.727 3275.475707 (z7/40/46) z8 35 15787.1 451.061 2857.506681 (z8/81/92) z9 86 20723.8 240.974 1605.788985 (z9/162/186) z10 277 25456.8 91.9018 778.311785 (z10/331/369) z11 960 28851.3 30.0534 627.351078 (z11/657/735) z12 3477 24031.6 6.91158 451.122972 (z12/1315/1471) z13 13005 13763.7 1.05834 156.074701 (z13/2631/2943) z14 50512 24214.7 0.479385 106.358450 (z14/5297/5916) ``` This shows each zoom's # of tiles, total time, average time, worst case time (and the tile that caused it). In general, lower zooms are slower than higher zooms. This seems intuitively reasonable, as the lower zoom often contains all of the objects in the higher zoom. I would have guessed that a lower zoom would cost 4x the next higher zoom on a per-tile basis. That's sort of the case for `z12->z13`, `z11->z12`, `z10->z11`, and `z9->z10`. But not so for other zooms, where it's more like a 2x cost. Looking at `z5->z6`, we see a big jump from 10ms/tile to 1,292ms/tile. This is probably because `water` has a minzoom of 6. This all makes me think that the next big gain will be from re-using simplifications. This is sort of the mirror image of the clip cache: - the clip cache avoids redundant clipping, and needs to be computed from lower zooms to higher zooms - a simplification cache could make simplifying cheaper, but needs to be computed from higher zooms to lower zooms The simplification cache also has two other wrinkles: 1. Is it even valid? e.g. is `simplify(object, 4)` the same as `simplify(simplify(object, 2), 2)` ? Maybe it doesn't have to be the same, because users are already accepting that we're losing accuracy when we simplify. 2. Rendering an object at `z - 1`, needds to (potentially) stitch together that object from 4 tiles at `z`. If those have each been simplified, we may introduce odd seams where the terminal points don't line up. * more, smaller caches; run destructors outside lock * use explicit types * don't populate unnecessary vectors * reserve vectors appropriately * don't eagerly call way:IsClosed() This saves a very little bit of time, but more importantly, tees up lazily evaluating the nodes in a way. * remove locks from geometry stores Rather than locking on each store call, threads lease a range of the ID space for points/lines/multilines/polygons. When the thread ends, it return the lease. This has some implications: - the IDs are no longer guaranteed to be contiguous - shapefiles are a bit weird, as they're loaded on the main thread -- so their lease won't be returned until program end. This is fine, just pointing it out. This didn't actually seem to affect runtime that much on my 16 core machine, but I think it'd help on machines with more cores. * increase attributestore shards When scaling to 32+ cores, this shows up as an issue. Try a really blunt hammer fix. * read_pbf: less lock contention on status `std::cout` has some internal locks -- instead, let's synchronize explicitly outside of it so we control the contention. If a worker fails to get the lock, just skip that worker's update. * tile_worker: do syscall 1x/thread, not 1x/tile * tilemaker: avoid lock contention on status update If a worker can't get the lock, just skip their update. * Revert "don't eagerly call way:IsClosed()" This reverts commit `3e7b9b62d1`. This commit came about from some experiments that I had done pre-SortedNodeStore. In that world, lazily evaluating the nodes of a way provided a meaningful savings if the way was ultimately discarded by the Lua code. Post-SortedNodeStore, it doesn't seem to matter as much. Which is great, as it means the store is much faster, but also means this commit is just noise. You can see the POC code in https://github.com/cldellow/tilemaker/tree/lazy-way-nodes * update ifdef guard, add comments * lazy way geometries Tilemaker previously stored the 2D geometries it produced from ways. This commit makes Tilemaker use the OSM way store to generate linestrings and polygons that originated with an OSM way. You can get the old behaviour with `--materialize-geometries`, which is a sensible choice if you are not memory constrained. For GB: before (available via `--materialize-geometries`): 2m11s, 9600MB this commit: 2m20s, 6400MB So ~8% slower, but 33% less memory. I think it's probably reasonable for this to be the default, which has nice symmetry with compressed nodes and compressed ways being the default. Building NA with --store still seems OK - 36min. I was concerned that the increased node store lookups could be vulnerable to thrashing. I do see some stuttering during tile writing, but I think the decreased read iops from a smaller geometry store balance out the increased read iops from looking up nodes. A future investigation might be to have SortedWayStore store latplons rather than node IDs -- a bit more memory, but should be less CPU and less vulnerable to thrashing. * improve tile coordinate generation Before writing, we compute the set of tiles to be written. There were two opportunities for improvement here: - determining which tiles were covered by our objects: we previously used a `std::set`, which has poor memory and runtime behaviour. Instead, use a fixed size `std::vector<bool>` -- this takes 64MB at z14, but gives much faster insert/query times - determining if tiles were covered by clipping box: we used boost's intersect algorithm before, which required constructing a TileBbox and was a bit slow. In the case where the tile is contained in a z6 tile that is wholly covered by the clipping box, we can short-circuit This has the most impact when the set of objects or tiles is very large--e.g. Antarctica, North America or bigger. * SortedNodeStore: only do arena allocations On a 48-core server, I noticed lock contention on the mmap allocator. So let's just always use pools of memory, and pick a bigger pool size. This means we'll sometimes allocate memory that we don't use. In the extreme case of Monaco, we only need like 200KB, but we'll allocate several megs. As you scale to larger PBFs, the waste trends to 0%, so this should be fine in practice. * remove TODO * fix Windows build D'oh, clock_gettime is Linux-ish. `std::chrono` may have a cross-platform option, but it's not clear. For now, just omit this functionality on Windows. If we want to expose it, we can explore something in std::chrono or make a wrapper that calls QueryPerformanceCounter on Windows. * sigh * fix bounds check	2023-12-15 17:04:46 +00:00
Colin Dellow	8300b0cdd9	Alternate node store (#590 ) * refactor NodeStore I'd like to add an alternative NodeStore that can be used when the `Type_then_ID` property is present in the PBF. First, a small (?) refactor: - make `NodeStore` an interface, with two concrete implementations - extract the NodeStore related things to their own files - this will cause some churn, as they'll depend on things that also need to get extracted to their own files. Short term pain, hopefully long term gain in faster compile times. Changing the invocations of the functions to be virtual may have impact on performance. Will need to revisit that before committing to virtual methods. * change how work is assigned for ReadPhase::Nodes Currently, when a worker needs work, it gets the next unprocessed block. This means blocks are read sequentially at a global level, but from the perspective of each worker, there are gaps in the blocks they see. For nodes, we'd prefer to give each worker thread contiguous blocks from the underlying PBF. This will enable a more efficient storage for PBFs with the `Sort.Type_then_ID` flag. * add SortedNodeStore SortedNodeStore is uesful for PBFs with the `Sort.Type_then_ID` property, e.g. the planet and Geofabrik exports. It stores nodes in a hierarchy: - Level 1 is groups: there are 256K groups - Level 2 is chunks: each group has 256 chunks - Level 3 is nodes: each chunk has 256 nodes This allows us to store 2^34 nodes, with a fixed overhead of only 2M -- the space required for the level 1 pointers. Groups and chunks store their data sparsely. If a group has 7 chunks, it only uses storage for 7 chunks. On Great Britain's 184M node PBF, it needs ~9.13 bytes per node. Looking up a node can be done in fixed time: First, get some offsets: - Group: `nodeID / 65536` - Chunk: `(nodeID / 65536) / 256` - Position within chunk: `nodeID % 256` For example, Cape Chignecto Provincial Park has ID 4855703, giving: - Group 74 - Chunk 23 - Offset 151 Group 74's chunks may be sparse. To map chunk 23 to its physical location, each group has a 256-bit bitmask indicating which chunks are present. Use its physical location to get its `chunkOffset`. That allows you to get to the `ChunkInfo` struct. From there, do the same thing to get the node data. This design should also let us do some interesting things down the road, like efficiently compressing each chunk using something like delta encoding, zigzag encoding and bit packing. Then, to avoid paying a decompression cost, we'd likely give each worker a cache of uncompressed chunks. * cmake build * tidy up * tweak * tweak * derp * mac/windows build * fix build? I don't understand why these can't be passed as a copy in the Windows and Mac builds. Whatever, try passing a reference. * fix --store I think nested containers may not be wired up quite correctly. Instead, manage the char* buffers directly, rather than as `std::vector<char>` I'll fixup the other aspects (attributing libpopcnt, picking Sorted vs BinarySearch on the fly) later * attribution for libpopcnt * simplify read_pbf All read phases use the same striding-over-batches-of-blocks approach. This required changing how progress is reported, as block IDs are no longer globally montonically increasing. Rather than thread the state into ReadBlock, I just adopted 2 atomic counters for the whole class -- the progress reporter already assumes that it's the only thing dumping to stdout, so the purity of avoiding class-global doesn't buy us anything. * clear allocatedMemory * use scale factor 16, not 8 D'oh, if you get a full group where each chunk is full, you need to be able to express a value _ever so slightly_ larger than 65,536. North America and Europe have examples of this. Use a scale factor of 16, not 8. This'll mean some chunks have up to 15 wasted bytes, but it's not a huge deal. (And I have some thoughts on how to claw it back.) * comment out debug stats * windows build * derp * use SortedNodeStore if PBFs have Sort.Type_then_ID * add --compress-nodes If the user passes `--compress-nodes`, we use [streamvbyte](https://github.com/lemire/streamvbyte) to compress chunks of nodes in memory. The impact on read time is not much: - GB with `--compress-nodes`: 1m42s - without: 1m35s But the impact on memory is worthwhile, even across very different extracts: North America - 5.52 bytes/node vs 8.48 bytes/node 169482 groups, 18364343 chunks, 1757589784 nodes, needed 9706167278 bytes 169482 groups, 18364343 chunks, 1757589784 nodes, needed 14916095182 bytes Great Britain - 5.97 bytes/node vs 9.25 bytes/node 163074 groups, 4871807 chunks, 184655287 nodes, needed 1104024510 bytes 163074 groups, 4871807 chunks, 184655287 nodes, needed 1708093150 bytes Nova Scota - 5.81 bytes/node vs 8.7 bytes/node 26777 groups, 157927 chunks, 12104733 nodes, needed 70337950 bytes 26777 groups, 157927 chunks, 12104733 nodes, needed 105367598 bytes Monaco - 10.43 bytes/node vs 13.52 bytes/node 1196 groups, 2449 chunks, 30477 nodes, needed 318114 bytes 1196 groups, 2449 chunks, 30477 nodes, needed 412258 bytes * build * build * remove __restrict__ to satisfy windows build * remove debug print, small memory optimization * use an arena for small groups * omit needless words * better short-circuiting for Type-then-ID PBFs Track metadata about which blocks have nodes, ways and relations. By default, we assume any block may contain nodes, ways or relations. If the PBF supports Type-then-ID PBFs, do a binary search to find the first blocks with ways and relations. This means ReadPhase::Nodes can stop without scanning ways/relations. In addition to avoiding needless work, it makes it easier to assign each worker a balanced amount of work -- now each worker has only blocks with nodes, which are about the same effort computationally. It also makes ReadPhase::ScanRelations faster, as it scans exactly the blocks with relations, skipping the blocks with ways. Similarly, ReadPhase::Ways is a bit faster, as it doesn't have to read the blocks with relations. For North America, this reduces the time to complete the Nodes and RelationsScan phase from 2m30s to 1m20s. For GB, it reduces the time from 22s to 9s. * ReadPhase::Relations - more parallelism When processing relations for small extracts, there are often fewer blocks than cores. Instead, divide the work more granularly, assigning each of the N threads 1/Nth of the block to process. This saves 4-5 seconds (which is cumulatively ~20% of runtime) for the Canadian province of Nova Scotia. * extract WayStore, BinarySearchWayStore * stub in SortedWayStore ...it just throws a lot of exceptions at the moment. * put SortedNodeStore in a namespace Also replace some `#define`s with `const`s. I'm likely going to reuse some names in SortedWayStore, so namespacing to avoid conflicts. * don't use SortedWayStore if LocationsOnWays present * stub in insertLatpLons/insertNodes * change at() to return a non mmap vector SortedWayStore won't create mmaped vectors, so we need to return the lowest common denominator. This pessimizes performance of BinarySearchWayStore, since it'll have to allocate vectors on demand. Longer term: it might be better to return an iterator that hides the heavy lifting. * begin drawing the rest of the owl * flesh out types * add unit test framework * naive encoding of ways Checkpointing since I have something that works. Future optimizations: - when all high ints are the same, don't encode them - compression * more efficient if high ints are all the same * extract mmap_allocator.cpp This is needed to unit test the way store without dragging in osm_store. * progress on publishGroup checkpointing, going to extract a populateMask(...) function * add populateMask function * finish publishGroup * SortedWayStore: implement at * pass node store into SortedWayStore * fix alignment * better logs * way stores should throw std::out_of_range This is part of the contract, client code will catch it and reject relations that have missing ways. * sortednodestore: throw std::out_of_range * support way compression * remove dead code, robust against empty ways * implement clear() * maybe fix windows build? very unclear why this is needed, but we seem to be getting C2131 on this line. * don't use variable-length arrays on stack Workaround for MSVC * avoid more variable-length arrays * make the other vectors as thread-local * --no-compress-ways, --no-compress-nodes	2023-12-09 13:47:07 +00:00
Richard Fairhurst	988c7708e5	Use CONFIG mode for CMake, fall back to MODULE (#566 )	2023-11-03 13:37:19 +00:00
Richard Fairhurst	3775c309f5	Reinstate CMAKE_DL_LIBS (#562 )	2023-11-03 11:00:14 +00:00
Richard Fairhurst	172ffb4324	Stop protobuf build erroring out (#559 )	2023-11-03 09:27:55 +00:00
FunMiles	5b6bb37daa	Modernize CMake include and linking resolution. (#555 ) * chore: Modernise cmake file. In modern cmake, linking a target to a library with target_link_libraries automatically sets the include paths. This modern approach also resolves issues with transitive dependencies (e.g. protobuf using abseil). * chore: Update the required minimum CMake to a lower working value. * fixed MSVC SQLite3 issue * Use an alias, avoiding second IF(MSVC)	2023-10-21 19:19:06 +01:00
holzgeist	d3c46edacb	fix: CMake build on Arch Linux (#503 ) * added missing include * set required C++ version to 17 Co-authored-by: Tobias Ollmann <tobias@holzgeist.at>	2023-07-16 13:32:06 +01:00
ǝɹʇʇɐʃǝ◖ xıʃǝɟ	2703d73410	Add latomic when needed. (#427 )	2023-03-07 23:49:04 +00:00
ǝɹʇʇɐʃǝ◖ xıʃǝɟ	0e7d647160	Fix manpage in makefiles. (#423 )	2022-09-27 21:02:46 +01:00
ǝɹʇʇɐʃǝ◖ xıʃǝɟ	4d87c15ea9	Avoid git during build with cmake (#341 )	2021-10-08 12:40:34 +01:00
Wouter van Kleunen	474e8bb852	Rely on provided rapidjson dependency + cmake (#306 ) * Rely on packaged rapidjson dependency * Add rapidjson to cmake * Add -pthread option to LIB flags variable in Makefile. * Allow environment setting of CXXFLAGS. * Avoid git during build * Github CI build using ubuntu 18.04 Co-authored-by: Felix Delattre <felix@delattre.de>	2021-09-10 14:01:59 +01:00
Richard Fairhurst	5dff747006	Show version number in --help output (#260 )	2021-07-02 18:23:26 +01:00
Wouter van Kleunen	0f2c6057b7	Non self-intersecting simplify (#239 )	2021-05-21 20:56:29 +01:00
Richard Fairhurst	d400f0b6c8	Merge tile contents (#225 )	2021-04-18 14:07:27 +01:00
Richard Fairhurst	229e9a01ea	Combine linestrings only at specified zoom levels (#221 )	2021-04-10 22:42:19 +01:00
Wouter van Kleunen	ad015fe879	Tilemaker CI for windows, macos and linux with boost	2021-03-20 07:56:28 +00:00
Wouter van Kleunen	264fc51a57	Perform static build of tilemaker	2021-03-07 10:07:23 +00:00
Wouter van Kleunen	0fb2c00589	Allow tilemake to run in compact (32bit) nodeid mode	2021-03-04 08:46:01 +00:00
Wouter van Kleunen	f7d985378e	CI use boost 1.67	2021-02-22 16:39:42 +00:00
Wouter van Kleunen	37ac7308fd	Use boost asio thread pool to schedule tile generation	2021-02-21 21:12:02 +00:00
Richard Fairhurst	8914030c8a	Default to Release build with cmake	2021-02-21 17:06:37 +00:00
Wouter van Kleunen	4ea47ca10e	Store shapes in mmap file and some cleanups	2021-02-20 20:25:15 +00:00
Wouter van Kleunen	20ef218935	Some additional optimizations and cleanup to osm store	2021-02-12 19:57:18 +00:00
Wouter van Kleunen	63c2591885	Store node in mmap file	2021-02-12 11:32:11 +00:00
Michal Fratczak	e5292e9f84	Add missing CPP files to CMakeLists.txt for tilemaker	2019-09-18 14:13:56 +02:00
systemed	42091e97d5	Replace Luabind with kaguya (hooray!)	2016-11-04 16:32:05 +00:00
Shunsuke Shimizu	00621a818d	threads support	2016-07-13 17:59:24 +09:00
alex85k	d4c4534d22	Add cmake build scripts	2015-10-12 00:46:49 +00:00

45 Commits