tilemaker

mirror of https://github.com/systemed/tilemaker.git synced 2026-05-09 01:40:01 -04:00

Author	SHA1	Message	Date
Colin Dellow	5e06647cf3	Remove output object ref (#595 )	2023-12-02 21:04:40 +00:00
Colin Dellow	3b3b8f1d3a	AttributeStore memory tweaks (#583 ) * make AttributeStore::get const I think AttributeStore lives forever, and AttributeSets are immutable once added to it, so we can avoid the copy. * use a string pool for AttributeSet keys There are relatively few unique key values for attributes, e.g. `kind`, `name`, `admin_level`. The Shortbread schema has only ~50 or so. I imagine OMT is similar, but haven't checked. We generate lots of AttributePairs -- on the order of tens of millions for GB, and std::string has an overhead of 32 bytes. By using a string pool and storing only an offset into it, we can save a few hundred MB of RAM. * lock-free reads for keys, vector for pairs This is the groundwork for implementing two future improvements: - hot/cold pairs: there is a bimodal distribution of attribute frequency. `landuse=wood`, `tunnel=0` are often duplicated. `name=Sneed's Seed & Feed` is not. In the future, we'll try to re-use the "hot" pairs to avoid paying the cost of an AttributePair for them. - "short vectors" - similar to the short string optimization, we should be able to pack up to 6 pairs (3 hot, 3 cold) in the overhead that a vector would otherwise use. As it stands, this commit increases memory usage. But we'll claw a lot of it back, and then some. * Have a "hot" shard for popular pairs If a pair looks like it might be re-usable, put it in a special shard and be able to re-use it. The special shard is limited to max 64K items, teeing up future work to have a simple vector for AttributeSets with few pairs. * treat 0 as a sentinel * de-dupe all AttributePairs The stats I was looking at were counting AttributePairs via AttributeSets, which of course presents a misleading image of how many duplicate AttributePairs there are, because by that point, they've already been deduped. De-duping doesn't add that much runtime overhead--and it could probably be improved by someone who knows more C++ concurrency tricks than me. * store pointers in pairMaps, optimize debug spew `Tile_Value` is a really memory-expensive object. Since we maintain long-lived references to the canonical AttributePair, we can store pointers to save a bit more memory. Now that value->AttributePairs are guaranteed to be 1:1, we can do our debug statistics on ints, and translate to pairs only when writing to stdout. * use boost::container::flat_map over std::map Doesn't appreciably affect runtime, saves a bit of memory. * don't memoize hash function Now that there is a 1:1 mapping between values and AttributePairs, it's trivial to compute the hash on demand. * output_object: avoid Tile_Value temporaries Also const-ify a few things * defer creating Tile_Value Tile_Value is a big union that takes up 96 bytes, but for our purposes, we're happy with a union of string, float and bool -- which can be expressed in 28 bytes. We need a discriminator variable, but due to alignment, that's free. I also consider `boost::variant<bool, float, string>`, but it seemed to take 40 bytes. I worried that not having a pool of Tile_Values would affect PBF writing time, but it seems unaffected. * adjust headers, remove unneeded rng * any integer 0 <= 25 is eligible for hot pool This is useful for ranks, which run from 1..25 * Use a small vector optimization for pair indexes `vector<uint32_t>` takes 24 bytes just to store its internal pointers. If you actually want to store a `uint32_t` in it, it'll then allocate some memory on the heap, taking a further 32-64 bytes depending on STL and malloc implementations. 56-88 bytes! For a single `uint32_t`! Outrageous. Instead, store references to pair indexes in an array of shorts. If the pairs don't fit in the array, upgrade it to a vector. Since we previously arranged for very popular pairs like `amenity=toilets` to have small indexes, our array of shorts is capable of storing between 4 and 8 pairs before we need to upgade to a vector. Most AttributeSets will not need to use a vector. * simplify AttributeKeyStore * use camelCase * re-write to avoid static lifetime AttributeKeyStore/AttributePairStore have the same lifetime as AttributeStore, so just make them owned by it. This results in slightly more convoluted code, but avoids having them floating around as globals. * reduce lock contention * Improve TileCoordinates hash function x ^ y will only use as many bits as max(x, y), but tiles only use the full 32-bit space at z16, so we're leaving a lot of the hash space on the table. * d'oh, avoid looking up the key name needlessly * change AttributeXyz(...) to be last-written wins Previously, if you set the same key to different values, it was not guaranteed that the last value written would win. * remove misleading comment * include deque * include map * return vector, not set set seems a bit like overkill - we already know the items are unique, and the consumer is likely just going to iterate over them * avoid GNU-specific initializer also avoid hardcoding 12 * Revert "Improve TileCoordinates hash function" This reverts commit `7570737715`. Oops, I think this change isn't meaningful, and is a result of me misreading the original code. It might still be an improvement to do something like `hash(x << 16) ^ hash(y)`, since the default TileCoordinate is only 16 bits, but that can be considered independent of this PR. * remove dead code * avoid copying AttributePairs They're long-lived, so pass pointers * OutputObjects - greatly reduce need for locks I'm slowly remembering how to write concurrent code... * AttributeKeyStore: use a TLS cache This should reduce futex contention significantly. I'll apply the same change for AttributePairStore's shard 0, then measure. * AttributePairStore: reduce lock contention * ensure atomics are initialized Per https://stackoverflow.com/questions/36320008/whats-the-default-value-for-a-stdatomic, they aren't initialized by default. Somewhat surprised this didn't result in crashes. * don't store duplicate way geometries A common pattern is: ```lua way:Layer("waterway", false) ... way:Layer("waterway_names", false) ``` Previously, we'd process the geometry twice, and store a second copy of it in memory. Instead, re-use the previously stored geometry. This saves another ~1GB of memory for the GB extract. It doesn't seem to affect runtime - I think we only re-use linestrings, and linestrings are relatively cheap to do `is_valid` on. It seems like with the rest of the work on this branch, the `OutputObjectXyz` classes are very thin -- inspecting `geomType` in order to construct the right was a bit tedious, so I removed them.	2023-11-19 17:59:05 +00:00
systemed	d9bb72b929	Merge branch 'master' into refactor_geometries	2023-11-16 21:16:44 +00:00
Colin Dellow	8b45a8a33e	Avoid copying strings/tag maps (#577 )	2023-11-12 19:03:02 +00:00
systemed	4db8c25417	Merge branch 'master' into refactor_geometries	2023-11-04 12:58:56 +00:00
Richard Fairhurst	dbdb4da097	RestartRelations() to reset relation subscript (#548 )	2023-10-09 11:30:03 +01:00
systemed	0ea137fedb	Per-tile feature limit as per #547	2023-10-03 17:52:02 +01:00
systemed	2249202751	Keep output list as OutputObjects for longer	2023-09-19 22:17:21 +01:00
systemed	22a3e64c68	Store objects sequentially (breaks include_ids)	2023-09-19 18:43:16 +01:00
systemed	adcdb6e892	Merge branch 'master' into refactor_geometries	2023-09-18 18:24:02 +01:00
Richard Fairhurst	439c6b1027	Report OSM ID on Lua processing error (#535 )	2023-09-18 14:22:36 +01:00
Richard Fairhurst	528aa25dec	Support type=boundary relations as native multipolygons (#508 )	2023-07-23 08:57:39 +01:00
systemed	23d8e6da59	Store attribute sets by index	2023-06-18 14:45:04 +01:00
systemed	856951daae	Reinstate output/attribute pair (memory leak)	2023-06-13 12:25:27 +01:00
systemed	ed14fc427d	Break pair into separate output/attribute lists	2023-06-12 10:09:32 +01:00
systemed	0944743852	Rename classes away from OsmStore	2023-06-10 13:56:15 +01:00
Richard Fairhurst	4a85c60e6d	Compile-time option for float ZOrder (#486 )	2023-03-31 15:10:30 +01:00
Fabrizio	57942b9812	Fixed application crashes (#441 )	2022-10-29 14:18:40 +01:00
Richard Fairhurst	ad1fcfbdf3	Support osmium locations-on-ways format (#386 )	2022-02-26 16:58:44 +00:00
Richard Fairhurst	70f89d64ba	Output slow geometry generation and allow user interrupt (#378 )	2022-02-17 10:45:51 +00:00
Richard Fairhurst	965f6ae349	(Non-multipolygon) relation support (#360 )	2022-01-08 10:12:08 +00:00
Richard Fairhurst	a8ee8a96cc	Revert "Use artificial IDs for OutputObject after 1st use (#357 )" (#359 ) This reverts commit `962ad86bfa`.	2022-01-05 00:22:16 +00:00
Richard Fairhurst	962ad86bfa	Use artificial IDs for OutputObject after 1st use (#357 )	2022-01-03 13:05:37 +00:00
Wouter van Kleunen	3f8b4e2f60	Handle nan condition in lua (#329 )	2021-09-27 17:35:17 +01:00
Richard Fairhurst	b45219ff91	Use real relation IDs rather than --newWayID (#318 )	2021-09-13 22:42:44 +01:00
Michael Reichert	ca0fd702c3	Add z_order sorting support (#283 ) * Add support for a z_order field and sort output objects by z_order PostGIS import tools like Osm2pgsql and Imposm can write a z_order field to the database table in order to allow map styles to render features ordered by road class and vertical layer. Tilemaker now gets a Lua callback to set the z_order for an OSM object (default 0) and will sort the vectortile features by z_order. z_order is also taken into account for combining of features. z_order values are limited to 1-byte unsigned integer. * Document new ZOrder callback function * Drop unnecessary variable * Implement z_order sorting for the OpenMapTiles example config This commit also adapts the minimum zoom levels to the lastest version.	2021-08-23 13:37:24 +01:00
Richard Fairhurst	aa1398ff51	Tidy logging	2021-07-02 14:45:50 +01:00
Wouter van Kleunen	3b11dff4bb	Perform dissolve on self-intersecting polygons (#249 )	2021-06-22 11:37:36 +01:00
Wouter van Kleunen	631864ab5a	Parallel loading of osm.pbf file (#243 ) * Parallel reading * Fix win build * Simplifier correctly init nodes * Use generator to open input file(s) * Bit more consistent naming of osm_store files * Make number of locks in attribute store function of number of threads * Small optimization, don't use virtual method calls * Restore nodeListPolygon correct * Don't drop self intersecting polygons	2021-06-06 13:02:41 +01:00
Wouter van Kleunen	0f2c6057b7	Non self-intersecting simplify (#239 )	2021-05-21 20:56:29 +01:00
Richard Fairhurst	ead23531c4	Add :Centroid method to get lat/lon from Lua (#230 )	2021-04-26 18:47:10 +01:00
Richard Fairhurst	a73723702b	Refactor shapefile spatial queries (#228 )	2021-04-26 14:48:19 +01:00
systemed	6b111cc217	Shapefile attribute enhancements	2021-04-07 20:42:19 +01:00
Richard Fairhurst	4399349e78	Optionally set minzoom to write attributes (#219 )	2021-04-05 20:06:20 +01:00
Wouter van Kleunen	ad015fe879	Tilemaker CI for windows, macos and linux with boost	2021-03-20 07:56:28 +00:00
Wouter van Kleunen	d9d4989d55	Allow generation of pbf index file	2021-03-12 14:07:13 +00:00
systemed	005f3d7913	Make nodeVec available to Lua way_function	2021-03-06 17:02:09 +00:00
Wouter van Kleunen	0fb2c00589	Allow tilemake to run in compact (32bit) nodeid mode	2021-03-04 08:46:01 +00:00
Wouter van Kleunen	43dc1a2259	Attribute store sets of attributes	2021-02-26 14:45:16 +00:00
Wouter van Kleunen	664cd22571	Allow initialization of store to prevent rehashing and small optimizations	2021-02-23 16:49:17 +00:00
Wouter van Kleunen	6e7ab9f77d	Store generated geometries in mmap	2021-02-20 09:09:19 +00:00
Wouter van Kleunen	a006a319d0	Small cleanup and configurable storage filename	2021-02-14 16:16:01 +00:00
systemed	04b93a92ca	Use shared key/value dict across OutputObjects	2020-06-28 17:06:39 +01:00
systemed	080f94d471	Add :MinZoom(z) for per-feature zoom control	2020-05-25 11:53:29 +01:00
systemed	eaf21aa673	Remove scale functions now we return metres anyway	2020-05-23 19:32:22 +01:00
systemed	6e713f542b	Consistently use 1TBS [whitespace only, no code changes]	2020-05-23 12:19:56 +01:00
systemed	f02c8cf1a1	Better diagnostics for invalid multipolygons	2020-01-29 11:03:31 +00:00
systemed	06ccc8d043	Rewrite shapefile tags from Lua	2019-03-06 12:11:49 +00:00
Tim Sheerman-Chase	5f4307101a	Rename OSMObject to OsmLuaProcessing, start on doxygen documentation	2018-06-11 12:42:00 +01:00

49 Commits