Commit Graph

49 Commits

Author SHA1 Message Date
Colin Dellow 5e06647cf3 Remove output object ref (#595) 2023-12-02 21:04:40 +00:00
Colin Dellow 3b3b8f1d3a AttributeStore memory tweaks (#583)
* make AttributeStore::get const

I think AttributeStore lives forever, and AttributeSets are immutable
once added to it, so we can avoid the copy.

* use a string pool for AttributeSet keys

There are relatively few unique key values for attributes, e.g.
`kind`, `name`, `admin_level`.

The Shortbread schema has only ~50 or so. I imagine OMT is similar,
but haven't checked.

We generate lots of AttributePairs -- on the order of tens of millions
for GB, and std::string has an overhead of 32 bytes. By using a string
pool and storing only an offset into it, we can save a few hundred MB
of RAM.

* lock-free reads for keys, vector for pairs

This is the groundwork for implementing two future improvements:

- hot/cold pairs: there is a bimodal distribution of attribute
  frequency. `landuse=wood`, `tunnel=0` are often duplicated.
  `name=Sneed's Seed & Feed` is not.

  In the future, we'll try to re-use the "hot" pairs to avoid
  paying the cost of an AttributePair for them.

- "short vectors" - similar to the short string optimization,
  we should be able to pack up to 6 pairs (3 hot, 3 cold) in
  the overhead that a vector would otherwise use.

As it stands, this commit increases memory usage. But we'll claw
a lot of it back, and then some.

* Have a "hot" shard for popular pairs

If a pair looks like it might be re-usable, put it in a special
shard and be able to re-use it.

The special shard is limited to max 64K items, teeing up future
work to have a simple vector for AttributeSets with few pairs.

* treat 0 as a sentinel

* de-dupe all AttributePairs

The stats I was looking at were counting AttributePairs via
AttributeSets, which of course presents a misleading image
of how many duplicate AttributePairs there are, because by
that point, they've already been deduped.

De-duping doesn't add that much runtime overhead--and it could
probably be improved by someone who knows more C++ concurrency
tricks than me.

* store pointers in pairMaps, optimize debug spew

`Tile_Value` is a really memory-expensive object. Since we maintain
long-lived references to the canonical AttributePair, we can store
pointers to save a bit more memory.

Now that value->AttributePairs are guaranteed to be 1:1, we can do our
debug statistics on ints, and translate to pairs only when writing to
stdout.

* use boost::container::flat_map over std::map

Doesn't appreciably affect runtime, saves a bit of memory.

* don't memoize hash function

Now that there is a 1:1 mapping between values and AttributePairs,
it's trivial to compute the hash on demand.

* output_object: avoid Tile_Value temporaries

Also const-ify a few things

* defer creating Tile_Value

Tile_Value is a big union that takes up 96 bytes,
but for our purposes, we're happy with a union of
string, float and bool -- which can be expressed
in 28 bytes. We need a discriminator variable, but
due to alignment, that's free.

I also consider `boost::variant<bool, float, string>`,
but it seemed to take 40 bytes.

I worried that not having a pool of Tile_Values would
affect PBF writing time, but it seems unaffected.

* adjust headers, remove unneeded rng

* any integer 0 <= 25 is eligible for hot pool

This is useful for ranks, which run from 1..25

* Use a small vector optimization for pair indexes

`vector<uint32_t>` takes 24 bytes just to store its internal pointers.
If you actually want to store a `uint32_t` in it, it'll then allocate
some memory on the heap, taking a further 32-64 bytes depending on STL
and malloc implementations.

56-88 bytes! For a single `uint32_t`! Outrageous.

Instead, store references to pair indexes in an array of shorts. If
the pairs don't fit in the array, upgrade it to a vector.

Since we previously arranged for very popular pairs like `amenity=toilets`
to have small indexes, our array of shorts is capable of storing between
4 and 8 pairs before we need to upgade to a vector. Most AttributeSets
will not need to use a vector.

* simplify AttributeKeyStore

* use camelCase

* re-write to avoid static lifetime

AttributeKeyStore/AttributePairStore have the same lifetime as
AttributeStore, so just make them owned by it.

This results in slightly more convoluted code, but avoids having
them floating around as globals.

* reduce lock contention

* Improve TileCoordinates hash function

x ^ y will only use as many bits as max(x, y), but tiles
only use the full 32-bit space at z16, so we're leaving
a lot of the hash space on the table.

* d'oh, avoid looking up the key name needlessly

* change AttributeXyz(...) to be last-written wins

Previously, if you set the same key to different values, it was
not guaranteed that the last value written would win.

* remove misleading comment

* include deque

* include map

* return vector, not set

set seems a bit like overkill - we already know the items are unique,
and the consumer is likely just going to iterate over them

* avoid GNU-specific initializer

also avoid hardcoding 12

* Revert "Improve TileCoordinates hash function"

This reverts commit 7570737715.

Oops, I think this change isn't meaningful, and is a result of me
misreading the original code.

It might still be an improvement to do something like
`hash(x << 16) ^ hash(y)`, since the default TileCoordinate is only 16
bits, but that can be considered independent of this PR.

* remove dead code

* avoid copying AttributePairs

They're long-lived, so pass pointers

* OutputObjects - greatly reduce need for locks

I'm slowly remembering how to write concurrent code...

* AttributeKeyStore: use a TLS cache

This should reduce futex contention significantly. I'll
apply the same change for AttributePairStore's shard 0, then
measure.

* AttributePairStore: reduce lock contention

* ensure atomics are initialized

Per https://stackoverflow.com/questions/36320008/whats-the-default-value-for-a-stdatomic,
they aren't initialized by default. Somewhat surprised this didn't
result in crashes.

* don't store duplicate way geometries

A common pattern is:

```lua
way:Layer("waterway", false)
...
way:Layer("waterway_names", false)
```

Previously, we'd process the geometry twice, and store a second copy of
it in memory. Instead, re-use the previously stored geometry.

This saves another ~1GB of memory for the GB extract. It doesn't
seem to affect runtime - I think we only re-use linestrings, and
linestrings are relatively cheap to do `is_valid` on.

It seems like with the rest of the work on this branch, the
`OutputObjectXyz` classes are very thin -- inspecting `geomType` in
order to construct the right was a bit tedious, so I removed them.
2023-11-19 17:59:05 +00:00
systemed d9bb72b929 Merge branch 'master' into refactor_geometries 2023-11-16 21:16:44 +00:00
Colin Dellow 8b45a8a33e Avoid copying strings/tag maps (#577) 2023-11-12 19:03:02 +00:00
systemed 4db8c25417 Merge branch 'master' into refactor_geometries 2023-11-04 12:58:56 +00:00
Richard Fairhurst dbdb4da097 RestartRelations() to reset relation subscript (#548) 2023-10-09 11:30:03 +01:00
systemed 0ea137fedb Per-tile feature limit as per #547 2023-10-03 17:52:02 +01:00
systemed 2249202751 Keep output list as OutputObjects for longer 2023-09-19 22:17:21 +01:00
systemed 22a3e64c68 Store objects sequentially (breaks include_ids) 2023-09-19 18:43:16 +01:00
systemed adcdb6e892 Merge branch 'master' into refactor_geometries 2023-09-18 18:24:02 +01:00
Richard Fairhurst 439c6b1027 Report OSM ID on Lua processing error (#535) 2023-09-18 14:22:36 +01:00
Richard Fairhurst 528aa25dec Support type=boundary relations as native multipolygons (#508) 2023-07-23 08:57:39 +01:00
systemed 23d8e6da59 Store attribute sets by index 2023-06-18 14:45:04 +01:00
systemed 856951daae Reinstate output/attribute pair (memory leak) 2023-06-13 12:25:27 +01:00
systemed ed14fc427d Break pair into separate output/attribute lists 2023-06-12 10:09:32 +01:00
systemed 0944743852 Rename classes away from OsmStore 2023-06-10 13:56:15 +01:00
Richard Fairhurst 4a85c60e6d Compile-time option for float ZOrder (#486) 2023-03-31 15:10:30 +01:00
Fabrizio 57942b9812 Fixed application crashes (#441) 2022-10-29 14:18:40 +01:00
Richard Fairhurst ad1fcfbdf3 Support osmium locations-on-ways format (#386) 2022-02-26 16:58:44 +00:00
Richard Fairhurst 70f89d64ba Output slow geometry generation and allow user interrupt (#378) 2022-02-17 10:45:51 +00:00
Richard Fairhurst 965f6ae349 (Non-multipolygon) relation support (#360) 2022-01-08 10:12:08 +00:00
Richard Fairhurst a8ee8a96cc Revert "Use artificial IDs for OutputObject after 1st use (#357)" (#359)
This reverts commit 962ad86bfa.
2022-01-05 00:22:16 +00:00
Richard Fairhurst 962ad86bfa Use artificial IDs for OutputObject after 1st use (#357) 2022-01-03 13:05:37 +00:00
Wouter van Kleunen 3f8b4e2f60 Handle nan condition in lua (#329) 2021-09-27 17:35:17 +01:00
Richard Fairhurst b45219ff91 Use real relation IDs rather than --newWayID (#318) 2021-09-13 22:42:44 +01:00
Michael Reichert ca0fd702c3 Add z_order sorting support (#283)
* Add support for a z_order field and sort output objects by z_order

PostGIS import tools like Osm2pgsql and Imposm can write a z_order field
to the database table in order to allow map styles to render features
ordered by road class and vertical layer. Tilemaker now gets a Lua
callback to set the z_order for an OSM object (default 0) and will sort
the vectortile features by z_order. z_order is also taken into account
for combining of features. z_order values are limited to 1-byte unsigned
integer.

* Document new ZOrder callback function

* Drop unnecessary variable

* Implement z_order sorting for the OpenMapTiles example config

This commit also adapts the minimum zoom levels to the lastest version.
2021-08-23 13:37:24 +01:00
Richard Fairhurst aa1398ff51 Tidy logging 2021-07-02 14:45:50 +01:00
Wouter van Kleunen 3b11dff4bb Perform dissolve on self-intersecting polygons (#249) 2021-06-22 11:37:36 +01:00
Wouter van Kleunen 631864ab5a Parallel loading of osm.pbf file (#243)
* Parallel reading

* Fix win build

* Simplifier correctly init nodes

* Use generator to open input file(s)

* Bit more consistent naming of osm_store files

* Make number of locks in attribute store function of number of threads

* Small optimization, don't use virtual method calls

* Restore nodeListPolygon correct

* Don't drop self intersecting polygons
2021-06-06 13:02:41 +01:00
Wouter van Kleunen 0f2c6057b7 Non self-intersecting simplify (#239) 2021-05-21 20:56:29 +01:00
Richard Fairhurst ead23531c4 Add :Centroid method to get lat/lon from Lua (#230) 2021-04-26 18:47:10 +01:00
Richard Fairhurst a73723702b Refactor shapefile spatial queries (#228) 2021-04-26 14:48:19 +01:00
systemed 6b111cc217 Shapefile attribute enhancements 2021-04-07 20:42:19 +01:00
Richard Fairhurst 4399349e78 Optionally set minzoom to write attributes (#219) 2021-04-05 20:06:20 +01:00
Wouter van Kleunen ad015fe879 Tilemaker CI for windows, macos and linux with boost 2021-03-20 07:56:28 +00:00
Wouter van Kleunen d9d4989d55 Allow generation of pbf index file 2021-03-12 14:07:13 +00:00
systemed 005f3d7913 Make nodeVec available to Lua way_function 2021-03-06 17:02:09 +00:00
Wouter van Kleunen 0fb2c00589 Allow tilemake to run in compact (32bit) nodeid mode 2021-03-04 08:46:01 +00:00
Wouter van Kleunen 43dc1a2259 Attribute store sets of attributes 2021-02-26 14:45:16 +00:00
Wouter van Kleunen 664cd22571 Allow initialization of store to prevent rehashing and small optimizations 2021-02-23 16:49:17 +00:00
Wouter van Kleunen 6e7ab9f77d Store generated geometries in mmap 2021-02-20 09:09:19 +00:00
Wouter van Kleunen a006a319d0 Small cleanup and configurable storage filename 2021-02-14 16:16:01 +00:00
systemed 04b93a92ca Use shared key/value dict across OutputObjects 2020-06-28 17:06:39 +01:00
systemed 080f94d471 Add :MinZoom(z) for per-feature zoom control 2020-05-25 11:53:29 +01:00
systemed eaf21aa673 Remove scale functions now we return metres anyway 2020-05-23 19:32:22 +01:00
systemed 6e713f542b Consistently use 1TBS
[whitespace only, no code changes]
2020-05-23 12:19:56 +01:00
systemed f02c8cf1a1 Better diagnostics for invalid multipolygons 2020-01-29 11:03:31 +00:00
systemed 06ccc8d043 Rewrite shapefile tags from Lua 2019-03-06 12:11:49 +00:00
Tim Sheerman-Chase 5f4307101a Rename OSMObject to OsmLuaProcessing, start on doxygen documentation 2018-06-11 12:42:00 +01:00