* make AttributeStore::get const
I think AttributeStore lives forever, and AttributeSets are immutable
once added to it, so we can avoid the copy.
* use a string pool for AttributeSet keys
There are relatively few unique key values for attributes, e.g.
`kind`, `name`, `admin_level`.
The Shortbread schema has only ~50 or so. I imagine OMT is similar,
but haven't checked.
We generate lots of AttributePairs -- on the order of tens of millions
for GB, and std::string has an overhead of 32 bytes. By using a string
pool and storing only an offset into it, we can save a few hundred MB
of RAM.
* lock-free reads for keys, vector for pairs
This is the groundwork for implementing two future improvements:
- hot/cold pairs: there is a bimodal distribution of attribute
frequency. `landuse=wood`, `tunnel=0` are often duplicated.
`name=Sneed's Seed & Feed` is not.
In the future, we'll try to re-use the "hot" pairs to avoid
paying the cost of an AttributePair for them.
- "short vectors" - similar to the short string optimization,
we should be able to pack up to 6 pairs (3 hot, 3 cold) in
the overhead that a vector would otherwise use.
As it stands, this commit increases memory usage. But we'll claw
a lot of it back, and then some.
* Have a "hot" shard for popular pairs
If a pair looks like it might be re-usable, put it in a special
shard and be able to re-use it.
The special shard is limited to max 64K items, teeing up future
work to have a simple vector for AttributeSets with few pairs.
* treat 0 as a sentinel
* de-dupe all AttributePairs
The stats I was looking at were counting AttributePairs via
AttributeSets, which of course presents a misleading image
of how many duplicate AttributePairs there are, because by
that point, they've already been deduped.
De-duping doesn't add that much runtime overhead--and it could
probably be improved by someone who knows more C++ concurrency
tricks than me.
* store pointers in pairMaps, optimize debug spew
`Tile_Value` is a really memory-expensive object. Since we maintain
long-lived references to the canonical AttributePair, we can store
pointers to save a bit more memory.
Now that value->AttributePairs are guaranteed to be 1:1, we can do our
debug statistics on ints, and translate to pairs only when writing to
stdout.
* use boost::container::flat_map over std::map
Doesn't appreciably affect runtime, saves a bit of memory.
* don't memoize hash function
Now that there is a 1:1 mapping between values and AttributePairs,
it's trivial to compute the hash on demand.
* output_object: avoid Tile_Value temporaries
Also const-ify a few things
* defer creating Tile_Value
Tile_Value is a big union that takes up 96 bytes,
but for our purposes, we're happy with a union of
string, float and bool -- which can be expressed
in 28 bytes. We need a discriminator variable, but
due to alignment, that's free.
I also consider `boost::variant<bool, float, string>`,
but it seemed to take 40 bytes.
I worried that not having a pool of Tile_Values would
affect PBF writing time, but it seems unaffected.
* adjust headers, remove unneeded rng
* any integer 0 <= 25 is eligible for hot pool
This is useful for ranks, which run from 1..25
* Use a small vector optimization for pair indexes
`vector<uint32_t>` takes 24 bytes just to store its internal pointers.
If you actually want to store a `uint32_t` in it, it'll then allocate
some memory on the heap, taking a further 32-64 bytes depending on STL
and malloc implementations.
56-88 bytes! For a single `uint32_t`! Outrageous.
Instead, store references to pair indexes in an array of shorts. If
the pairs don't fit in the array, upgrade it to a vector.
Since we previously arranged for very popular pairs like `amenity=toilets`
to have small indexes, our array of shorts is capable of storing between
4 and 8 pairs before we need to upgade to a vector. Most AttributeSets
will not need to use a vector.
* simplify AttributeKeyStore
* use camelCase
* re-write to avoid static lifetime
AttributeKeyStore/AttributePairStore have the same lifetime as
AttributeStore, so just make them owned by it.
This results in slightly more convoluted code, but avoids having
them floating around as globals.
* reduce lock contention
* Improve TileCoordinates hash function
x ^ y will only use as many bits as max(x, y), but tiles
only use the full 32-bit space at z16, so we're leaving
a lot of the hash space on the table.
* d'oh, avoid looking up the key name needlessly
* change AttributeXyz(...) to be last-written wins
Previously, if you set the same key to different values, it was
not guaranteed that the last value written would win.
* remove misleading comment
* include deque
* include map
* return vector, not set
set seems a bit like overkill - we already know the items are unique,
and the consumer is likely just going to iterate over them
* avoid GNU-specific initializer
also avoid hardcoding 12
* Revert "Improve TileCoordinates hash function"
This reverts commit 7570737715.
Oops, I think this change isn't meaningful, and is a result of me
misreading the original code.
It might still be an improvement to do something like
`hash(x << 16) ^ hash(y)`, since the default TileCoordinate is only 16
bits, but that can be considered independent of this PR.
* remove dead code
* avoid copying AttributePairs
They're long-lived, so pass pointers
* OutputObjects - greatly reduce need for locks
I'm slowly remembering how to write concurrent code...
* AttributeKeyStore: use a TLS cache
This should reduce futex contention significantly. I'll
apply the same change for AttributePairStore's shard 0, then
measure.
* AttributePairStore: reduce lock contention
* ensure atomics are initialized
Per https://stackoverflow.com/questions/36320008/whats-the-default-value-for-a-stdatomic,
they aren't initialized by default. Somewhat surprised this didn't
result in crashes.
* don't store duplicate way geometries
A common pattern is:
```lua
way:Layer("waterway", false)
...
way:Layer("waterway_names", false)
```
Previously, we'd process the geometry twice, and store a second copy of
it in memory. Instead, re-use the previously stored geometry.
This saves another ~1GB of memory for the GB extract. It doesn't
seem to affect runtime - I think we only re-use linestrings, and
linestrings are relatively cheap to do `is_valid` on.
It seems like with the rest of the work on this branch, the
`OutputObjectXyz` classes are very thin -- inspecting `geomType` in
order to construct the right was a bit tedious, so I removed them.
* Add support for a z_order field and sort output objects by z_order
PostGIS import tools like Osm2pgsql and Imposm can write a z_order field
to the database table in order to allow map styles to render features
ordered by road class and vertical layer. Tilemaker now gets a Lua
callback to set the z_order for an OSM object (default 0) and will sort
the vectortile features by z_order. z_order is also taken into account
for combining of features. z_order values are limited to 1-byte unsigned
integer.
* Document new ZOrder callback function
* Drop unnecessary variable
* Implement z_order sorting for the OpenMapTiles example config
This commit also adapts the minimum zoom levels to the lastest version.
* Parallel reading
* Fix win build
* Simplifier correctly init nodes
* Use generator to open input file(s)
* Bit more consistent naming of osm_store files
* Make number of locks in attribute store function of number of threads
* Small optimization, don't use virtual method calls
* Restore nodeListPolygon correct
* Don't drop self intersecting polygons