Files
Colin Dellow 3c1740ad4d generalize node_keys; add way_keys
This PR generalizes the idea of `node_keys`, adds `way_keys`, and fixes #402.

I'm not too sure if this is generally useful - it's useful for one of my
use cases, and I see someone asking about it in https://github.com/systemed/tilemaker/issues/190
and, elsewhere, in https://github.com/onthegomap/planetiler/issues/99

If you feel it complicates the maintainer story too much, please reject.

The goal is to reduce memory usage for users doing thematic extracts by
not indexing nodes that are only used by uninteresting ways.

For example, North America has ~1.8B nodes, needing 9.7GB of RAM for its node
store. By contrast, if your interest is only to build a railway map, you
require only ~8M nodes, needing 70MB of RAM. Or, to build a map of
national/provincial parks, 12M nodes and ~120MB of RAM.

Currently, a user can achieve this by pre-filtering their PBF using
osmium-tool. If you know exactly what you want, this is a good
long-term solution. But if you're me, flailing about in the OSM data
model, it's convenient to be able to tweak something in the Lua script
and observe the results without having to re-filter the PBF and update
your tilemaker command to use the new PBF.

Sample use cases:

```lua
-- Building a map without building polygons, ~ excludes ways whose
-- only tags are matched by the filter.
way_keys = {"~building"}
```

```lua
-- Building a railway map
way_keys = {"railway"}
```

```lua
-- Building a map of major roads
way_keys = {"highway=motorway", "highway=trunk", "highway=primary", "highway=secondary"}`
```

Nodes used in ways which are used in relations (as identified by
`relation_scan_function`) will always be indexed, regardless of
`node_keys` and `way_keys` settings that might exclude them.

A concrete example, given a Lua script like:

```lua
function way_function()
  if Find("railway") ~= "" then
    Layer("lines", false)
  end
end
```

it takes 13GB of RAM and 100 seconds to process North America.

If you add:

```lua
way_keys = {"railway"}
```

It takes 2GB of RAM and 47 seconds.

Notes:

1. This is based on `lua-interop-3`, as it interacts with files that are
   changed by that. I can rebase against master after lua-interop-3 is
   merged.

2. The names `node_keys` and `way_keys` are perhaps out of date, as they
   can now express conditions on the values of tags in addition to their
   keys. Leaving them as-is is nice, as it's not a breaking change.
   But if breaking changes are OK, maybe these should be
   `node_filters` and `way_filters` ?

3. Maybe the value for `node_keys` in the OMT profile should be
   expressed in terms of a negation, e.g. `node_keys = {"~created_by"}`?
   This would avoid issues like https://github.com/systemed/tilemaker/issues/337

4. This also adds a SIGUSR1 handler during OSM processing, which prints
   the ID of the object currently being processed. This is helpful for
   tracking down slow geometries.
2023-12-29 18:02:11 -05:00

133 lines
3.9 KiB
C++

/*! \file */
#ifndef _READ_PBF_H
#define _READ_PBF_H
#include <string>
#include <unordered_set>
#include <vector>
#include <mutex>
#include <map>
#include "osm_store.h"
#include "significant_tags.h"
#include "pbf_reader.h"
#include "tag_map.h"
#include <protozero/data_view.hpp>
class OsmLuaProcessing;
extern const std::string OptionSortTypeThenID;
extern const std::string OptionLocationsOnWays;
struct BlockMetadata {
long int offset;
int32_t length;
bool hasNodes;
bool hasWays;
bool hasRelations;
// We use blocks as the unit of parallelism. Sometimes, a PBF only
// has a few blocks with relations. In this case, to keep all cores
// busy, we'll subdivide the block into chunks, and each thread
// will only process a chunk of the block.
size_t chunk;
size_t chunks;
};
struct IndexedBlockMetadata: BlockMetadata {
size_t index;
};
/**
*\brief Reads a PBF OSM file and returns objects as a stream of events to a class derived from OsmLuaProcessing
*
* The output class is typically OsmMemTiles, which is derived from OsmLuaProcessing
*/
class PbfProcessor
{
public:
enum class ReadPhase { Nodes = 1, Ways = 2, Relations = 4, RelationScan = 8, WayScan = 16 };
PbfProcessor(OSMStore &osmStore);
using pbfreader_generate_output = std::function< std::shared_ptr<OsmLuaProcessing> () >;
using pbfreader_generate_stream = std::function< std::shared_ptr<std::istream> () >;
int ReadPbfFile(
uint shards,
bool hasSortTypeThenID,
const SignificantTags& nodeKeys,
const SignificantTags& wayKeys,
unsigned int threadNum,
const pbfreader_generate_stream& generate_stream,
const pbfreader_generate_output& generate_output,
const NodeStore& nodeStore,
const WayStore& wayStore
);
// Read tags into a map from a way/node/relation
template<typename T>
void readTags(T &pbfObject, PbfReader::PrimitiveBlock const &pb, TagMap& tags) {
for (uint n=0; n < pbfObject.keys.size(); n++) {
auto keyIndex = pbfObject.keys[n];
auto valueIndex = pbfObject.vals[n];
tags.addTag(pb.stringTable[keyIndex], pb.stringTable[valueIndex]);
}
}
private:
bool ReadBlock(
std::istream &infile,
OsmLuaProcessing &output,
const BlockMetadata& blockMetadata,
const SignificantTags& nodeKeys,
const SignificantTags& wayKeys,
bool locationsOnWays,
ReadPhase phase,
uint shard,
uint effectiveShard
);
bool ReadNodes(OsmLuaProcessing& output, PbfReader::PrimitiveGroup& pg, const PbfReader::PrimitiveBlock& pb, const SignificantTags& nodeKeys);
bool ReadWays(
OsmLuaProcessing& output,
PbfReader::PrimitiveGroup& pg,
const PbfReader::PrimitiveBlock& pb,
const SignificantTags& wayKeys,
bool locationsOnWays,
uint shard,
uint effectiveShards
);
bool ScanWays(OsmLuaProcessing& output, PbfReader::PrimitiveGroup& pg, const PbfReader::PrimitiveBlock& pb, const SignificantTags& wayKeys);
bool ScanRelations(OsmLuaProcessing& output, PbfReader::PrimitiveGroup& pg, const PbfReader::PrimitiveBlock& pb, const SignificantTags& wayKeys);
bool ReadRelations(
OsmLuaProcessing& output,
PbfReader::PrimitiveGroup& pg,
const PbfReader::PrimitiveBlock& pb,
const BlockMetadata& blockMetadata,
const SignificantTags& wayKeys,
uint shard,
uint effectiveShards
);
inline bool relationIsType(const PbfReader::Relation& rel, int typeKey, int val) {
if (typeKey == -1 || val == -1) return false;
auto typeI = std::find(rel.keys.begin(), rel.keys.end(), typeKey);
if (typeI == rel.keys.end()) return false;
int typePos = typeI - rel.keys.begin();
return rel.vals[typePos] == val;
}
/// Find a string in the dictionary
static int findStringPosition(const PbfReader::PrimitiveBlock& pb, const std::string& str);
OSMStore &osmStore;
std::mutex ioMutex;
};
int ReadPbfBoundingBox(const std::string &inputFile, double &minLon, double &maxLon,
double &minLat, double &maxLat, bool &hasClippingBox);
bool PbfHasOptionalFeature(const std::string& inputFile, const std::string& feature);
#endif //_READ_PBF_H