maps/docs/DATA_MODEL.md
2026-03-30 09:22:16 +02:00

1329 lines
43 KiB
Markdown

# Data Model: Privacy-First Maps Application
**Version:** 1.0.0
**Date:** 2026-03-29
**Status:** Ready for development review
---
## Table of Contents
1. [PostGIS Database Schema](#1-postgis-database-schema)
2. [Mobile SQLite Schemas (Drift ORM)](#2-mobile-sqlite-schemas-drift-orm)
3. [Redis Caching Strategy](#3-redis-caching-strategy)
4. [OSM Data Import Pipeline](#4-osm-data-import-pipeline)
---
## 1. PostGIS Database Schema
The PostgreSQL 16+ database with PostGIS 3.4+ stores two primary application tables: `pois` for point-of-interest data served by the Rust API gateway, and `offline_regions` for offline download package metadata. Martin reads separate `openmaptiles`-schema tables for tile generation; those tables are managed by the openmaptiles toolchain and are not documented here.
### 1.1 `pois` Table
Stores POI data extracted from OpenStreetMap via `osm2pgsql` with a custom style file. Queried directly by the Rust gateway for the `GET /api/pois` and `GET /api/pois/{osm_type}/{osm_id}` endpoints.
```sql
CREATE TABLE pois (
osm_id BIGINT NOT NULL,
osm_type CHAR(1) NOT NULL CHECK (osm_type IN ('N', 'W', 'R')),
name TEXT NOT NULL,
category TEXT NOT NULL,
geometry geometry(Point, 4326) NOT NULL,
address JSONB,
tags JSONB,
opening_hours TEXT,
phone TEXT,
website TEXT,
wheelchair TEXT CHECK (wheelchair IN ('yes', 'no', 'limited', NULL)),
CONSTRAINT pois_pk PRIMARY KEY (osm_type, osm_id)
);
```
**Column details:**
| Column | Type | Nullable | Description |
|---|---|---|---|
| `osm_id` | `BIGINT` | No | OpenStreetMap element ID |
| `osm_type` | `CHAR(1)` | No | `'N'` (node), `'W'` (way), or `'R'` (relation) |
| `name` | `TEXT` | No | POI name from the OSM `name` tag |
| `category` | `TEXT` | No | Normalized category. One of: `restaurant`, `cafe`, `shop`, `supermarket`, `pharmacy`, `hospital`, `fuel`, `parking`, `atm`, `public_transport`, `hotel`, `tourist_attraction`, `park` |
| `geometry` | `geometry(Point, 4326)` | No | WGS84 point location. For ways and relations, this is the centroid |
| `address` | `JSONB` | Yes | Structured address: `{"street": "...", "housenumber": "...", "postcode": "...", "city": "..."}` |
| `tags` | `JSONB` | Yes | Additional OSM tags as key-value pairs (e.g., `{"cuisine": "italian", "outdoor_seating": "yes"}`) |
| `opening_hours` | `TEXT` | Yes | Raw OSM `opening_hours` tag value |
| `phone` | `TEXT` | Yes | Phone number from `phone` or `contact:phone` tag |
| `website` | `TEXT` | Yes | URL from `website` or `contact:website` tag |
| `wheelchair` | `TEXT` | Yes | Wheelchair accessibility: `'yes'`, `'no'`, `'limited'`, or `NULL` |
**Indexes:**
```sql
-- Spatial index for bounding-box queries (used by GET /api/pois?bbox=...)
CREATE INDEX idx_pois_geometry ON pois USING GIST (geometry);
-- Category index for filtered POI queries (used by GET /api/pois?category=...)
CREATE INDEX idx_pois_category ON pois (category);
-- Composite index for single-POI lookups (covered by the primary key)
-- The PK constraint already creates a unique index on (osm_type, osm_id)
```
**Example query (bounding-box with category filter):**
```sql
SELECT osm_id, osm_type, name, category,
ST_AsGeoJSON(geometry)::json AS geometry,
address, tags, opening_hours, phone, website, wheelchair
FROM pois
WHERE geometry && ST_MakeEnvelope(4.85, 52.35, 4.95, 52.38, 4326)
AND category IN ('cafe', 'restaurant')
ORDER BY name
LIMIT 100 OFFSET 0;
```
**Example query (single POI by type and ID):**
```sql
SELECT osm_id, osm_type, name, category,
ST_AsGeoJSON(geometry)::json AS geometry,
address, tags, opening_hours, phone, website, wheelchair
FROM pois
WHERE osm_type = 'N' AND osm_id = 987654321;
```
### 1.2 `offline_regions` Table
Stores metadata about pre-built offline region packages. Queried by the gateway for `GET /api/offline/regions`. Region packages themselves are stored as files on disk (see Section 4).
```sql
CREATE TABLE offline_regions (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
bbox geometry(Polygon, 4326) NOT NULL,
tiles_size_bytes BIGINT NOT NULL DEFAULT 0,
routing_size_bytes BIGINT NOT NULL DEFAULT 0,
pois_size_bytes BIGINT NOT NULL DEFAULT 0,
last_updated TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
**Column details:**
| Column | Type | Nullable | Description |
|---|---|---|---|
| `id` | `TEXT` | No | URL-safe slug identifier (e.g., `amsterdam`, `netherlands`) |
| `name` | `TEXT` | No | Human-readable region name |
| `description` | `TEXT` | Yes | Region description |
| `bbox` | `geometry(Polygon, 4326)` | No | Region bounding box as a polygon |
| `tiles_size_bytes` | `BIGINT` | No | Size of the MBTiles package in bytes |
| `routing_size_bytes` | `BIGINT` | No | Combined size of all routing profile packages in bytes |
| `pois_size_bytes` | `BIGINT` | No | Size of the POI SQLite package in bytes |
| `last_updated` | `TIMESTAMPTZ` | No | Timestamp of the last data rebuild |
**Indexes:**
```sql
-- Spatial index for finding regions that overlap a given bounding box
CREATE INDEX idx_offline_regions_bbox ON offline_regions USING GIST (bbox);
```
**Example insert:**
```sql
INSERT INTO offline_regions (id, name, description, bbox, tiles_size_bytes, routing_size_bytes, pois_size_bytes, last_updated)
VALUES (
'amsterdam',
'Amsterdam',
'Amsterdam metropolitan area',
ST_MakeEnvelope(4.7288, 52.2783, 5.0796, 52.4311, 4326),
57671680, -- ~55 MB tiles
31457280, -- ~30 MB routing (all profiles combined)
10485760, -- ~10 MB POIs
'2026-03-25T00:00:00Z'
);
```
### 1.3 Full Migration Script
```sql
-- migrations/001_create_pois.sql
BEGIN;
-- Enable PostGIS extension if not already present
CREATE EXTENSION IF NOT EXISTS postgis;
-- POI table
CREATE TABLE IF NOT EXISTS pois (
osm_id BIGINT NOT NULL,
osm_type CHAR(1) NOT NULL CHECK (osm_type IN ('N', 'W', 'R')),
name TEXT NOT NULL,
category TEXT NOT NULL,
geometry geometry(Point, 4326) NOT NULL,
address JSONB,
tags JSONB,
opening_hours TEXT,
phone TEXT,
website TEXT,
wheelchair TEXT CHECK (wheelchair IN ('yes', 'no', 'limited', NULL)),
CONSTRAINT pois_pk PRIMARY KEY (osm_type, osm_id)
);
CREATE INDEX IF NOT EXISTS idx_pois_geometry ON pois USING GIST (geometry);
CREATE INDEX IF NOT EXISTS idx_pois_category ON pois (category);
-- Offline regions table
CREATE TABLE IF NOT EXISTS offline_regions (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
bbox geometry(Polygon, 4326) NOT NULL,
tiles_size_bytes BIGINT NOT NULL DEFAULT 0,
routing_size_bytes BIGINT NOT NULL DEFAULT 0,
pois_size_bytes BIGINT NOT NULL DEFAULT 0,
last_updated TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_offline_regions_bbox ON offline_regions USING GIST (bbox);
COMMIT;
```
---
## 2. Mobile SQLite Schemas (Drift ORM)
The Flutter app uses [Drift](https://drift.simonbinder.eu/) (formerly Moor) as a type-safe SQLite ORM. All on-device data is stored in a single Drift-managed database (`app.db`), with a separate MBTiles database for tile caching. Database files are stored in the platform's encrypted storage directory.
### 2.1 `search_history` Table
Stores the last 50 search queries, displayed when the search bar is focused with an empty query. Never transmitted over the network.
**Drift table definition (Dart):**
```dart
// lib/database/tables/search_history_table.dart
class SearchHistory extends Table {
IntColumn get id => integer().autoIncrement()();
TextColumn get query => text()();
RealColumn get latitude => real().nullable()();
RealColumn get longitude => real().nullable()();
IntColumn get timestamp => integer()(); // Unix epoch seconds
}
```
**Generated SQLite schema:**
```sql
CREATE TABLE search_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query TEXT NOT NULL,
latitude REAL,
longitude REAL,
timestamp INTEGER NOT NULL
);
```
**Drift DAO:**
```dart
// lib/database/daos/search_history_dao.dart
@DriftAccessor(tables: [SearchHistory])
class SearchHistoryDao extends DatabaseAccessor<AppDatabase>
with _$SearchHistoryDaoMixin {
SearchHistoryDao(AppDatabase db) : super(db);
/// Returns the most recent 50 search history items, newest first.
Future<List<SearchHistoryData>> getRecentSearches() {
return (select(searchHistory)
..orderBy([(t) => OrderingTerm.desc(t.timestamp)])
..limit(50))
.get();
}
/// Watches search history as a reactive stream (for Riverpod StreamProvider).
Stream<List<SearchHistoryData>> watchRecentSearches() {
return (select(searchHistory)
..orderBy([(t) => OrderingTerm.desc(t.timestamp)])
..limit(50))
.watch();
}
/// Inserts a new search entry. If the history exceeds 50 items, the oldest
/// entry is deleted.
Future<void> addSearch(String query, {double? lat, double? lon}) async {
await into(searchHistory).insert(SearchHistoryCompanion.insert(
query: query,
latitude: Value(lat),
longitude: Value(lon),
timestamp: DateTime.now().millisecondsSinceEpoch ~/ 1000,
));
// Evict entries beyond the 50-item limit
await customStatement('''
DELETE FROM search_history
WHERE id NOT IN (
SELECT id FROM search_history ORDER BY timestamp DESC LIMIT 50
)
''');
}
/// Deletes a single history entry by ID.
Future<void> deleteSearch(int id) {
return (delete(searchHistory)..where((t) => t.id.equals(id))).go();
}
/// Deletes all search history.
Future<void> clearAll() {
return delete(searchHistory).go();
}
}
```
### 2.2 `favorites` Table
Stores user-saved places. Each favorite belongs to a named group (default: `'Favorites'`). Supports import/export as GeoJSON. Never transmitted over the network.
**Drift table definition (Dart):**
```dart
// lib/database/tables/favorites_table.dart
class Favorites extends Table {
IntColumn get id => integer().autoIncrement()();
IntColumn get osmId => integer().nullable()();
TextColumn get osmType => text().nullable()(); // 'N', 'W', or 'R'
TextColumn get name => text()();
TextColumn get note => text().nullable()();
TextColumn get groupName => text().withDefault(const Constant('Favorites'))();
RealColumn get latitude => real()();
RealColumn get longitude => real()();
TextColumn get addressJson => text().nullable()(); // JSON-encoded address
IntColumn get createdAt => integer()(); // Unix epoch seconds
IntColumn get updatedAt => integer()(); // Unix epoch seconds
}
```
**Generated SQLite schema:**
```sql
CREATE TABLE favorites (
id INTEGER PRIMARY KEY AUTOINCREMENT,
osm_id INTEGER,
osm_type TEXT,
name TEXT NOT NULL,
note TEXT,
group_name TEXT NOT NULL DEFAULT 'Favorites',
latitude REAL NOT NULL,
longitude REAL NOT NULL,
address_json TEXT,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
```
**Drift DAO:**
```dart
// lib/database/daos/favorites_dao.dart
@DriftAccessor(tables: [Favorites])
class FavoritesDao extends DatabaseAccessor<AppDatabase>
with _$FavoritesDaoMixin {
FavoritesDao(AppDatabase db) : super(db);
/// Watches all favorites, grouped by group_name, ordered by name.
Stream<List<FavoriteData>> watchAllFavorites() {
return (select(favorites)
..orderBy([
(t) => OrderingTerm.asc(t.groupName),
(t) => OrderingTerm.asc(t.name),
]))
.watch();
}
/// Watches favorites in a specific group.
Stream<List<FavoriteData>> watchFavoritesByGroup(String group) {
return (select(favorites)
..where((t) => t.groupName.equals(group))
..orderBy([(t) => OrderingTerm.asc(t.name)]))
.watch();
}
/// Returns all distinct group names.
Future<List<String>> getGroups() async {
final query = selectOnly(favorites, distinct: true)
..addColumns([favorites.groupName]);
final rows = await query.get();
return rows.map((row) => row.read(favorites.groupName)!).toList();
}
/// Inserts a new favorite.
Future<int> addFavorite(FavoritesCompanion entry) {
final now = DateTime.now().millisecondsSinceEpoch ~/ 1000;
return into(favorites).insert(entry.copyWith(
createdAt: Value(now),
updatedAt: Value(now),
));
}
/// Updates an existing favorite (name, note, group).
Future<bool> updateFavorite(FavoriteData entry) {
final now = DateTime.now().millisecondsSinceEpoch ~/ 1000;
return update(favorites).replace(entry.copyWith(updatedAt: now));
}
/// Deletes a favorite by ID.
Future<void> deleteFavorite(int id) {
return (delete(favorites)..where((t) => t.id.equals(id))).go();
}
}
```
### 2.3 `offline_regions` Table
Tracks downloaded offline regions on the device. Stores bounding box as individual coordinate columns (SQLite has no geometry type). Never transmitted over the network.
**Drift table definition (Dart):**
```dart
// lib/database/tables/offline_regions_table.dart
class OfflineRegions extends Table {
TextColumn get id => text()(); // Matches backend region ID
TextColumn get name => text()();
RealColumn get minLat => real()();
RealColumn get minLon => real()();
RealColumn get maxLat => real()();
RealColumn get maxLon => real()();
IntColumn get tilesSizeBytes => integer()();
IntColumn get routingSizeBytes => integer()();
IntColumn get poisSizeBytes => integer()();
IntColumn get downloadedAt => integer()(); // Unix epoch seconds
IntColumn get lastUpdated => integer()(); // Unix epoch seconds (from backend)
@override
Set<Column> get primaryKey => {id};
}
```
**Generated SQLite schema:**
```sql
CREATE TABLE offline_regions (
id TEXT PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
min_lat REAL NOT NULL,
min_lon REAL NOT NULL,
max_lat REAL NOT NULL,
max_lon REAL NOT NULL,
tiles_size_bytes INTEGER NOT NULL,
routing_size_bytes INTEGER NOT NULL,
pois_size_bytes INTEGER NOT NULL,
downloaded_at INTEGER NOT NULL,
last_updated INTEGER NOT NULL
);
```
**Drift DAO:**
```dart
// lib/database/daos/offline_regions_dao.dart
@DriftAccessor(tables: [OfflineRegions])
class OfflineRegionsDao extends DatabaseAccessor<AppDatabase>
with _$OfflineRegionsDaoMixin {
OfflineRegionsDao(AppDatabase db) : super(db);
/// Watches all downloaded regions.
Stream<List<OfflineRegionData>> watchAll() {
return (select(offlineRegions)
..orderBy([(t) => OrderingTerm.asc(t.name)]))
.watch();
}
/// Returns a single region by ID, or null if not downloaded.
Future<OfflineRegionData?> getById(String regionId) {
return (select(offlineRegions)..where((t) => t.id.equals(regionId)))
.getSingleOrNull();
}
/// Checks if a point falls within any downloaded region.
Future<OfflineRegionData?> findRegionContaining(double lat, double lon) {
return (select(offlineRegions)
..where((t) =>
t.minLat.isSmallerOrEqualValue(lat) &
t.maxLat.isBiggerOrEqualValue(lat) &
t.minLon.isSmallerOrEqualValue(lon) &
t.maxLon.isBiggerOrEqualValue(lon)))
.getSingleOrNull();
}
/// Inserts or replaces a downloaded region record.
Future<void> upsertRegion(OfflineRegionsCompanion entry) {
return into(offlineRegions).insertOnConflictUpdate(entry);
}
/// Deletes a region record. Caller is responsible for deleting the
/// associated files (MBTiles, OSRM data, POI database) from disk.
Future<void> deleteRegion(String regionId) {
return (delete(offlineRegions)..where((t) => t.id.equals(regionId))).go();
}
/// Returns total storage used by all downloaded regions in bytes.
Future<int> getTotalStorageBytes() async {
final query = selectOnly(offlineRegions)
..addColumns([
offlineRegions.tilesSizeBytes.sum(),
offlineRegions.routingSizeBytes.sum(),
offlineRegions.poisSizeBytes.sum(),
]);
final row = await query.getSingle();
final tiles = row.read(offlineRegions.tilesSizeBytes.sum()) ?? 0;
final routing = row.read(offlineRegions.routingSizeBytes.sum()) ?? 0;
final pois = row.read(offlineRegions.poisSizeBytes.sum()) ?? 0;
return tiles + routing + pois;
}
}
```
### 2.4 `settings` Table
Stores key-value user preferences (theme, units, cache size limit, backend URL). Read at app startup by the `settingsProvider`.
**Drift table definition (Dart):**
```dart
// lib/database/tables/settings_table.dart
class Settings extends Table {
TextColumn get key => text()();
TextColumn get value => text()();
@override
Set<Column> get primaryKey => {key};
}
```
**Generated SQLite schema:**
```sql
CREATE TABLE settings (
key TEXT PRIMARY KEY NOT NULL,
value TEXT NOT NULL
);
```
**Known settings keys:**
| Key | Default Value | Description |
|---|---|---|
| `backend_url` | (set at first launch) | Base URL of the self-hosted backend |
| `theme` | `auto` | Map theme: `day`, `night`, `terrain`, `auto` |
| `units` | `metric` | Distance units: `metric`, `imperial` |
| `tile_cache_size_mb` | `500` | Maximum tile cache size in MB |
| `last_viewport_lat` | `52.3676` | Last viewed map center latitude |
| `last_viewport_lon` | `4.9041` | Last viewed map center longitude |
| `last_viewport_zoom` | `12.0` | Last viewed map zoom level |
### 2.5 Tile Cache (MBTiles)
The tile cache is a separate SQLite database (`tile_cache.db`) following the [MBTiles 1.3 specification](https://github.com/mapbox/mbtiles-spec). It is managed independently of the Drift database. MapLibre GL Native reads from this database directly.
**Schema:**
```sql
-- MBTiles metadata table
CREATE TABLE metadata (
name TEXT NOT NULL,
value TEXT NOT NULL
);
-- MBTiles tiles table
CREATE TABLE tiles (
zoom_level INTEGER NOT NULL,
tile_column INTEGER NOT NULL,
tile_row INTEGER NOT NULL,
tile_data BLOB NOT NULL
);
CREATE UNIQUE INDEX idx_tiles ON tiles (zoom_level, tile_column, tile_row);
```
**LRU eviction strategy:**
The app extends the MBTiles schema with a timestamp column to support LRU eviction:
```sql
-- Extension: access tracking for LRU eviction
ALTER TABLE tiles ADD COLUMN last_accessed INTEGER NOT NULL DEFAULT 0;
CREATE INDEX idx_tiles_lru ON tiles (last_accessed ASC);
```
When the cache exceeds the configured size limit (default 500 MB, max 2 GB), the oldest-accessed tiles are evicted:
```sql
-- Evict oldest tiles until the database is under the size limit.
-- Run periodically or when inserting a new tile that exceeds the limit.
DELETE FROM tiles
WHERE rowid IN (
SELECT rowid FROM tiles
ORDER BY last_accessed ASC
LIMIT ? -- number of tiles to evict, calculated by the app
);
```
### 2.6 Offline POI Database
Each downloaded region includes a standalone SQLite POI database (`{region_id}_pois.db`) with the following schema. This is generated on the backend and downloaded by the app as-is.
```sql
CREATE TABLE pois (
osm_id INTEGER NOT NULL,
osm_type TEXT NOT NULL,
name TEXT NOT NULL,
category TEXT NOT NULL,
latitude REAL NOT NULL,
longitude REAL NOT NULL,
address_json TEXT,
tags_json TEXT,
opening_hours TEXT,
phone TEXT,
website TEXT,
wheelchair TEXT,
PRIMARY KEY (osm_type, osm_id)
);
-- FTS5 full-text search index on name and address for offline search
CREATE VIRTUAL TABLE pois_fts USING fts5(
name,
address_text,
content='pois',
content_rowid='rowid',
tokenize='unicode61'
);
-- Triggers to keep FTS index in sync (populated at build time on the backend)
CREATE TRIGGER pois_ai AFTER INSERT ON pois BEGIN
INSERT INTO pois_fts(rowid, name, address_text)
VALUES (new.rowid, new.name, COALESCE(json_extract(new.address_json, '$.street'), '') || ' ' || COALESCE(json_extract(new.address_json, '$.city'), ''));
END;
-- Spatial-like index for bounding-box queries (SQLite has no native spatial index)
CREATE INDEX idx_pois_coords ON pois (latitude, longitude);
CREATE INDEX idx_pois_category ON pois (category);
```
### 2.7 Drift Database Definition
All tables are combined in a single Drift database class:
```dart
// lib/database/app_database.dart
@DriftDatabase(
tables: [SearchHistory, Favorites, OfflineRegions, Settings],
daos: [SearchHistoryDao, FavoritesDao, OfflineRegionsDao],
)
class AppDatabase extends _$AppDatabase {
AppDatabase() : super(_openConnection());
@override
int get schemaVersion => 1;
@override
MigrationStrategy get migration => MigrationStrategy(
onCreate: (Migrator m) async {
await m.createAll();
},
onUpgrade: (Migrator m, int from, int to) async {
// Future schema migrations go here
},
);
static QueryExecutor _openConnection() {
return NativeDatabase.createInBackground(
File(join(appDocumentsDir, 'app.db')),
);
}
}
```
---
## 3. Redis Caching Strategy
Redis serves as an in-memory response cache for the Rust API gateway, reducing load on PostGIS and upstream services (Martin, Photon, OSRM). Redis is configured with `maxmemory 2gb` and `maxmemory-policy allkeys-lru`.
### 3.1 Tile Cache
Caches raw vector tile protobuf bytes returned by Martin. Tiles are immutable between OSM data imports (weekly), so a 24-hour TTL is appropriate.
| Property | Value |
|---|---|
| **Key pattern** | `tile:{layer}:{z}:{x}:{y}` |
| **Example key** | `tile:openmaptiles:14:8425:5404` |
| **Value** | Raw gzip-compressed protobuf bytes (binary-safe Redis string) |
| **TTL** | 86400 seconds (24 hours) |
| **Eviction** | LRU at 2 GB `maxmemory` |
**Rust pseudocode:**
```rust
async fn get_tile(layer: &str, z: u32, x: u32, y: u32) -> Result<Bytes> {
let cache_key = format!("tile:{layer}:{z}:{x}:{y}");
// Check cache
if let Some(cached) = redis.get_bytes(&cache_key).await? {
return Ok(cached);
}
// Fetch from Martin
let tile_data = martin_proxy.get_tile(layer, z, x, y).await?;
// Store in cache with 24h TTL
redis.set_ex(&cache_key, &tile_data, 86400).await?;
Ok(tile_data)
}
```
### 3.2 Search Cache
Caches JSON responses from Photon for identical search queries. The cache key is derived from a hash of all query parameters to ensure that different proximity biases produce distinct cache entries.
| Property | Value |
|---|---|
| **Key pattern** | `search:{sha256_hex(q + lat + lon + limit + lang)}` |
| **Example key** | `search:a3f2b8c1d4e5...` (64-char hex digest) |
| **Value** | JSON response body (GeoJSON FeatureCollection) |
| **TTL** | 3600 seconds (1 hour) |
| **Eviction** | LRU (shared maxmemory pool) |
**Hash construction:**
```rust
fn search_cache_key(q: &str, lat: Option<f64>, lon: Option<f64>, limit: u32, lang: &str) -> String {
let input = format!(
"q={}&lat={}&lon={}&limit={}&lang={}",
q,
lat.map(|v| format!("{:.6}", v)).unwrap_or_default(),
lon.map(|v| format!("{:.6}", v)).unwrap_or_default(),
limit,
lang,
);
let hash = sha256::digest(input);
format!("search:{hash}")
}
```
### 3.3 Route Cache
Caches JSON responses from OSRM. Routes are less likely to be cache hits (coordinates vary continuously), but repeated queries for the same origin/destination pair benefit from caching. Short TTL because road conditions can change.
| Property | Value |
|---|---|
| **Key pattern** | `route:{sha256_hex(profile + coordinates + alternatives + steps + geometries + overview)}` |
| **Example key** | `route:b7e9d1f3c2a4...` (64-char hex digest) |
| **Value** | JSON response body (OSRM route response) |
| **TTL** | 1800 seconds (30 minutes) |
| **Eviction** | LRU (shared maxmemory pool) |
**Hash construction:**
```rust
fn route_cache_key(
profile: &str,
coordinates: &str,
alternatives: u32,
steps: bool,
geometries: &str,
overview: &str,
) -> String {
let input = format!(
"profile={}&coords={}&alt={}&steps={}&geom={}&overview={}",
profile, coordinates, alternatives, steps, geometries, overview,
);
let hash = sha256::digest(input);
format!("route:{hash}")
}
```
### 3.4 Health Status Cache
Caches the result of upstream health probes to avoid overwhelming upstream services with health checks. The `/api/health` endpoint reads from cache; a background task refreshes the cache every 30 seconds.
| Property | Value |
|---|---|
| **Key pattern** | `health:{service}` |
| **Example keys** | `health:martin`, `health:photon`, `health:osrm_driving`, `health:osrm_walking`, `health:osrm_cycling`, `health:postgres`, `health:redis` |
| **Value** | JSON object: `{"status": "ok", "latency_ms": 12}` |
| **TTL** | 30 seconds |
| **Eviction** | TTL-based (entries are small, never hit LRU) |
### 3.5 Redis Configuration
```conf
# /etc/redis/redis.conf (relevant settings)
maxmemory 2gb
maxmemory-policy allkeys-lru
save "" # Disable RDB snapshots (cache is ephemeral)
appendonly no # Disable AOF persistence
protected-mode yes
bind 127.0.0.1 # Listen only on Docker network interface
```
### 3.6 Cache Invalidation
| Event | Invalidation Strategy |
|---|---|
| Weekly OSM data import | Flush all `tile:*` keys (`SCAN` + `UNLINK` in batches). Search and route caches expire naturally via TTL. |
| Manual rebuild | `FLUSHDB` to clear all cached data. The cache warms up organically from incoming requests. |
| Service restart | Redis is configured without persistence; cache starts empty. This is acceptable because the cache is a performance optimization, not a data store. |
---
## 4. OSM Data Import Pipeline
The backend imports OpenStreetMap data from [Geofabrik](https://download.geofabrik.de/) PBF extracts. The import pipeline produces four outputs: vector tile data in PostGIS, POI data in PostGIS, a Photon geocoding index, and OSRM routing graphs. A weekly cron job re-runs the pipeline to incorporate OSM edits.
### 4.1 Pipeline Overview
```
+------------------+
| Geofabrik PBF |
| (e.g. netherlands|
| -latest.osm.pbf)|
+--------+---------+
|
+---------------+---------------+
| | |
v v v
+-------+------+ +-----+------+ +------+------+
| osm2pgsql | | Nominatim | | osrm-extract|
| (tiles + | | import | | osrm-partition
| POIs) | | | | osrm-customize
+-------+------+ +-----+------+ +------+------+
| | |
v v v
+-------+------+ +-----+------+ +------+------+
| PostGIS | | Nominatim | | .osrm files |
| (openmaptiles| | database | | (per profile)|
| schema + | | | | |
| pois table) | +-----+------+ +------+------+
+-------+------+ | |
| v |
| +-----+------+ |
v | Photon | v
+-------+------+ | (reads | +------+------+
| Martin | | Nominatim) | | OSRM server |
| (serves | +------------+ | (serves |
| tiles from | | .osrm files)|
| PostGIS) | +-------------+
+--------------+
```
### 4.2 Step 1: Download OSM PBF Extract
Download the latest PBF extract from Geofabrik for the target region.
```bash
#!/bin/bash
# scripts/01_download.sh
REGION="europe/netherlands"
DATA_DIR="/data/osm"
GEOFABRIK_BASE="https://download.geofabrik.de"
mkdir -p "$DATA_DIR"
# Download PBF extract (or update if already present)
wget -N "${GEOFABRIK_BASE}/${REGION}-latest.osm.pbf" \
-O "${DATA_DIR}/region.osm.pbf"
# Download the corresponding state file for future diff updates
wget -N "${GEOFABRIK_BASE}/${REGION}-updates/state.txt" \
-O "${DATA_DIR}/state.txt"
```
### 4.3 Step 2: Tile Data (osm2pgsql + openmaptiles + Martin)
Import OSM data into PostGIS using the openmaptiles schema, which Martin then serves as vector tiles.
```bash
#!/bin/bash
# scripts/02_import_tiles.sh
PBF_FILE="/data/osm/region.osm.pbf"
PG_CONN="postgresql://maps:maps@postgres:5432/maps"
# Clone openmaptiles toolchain (once)
if [ ! -d "/opt/openmaptiles" ]; then
git clone https://github.com/openmaptiles/openmaptiles.git /opt/openmaptiles
fi
# Import OSM data into PostGIS using openmaptiles schema
# This creates the tables that Martin reads for tile generation
cd /opt/openmaptiles
# osm2pgsql import with openmaptiles mapping
osm2pgsql \
--create \
--slim \
--database "$PG_CONN" \
--style openmaptiles.style \
--tag-transform-script lua/tagtransform.lua \
--number-processes 4 \
--cache 4096 \
--flat-nodes /data/osm/nodes.cache \
"$PBF_FILE"
# Run openmaptiles SQL post-processing to create materialized views
# that Martin serves as tile layers
psql "$PG_CONN" -f build/openmaptiles.sql
echo "Tile data import complete. Martin will serve tiles from PostGIS."
```
Martin is configured to read from PostGIS and serve tiles:
```yaml
# martin/config.yaml
postgres:
connection_string: postgresql://maps:maps@postgres:5432/maps
default_srid: 4326
pool_size: 20
tables:
openmaptiles:
schema: public
table: planet_osm_polygon
srid: 3857
geometry_column: way
geometry_type: GEOMETRY
properties:
name: name
class: class
subclass: subclass
```
### 4.4 Step 3: POI Data (osm2pgsql with Custom Style)
Import POIs into the `pois` table using `osm2pgsql` with a custom Lua tag transform script that normalizes categories and extracts address fields.
```bash
#!/bin/bash
# scripts/03_import_pois.sh
PBF_FILE="/data/osm/region.osm.pbf"
PG_CONN="postgresql://maps:maps@postgres:5432/maps"
# Run the initial migration to create the pois table
psql "$PG_CONN" -f /app/migrations/001_create_pois.sql
# Import POIs using osm2pgsql with a custom Lua transform
osm2pgsql \
--create \
--output=flex \
--style /app/scripts/poi_flex.lua \
--database "$PG_CONN" \
--cache 2048 \
--number-processes 4 \
--flat-nodes /data/osm/nodes.cache \
"$PBF_FILE"
echo "POI import complete."
```
**Custom osm2pgsql Lua flex output script:**
```lua
-- scripts/poi_flex.lua
-- osm2pgsql flex output for POI extraction
local pois = osm2pgsql.define_table({
name = 'pois',
ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
columns = {
{ column = 'name', type = 'text', not_null = true },
{ column = 'category', type = 'text', not_null = true },
{ column = 'geometry', type = 'point', projection = 4326, not_null = true },
{ column = 'address', type = 'jsonb' },
{ column = 'tags', type = 'jsonb' },
{ column = 'opening_hours', type = 'text' },
{ column = 'phone', type = 'text' },
{ column = 'website', type = 'text' },
{ column = 'wheelchair', type = 'text' },
},
})
-- Maps OSM amenity/shop/tourism/leisure tags to normalized categories
local category_map = {
-- amenity
restaurant = 'restaurant',
fast_food = 'restaurant',
cafe = 'cafe',
pharmacy = 'pharmacy',
hospital = 'hospital',
clinic = 'hospital',
fuel = 'fuel',
parking = 'parking',
atm = 'atm',
bank = 'atm',
bus_station = 'public_transport',
hotel = 'hotel',
-- shop
supermarket = 'supermarket',
convenience = 'shop',
clothes = 'shop',
hairdresser = 'shop',
bakery = 'shop',
-- tourism
attraction = 'tourist_attraction',
museum = 'tourist_attraction',
viewpoint = 'tourist_attraction',
-- leisure
park = 'park',
garden = 'park',
playground = 'park',
}
local function get_category(tags)
for _, key in ipairs({'amenity', 'shop', 'tourism', 'leisure'}) do
local val = tags[key]
if val and category_map[val] then
return category_map[val]
end
end
return nil
end
local function build_address(tags)
local addr = {}
if tags['addr:street'] then addr.street = tags['addr:street'] end
if tags['addr:housenumber'] then addr.housenumber = tags['addr:housenumber'] end
if tags['addr:postcode'] then addr.postcode = tags['addr:postcode'] end
if tags['addr:city'] then addr.city = tags['addr:city'] end
if next(addr) then return addr end
return nil
end
local function build_extra_tags(tags)
local extra = {}
local dominated = {
'name', 'amenity', 'shop', 'tourism', 'leisure',
'addr:street', 'addr:housenumber', 'addr:postcode', 'addr:city',
'opening_hours', 'phone', 'contact:phone',
'website', 'contact:website', 'wheelchair',
}
local skip = {}
for _, k in ipairs(dominated) do skip[k] = true end
for k, v in pairs(tags) do
if not skip[k] and not k:match('^addr:') then
extra[k] = v
end
end
if next(extra) then return extra end
return nil
end
function osm2pgsql.process_node(object)
local tags = object.tags
if not tags.name then return end
local category = get_category(tags)
if not category then return end
pois:insert({
name = tags.name,
category = category,
geometry = object:as_point(),
address = build_address(tags),
tags = build_extra_tags(tags),
opening_hours = tags.opening_hours,
phone = tags.phone or tags['contact:phone'],
website = tags.website or tags['contact:website'],
wheelchair = tags.wheelchair,
})
end
function osm2pgsql.process_way(object)
local tags = object.tags
if not tags.name then return end
local category = get_category(tags)
if not category then return end
if not object.is_closed then return end
pois:insert({
name = tags.name,
category = category,
geometry = object:as_polygon():centroid(),
address = build_address(tags),
tags = build_extra_tags(tags),
opening_hours = tags.opening_hours,
phone = tags.phone or tags['contact:phone'],
website = tags.website or tags['contact:website'],
wheelchair = tags.wheelchair,
})
end
function osm2pgsql.process_relation(object)
local tags = object.tags
if not tags.name then return end
local category = get_category(tags)
if not category then return end
if tags.type ~= 'multipolygon' then return end
pois:insert({
name = tags.name,
category = category,
geometry = object:as_multipolygon():centroid(),
address = build_address(tags),
tags = build_extra_tags(tags),
opening_hours = tags.opening_hours,
phone = tags.phone or tags['contact:phone'],
website = tags.website or tags['contact:website'],
wheelchair = tags.wheelchair,
})
end
```
### 4.5 Step 4: Geocoding (Nominatim + Photon)
Build a Nominatim database from the PBF extract, then point Photon at it to serve geocoding queries.
```bash
#!/bin/bash
# scripts/04_import_geocoding.sh
PBF_FILE="/data/osm/region.osm.pbf"
NOMINATIM_DATA="/data/nominatim"
PHOTON_DATA="/data/photon"
# --- Nominatim Import ---
# Nominatim builds a PostgreSQL database with geocoding data.
# Photon reads from this database to build its Elasticsearch index.
nominatim import \
--osm-file "$PBF_FILE" \
--project-dir "$NOMINATIM_DATA" \
--threads 4
# --- Photon Import ---
# Photon reads the Nominatim database and builds an Elasticsearch index.
# This index is what Photon uses to serve search queries.
java -jar /opt/photon/photon.jar \
-nominatim-import \
-host localhost \
-port 5432 \
-database nominatim \
-user nominatim \
-password nominatim \
-data-dir "$PHOTON_DATA" \
-languages en,nl,de,fr
echo "Geocoding index built. Photon is ready to serve."
```
### 4.6 Step 5: Routing (OSRM)
Preprocess the PBF extract into OSRM routing graphs, one per travel profile.
```bash
#!/bin/bash
# scripts/05_import_routing.sh
PBF_FILE="/data/osm/region.osm.pbf"
OSRM_DATA="/data/osrm"
# Process each profile: driving, walking, cycling
for PROFILE in car foot bicycle; do
PROFILE_DIR="${OSRM_DATA}/${PROFILE}"
mkdir -p "$PROFILE_DIR"
cp "$PBF_FILE" "${PROFILE_DIR}/region.osm.pbf"
# Step 1: Extract — parse the PBF and produce an .osrm file
# Uses the appropriate profile from OSRM's bundled profiles
osrm-extract \
--profile /opt/osrm-profiles/${PROFILE}.lua \
--threads 4 \
"${PROFILE_DIR}/region.osm.pbf"
# Step 2: Partition — create a recursive multi-level partition
osrm-partition \
"${PROFILE_DIR}/region.osrm"
# Step 3: Customize — compute edge weights for the partition
osrm-customize \
"${PROFILE_DIR}/region.osrm"
echo "OSRM ${PROFILE} profile ready."
done
echo "All OSRM profiles processed."
```
The OSRM Docker containers are configured to load the preprocessed data:
```yaml
# docker-compose.yml (OSRM services excerpt)
services:
osrm-driving:
image: osrm/osrm-backend:latest
command: osrm-routed --algorithm mld /data/region.osrm
volumes:
- ./data/osrm/car:/data
ports:
- "5001:5000"
osrm-walking:
image: osrm/osrm-backend:latest
command: osrm-routed --algorithm mld /data/region.osrm
volumes:
- ./data/osrm/foot:/data
ports:
- "5002:5000"
osrm-cycling:
image: osrm/osrm-backend:latest
command: osrm-routed --algorithm mld /data/region.osrm
volumes:
- ./data/osrm/bicycle:/data
ports:
- "5003:5000"
```
### 4.7 Step 6: Build Offline Packages
After the import, generate downloadable offline packages for each configured region.
```bash
#!/bin/bash
# scripts/06_build_offline_packages.sh
PG_CONN="postgresql://maps:maps@postgres:5432/maps"
PACKAGES_DIR="/data/offline_packages"
REGION_ID="amsterdam"
BBOX="4.7288,52.2783,5.0796,52.4311" # minLon,minLat,maxLon,maxLat
mkdir -p "${PACKAGES_DIR}/${REGION_ID}"
# --- Tiles: extract MBTiles for the bounding box ---
# Use martin-cp (Martin's CLI tool) to export tiles from PostGIS to MBTiles
martin-cp \
--output-file "${PACKAGES_DIR}/${REGION_ID}/tiles.mbtiles" \
--mbtiles-type flat \
--bbox "$BBOX" \
--min-zoom 0 \
--max-zoom 16 \
--source openmaptiles \
--connect "$PG_CONN"
# --- POIs: export to SQLite with FTS5 index ---
# Custom Rust tool or Python script that queries PostGIS and writes SQLite
/app/tools/export_pois_sqlite \
--bbox "$BBOX" \
--pg-conn "$PG_CONN" \
--output "${PACKAGES_DIR}/${REGION_ID}/pois.db"
# --- Routing: tar the OSRM files per profile ---
for PROFILE in car foot bicycle; do
tar -cf "${PACKAGES_DIR}/${REGION_ID}/routing-${PROFILE}.tar" \
-C "/data/osrm/${PROFILE}" \
region.osrm region.osrm.cell_metrics region.osrm.cells \
region.osrm.datasource_names region.osrm.ebg region.osrm.ebg_nodes \
region.osrm.edges region.osrm.fileIndex region.osrm.geometry \
region.osrm.icd region.osrm.maneuver_overrides \
region.osrm.mldgr region.osrm.names region.osrm.nbg_nodes \
region.osrm.partition region.osrm.properties \
region.osrm.ramIndex region.osrm.timestamp \
region.osrm.tld region.osrm.tls region.osrm.turn_duration_penalties \
region.osrm.turn_penalties_index region.osrm.turn_weight_penalties
done
# --- Update offline_regions table with file sizes ---
TILES_SIZE=$(stat -f%z "${PACKAGES_DIR}/${REGION_ID}/tiles.mbtiles" 2>/dev/null || stat -c%s "${PACKAGES_DIR}/${REGION_ID}/tiles.mbtiles")
ROUTING_SIZE=0
for PROFILE in car foot bicycle; do
SIZE=$(stat -f%z "${PACKAGES_DIR}/${REGION_ID}/routing-${PROFILE}.tar" 2>/dev/null || stat -c%s "${PACKAGES_DIR}/${REGION_ID}/routing-${PROFILE}.tar")
ROUTING_SIZE=$((ROUTING_SIZE + SIZE))
done
POIS_SIZE=$(stat -f%z "${PACKAGES_DIR}/${REGION_ID}/pois.db" 2>/dev/null || stat -c%s "${PACKAGES_DIR}/${REGION_ID}/pois.db")
psql "$PG_CONN" <<SQL
INSERT INTO offline_regions (id, name, description, bbox, tiles_size_bytes, routing_size_bytes, pois_size_bytes, last_updated)
VALUES (
'${REGION_ID}',
'Amsterdam',
'Amsterdam metropolitan area',
ST_MakeEnvelope(4.7288, 52.2783, 5.0796, 52.4311, 4326),
${TILES_SIZE},
${ROUTING_SIZE},
${POIS_SIZE},
NOW()
)
ON CONFLICT (id) DO UPDATE SET
tiles_size_bytes = EXCLUDED.tiles_size_bytes,
routing_size_bytes = EXCLUDED.routing_size_bytes,
pois_size_bytes = EXCLUDED.pois_size_bytes,
last_updated = EXCLUDED.last_updated;
SQL
echo "Offline package for ${REGION_ID} built."
```
### 4.8 Weekly Update Cron Job
A cron job runs the full pipeline weekly to incorporate the latest OSM edits.
```bash
# /etc/cron.d/maps-data-update
# Runs every Sunday at 02:00 UTC
0 2 * * 0 root /app/scripts/update_all.sh >> /var/log/maps-update.log 2>&1
```
```bash
#!/bin/bash
# scripts/update_all.sh
# Full weekly data update pipeline
set -euo pipefail
LOGFILE="/var/log/maps-update.log"
exec > >(tee -a "$LOGFILE") 2>&1
echo "=== OSM data update started at $(date -u) ==="
# Step 1: Download latest PBF
/app/scripts/01_download.sh
# Step 2: Import tile data
/app/scripts/02_import_tiles.sh
# Step 3: Import POI data
/app/scripts/03_import_pois.sh
# Step 4: Update geocoding index
/app/scripts/04_import_geocoding.sh
# Step 5: Rebuild OSRM routing graphs
/app/scripts/05_import_routing.sh
# Step 6: Rebuild offline packages
/app/scripts/06_build_offline_packages.sh
# Step 7: Flush tile cache in Redis (tiles have changed)
redis-cli -h redis FLUSHDB
# Step 8: Restart services to pick up new data
docker compose restart martin osrm-driving osrm-walking osrm-cycling
echo "=== OSM data update completed at $(date -u) ==="
```
**Update frequency rationale:** Weekly updates balance data freshness against the computational cost of a full re-import. The Netherlands PBF is approximately 1.2 GB and takes roughly 30-45 minutes to process through all pipeline stages on an 8-core server with 16 GB RAM. More frequent updates (daily) are possible but increase server load. Less frequent updates (monthly) risk stale data, particularly for business opening hours and new roads.