Architecture
Overview
The Firetiger dataplane is a distributed system for ingesting, storing, and optimizing telemetry data using Apache Iceberg tables. The system consists of three main components that work together to provide a complete data pipeline.
The catalog server manages Apache Iceberg table metadata using a REST catalog interface. It serves as the central metadata repository that tracks table schemas, partitions, and file locations. The ingest server accepts telemetry data via OTLP/HTTP and writes it to object storage in Apache Iceberg format. For distributed deployments, a dedicated commit server handles transaction commits to ensure data consistency across multiple ingest instances. Finally, the datafile optimizer runs periodic compaction jobs to merge small files into larger, more efficient ones.
The deployment follows a distributed architecture where each service runs independently. Clients send telemetry data to ingest servers, which coordinate with commit servers for transactional consistency. All services use the catalog server for metadata management and store data in object storage. The datafile optimizer periodically compacts files for efficiency.
When a client sends telemetry data to the ingest server via OTLP/HTTP, the server transforms it into Apache Iceberg format and coordinates with the commit server to ensure transactional consistency. The catalog server provides metadata management for the Iceberg tables, while the datafile optimizer runs periodic compaction jobs to maintain storage efficiency.
All services communicate over HTTP and store data in S3-compatible object storage. The system uses Apache Iceberg’s table format for ACID transactions, schema evolution, and efficient querying.
Dependency Graph
The following diagram shows the main components of the Firetiger dataplane and their dependent relationships. We explain each component in more detail below.
graph TB
Client[Client]
QueryServer[Query Server]
IngestServer[Ingest Server]
CommitServer[Commit Server]
ManifestOpt[Manifest Optimizer]
DatafileOpt[Datafile Optimizer]
StorageSweeper[Storage Sweeper]
PartitionSweeper[Partition Sweeper]
CatalogServer[Catalog Server]
Storage[(Object Storage<br/>S3/GCS)]
Client --> QueryServer
Client --> IngestServer
QueryServer --> CatalogServer
IngestServer --> CatalogServer
IngestServer --> Storage
IngestServer --> CommitServer
QueryServer --> Storage
CommitServer --> CatalogServer
CommitServer --> Storage
CatalogServer --> Storage
PartitionSweeper --> CatalogServer
StorageSweeper --> CatalogServer
StorageSweeper --> Storage
DatafileOpt --> CommitServer
DatafileOpt --> CatalogServer
DatafileOpt --> Storage
ManifestOpt --> CommitServer
ManifestOpt --> CatalogServer
ManifestOpt --> Storage
style Client fill:#e1f5ff
style QueryServer fill:#fff4e1
style IngestServer fill:#fff4e1
style CommitServer fill:#fff4e1
style CatalogServer fill:#e8f5e9
style Storage fill:#f3e5f5
style ManifestOpt fill:#ffe1e1
style DatafileOpt fill:#ffe1e1
style StorageSweeper fill:#ffe1e1
style PartitionSweeper fill:#ffe1e1
Legend:
- Blue: Client applications
- Yellow: Ingest and query services
- Green: Catalog, metadata management
- Purple: Object storage
- Red: Background processes
Dependencies
Before deploying the Firetiger dataplane, you’ll need access to S3-compatible
object storage. Create a bucket (such as my-bucket) with read/write access and
ensure the Firetiger services have appropriate credentials. The bucket URI
format should be s3://my-bucket/firetiger/. The services require standard S3 permissions:
GetObject and PutObject for reading and writing data files, DeleteObject
for removing old files during optimization, and ListBucket for object
discovery.
All services listen on port 4317 by default, though you can configure the
listening address using the --http {addr}:{port} flag. Services communicate
using HTTP/1.1 for inter-service communication, OTLP/HTTP for telemetry data
ingestion, and REST API for catalog operations.
The services have dependencies on each other, so start them in this order: catalog server first (it only needs object storage access), then the commit server (requires the catalog server), followed by the ingest server (requires both catalog and commit servers), and finally the datafile optimizer (requires catalog and commit servers).
To allow the Docker containers to communicate with each other by name, first create a Docker network:
docker network create firetiger
Also, pick an initial Firetiger release version to use. Find a list of releases here.
export FIRETIGER_VERSION=2025-10-24
Catalog
The catalog server is the central metadata repository for all Apache Iceberg tables in the system. It provides a REST API for managing table schemas, partitions, and file locations. All other services (ingest server, commit server, and datafile optimizer) depend on the catalog to coordinate their operations on shared table metadata.
graph TB
IngestServer[Ingest Server]
CommitServer[Commit Server]
ManifestOpt[Manifest Optimizer]
DatafileOpt[Datafile Optimizer]
StorageSweeper[Storage Sweeper]
PartitionSweeper[Partition Sweeper]
CatalogServer[Catalog Server]
Storage[(Object Storage<br/>S3/GCS)]
CommitServer --> CatalogServer
IngestServer --> CatalogServer
ManifestOpt --> CatalogServer
DatafileOpt --> CatalogServer
StorageSweeper --> CatalogServer
PartitionSweeper --> CatalogServer
CatalogServer --> Storage
style CatalogServer fill:#e8f5e9
style Storage fill:#f3e5f5
To start the catalog server, use:
docker run --rm \
-p 4317:4317 \
--name catalog-server \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
run catalog server \
--catalog s3://my-bucket/firetiger/ \
--namespace firetiger
This configures the catalog to store metadata in the specified S3 bucket and
sets firetiger as the default namespace for tables. The server exposes a REST
catalog API at /v1/ following the Apache Iceberg specification, allowing you
to list namespaces, list tables within a namespace, and create new tables.
--catalog
Note that the --catalog flag here refers to the backend catalog service, but
other Firetiger services use the --catalog flag to point to this service.
In the example above, an S3 bucket serves as the backend (s3://...), but this
flag can also point to a GCS bucket (gs://...), a local file system
(file://...), a temporary memory catalog (memory://...) an AWS Glue catalog
(glue://...), or even another REST catalog server (http://... or
https://...).
Other Firetiger services should be directed to this catalog server by setting
the --catalog flag to http://catalog-server:4317.
Query Server
The query server runs the query engine and provides FlightSQL and PromQL APIs for querying telemetry data.
graph TB
Client[Client]
QueryServer[Query Server]
CatalogServer[Catalog Server]
Storage[(Object Storage<br/>S3/GCS)]
Client --> QueryServer
QueryServer --> CatalogServer
QueryServer --> Storage
style QueryServer fill:#e8f5e9
style CatalogServer fill:#f3e5f5
style Storage fill:#f3e5f5
To start the query server, use:
docker run --rm \
-p 4319:4317 \
--name query-server \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
run query server \
--catalog http://catalog-server:4317 \
--namespace firetiger
Ingest
The ingest layer consists of two services that work together to receive telemetry data and store it reliably in Apache Iceberg tables.
Commit Server
The commit server handles distributed transaction commits to ensure data consistency across the dataplane. It runs the same ingest server binary but acts as a dedicated transaction coordinator.
graph TB
IngestServer[Ingest Server]
CommitServer[Commit Server]
ManifestOpt[Manifest Optimizer]
DatafileOpt[Datafile Optimizer]
CatalogServer[Catalog Server]
Storage[(Object Storage<br/>S3/GCS)]
CommitServer --> CatalogServer
CommitServer --> Storage
IngestServer --> CommitServer
ManifestOpt --> CommitServer
DatafileOpt --> CommitServer
style CommitServer fill:#e8f5e9
style CatalogServer fill:#f3e5f5
style Storage fill:#f3e5f5
To start the commit server, use:
docker run --rm \
-p 4318:4317 \
--name commit-server \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
run ingest server \
--bucket s3://my-bucket/firetiger/ \
--catalog http://catalog-server:4317 \
--namespace firetiger
The commit server receives commit requests from ingest servers and coordinates them by merging transactions.
Ingest Server
The ingest server receives telemetry data via OTLP/HTTP and writes it to object storage through the commit server.
graph TB
Client[Client]
IngestServer[Ingest Server]
CommitServer[Commit Server]
CatalogServer[Catalog Server]
Storage[(Object Storage<br/>S3/GCS)]
Client --> IngestServer
IngestServer --> CatalogServer
IngestServer --> CommitServer
IngestServer --> Storage
style IngestServer fill:#e8f5e9
style CommitServer fill:#f3e5f5
style CatalogServer fill:#f3e5f5
style Storage fill:#f3e5f5
To start the ingest server, use:
docker run --rm \
-p 4317:4317 \
--name ingest-server \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
run ingest server \
--bucket s3://my-bucket/firetiger/ \
--catalog http://catalog-server:4317 \
--commit-endpoint http://commit-server:4317 \
--namespace firetiger
The ingest server exposes OTLP/HTTP endpoints at /v1/logs, /v1/metrics, and
/v1/traces for receiving OpenTelemetry data. When data arrives, the server
transforms it into Apache Iceberg format, writes data files to object storage,
sends a commit request to the commit server, which then updates table metadata
via the catalog server. This ensures the entire transaction is committed
atomically.
The system automatically creates three tables (logs, metrics, and traces)
with appropriate schemas for structured log data, time-series metrics, and
distributed trace spans.
Metadata Compaction
The metadata layer is responsible for managing table metadata and schema evolution. The manifest optimizer runs periodically to compact manifests to keep the manifests few and well-organized. In a busy Firetiger deployment the manifest optimizer should be run every 1-5 minutes.
graph TB
ManifestOpt[Manifest Optimizer]
CommitServer[Commit Server]
CatalogServer[Catalog Server]
Storage[(Object Storage<br/>S3/GCS)]
ManifestOpt --> CatalogServer
ManifestOpt --> CommitServer
ManifestOpt --> Storage
style ManifestOpt fill:#e8f5e9
style CommitServer fill:#f3e5f5
style CatalogServer fill:#f3e5f5
style Storage fill:#f3e5f5
To start the manifest optimizer, use:
docker run --rm \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
run manifest optimizer \
--catalog http://catalog-server:4317 \
--commit-endpoint http://commit-server:4317 \
--namespace firetiger
Data Compaction
The datafile optimization layer improves storage efficiency by compacting small files into larger ones, reducing metadata overhead and improving query performance. This happens in two phases: planning and execution.
graph TB
DatafileOpt[Datafile Optimizer]
CatalogServer[Catalog Server]
CommitServer[Commit Server]
Storage[(Object Storage<br/>S3/GCS)]
DatafileOpt --> CatalogServer
DatafileOpt --> CommitServer
DatafileOpt --> Storage
style DatafileOpt fill:#e8f5e9
style CatalogServer fill:#f3e5f5
style CommitServer fill:#f3e5f5
style Storage fill:#f3e5f5
Planning Optimization
The first step generates optimization plans that identify which files should be merged together. Run the planner:
docker run --rm \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
plan datafile optimizer \
--catalog http://catalog-server:4317 \
--namespace firetiger \
--output s3://my-bucket/firetiger/optimizer/plans/
The planner analyzes each table’s data files, groups them by partition and time fields, identifies small files that can be merged efficiently, and creates merge plans that balance file size and merge complexity. Plans are stored as Apache Iceberg manifest files in the specified output path, with each plan containing the list of input files to be merged, target output file specifications, and metadata about the optimization operation.
Executing Optimization
The second step executes the optimization plans by reading input files, merging them, and updating table metadata:
docker run --rm \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
run datafile optimizer \
--catalog http://catalog-server:4317 \
--commit-endpoint http://commit-server:4317 \
--delete always \
--manifest s3://my-bucket/firetiger/optimizer/plans/logs/plan-001.avro
The optimizer loads the optimization plan from object storage, reads all input data files specified in the plan, merges and sorts the data according to the table’s sort order, writes optimized data files in efficient Parquet format, and commits the transaction via the commit server. The result is Parquet files with efficient columnar layout for analytical queries, optimized page sizes for memory usage, and column statistics for query pruning.
For production deployments, run optimization on a schedule with a planning phase every few minutes (firetiger plan datafile optimizer), followed by an execution phase (firetiger run datafile optimizer) for each plan emitted by the planner. This two-phase approach allows for flexible scheduling and resource allocation, separating the planning overhead from the compute-intensive optimization work.
Sweepers
The sweeper layer runs periodic tasks to clean up old data files and metadata entries. There are two sweepers: one for storage and one for the table catalog.
Storage Sweeper
The storage sweeper removes unreferenced data files and metadata files from object storage. It should be run every 1-24 hours and sweeps both metadata and data files in a single run to ensure consistency.
To run the storage sweeper, use:
graph TB
StorageSweeper[Storage Sweeper]
CatalogServer[Catalog Server]
Storage[(Object Storage<br/>S3/GCS)]
StorageSweeper --> CatalogServer
StorageSweeper --> Storage
style StorageSweeper fill:#e8f5e9
style CatalogServer fill:#f3e5f5
style Storage fill:#f3e5f5
docker run --rm \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
run storage sweeper \
--catalog http://catalog-server:4317 \
--sweep-metadata true \
--sweep-data true
The storage sweeper first sweeps unreferenced metadata files (manifest lists, manifests, and old table metadata JSON files), then sweeps unreferenced data files (Parquet files). By sweeping both types in a single run using the same table metadata view, it avoids race conditions where data files could be mistakenly removed while referenced by new manifests.
Partition Sweeper
The partition sweeper performs data retention by removing old partitions from
the catalog. Set the --min-retention flag to the desired retention period. It
should be run every 1-24 hours.
graph TB
PartitionSweeper[Storage Sweeper]
CatalogServer[Catalog Server]
PartitionSweeper --> CatalogServer
style PartitionSweeper fill:#e8f5e9
style CatalogServer fill:#f3e5f5
To run the partition sweeper, use:
docker run --rm \
--network firetiger \
public.ecr.aws/firetiger/firetiger:${FIRETIGER_VERSION} /
run partition sweeper \
--catalog http://catalog-server:4317 \
--commit-endpoint http://commit-server:4317 \
--min-retention 720h