CloudFront to Firetiger Integration via Kinesis Data Firehose
This guide explains how to configure AWS CloudFront real-time logs to stream to Firetiger using Kinesis Data Streams and Kinesis Data Firehose.
Prerequisites
- AWS Account with CloudFront distributions
- Firetiger deployment with ingest endpoint accessible from AWS
- IAM permissions to create Kinesis resources and modify CloudFront configurations
- Firetiger API credentials for authentication
Architecture Overview
CloudFront → Kinesis Data Stream → Kinesis Data Firehose → Firetiger Ingest API
Step 1: Create Kinesis Data Stream
First, create a Kinesis Data Stream to receive CloudFront real-time logs:
aws kinesis create-stream \
--stream-name cloudfront-logs-stream \
--shard-count 1 \
--region us-east-1
Wait for the stream to become active:
aws kinesis describe-stream \
--stream-name cloudfront-logs-stream \
--region us-east-1
Step 2: Configure CloudFront Real-time Logs
Via AWS Console
- Navigate to CloudFront in AWS Console
- Select your distribution
- Go to the “Telemetry” tab
- Under “Real-time logs”, click “Create configuration”
- Configure the following:
- Name:
firetiger-realtime-logs - Sampling rate: 100 (adjust based on volume)
- Fields: Select all fields or customize based on your needs
- Endpoint: Select “Kinesis Data Streams”
- Stream: Select
cloudfront-logs-stream - IAM Role: Create new role with Kinesis write permissions
- Name:
Via AWS CLI
aws cloudfront create-realtime-log-config \
--name firetiger-realtime-logs \
--stream-type Kinesis \
--kinesis-stream-config StreamArn=arn:aws:kinesis:us-east-1:ACCOUNT_ID:stream/cloudfront-logs-stream,RoleArn=arn:aws:iam::ACCOUNT_ID:role/CloudFrontRealtimeLogRole \
--fields timestamp,c-ip,s-ip,time-to-first-byte,sc-status,sc-bytes,cs-method,cs-protocol,cs-host,cs-uri-stem,cs-bytes,x-edge-location,x-edge-request-id,x-host-header,time-taken,cs-protocol-version,c-ip-version,cs-user-agent,cs-referer,cs-cookie,cs-uri-query,x-edge-response-result-type,x-forwarded-for,ssl-protocol,ssl-cipher,x-edge-result-type,c-country \
--sampling-rate 100
Step 3: Create IAM Role for Kinesis Data Firehose
Create an IAM role that allows Kinesis Data Firehose to read from the stream:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "firehose.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Attach the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards"
],
"Resource": "arn:aws:kinesis:us-east-1:ACCOUNT_ID:stream/cloudfront-logs-stream"
},
{
"Effect": "Allow",
"Action": [
"logs:PutLogEvents",
"logs:CreateLogGroup",
"logs:CreateLogStream"
],
"Resource": "*"
}
]
}
Step 4: Create Kinesis Data Firehose Delivery Stream
To configure the Delivery Stream, you’ll need your organization’s Firetiger Ingest username and password credentials.
These can be found in the Firetiger UI on the /settings page: https://ui.{deployment}.firetigerapi.com/settings (replace {deployment} with your deployment name).
Substitute those values for $FIRETIGER_USERNAME and $FIRETIGER_PASSWORD in the following commands:
Via AWS CLI
export ACCESS_KEY=$(echo -n "$FIRETIGER_USERNAME:$FIRETIGER_PASSSWORD" | base64)
aws firehose create-delivery-stream \
--delivery-stream-name cloudfront-to-firetiger \
--delivery-stream-type KinesisStreamAsSource \
--kinesis-stream-source-configuration '{
"KinesisStreamARN": "arn:aws:kinesis:us-east-1:ACCOUNT_ID:stream/cloudfront-logs-stream",
"RoleARN": "arn:aws:iam::ACCOUNT_ID:role/firehose-role"
}' \
--http-endpoint-destination-configuration '{
"EndpointConfiguration": {
"Url": "https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis?fields=timestamp,c-ip,s-ip,time-to-first-byte,sc-status,sc-bytes,cs-method,cs-protocol,cs-host,cs-uri-stem,cs-bytes,x-edge-location,x-edge-request-id,x-host-header,time-taken,cs-protocol-version,c-ip-version,cs-user-agent,cs-referer,cs-cookie,cs-uri-query,x-edge-response-result-type,x-forwarded-for,ssl-protocol,ssl-cipher,x-edge-result-type,c-country",
"Name": "Firetiger",
"AccessKey": "$ACCESS_KEY"
},
"BufferingHints": {
"IntervalInSeconds": 60,
"SizeInMBs": 1
},
"CompressionFormat": "GZIP",
"RequestConfiguration": {
"ContentEncoding": "GZIP"
},
"RetryConfiguration": {
"DurationInSeconds": 3600
},
"S3Configuration": {
"BucketARN": "arn:aws:s3:::your-backup-bucket",
"Prefix": "failed-records/",
"ErrorOutputPrefix": "error-records/",
"CompressionFormat": "GZIP",
"RoleARN": "arn:aws:iam::ACCOUNT_ID:role/firehose-role"
}
}'
Configuration Parameters
- Url: Your Firetiger ingest endpoint -
https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis - AccessKey: Your basic-auth Firetiger ingest credentials, as shown above
- BufferingHints:
IntervalInSeconds: How often to send data (60-900 seconds)SizeInMBs: Buffer size before sending (1-128 MB)
- CompressionFormat: Use GZIP to reduce bandwidth
- S3Configuration: Backup location for failed records
Step 5: Attach CloudFront Distribution to Real-time Log Configuration
aws cloudfront update-distribution \
--id YOUR_DISTRIBUTION_ID \
--realtime-log-config-arn arn:aws:cloudfront::ACCOUNT_ID:realtime-log-config/firetiger-realtime-logs
Step 6: Verify Data Flow
After configuration, verify that logs are flowing:
- Generate some traffic to your CloudFront distribution
- Monitor Kinesis Data Stream metrics in CloudWatch
- Check Kinesis Data Firehose metrics for successful deliveries
- Query your data in Firetiger to confirm ingestion
Troubleshooting
Common Issues
- Authentication Failures
- Verify your API key is correct
- Ensure the endpoint URL includes the correct protocol (https)
- No Data Flowing
- Check CloudWatch Logs for Kinesis Data Firehose error messages
- Verify IAM roles have correct permissions
- Ensure CloudFront distribution is attached to the real-time log configuration
- High Error Rate
- Check the S3 backup bucket for failed records
- Review error messages in CloudWatch Logs
- Verify endpoint is accessible from AWS
Advanced Configuration
Custom Field Selection
CloudFront allows you to customize which fields to include in real-time logs. This is useful for reducing data volume and costs. You can configure this when creating or updating your CloudFront real-time log configuration:
aws cloudfront update-realtime-log-config \
--name firetiger-realtime-logs \
--fields timestamp,c-ip,sc-status,cs-method,cs-uri-stem,x-edge-location
Important: When you customize CloudFront fields, you must also update your Kinesis Data Firehose endpoint URL to specify which fields you’re sending. This ensures Firetiger parses the log records correctly.
Configuring Firetiger for Custom Fields
When configuring custom CloudFront fields, add a ?fields= query parameter to your Firehose endpoint URL that lists the fields in the same order as your CloudFront configuration:
export ACCESS_KEY=$(echo -n "$FIRETIGER_USERNAME:$FIRETIGER_PASSWORD" | base64)
aws firehose create-delivery-stream \
--delivery-stream-name cloudfront-to-firetiger \
--delivery-stream-type KinesisStreamAsSource \
--kinesis-stream-source-configuration '{
"KinesisStreamARN": "arn:aws:kinesis:us-east-1:ACCOUNT_ID:stream/cloudfront-logs-stream",
"RoleARN": "arn:aws:iam::ACCOUNT_ID:role/firehose-role"
}' \
--http-endpoint-destination-configuration '{
"EndpointConfiguration": {
"Url": "https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis?fields=timestamp,c-ip,sc-status,cs-method,cs-uri-stem,x-edge-location",
"Name": "Firetiger",
"AccessKey": "$ACCESS_KEY"
},
...
}'
Field Mapping Examples
Default (All Fields):
If you don’t specify a ?fields= parameter, Firetiger expects all 45 standard CloudFront fields in canonical order. This is the recommended configuration for most use cases.
https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis
Minimal Fields:
For cost-sensitive deployments, you can send only essential fields:
https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis?fields=timestamp,c-ip,sc-status,cs-method,cs-uri-stem
Custom Fields:
Select specific fields based on your analytics needs:
https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis?fields=timestamp,c-ip,cs-method,cs-host,cs-uri-stem,sc-status,sc-bytes,time-taken,cs-user-agent,x-edge-location,x-edge-response-result-type
Important Notes:
- Field names must match CloudFront field names exactly (e.g.,
timestamp,c-ip,sc-status) - Field order must match your CloudFront real-time log configuration exactly - CloudFront always sends fields in canonical order
- Changing fields: If you add or remove fields in your CloudFront configuration, you must update your Kinesis Data Firehose URL to match the new field list. AWS automatically reorders fields in canonical order when you modify the configuration.
- Field names are comma-separated with no spaces
- The query parameter can handle all 45 standard fields
- If you omit fields from your CloudFront configuration, those attributes will not be populated in Firetiger
Example of field ordering:
- Initial CloudFront config:
time-to-first-byte, sc-status, c-country(3 fields) - Firehose URL:
?fields=time-to-first-byte,sc-status,c-country - Later you add
sc-bytesandtime-takento CloudFront - CloudFront automatically reorders to:
time-to-first-byte, sc-status, sc-bytes, time-taken, c-country(canonical order) - You must update Firehose URL to:
?fields=time-to-first-byte,sc-status,sc-bytes,time-taken,c-country - The
c-countryfield moves from position 3 to position 5 in the log records
Multi-Region Setup
For global distributions, consider:
- Creating Kinesis streams in multiple regions
- Using cross-region replication
- Configuring regional Firehose delivery streams
CloudFront Log Format Details
Real-time Log Fields
CloudFront real-time logs are delivered as tab-delimited records with 40-69 fields (depending on configuration). Each record in the Kinesis Data Firehose payload is base64-encoded.
Note: AWS has expanded the CloudFront real-time log format over time. Firetiger supports:
- Fields 1-45: Fully parsed with structured field names
- Fields 46-69+: Gracefully ignored (CMCD media streaming fields and future extensions)
The fields are:
Core Fields (1-41)
- timestamp - Unix timestamp with milliseconds (e.g., 1733270958.123)
- c-ip - Client IP address
- s-ip - CloudFront server IP address
- time-to-first-byte - Time to first byte in seconds
- sc-status - HTTP status code
- sc-bytes - Response size in bytes
- cs-method - HTTP method (GET, POST, etc.)
- cs-protocol - Protocol (http/https)
- cs-host - Host header value
- cs-uri-stem - URI path
- cs-bytes - Request size in bytes
- x-edge-location - CloudFront edge location code
- x-edge-request-id - Unique request identifier
- x-host-header - Host header sent to origin
- time-taken - Total time taken in seconds
- cs-protocol-version - HTTP protocol version
- c-ip-version - IP version (IPv4/IPv6)
- cs-user-agent - User agent string
- cs-referer - Referer header
- cs-cookie - Cookie header
- cs-uri-query - Query string
- x-edge-response-result-type - Cache result (Hit, Miss, Error)
- x-forwarded-for - X-Forwarded-For header
- ssl-protocol - SSL/TLS protocol version
- ssl-cipher - SSL/TLS cipher suite
- x-edge-result-type - How request was classified
- fle-encrypted-fields - Field-level encryption
- fle-status - Field-level encryption status
- sc-content-type - Response content type
- sc-content-len - Response content length
- sc-range-start - Range request start
- sc-range-end - Range request end
- c-port - Client port
- x-edge-detailed-result-type - Detailed result type
- c-country - Client country code
- cs-accept-encoding - Accept-Encoding header
- cs-accept - Accept header
- cache-behavior-path-pattern - Cache behavior pattern
- cs-headers - Custom headers
- cs-header-names - Custom header names
- cs-headers-count - Count of custom headers
Extended Fields (42-45, added October 2022)
- primary-distribution-id - Primary distribution identifier
- primary-distribution-dns-name - Primary distribution DNS name
- origin-fbl - Origin first-byte latency (time from edge to origin’s first byte, in seconds)
- origin-lbl - Origin last-byte latency (time from edge to origin’s last byte, in seconds)
CMCD and Extended Fields (46-69+, added April 2024)
Fields 46 and beyond are gracefully ignored but accepted in log records. The known CMCD field names are:
- asn - Autonomous system number
- c-asn - Client autonomous system number
- cmcd-buffer-length - CMCD buffer length (milliseconds)
- cmcd-buffer-starvation - CMCD buffer starvation indicator
- cmcd-content-id - CMCD content identifier
- cmcd-deadline - CMCD playback deadline
- cmcd-encoded-bitrate - CMCD encoded bitrate (kbps)
- cmcd-measured-throughput - CMCD measured throughput (kbps)
- cmcd-next-object-request - CMCD next object request
- cmcd-next-range-request - CMCD next range request
- cmcd-object-duration - CMCD object duration (milliseconds)
- cmcd-object-type - CMCD object type (m=manifest, a=audio, v=video, etc.)
- cmcd-playback-rate - CMCD playback rate
- cmcd-requested-maximum-throughput - CMCD requested max throughput
- cmcd-startup - CMCD startup indicator
- cmcd-stream-type - CMCD stream type (v=VOD, l=live)
- cmcd-streaming-format - CMCD streaming format (d=DASH, h=HLS, etc.)
- cmcd-top-bitrate - CMCD top bitrate (kbps)
- cmcd-version - CMCD version
- r-host - Request host
- sc-range-count - Range request count
- sc-response-body-time - Response body time
- sr-reason - Server reason code
- x-sc-response-body-time - Extended response body time
These fields are only present if configured in your CloudFront real-time log configuration. CMCD fields are primarily used for media streaming analytics and are sent by compatible media players. Firetiger accepts but does not parse these fields.
For complete documentation, see the AWS CloudFront Real-time Logs documentation.
Field Value Conventions
- Fields with no value are represented as
-(hyphen) - Numeric fields use standard decimal notation
- Timestamps use Unix epoch format with decimal seconds (e.g., 1733270958.123)
Kinesis Data Firehose Request Format
The HTTP request from Kinesis Data Firehose follows this structure:
{
"requestId": "ed4acda5-034f-9f42-bba1-f29aea6d7d8f",
"timestamp": 1733270958000,
"records": [
{
"data": "MTczMzI3MDk1OC4xMjMJMTkyLjE2OC4xLjEJ..."
}
]
}
requestId: Unique identifier for the Firehose request (matches X-Amz-Firehose-Request-Id header)timestamp: Unix timestamp in milliseconds when the request was createdrecords: Array of base64-encoded CloudFront log records
Expected Response Format
Firetiger must respond with the following format:
Success (200 OK):
{
"requestId": "ed4acda5-034f-9f42-bba1-f29aea6d7d8f",
"timestamp": 1733270958123
}
Error (4xx/5xx):
{
"requestId": "ed4acda5-034f-9f42-bba1-f29aea6d7d8f",
"timestamp": 1733270958123,
"errorMessage": "Error description"
}