CloudFront to Firetiger Integration via Kinesis Data Firehose

This guide explains how to configure AWS CloudFront real-time logs to stream to Firetiger using Kinesis Data Streams and Kinesis Data Firehose.

Prerequisites

  • AWS Account with CloudFront distributions
  • Firetiger deployment with ingest endpoint accessible from AWS
  • IAM permissions to create Kinesis resources and modify CloudFront configurations
  • Firetiger API credentials for authentication

Architecture Overview

CloudFront → Kinesis Data Stream → Kinesis Data Firehose → Firetiger Ingest API

Step 1: Create Kinesis Data Stream

First, create a Kinesis Data Stream to receive CloudFront real-time logs:

aws kinesis create-stream \
  --stream-name cloudfront-logs-stream \
  --shard-count 1 \
  --region us-east-1

Wait for the stream to become active:

aws kinesis describe-stream \
  --stream-name cloudfront-logs-stream \
  --region us-east-1

Step 2: Configure CloudFront Real-time Logs

Via AWS Console

  1. Navigate to CloudFront in AWS Console
  2. Select your distribution
  3. Go to the “Telemetry” tab
  4. Under “Real-time logs”, click “Create configuration”
  5. Configure the following:
    • Name: firetiger-realtime-logs
    • Sampling rate: 100 (adjust based on volume)
    • Fields: Select all fields or customize based on your needs
    • Endpoint: Select “Kinesis Data Streams”
    • Stream: Select cloudfront-logs-stream
    • IAM Role: Create new role with Kinesis write permissions

Via AWS CLI

aws cloudfront create-realtime-log-config \
  --name firetiger-realtime-logs \
  --stream-type Kinesis \
  --kinesis-stream-config StreamArn=arn:aws:kinesis:us-east-1:ACCOUNT_ID:stream/cloudfront-logs-stream,RoleArn=arn:aws:iam::ACCOUNT_ID:role/CloudFrontRealtimeLogRole \
  --fields timestamp,c-ip,s-ip,time-to-first-byte,sc-status,sc-bytes,cs-method,cs-protocol,cs-host,cs-uri-stem,cs-bytes,x-edge-location,x-edge-request-id,x-host-header,time-taken,cs-protocol-version,c-ip-version,cs-user-agent,cs-referer,cs-cookie,cs-uri-query,x-edge-response-result-type,x-forwarded-for,ssl-protocol,ssl-cipher,x-edge-result-type,c-country \
  --sampling-rate 100

Step 3: Create IAM Role for Kinesis Data Firehose

Create an IAM role that allows Kinesis Data Firehose to read from the stream:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "firehose.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Attach the following policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "kinesis:DescribeStream",
        "kinesis:GetShardIterator",
        "kinesis:GetRecords",
        "kinesis:ListShards"
      ],
      "Resource": "arn:aws:kinesis:us-east-1:ACCOUNT_ID:stream/cloudfront-logs-stream"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:PutLogEvents",
        "logs:CreateLogGroup",
        "logs:CreateLogStream"
      ],
      "Resource": "*"
    }
  ]
}

Step 4: Create Kinesis Data Firehose Delivery Stream

To configure the Delivery Stream, you’ll need your organization’s Firetiger Ingest username and password credentials. These can be found in the Firetiger UI on the /settings page: https://ui.{deployment}.firetigerapi.com/settings (replace {deployment} with your deployment name). Substitute those values for $FIRETIGER_USERNAME and $FIRETIGER_PASSWORD in the following commands:

Via AWS CLI

export ACCESS_KEY=$(echo -n "$FIRETIGER_USERNAME:$FIRETIGER_PASSSWORD" | base64)
aws firehose create-delivery-stream \
  --delivery-stream-name cloudfront-to-firetiger \
  --delivery-stream-type KinesisStreamAsSource \
  --kinesis-stream-source-configuration '{
    "KinesisStreamARN": "arn:aws:kinesis:us-east-1:ACCOUNT_ID:stream/cloudfront-logs-stream",
    "RoleARN": "arn:aws:iam::ACCOUNT_ID:role/firehose-role"
  }' \
  --http-endpoint-destination-configuration '{
    "EndpointConfiguration": {
      "Url": "https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis?fields=timestamp,c-ip,s-ip,time-to-first-byte,sc-status,sc-bytes,cs-method,cs-protocol,cs-host,cs-uri-stem,cs-bytes,x-edge-location,x-edge-request-id,x-host-header,time-taken,cs-protocol-version,c-ip-version,cs-user-agent,cs-referer,cs-cookie,cs-uri-query,x-edge-response-result-type,x-forwarded-for,ssl-protocol,ssl-cipher,x-edge-result-type,c-country",
      "Name": "Firetiger",
      "AccessKey": "$ACCESS_KEY"
    },
    "BufferingHints": {
      "IntervalInSeconds": 60,
      "SizeInMBs": 1
    },
    "CompressionFormat": "GZIP",
    "RequestConfiguration": {
      "ContentEncoding": "GZIP"
    },
    "RetryConfiguration": {
      "DurationInSeconds": 3600
    },
    "S3Configuration": {
      "BucketARN": "arn:aws:s3:::your-backup-bucket",
      "Prefix": "failed-records/",
      "ErrorOutputPrefix": "error-records/",
      "CompressionFormat": "GZIP",
      "RoleARN": "arn:aws:iam::ACCOUNT_ID:role/firehose-role"
    }
  }'

Configuration Parameters

  • Url: Your Firetiger ingest endpoint - https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis
  • AccessKey: Your basic-auth Firetiger ingest credentials, as shown above
  • BufferingHints:
    • IntervalInSeconds: How often to send data (60-900 seconds)
    • SizeInMBs: Buffer size before sending (1-128 MB)
  • CompressionFormat: Use GZIP to reduce bandwidth
  • S3Configuration: Backup location for failed records

Step 5: Attach CloudFront Distribution to Real-time Log Configuration

aws cloudfront update-distribution \
  --id YOUR_DISTRIBUTION_ID \
  --realtime-log-config-arn arn:aws:cloudfront::ACCOUNT_ID:realtime-log-config/firetiger-realtime-logs

Step 6: Verify Data Flow

After configuration, verify that logs are flowing:

  1. Generate some traffic to your CloudFront distribution
  2. Monitor Kinesis Data Stream metrics in CloudWatch
  3. Check Kinesis Data Firehose metrics for successful deliveries
  4. Query your data in Firetiger to confirm ingestion

Troubleshooting

Common Issues

  1. Authentication Failures
    • Verify your API key is correct
    • Ensure the endpoint URL includes the correct protocol (https)
  2. No Data Flowing
    • Check CloudWatch Logs for Kinesis Data Firehose error messages
    • Verify IAM roles have correct permissions
    • Ensure CloudFront distribution is attached to the real-time log configuration
  3. High Error Rate
    • Check the S3 backup bucket for failed records
    • Review error messages in CloudWatch Logs
    • Verify endpoint is accessible from AWS

Advanced Configuration

Custom Field Selection

CloudFront allows you to customize which fields to include in real-time logs. This is useful for reducing data volume and costs. You can configure this when creating or updating your CloudFront real-time log configuration:

aws cloudfront update-realtime-log-config \
  --name firetiger-realtime-logs \
  --fields timestamp,c-ip,sc-status,cs-method,cs-uri-stem,x-edge-location

Important: When you customize CloudFront fields, you must also update your Kinesis Data Firehose endpoint URL to specify which fields you’re sending. This ensures Firetiger parses the log records correctly.

Configuring Firetiger for Custom Fields

When configuring custom CloudFront fields, add a ?fields= query parameter to your Firehose endpoint URL that lists the fields in the same order as your CloudFront configuration:

export ACCESS_KEY=$(echo -n "$FIRETIGER_USERNAME:$FIRETIGER_PASSWORD" | base64)
aws firehose create-delivery-stream \
  --delivery-stream-name cloudfront-to-firetiger \
  --delivery-stream-type KinesisStreamAsSource \
  --kinesis-stream-source-configuration '{
    "KinesisStreamARN": "arn:aws:kinesis:us-east-1:ACCOUNT_ID:stream/cloudfront-logs-stream",
    "RoleARN": "arn:aws:iam::ACCOUNT_ID:role/firehose-role"
  }' \
  --http-endpoint-destination-configuration '{
    "EndpointConfiguration": {
      "Url": "https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis?fields=timestamp,c-ip,sc-status,cs-method,cs-uri-stem,x-edge-location",
      "Name": "Firetiger",
      "AccessKey": "$ACCESS_KEY"
    },
    ...
  }'

Field Mapping Examples

Default (All Fields):

If you don’t specify a ?fields= parameter, Firetiger expects all 45 standard CloudFront fields in canonical order. This is the recommended configuration for most use cases.

https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis

Minimal Fields:

For cost-sensitive deployments, you can send only essential fields:

https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis?fields=timestamp,c-ip,sc-status,cs-method,cs-uri-stem

Custom Fields:

Select specific fields based on your analytics needs:

https://ingest.YOUR_FT_DOMAIN/aws/cloudfront/kinesis?fields=timestamp,c-ip,cs-method,cs-host,cs-uri-stem,sc-status,sc-bytes,time-taken,cs-user-agent,x-edge-location,x-edge-response-result-type

Important Notes:

  1. Field names must match CloudFront field names exactly (e.g., timestamp, c-ip, sc-status)
  2. Field order must match your CloudFront real-time log configuration exactly - CloudFront always sends fields in canonical order
  3. Changing fields: If you add or remove fields in your CloudFront configuration, you must update your Kinesis Data Firehose URL to match the new field list. AWS automatically reorders fields in canonical order when you modify the configuration.
  4. Field names are comma-separated with no spaces
  5. The query parameter can handle all 45 standard fields
  6. If you omit fields from your CloudFront configuration, those attributes will not be populated in Firetiger

Example of field ordering:

  • Initial CloudFront config: time-to-first-byte, sc-status, c-country (3 fields)
  • Firehose URL: ?fields=time-to-first-byte,sc-status,c-country
  • Later you add sc-bytes and time-taken to CloudFront
  • CloudFront automatically reorders to: time-to-first-byte, sc-status, sc-bytes, time-taken, c-country (canonical order)
  • You must update Firehose URL to: ?fields=time-to-first-byte,sc-status,sc-bytes,time-taken,c-country
  • The c-country field moves from position 3 to position 5 in the log records

Multi-Region Setup

For global distributions, consider:

  1. Creating Kinesis streams in multiple regions
  2. Using cross-region replication
  3. Configuring regional Firehose delivery streams

CloudFront Log Format Details

Real-time Log Fields

CloudFront real-time logs are delivered as tab-delimited records with 40-69 fields (depending on configuration). Each record in the Kinesis Data Firehose payload is base64-encoded.

Note: AWS has expanded the CloudFront real-time log format over time. Firetiger supports:

  • Fields 1-45: Fully parsed with structured field names
  • Fields 46-69+: Gracefully ignored (CMCD media streaming fields and future extensions)

The fields are:

Core Fields (1-41)

  1. timestamp - Unix timestamp with milliseconds (e.g., 1733270958.123)
  2. c-ip - Client IP address
  3. s-ip - CloudFront server IP address
  4. time-to-first-byte - Time to first byte in seconds
  5. sc-status - HTTP status code
  6. sc-bytes - Response size in bytes
  7. cs-method - HTTP method (GET, POST, etc.)
  8. cs-protocol - Protocol (http/https)
  9. cs-host - Host header value
  10. cs-uri-stem - URI path
  11. cs-bytes - Request size in bytes
  12. x-edge-location - CloudFront edge location code
  13. x-edge-request-id - Unique request identifier
  14. x-host-header - Host header sent to origin
  15. time-taken - Total time taken in seconds
  16. cs-protocol-version - HTTP protocol version
  17. c-ip-version - IP version (IPv4/IPv6)
  18. cs-user-agent - User agent string
  19. cs-referer - Referer header
  20. cs-cookie - Cookie header
  21. cs-uri-query - Query string
  22. x-edge-response-result-type - Cache result (Hit, Miss, Error)
  23. x-forwarded-for - X-Forwarded-For header
  24. ssl-protocol - SSL/TLS protocol version
  25. ssl-cipher - SSL/TLS cipher suite
  26. x-edge-result-type - How request was classified
  27. fle-encrypted-fields - Field-level encryption
  28. fle-status - Field-level encryption status
  29. sc-content-type - Response content type
  30. sc-content-len - Response content length
  31. sc-range-start - Range request start
  32. sc-range-end - Range request end
  33. c-port - Client port
  34. x-edge-detailed-result-type - Detailed result type
  35. c-country - Client country code
  36. cs-accept-encoding - Accept-Encoding header
  37. cs-accept - Accept header
  38. cache-behavior-path-pattern - Cache behavior pattern
  39. cs-headers - Custom headers
  40. cs-header-names - Custom header names
  41. cs-headers-count - Count of custom headers

Extended Fields (42-45, added October 2022)

  1. primary-distribution-id - Primary distribution identifier
  2. primary-distribution-dns-name - Primary distribution DNS name
  3. origin-fbl - Origin first-byte latency (time from edge to origin’s first byte, in seconds)
  4. origin-lbl - Origin last-byte latency (time from edge to origin’s last byte, in seconds)

CMCD and Extended Fields (46-69+, added April 2024)

Fields 46 and beyond are gracefully ignored but accepted in log records. The known CMCD field names are:

  1. asn - Autonomous system number
  2. c-asn - Client autonomous system number
  3. cmcd-buffer-length - CMCD buffer length (milliseconds)
  4. cmcd-buffer-starvation - CMCD buffer starvation indicator
  5. cmcd-content-id - CMCD content identifier
  6. cmcd-deadline - CMCD playback deadline
  7. cmcd-encoded-bitrate - CMCD encoded bitrate (kbps)
  8. cmcd-measured-throughput - CMCD measured throughput (kbps)
  9. cmcd-next-object-request - CMCD next object request
  10. cmcd-next-range-request - CMCD next range request
  11. cmcd-object-duration - CMCD object duration (milliseconds)
  12. cmcd-object-type - CMCD object type (m=manifest, a=audio, v=video, etc.)
  13. cmcd-playback-rate - CMCD playback rate
  14. cmcd-requested-maximum-throughput - CMCD requested max throughput
  15. cmcd-startup - CMCD startup indicator
  16. cmcd-stream-type - CMCD stream type (v=VOD, l=live)
  17. cmcd-streaming-format - CMCD streaming format (d=DASH, h=HLS, etc.)
  18. cmcd-top-bitrate - CMCD top bitrate (kbps)
  19. cmcd-version - CMCD version
  20. r-host - Request host
  21. sc-range-count - Range request count
  22. sc-response-body-time - Response body time
  23. sr-reason - Server reason code
  24. x-sc-response-body-time - Extended response body time

These fields are only present if configured in your CloudFront real-time log configuration. CMCD fields are primarily used for media streaming analytics and are sent by compatible media players. Firetiger accepts but does not parse these fields.

For complete documentation, see the AWS CloudFront Real-time Logs documentation.

Field Value Conventions

  • Fields with no value are represented as - (hyphen)
  • Numeric fields use standard decimal notation
  • Timestamps use Unix epoch format with decimal seconds (e.g., 1733270958.123)

Kinesis Data Firehose Request Format

The HTTP request from Kinesis Data Firehose follows this structure:

{
  "requestId": "ed4acda5-034f-9f42-bba1-f29aea6d7d8f",
  "timestamp": 1733270958000,
  "records": [
    {
      "data": "MTczMzI3MDk1OC4xMjMJMTkyLjE2OC4xLjEJ..."
    }
  ]
}
  • requestId: Unique identifier for the Firehose request (matches X-Amz-Firehose-Request-Id header)
  • timestamp: Unix timestamp in milliseconds when the request was created
  • records: Array of base64-encoded CloudFront log records

Expected Response Format

Firetiger must respond with the following format:

Success (200 OK):

{
  "requestId": "ed4acda5-034f-9f42-bba1-f29aea6d7d8f",
  "timestamp": 1733270958123
}

Error (4xx/5xx):

{
  "requestId": "ed4acda5-034f-9f42-bba1-f29aea6d7d8f",
  "timestamp": 1733270958123,
  "errorMessage": "Error description"
}

This site uses Just the Docs, a documentation theme for Jekyll.