Learn more
All posts

Getting Started with AWS and OTel Collectors

Observability
Jul
14
2025
May
01
2025
Getting Started with AWS and OTel Collectors

By Eran Barlev - Part 1 of 3: A comprehensive guide to implementing observability with AWS and OpenTelemetry

Observability is crucial for modern cloud-native applications, providing deep insights into system performance, user experience, and business metrics. This guide will walk you through implementing a robust observability solution using Amazon Web Services (AWS) and OpenTelemetry, the industry standard for telemetry data collection.

In this first part, we'll cover the fundamentals of AWS and OpenTelemetry, then dive into the practical implementation of the AWS Distro for OpenTelemetry (ADOT) collector on Amazon EKS. You'll learn how to set up the infrastructure, configure the collector, and instrument your applications to send telemetry data to AWS services like CloudWatch and X-Ray.

What Is AWS?

Amazon Web Services (AWS) is the leading cloud computing platform, offering scalable infrastructure and services for storage, compute, networking, and more. It powers applications for startups and enterprises alike, providing the backbone for modern cloud-native development.

What Is OpenTelemetry?

OpenTelemetry (often abbreviated as OTel) is an open-source observability framework designed to standardize the collection of telemetry data such as logs, metrics, and traces. It enables developers and DevOps teams to collect, process, and export observability data from their applications to various backends.

At the heart of the OpenTelemetry ecosystem is the OTel Collector. The collector is a vendor-agnostic agent that receives telemetry data, processes it (e.g., filtering, batching, transforming), and exports it to the observability platform of your choice (e.g., Amazon CloudWatch, AWS X-Ray, Prometheus, or third-party services like Datadog).

How Do AWS and OpenTelemetry Work Together?

AWS provides a distribution of the OpenTelemetry Collector known as the AWS Distro for OpenTelemetry (ADOT). It's an AWS-supported version of the OTel Collector, preconfigured to integrate seamlessly with AWS services like:

This combination allows you to instrument your applications using OpenTelemetry SDKs and forward the telemetry data through the AWS OTEL Collector to native AWS services, offering deep visibility into system performance and behavior.

Getting Started Using OTel Collectors and AWS

Step 1: Choose Your Environment

Before you begin, decide where you want to run the AWS OTEL Collector:

For this guide, we'll use Amazon EKS as an example.

Step 2: Install the AWS OTEL Collector on EKS

Step 2.1: Prerequisites

Before installing the ADOT collector, ensure you have:

Step 2.2: Create IAM Role for ADOT Collector

Create an IAM role that the ADOT collector will use to send data to CloudWatch and X-Ray:

# Create IAM policy for ADOT collector
cat <<EOF > adot-policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "xray:PutTraceSegments",
                "xray:PutTelemetryRecords",
                "xray:GetSamplingRules",
                "xray:GetSamplingTargets",
                "xray:GetSamplingStatisticSummaries"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:PutLogEvents",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData"
            ],
            "Resource": "*"
        }
    ]
}
EOF

# Create the policy
aws iam create-policy \
    --policy-name ADOTCollectorPolicy \
    --policy-document file://adot-policy.json

# Create IAM role and attach policy (replace with your account ID and cluster name)
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
CLUSTER_NAME=your-eks-cluster-name

aws iam create-role \
    --role-name ADOTCollectorRole \
    --assume-role-policy-document '{
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": "arn:aws:iam::'"$ACCOUNT_ID"':oidc-provider/oidc.eks.'"$AWS_REGION"'.amazonaws.com/id/'"$(aws eks describe-cluster --name $CLUSTER_NAME --query cluster.identity.oidc.issuer --output text | cut -d'/' -f5)"'"
                },
                "Action": "sts:AssumeRoleWithWebIdentity",
                "Condition": {
                    "StringEquals": {
                        "oidc.eks.'"$AWS_REGION"'.amazonaws.com/id/'"$(aws eks describe-cluster --name $CLUSTER_NAME --query cluster.identity.oidc.issuer --output text | cut -d'/' -f5)"':sub": "system:serviceaccount:aws-otel:aws-otel-collector"
                    }
                }
            }
        ]
    }'

aws iam attach-role-policy \
    --role-name ADOTCollectorRole \
    --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/ADOTCollectorPolicy

Step 2.3: Install the ADOT Operator

Use Helm to install the AWS Distro for OpenTelemetry Operator:

# Add the AWS Helm repository
helm repo add aws-observability https://aws-observability.github.io/aws-otel-helm-charts
helm repo update

# Create namespace for ADOT
kubectl create namespace aws-otel

# Install the ADOT Operator
helm install adot-operator aws-observability/adot-operator \
    --namespace aws-otel \
    --set serviceAccount.create=true \
    --set serviceAccount.name=adot-operator

Step 2.4: Create Collector Configuration

Create a ConfigMap with the ADOT collector configuration:

cat <<EOF > adot-collector-config.yaml
apiVersion: v1
kind: ConfigMap  # Kubernetes resource type for storing configuration data
metadata:
  name: adot-collector-config  # Name of the ConfigMap
  namespace: aws-otel  # Kubernetes namespace where this ConfigMap will be created
data:
  config.yaml: |  # The actual OpenTelemetry collector configuration
    receivers:  # Define how the collector receives telemetry data
      otlp:  # OpenTelemetry Protocol receiver (standard protocol)
        protocols:
          grpc:  # gRPC protocol endpoint for receiving data
            endpoint: 0.0.0.0:4317  # Listen on all interfaces, port 4317 (standard OTLP gRPC port)
          http:  # HTTP protocol endpoint for receiving data
            endpoint: 0.0.0.0:4318  # Listen on all interfaces, port 4318 (standard OTLP HTTP port)
    
    processors:  # Define how to process/transform telemetry data before exporting
      batch:  # Batch processor groups multiple telemetry items together
        timeout: 1s  # Maximum time to wait before sending a batch
        send_batch_size: 1024  # Maximum number of items in a batch
      resource:  # Resource processor adds metadata to telemetry data
        attributes:
          - key: service.name  # Add service name attribute to all telemetry
            value: "my-service"  # Value for the service name
            action: upsert  # Create if doesn't exist, update if it does
          - key: service.namespace  # Add namespace attribute
            value: "production"  # Environment/namespace value
            action: upsert
    
    exporters:  # Define where to send the processed telemetry data
      awscloudwatch:  # AWS CloudWatch exporter for metrics and logs
        region: us-west-2  # AWS region where CloudWatch is located
        log_group_name: "/aws/eks/my-cluster/application"  # CloudWatch log group name
        log_stream_name: "{PodName}"  # Use pod name as log stream (dynamic)
        endpoint: "https://logs.us-west-2.amazonaws.com"  # CloudWatch logs endpoint
      awsxray:  # AWS X-Ray exporter for distributed tracing
        region: us-west-2  # AWS region where X-Ray is located
        index_attributes: true  # Enable indexing of trace attributes for faster queries
        default_indexed_attributes: true  # Index common attributes by default
    
    service:  # Define which telemetry types to collect and how to process them
      pipelines:  # Processing pipelines for different telemetry types
        traces:  # Pipeline for distributed traces
          receivers: [otlp]  # Receive traces via OTLP protocol
          processors: [batch, resource]  # Process with batching and resource attribution
          exporters: [awsxray]  # Send traces to AWS X-Ray
        metrics:  # Pipeline for metrics
          receivers: [otlp]  # Receive metrics via OTLP protocol
          processors: [batch, resource]  # Process with batching and resource attribution
          exporters: [awscloudwatch]  # Send metrics to CloudWatch
        logs:  # Pipeline for logs
          receivers: [otlp]  # Receive logs via OTLP protocol
          processors: [batch, resource]  # Process with batching and resource attribution
          exporters: [awscloudwatch]  # Send logs to CloudWatch
EOF

kubectl apply -f adot-collector-config.yaml  # Apply the ConfigMap to the cluster

Step 2.5: Deploy the ADOT Collector

Create the ADOT collector deployment using the Operator:

cat <<EOF > adot-collector-deployment.yaml
apiVersion: opentelemetry.io/v1alpha1  # Custom resource API for OpenTelemetry Operator
kind: OpenTelemetryCollector  # Custom resource type for deploying collectors
metadata:
  name: adot-collector  # Name of the collector instance
  namespace: aws-otel  # Kubernetes namespace
spec:
  mode: daemonset  # Deploy as DaemonSet (one pod per node)
  serviceAccount: aws-otel-collector  # Service account for the collector pods
  image: amazon/aws-otel-collector:latest  # AWS-provided collector image
  config: |  # Inline collector configuration (alternative to ConfigMap)
    receivers:  # Define data sources
      otlp:  # OpenTelemetry Protocol receiver
        protocols:
          grpc:  # gRPC endpoint for high-performance data ingestion
            endpoint: 0.0.0.0:4317  # Listen on all network interfaces
          http:  # HTTP endpoint for web-based data ingestion
            endpoint: 0.0.0.0:4318  # Standard OTLP HTTP port
    
    processors:  # Data processing pipeline
      batch:  # Group multiple telemetry items for efficient transmission
        timeout: 1s  # Wait up to 1 second to form a batch
        send_batch_size: 1024  # Maximum items per batch
      resource:  # Add contextual metadata to telemetry data
        attributes:
          - key: service.name  # Service identifier
            value: "my-service"  # Your service name
            action: upsert  # Insert or update the attribute
          - key: service.namespace  # Environment identifier
            value: "production"  # Your environment name
            action: upsert
    
    exporters:  # Data destinations
      awscloudwatch:  # Send metrics and logs to CloudWatch
        region: us-west-2  # Target AWS region
        log_group_name: "/aws/eks/my-cluster/application"  # CloudWatch log group
        log_stream_name: "{PodName}"  # Dynamic log stream naming
      awsxray:  # Send traces to X-Ray for distributed tracing
        region: us-west-2  # Target AWS region
        index_attributes: true  # Enable attribute indexing for better query performance
        default_indexed_attributes: true  # Index common attributes automatically
    
    service:  # Telemetry processing configuration
      pipelines:  # Define processing flows for different data types
        traces:  # Distributed tracing data flow
          receivers: [otlp]  # Source: OTLP protocol
          processors: [batch, resource]  # Processing: batching + metadata enrichment
          exporters: [awsxray]  # Destination: AWS X-Ray
        metrics:  # Metrics data flow
          receivers: [otlp]  # Source: OTLP protocol
          processors: [batch, resource]  # Processing: batching + metadata enrichment
          exporters: [awscloudwatch]  # Destination: CloudWatch
        logs:  # Log data flow
          receivers: [otlp]  # Source: OTLP protocol
          processors: [batch, resource]  # Processing: batching + metadata enrichment
          exporters: [awscloudwatch]  # Destination: CloudWatch
  
  serviceAccount:  # Service account configuration for AWS permissions
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::$ACCOUNT_ID:role/ADOTCollectorRole  # IAM role for AWS service access
EOF

kubectl apply -f adot-collector-deployment.yaml  # Deploy the collector to the cluster

Step 2.6: Verify the Deployment

Check that the ADOT collector is running properly:

# Check if the operator is running
kubectl get pods -n aws-otel

# Check if the collector is running
kubectl get daemonset -n aws-otel

# Check collector logs
kubectl logs -n aws-otel -l app.kubernetes.io/name=aws-otel-collector --tail=50

Step 3: Instrument Your Application

Now that the ADOT collector is running, you need to instrument your applications to send telemetry data to it. This involves installing the OpenTelemetry SDK for your programming language and configuring it to export data via OTLP (OpenTelemetry Protocol).

Step 3.1: Choose Your Language SDK

OpenTelemetry provides SDKs for multiple programming languages. Here are examples for the most common ones:

Python Example:

# Install the required packages
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc
# Basic Python instrumentation
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import Resource

# Configure resource attributes (metadata about your service)
resource = Resource.create({
    "service.name": "my-python-service",  # Service identifier
    "service.version": "1.0.0",           # Version information
    "service.namespace": "production",     # Environment
    "deployment.environment": "prod"       # Deployment environment
})

# Set up tracing
trace.set_tracer_provider(TracerProvider(resource=resource))  # Create tracer provider with resource info
otlp_trace_exporter = OTLPSpanExporter(
    endpoint="http://localhost:4317",  # ADOT collector endpoint (gRPC)
    insecure=True  # Use HTTP instead of HTTPS for local development
)
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(otlp_trace_exporter)  # Batch spans for efficient transmission
)

# Set up metrics
metric_reader = PeriodicExportingMetricReader(
    OTLPMetricExporter(
        endpoint="http://localhost:4317",  # ADOT collector endpoint
        insecure=True
    ),
    export_interval_millis=5000  # Export metrics every 5 seconds
)
metrics.set_meter_provider(MeterProvider(resource=resource, metric_readers=[metric_reader]))

# Get tracer and meter for your application
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

# Example usage in your application
def process_request(request_data):
    with tracer.start_as_current_span("process_request") as span:  # Create a span for this operation
        span.set_attribute("request.size", len(request_data))  # Add custom attributes
        
        # Create a counter metric
        request_counter = meter.create_counter(
            name="requests_total",  # Metric name
            description="Total number of requests processed"  # Metric description
        )
        request_counter.add(1, {"endpoint": "/api/process"})  # Increment counter with labels
        
        # Your business logic here
        result = perform_processing(request_data)
        
        span.set_attribute("result.status", "success")  # Add result to span
        return result

Node.js Example:

# Install the required packages
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-otlp-proto-grpc

Java Example:

// Basic Node.js instrumentation
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-otlp-proto-grpc');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-otlp-proto-grpc');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

// Configure resource attributes
const resource = new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'my-nodejs-service',  // Service identifier
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',          // Version information
    [SemanticResourceAttributes.SERVICE_NAMESPACE]: 'production',    // Environment
    'deployment.environment': 'prod'                                 // Deployment environment
});

// Initialize the SDK
const sdk = new NodeSDK({
    resource: resource,  // Set resource attributes
    traceExporter: new OTLPTraceExporter({
        url: 'http://localhost:4317',  // ADOT collector endpoint
        headers: {},  // Optional headers
    }),
    metricReader: new PeriodicExportingMetricReader({
        exporter: new OTLPMetricExporter({
            url: 'http://localhost:4317',  // ADOT collector endpoint
        }),
        exportIntervalMillis: 5000,  // Export metrics every 5 seconds
    }),
});

// Start the SDK
sdk.start();

// Example usage in your application
const { trace, metrics } = require('@opentelemetry/api');

async function processRequest(requestData) {
    const tracer = trace.getTracer('my-service');  // Get tracer instance
    const meter = metrics.getMeter('my-service');  // Get meter instance
    
    return await tracer.startActiveSpan('process_request', async (span) => {
        span.setAttribute('request.size', requestData.length);  // Add custom attributes
        
        // Create a counter metric
        const requestCounter = meter.createCounter('requests_total', {
            description: 'Total number of requests processed'
        });
        requestCounter.add(1, { endpoint: '/api/process' });  // Increment counter with labels
        
        try {
            // Your business logic here
            const result = await performProcessing(requestData);
            span.setAttribute('result.status', 'success');  // Add result to span
            return result;
        } catch (error) {
            span.setAttribute('result.status', 'error');  // Mark as error
            span.recordException(error);  // Record the exception
            throw error;
        } finally {
            span.end();  // End the span
        }
    });
}

Step 3.2: Configure for Production

For production deployments, update the OTLP endpoint to point to your ADOT collector:

# Production configuration - replace with your collector endpoint
otlp_trace_exporter = OTLPSpanExporter(
    endpoint="http://adot-collector.aws-otel.svc.cluster.local:4317",  # Kubernetes service endpoint
    insecure=False  # Use HTTPS in production
)

// Production configuration
const sdk = new NodeSDK({
    traceExporter: new OTLPTraceExporter({
        url: 'http://adot-collector.aws-otel.svc.cluster.local:4317',  // Kubernetes service endpoint
    }),
    // ... other configuration
});

Step 4: View Your Data in AWS

Once your applications are instrumented and sending data to the ADOT collector, you can view the telemetry data in various AWS services.

Step 4.1: View Distributed Traces in AWS X-Ray

Access X-Ray Console:

  1. Open the AWS Management Console
  2. Navigate to X-Ray service
  3. Go to the Traces section

Key X-Ray Features:

  • Service Map: Visual representation of your distributed system showing service dependencies
  • Trace List: View individual traces with timing and error information
  • Trace Details: Drill down into specific traces to see spans and timing
  • Filtering: Filter traces by service, operation, status, or time range

Example X-Ray Trace View:

Service Map:
[Frontend] → [API Gateway] → [Backend Service] → [Database]

Trace Details:
├── Frontend Request (100ms)
│   ├── API Gateway Processing (50ms)
│   │   ├── Backend Service Call (30ms)
│   │   │   └── Database Query (10ms)
│   │   └── Response Processing (20ms)
└── Frontend Response (100ms)

Step 4.2: View Metrics in Amazon CloudWatch

Access CloudWatch Console:

  1. Open the AWS Management Console
  2. Navigate to CloudWatch service
  3. Go to Metrics section

Custom Namespaces:

  • Your application metrics will appear under custom namespaces
  • Look for namespaces like my-service or production
  • Metrics include counters, gauges, and histograms from your application

Creating CloudWatch Dashboards:

# Example: Create a dashboard for your application metrics
aws cloudwatch put-dashboard \
    --dashboard-name "MyService-Dashboard" \
    --dashboard-body '{
        "widgets": [
            {
                "type": "metric",
                "properties": {
                    "metrics": [
                        ["my-service", "requests_total", "endpoint", "/api/process"]
                    ],
                    "period": 300,
                    "stat": "Sum",
                    "region": "us-west-2",
                    "title": "Total Requests"
                }
            }
        ]
    }'

CloudWatch Alarms:

# Example: Create an alarm for high error rates
aws cloudwatch put-metric-alarm \
    --alarm-name "HighErrorRate" \
    --alarm-description "Alert when error rate exceeds 5%" \
    --metric-name "error_rate" \
    --namespace "my-service" \
    --statistic Average \
    --period 300 \
    --threshold 5 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2

Step 4.3: View Logs in CloudWatch Logs

Access CloudWatch Logs:

  1. Open the AWS Management Console
  2. Navigate to CloudWatch service
  3. Go to Log groups section
  4. Find your log group: /aws/eks/my-cluster/application

Log Streams:

  • Each pod creates its own log stream
  • Log streams are named using the {PodName} pattern
  • You can filter logs by pod, time range, or search terms

CloudWatch Logs Insights Queries:


-- Example: Find all error logs in the last hour
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100

-- Example: Count requests by endpoint
fields @timestamp, @message
| filter @message like /request/
| stats count() by @message

-- Example: Find slow requests (>1 second)
fields @timestamp, @message, @duration
| filter @duration > 1000
| sort @duration desc

Summary

In this first part of our comprehensive guide, we've covered the essential foundations for implementing observability with AWS and OpenTelemetry:

What we accomplished:

  1. Understanding the basics of AWS and OpenTelemetry integration
  2. Setting up the ADOT collector on Amazon EKS with proper IAM configuration
  3. Instrumenting applications in Python, Node.js, and Java
  4. Configuring data export to CloudWatch and X-Ray
  5. Viewing telemetry data in AWS services

Key takeaways:

  • The AWS Distro for OpenTelemetry (ADOT) provides a production-ready collector optimized for AWS services
  • Proper IAM configuration is crucial for secure data transmission
  • Application instrumentation follows OpenTelemetry standards and works across multiple languages
  • AWS services like CloudWatch and X-Ray provide powerful visualization and analysis capabilities

Coming soon:

  • In Part 2, we'll dive deep into Best Practices for Monitoring, covering comprehensive alerting strategies, dashboard creation, and operational excellence
  • In Part 3, we'll explore Best Practices for Using OpenTelemetry and AWS, including security, performance optimization, and cost management

This foundation sets you up for a robust, scalable observability platform that can grow with your applications and provide deep insights into your system's performance and user experience.