Distributed Tracing
In microservices architectures, a single user request may travel through dozens of services. Distributed tracing tracks requests across service boundaries to identify bottlenecks and failures.
How Tracing Works
User Request Flow:
βββββββββ βββββββββ βββββββββ βββββββββ
βGatewayβββββΆβ Auth βββββΆβ Order βββββΆβPaymentβ
βββββ¬ββββ βββββ¬ββββ βββββ¬ββββ βββββ¬ββββ
β β β β
ββββββββββββββΌβββββββββββββΌβββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Trace Timeline β
β ββ Gateway ββββββββββββββββββββββββββββββ€ β
β ββ Auth ββββββββ€ β β
β ββ Order βββββββββββ€ β β
β ββPaymentββ€ β β
β β
β Total: 250ms | Bottleneck: Payment: 120msβ
βββββββββββββββββββββββββββββββββββββββββββββββ
Trace Concepts
- Trace β The entire journey of a request through the system
- Span β A single unit of work within a trace (e.g., a database query)
- Context Propagation β Passing trace IDs across service boundaries
- Sampling β Collecting a subset of traces to reduce overhead
Tracing Tools
Jaeger (by Uber):
ββββββββββββββββββββββββββββββββββββββββ
β Jaeger Agent β Collects spans β
β Jaeger Collector β Processes spans β
β Jaeger Query β Search & query β
β Jaeger UI β Visualization β
ββββββββββββββββββββββββββββββββββββββββ
Zipkin (by Twitter):
ββββββββββββββββββββββββββββββββββββββββ
β Collector β Receives trace data β
β Storage β MySQL, ES, Cassandra β
β API β Query interface β
β UI β Dependency graph β
ββββββββββββββββββββββββββββββββββββββββ
OpenTelemetry (CNCF Standard):
ββββββββββββββββββββββββββββββββββββββββ
β Unified API for traces, metrics, β
β and logs. Exports to any backend. β
ββββββββββββββββββββββββββββββββββββββββ