Distributed Tracing: Following Requests Across Go Services

8 min readDec 13, 2023

In today’s complex microservice architectures, requests often travel through multiple different services until a response is eventually returned to the end user. Each service is like a black box — we know requests go in and responses come out, but what happens inside can be opaque.

This is where distributed tracing comes into play. Distributed tracing allows us to open up that black box and follow the entire journey of a request across all the microservices it touches. Each service adds context and metadata to outbound requests in tracing headers, allowing the full end-to-end path to be reconstructed.

Implementing distributed tracing provides crucial visibility and insight into microservices-based systems. We can analyze performance hotspots, identify failing services, and pinpoint the root cause of errors and high latency. Tracing request flows also helps us better comprehend the complex interactions between services so we can optimize architectures over time.

In this post, we will explore how to implement distributed tracing for a sample Go-based microservice architecture.

Imagine this simple e-commerce platform

A simple request will be basket -> order -> payment -> picker -> delivery some of the events can happen asynchronously, assume a bug happened and you need to trace the error what was the error, and how can we reproduce the error?

let me tell you about a bug I encountered while working on the grocery ordering squad for an e-commerce startup. The customer service agent shared this issue with me during my on-call round.

Hello, I’m calling about order #127489 I placed yesterday. Your website had a 30% off promotion advertised for purchases over $50. My order qualified at $89, but the 30% discount wasn’t applied to my receipt as expected.

When a customer places an order, our system captures their location (latitude and longitude). We shared the order details with our picker application used by staff at the grocery store to assemble orders. If a product was unavailable, the picker app allowed store staff to notify the customer to select a replacement. However, if the customer did not respond within 10 minutes, our system automatically removed the out-of-stock item from their cart.

The bug occurred because removing an item triggered our system to re-check promotions, but without passing the customer’s location data this time. This caused inaccurate promotion application logic since geographic data is required to determine certain regional offers.

As you see logs for one order span multiple microservices and we need to check where is the error exactly, this requires an engineering effort to trace the logs, we will not explore in this article the setup of monitoring and log aggregation tools otherwise we will focus on generating a correlated logs for the same order.

We have 3 cases for request

1- The client calls the service and the service returns a response directly

2- The client calls a service and the service calls another service synchronously

3- The client calls a service and the service publishes a message to MQ to be consumed by another service asynchronously

For single-service tracing, we can get the benefit from our logging package if we can add a trace-id for each log message then we will be able to gather all logs for the same request, let’s build our golang logger package.

First, create a directory namedlogging inside pkg directory and create a file logger.go it will be an interface for our logger

package logging

import (
 "context"
)

type Logger interface {
 Debugf(msg string, args ...any)
 Infof(msg string, args ...any)
 Errorf(msg string, args ...any)
 Fatalf(msg string, args ...any)
 Debug(msg string)
 Info(msg string)
 Error(msg string)
 Fatal(msg string)
 WithFields(map[string]string) Logger
 WithContext(ctx context.Context) Logger
}

Secondly, let’s implement the interface I’ll use Zap as a logger you can install Zap as follows

go get go.uber.org/zap

let’s create a file called zap.go it will contain a struct as a wrapper for the Zap library in case you want to change it in the future.

package logging

import (
 "context"
 "fmt"

 "go.uber.org/zap"
 "go.uber.org/zap/zapcore"
)

type ZapLogger struct {
 *zap.Logger
}

func (l *ZapLogger) Debugf(msg string, args ...any) {
 l.Logger.Debug(fmt.Sprintf(msg, args...))
}
func (l *ZapLogger) Infof(msg string, args ...any) {
 l.Logger.Info(fmt.Sprintf(msg, args...))
}
func (l *ZapLogger) Errorf(msg string, args ...any) {
 l.Logger.Error(fmt.Sprintf(msg, args...))
}
func (l *ZapLogger) Fatalf(msg string, args ...any) {
 l.Logger.Fatal(fmt.Sprintf(msg, args...))
}
func (l *ZapLogger) Debug(msg string) {
 l.Logger.Debug(msg)
}
func (l *ZapLogger) Info(msg string) {
 l.Logger.Info(msg)
}
func (l *ZapLogger) Error(msg string) {
 l.Logger.Error(msg)
}
func (l *ZapLogger) Fatal(msg string) {
 l.Logger.Fatal(msg)
}
func (l *ZapLogger) WithFields(fields map[string]string) (logger Logger) {
 if len(fields) == 0 {
  logger = &ZapLogger{
   Logger: l.Logger,
  }
  return
 }
 zapFields := make([]zapcore.Field, 0)
 for k, v := range fields {
  zapFields = append(zapFields, zap.String(k, v))
 }

 clonedLog := l.Logger.With(zapFields...)
 logger = &ZapLogger{
  Logger: clonedLog,
 }
 return
}
func (l *ZapLogger) WithContext(ctx context.Context) Logger {
  if ctx == nil { 
    return l
  }
  
  logger := l.Logger
  fields := []zap.Field{}

  for key, value := range contextKeys {
    if val, ok := ctx.Value(key).(string); ok {
      fields = append(fields, zap.String(value, val))
    }
  }

  return &ZapLogger{
    Logger: logger.With(fields...),
  } 
}

func NewZapLogger(logger *zap.Logger) *ZapLogger {

 return &ZapLogger{
  Logger: logger,
 }

}

There exist two interesting functions let's explore them.

func (l *ZapLogger) WithFields(fields map[string]string) (logger Logger)
func (l *ZapLogger) WithContext(ctx context.Context) Logger

these two functions enable us to add fields for the log message directly, and the second one enables us to extract fields from the context and them to the log message, hence now we can add an identifier for every log message Let's call it trace-id.

let’s now define contextKeys map used inside WithContext method

type ContextKey string

var contextKeys = map[ContextKey]string{
  ContextKey("trace_id"): "traceID",
  ContextKey("span_id"): "spanID",
  ContextKey("user_id"): "userID", 
}

By using a custom ContextKey type rather than just string, we gain:

Type safety on the keys
Avoid collisions with non-context key strings
Self-documenting code

There exist other keys that might be helpful depending on the application logic, here is a good list of context keys:

traceID — The overall trace ID that links logs across services
spanID — The individual segment/span within a trace
userID — The ID of the current user
requestID — A unique ID for the in-process request
api_key— For identifying API keys used in requests
device_id — For tagging logs from a specific device
tenant_id — Identify tenant in a multi-tenant application

Finally, we can add the last file which is the place where we create the application logger itself let’s create app_logger.go file

package logging

import (
 "context"
 "fmt"
 "go.elastic.co/ecszap"
 "go.uber.org/zap"
 "go.uber.org/zap/zapcore"
 "os"
 "strings"
 "sync"
)

var LogHandle ApplicationLogger
var once sync.Once

type ApplicationLogger struct {
 logger *ZapLogger
}

func InitLogger(lvl, serviceName, environment string) error {
 level, err := parseLevel(lvl)
 LogHandle = getLogger(level, serviceName, environment)
 return err
}

func getLogger(level zapcore.Level, serviceName string, environment string) ApplicationLogger {
 once.Do(func() {
  encoderConfig := ecszap.EncoderConfig{
   EncodeName:     zap.NewProductionEncoderConfig().EncodeName,
   EncodeLevel:    zapcore.CapitalLevelEncoder,
   EncodeDuration: zapcore.MillisDurationEncoder,
   EncodeCaller:   ecszap.FullCallerEncoder,
  }
  core := ecszap.NewCore(encoderConfig, os.Stdout, level)
  l := zap.New(core, zap.AddCaller())
  l = l.With(zap.String("app", serviceName)).With(zap.String("env", environment))

  zapLogger := NewZapLogger(l)

  LogHandle = ApplicationLogger{
   logger: zapLogger,
  }
 })

 return LogHandle
}

func (l *ApplicationLogger) Debugf(msg string, args ...any) {
 l.logger.Debugf(msg, args...)
}
func (l *ApplicationLogger) Infof(msg string, args ...any) {
 l.logger.Infof(msg, args...)
}
func (l *ApplicationLogger) Errorf(msg string, args ...any) {
 l.logger.Errorf(msg, args...)
}
func (l *ApplicationLogger) Fatalf(msg string, args ...any) {
 l.logger.Fatalf(msg, args...)
}
func (l *ApplicationLogger) Debug(msg string) {
 l.logger.Debug(msg)
}
func (l *ApplicationLogger) Info(msg string) {
 l.logger.Info(msg)
}
func (l *ApplicationLogger) Error(msg string) {
 l.logger.Error(msg)
}
func (l *ApplicationLogger) Fatal(msg string) {
 l.logger.Fatal(msg)
}
func (l *ApplicationLogger) WithFields(fields map[string]string) Logger {
 return l.logger.WithFields(fields)
}
func (l *ApplicationLogger) WithContext(ctx context.Context) Logger {
 return l.logger.WithContext(ctx)
}

func parseLevel(lvl string) (zapcore.Level, error) {
 switch strings.ToLower(lvl) {
 case "debug":
  return zap.DebugLevel, nil
 case "info":
  return zap.InfoLevel, nil
 case "warn":
  return zap.WarnLevel, nil
 case "error":
  return zap.ErrorLevel, nil
 }
 return zap.InfoLevel, fmt.Errorf("invalid log level <%v>", lvl)
}

here we create the factory method which creates a single instance of the logger, we write the logs into the console directly.

we can use the previous logger as follows:

func main(){
 envConfig := loadEnvConfig()
 // Logging
 err := logging.InitLogger(envConfig.DebugLevel(), envConfig.AppName(), envConfig.Environment())
 if err != nil {
  log.Fatal("failed to init logger")
 }
 ctx := context.Background()
  
  // Generate UUIDs
  traceID := uuid.New()
  spanID := uuid.New()  
  userID := uuid.New()
  
  // Insert into context
  ctx = context.WithValue(ctx, traceIDKey, traceID.String())
  ctx = context.WithValue(ctx, spanIDKey, spanID.String()) 
  ctx = context.WithValue(ctx, userIDKey, userID.String())
  
  logging.LogHandle.WithContext(ctx).Info("hello world!")
}

HTTP server middlewares provide an effective way to enrich the request context for logging. Chaining together multiple middlewares lets vital information be added incrementally — a AuthMiddleware can inject the user_id while a TraceMiddleware can extract trace IDs from headers. This context data then flows through all handlers, automatically correlating log statements to specific requests and users. Keeping enriching logic in middlewares therefore allows application code to remain clean while enabling rich, traceable structured logging for monitoring.

for the second and third cases we should enable correlated logging between microservices, context propagation should be standardized across synchronous requests and asynchronous events. Clients and message producers must be made responsible for propagating context, by ensuring essential identifiers are passed in request headers or message metadata.

By encouraging the upstream services to propagate identifiers, rather than solely relying on downstreams to extract them, context can seamlessly flow across microservice boundaries both synchronously and asynchronously. Standardizing this context propagation allows the entire distributed system log narrative to be reconstructed, despite displacements across network endpoints, queues, caches and service calls. Each microservice can then log effectively, with logs aggregated and interconnected regardless of dispatch mechanism or deployment topology.

based on the unique traceId attribute propagated through each service. This traceId stitched together the entire sequence of transactions, enabling holistic analysis despite the request traversing multiple independently deployed microservices. By visually correlating the distributed segments of this single request flow, I identified precisely where the processing failed and how to address the root cause bug. Structuring logs for end-to-end traceability made the difference between prolonged head-scratching and rapid targeted diagnosis. As complex as microservices are, ensuring logs travel with context transforms troubleshooting from daunting to delightfully straightforward.

The more feedback and engagement, the more tales I can tell about the distributed systems, follow for more interesting stories.

Distributed Tracing: Following Requests Across Go Services

Written by Ahmed Ghazey

No responses yet