Distributed Tracing with Jaeger in Go

Distributed tracing is used by software engineers to monitor or debug their applications. It is very useful to find which process takes the most time or which function causes errors. One of the systems to do distributed tracing is Jaeger. This article will show you how to run Jaeger in the local environment and trace a Go application.

What is distributed tracing?

According to opentracing.io, distributed tracing is a method to profile and monitor applications, especially applications built using a microservice architecture. Distributed tracing can be very useful when our application gets some performance issue or when we want to improve the performance of our application. It is also useful to do a root cause analysis of a problem. The trace is presented as a chain of function calls. So we can debug our application easier or maybe found some unexpected process flow in our application.

Jaeger is one of the most popular systems to do distributed tracing. It is an open-source, end-to-end distributed tracing system. Jaeger is released by Uber Technology. Jaeger uses OpenTracing compatible data model and instrumentation libraries, so the API and instrumentation of distributed tracing are more standardized.

Run jaeger

To run Jaeger in local environment, we can use Jaeger all in one docker image. For other deployment method, you can see here.

docker run -d -p6831:6831/udp -p16686:16686 jaegertracing/all-in-one:latest

This image already contains the Jaeger UI, collector, query, and agent, which is enough to trace our local app. You can go to http://localhost:16686 to open the Jaeger UI.

Now that we have Jaeger running, we can start to trace our application. In this example, we will trace a Go webserver.

Distributed tracing a Go app

Initialization

To trace our Go application, we need to initialize the tracer first. This code is to initialize Jaeger tracer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import (
	"github.com/opentracing/opentracing-go"
	"github.com/uber/jaeger-client-go"
	"github.com/uber/jaeger-client-go/config"
)

// .....

cfg := config.Configuration{
    Sampler: &config.SamplerConfig{
        Type:  jaeger.SamplerTypeRateLimiting,
        Param: 100,
    },
}

tracer, closer, err := cfg.New("myservice")

The sampler configuration that we use here is rate-limiting with param 100. It means that Jaeger will collect a maximum of 100 traces per second. You may want to change the type and param which is suitable for your needs.
Then, set global tracer with the Jaeger tracer.

1
opentracing.SetGlobalTracer(tracer)

We can initialize Jaeger tracer in the main function of our app.

Trace the processes

To trace the functions in our app, we need to start an opentracing span from a context at the start of the function and call its Finish method before the function return. Note that we need to pass the context created to the next function call so that Jaeger knows that it is in a chain of function calls. Below is an example of tracing an HTTP handler.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func getCityHandler(w http.ResponseWriter, r *http.Request) {
    span, ctx := opentracing.StartSpanFromContext(r.Context(), "Handle /get_cities")
    defer span.Finish()

    if !isLoggedIn(ctx, r) {
        w.Write([]byte("you need to login first"))
        return
    }

    countryName := r.FormValue("country_name")
    cities, err := getCityByCountryName(ctx, countryName)
    ...

On line 2-3 we start opentracing span from the context of the http.Request, and put span.Finish() in defer. See that we use ctx created on line 2 to call function isLoggedIn on line 5 and getCityByCountryName on line 11. Those functions also have to start opentracing span and finish it before the function call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func isLoggedIn(ctx context.Context, r *http.Request) bool {
    span, ctx := opentracing.StartSpanFromContext(ctx, "isLoggedIn")
    defer span.Finish()
    ...


func getCityByCountryName(ctx context.Context, countryName string) ([]City, error) {
    span, ctx := opentracing.StartSpanFromContext(ctx, "getCityByCountryName")
    defer span.Finish()
    ...

Next, we try to call the function several times and see the traces in Jaeger UI. Go to Jaeger UI and click Find Traces.

Click one of the traces and we can see its detail.

We can see the traces from the start of our HTTP handler, and which functions it calls. The length of the span indicates how much time did the function take. Try to explore the Jaeger UI and you will find that it is very useful and can help you in many cases.

Summary

Distributed tracing is useful to monitor your app. It can show the traces as a chain of function calls and help you debug your application easier. Jaeger is one of the popular systems for distributed tracing. It is easy to run and use, and it implements opentracing. We can see the tracing result in Jaeger UI and find how much time did each function take. It makes it easy to find which function could cause a bottleneck.
You can see the complete sample code of the Go webserver below.

The complete code

The complete code is here (click to expand)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
package main

import (
	"context"
	"database/sql"
	"encoding/json"
	"fmt"
	"io"
	"io/ioutil"
	"net/http"
	"sync"
	"time"

	_ "github.com/go-sql-driver/mysql"
	"github.com/opentracing/opentracing-go"
	"github.com/uber/jaeger-client-go"
	"github.com/uber/jaeger-client-go/config"
)

var db *sql.DB

func main() {
	dtbs, err := sql.Open("mysql", "myuser:password@/mydb")
	if err != nil {
		panic(err.Error())
	}
	db = dtbs

	tracer, trCloser, err := InitJaeger()
	if err != nil {
		fmt.Printf("error init jaeger %v", err)
	} else {
		opentracing.SetGlobalTracer(tracer)
		defer trCloser.Close()
	}

	http.HandleFunc("/get_cities", getCityHandler)

	fmt.Println("listening..")
	http.ListenAndServe(":4560", nil)
}

func InitJaeger() (opentracing.Tracer, io.Closer, error) {
	cfg := config.Configuration{
		Sampler: &config.SamplerConfig{
			Type:  jaeger.SamplerTypeRateLimiting,
			Param: 100,
		},
	}

	tracer, closer, err := cfg.New("myservice")
	return tracer, closer, err
}

func getCityHandler(w http.ResponseWriter, r *http.Request) {
	span, ctx := opentracing.StartSpanFromContext(r.Context(), "Handle /get_cities")
	defer span.Finish()

	if !isLoggedIn(ctx, r) {
		w.Write([]byte("you need to login first"))
		return
	}

	countryName := r.FormValue("country_name")
	cities, err := getCityByCountryName(ctx, countryName)
	if err != nil {
		w.Write([]byte("got error: " + err.Error()))
		return
	}

	for k, v := range cities {
		lat, long, err := getGeoposition(ctx, v.Name)
		if err != nil {
			w.Write([]byte("got error: " + err.Error()))
			return
		}
		cities[k].Lat = lat
		cities[k].Long = long
	}

	b, err := json.Marshal(cities)
	if err != nil {
		w.Write([]byte("got error: " + err.Error()))
		return
	}

	w.Write(b)
}

type geopos struct {
	Standard struct {
		City string `json:"city"`
	} `json:"standard"`
	Longt string `json:"longt"`
	Latt  string `json:"latt"`
}

func getGeoposition(ctx context.Context, cityName string) (lat, long string, err error) {
	span, ctx := opentracing.StartSpanFromContext(ctx, "getGeoposition")
	defer span.Finish()

	url := fmt.Sprintf("https://geocode.xyz/%s?json=1", cityName)
	resp, err := http.Get(url)
	if err != nil {
		return "", "", err
	}
	defer resp.Body.Close()

	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return "", "", err
	}

	g := geopos{}
	err = json.Unmarshal(body, &g)
	if err != nil {
		return "", "", err
	}

	return g.Latt, g.Longt, nil
}

type City struct {
	Name        string `json:"name"`
	CountryName string `json:"country_name"`
	Lat         string `json:"lat"`
	Long        string `json:"long"`
}

func getCityByCountryName(ctx context.Context, countryName string) ([]City, error) {
	span, ctx := opentracing.StartSpanFromContext(ctx, "getCityByCountryName")
	defer span.Finish()

	cached := getCityByCountryNameFromCache(ctx, countryName)
	if cached != nil {
		return *cached, nil
	}

	cities, err := getCityByCountryNameFromDB(ctx, countryName)
	if err != nil {
		return nil, err
	}

	setCityByCountryNameFromCache(ctx, countryName, cities)
	return cities, nil
}

func getCityByCountryNameFromDB(ctx context.Context, countryName string) ([]City, error) {
	span, ctx := opentracing.StartSpanFromContext(ctx, "getCityByCountryNameFromDB")
	defer span.Finish()

	rows, err := db.QueryContext(ctx, "SELECT name, country_name FROM city WHERE country_name = ?", countryName)
	if err != nil {
		return nil, err
	}

	var cities []City
	for rows.Next() {
		var city City
		err := rows.Scan(&city.Name, &city.CountryName)
		if err != nil {
			return nil, err
		}
		cities = append(cities, city)
	}
	return cities, nil
}

func isLoggedIn(ctx context.Context, r *http.Request) bool {
	span, ctx := opentracing.StartSpanFromContext(ctx, "isLoggedIn")
	defer span.Finish()

	time.Sleep(5 * time.Millisecond)
	return true
}

var cachedCityByCountry = map[string]*[]City{}
var cachedCityByCountryLock sync.RWMutex

func getCityByCountryNameFromCache(ctx context.Context, countryName string) *[]City {
	span, ctx := opentracing.StartSpanFromContext(ctx, "getCityByCountryNameFromCache")
	defer span.Finish()

	cachedCityByCountryLock.RLock()
	defer cachedCityByCountryLock.RUnlock()

	return cachedCityByCountry[countryName]
}

func setCityByCountryNameFromCache(ctx context.Context, countryName string, cities []City) {
	span, ctx := opentracing.StartSpanFromContext(ctx, "setCityByCountryNameFromCache")
	defer span.Finish()

	cachedCityByCountryLock.Lock()
	defer cachedCityByCountryLock.Unlock()

	cachedCityByCountry[countryName] = &cities
}