Monitor Server Metrics With Prometheus and Grafana

Monitor Server Metrics With Prometheus and Grafana

Monitoring your server’s metrics is important to maintain the reliability of your service. You can also save costs by downgrading your server if the average load is much lower than the maximum capacity of your server. There are many ways and tools to monitor your server. One of the most popular is by using Prometheus to scrape and store the metrics. And Grafana to visualize the data. This article will show you how to monitor your server’s metrics with Prometheus and Grafana, especially for Linux servers.

The tools

The tools that we are using are node_exporter, Prometheus server, and Grafana. We will need docker to run the Prometheus server and Grafana.

The node_exporter is a daemon that runs on your target server that exposes the server’s metrics. It needs to be installed on each of your target servers. It will handle /metrics API on default port 9100. The metrics will be scrapped by Prometheus and then stored on the database. Grafana will query the data from Prometheus and visualize it to various kinds of graphs.
The architecture will look like this diagram:

grafana prome node

Expose The Metrics with node_exporter

SSH to your servers and download node_exporter package. If you want another version go to https://prometheus.io/download/#node_exporter.

wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz

Extract the package

tar xzvf node_exporter-0.18.1.linux-amd64.tar.gz

Go to the package folder and run the node_exporter on the background.

cd node_exporter-0.18.1.linux-amd64
./node_exporter &

In production, you may need some tools such as Ansible to automate these processes.

After you install and run node_exporter, hit /metrics on port 9100 of your server to verify it.

curl localhost:9100/metrics

You will see the metrics in something like this:

# HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 65486.1
node_cpu_seconds_total{cpu="0",mode="iowait"} 34.49
node_cpu_seconds_total{cpu="0",mode="irq"} 0
...
...

It means that your we can successfully scrape the metrics.

Scrape The Metrics with Prometheus Server

In this tutorial, we will install Prometheus server on docker. If you want to use another method, go to https://prometheus.io/download/.
First, we need to create a configuration file. Copy and paste the following code and save it as prometheus.yml.

global:
  scrape_interval:     10s
  
scrape_configs:
  - job_name: 'node'
    static_configs:
    - targets: ['192.168.100.17:9100']

This is a simple configuration file. The scrape_interval is how frequent the Prometheus server scrapes the target servers, scrape_interval 10s means that Prometheus will scrape every 10s. Put the IP of your servers in targets value. For more information about the configuration, go to https://prometheus.io/docs/prometheus/latest/configuration/configuration/.

Use this command to run Prometheus with docker. Don’t forget to change /path/to/your/prometheus.yml with the location of your file.

docker run -d \
    -p 9090:9090 \
    -v /path/to/your/prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus

After it is successfully run, go to http://localhost:9090 to see the Prometheus dashboard.

prome dashboard

Click Status > Targets to see if your target servers are successfully scraped by Prometheus.

prome targets

You can already query some metrics from your Prometheus. To do that, click Graph, and put avg by(instance) (node_load5) in the Expression field. It will show the load average of each server.

Visualize The Metrics with Grafana

Grafana is an open-source tool to monitor and analyze your metrics. We want to use Grafana to have a beautiful visualization of your metrics. It has many powerful features for monitoring and analyzing. It also has an alerting feature which is very useful. It is used by thousands of companies.

Use this run to install Grafana with docker. If you want another method, go to https://grafana.com/get.

docker run -d -p 3000:3000 grafana/grafana

Open http://localhost:3000/ to see the Grafana dashboard. Login with default username admin and password admin. After you login, make sure you change your password. You will see this page:

prome targets

Click Add data source.
Select Prometheus.
Input name of the data source and URL of your Prometheus server. Leave other fields as it is for now.

grafana data source

Click Save & Test.

Now go to Grafana Home and click New Dashboard, then click Add Query. Let’s use this query again avg by(instance) (node_load5) and see the graph.

grafana graph

Try playing around with the graph and the dashboard.

Some Useful Query Sample

Below are some of the helpful query samples. For more info, https://prometheus.io/docs/prometheus/latest/querying/basics/#examples.

# cpu usage
100 - (avg by(instance)(irate(node_cpu_seconds_total{mode="idle"}[5m]))*100)

# memory used
node_memory_MemTotal_bytes{} - node_memory_MemAvailable_bytes{}

# disk space used 
1-(node_filesystem_free_bytes{} / node_filesystem_size_bytes{})

# network received rate
rate(node_network_receive_bytes_total{instance=~'$node',device='eth0'}[1m])

# network transmitted rate
rate(node_network_transmit_bytes_total{instance=~'$node',device='eth0'}[1m])

See also

comments powered by Disqus