Zero Downtime Reload With Socketmaster

Nowadays, zero downtime reload is mandatory for most systems. Especially for the system that is accessed all the time. Stakeholders demand the high availability of the system. So it is bad if the system needs downtime for reloads even if it’s in milliseconds. Socketmaster is there to help your system reload with zero downtime.

What is socketmaster

Socketmaster is an application that enables us to reload our application without downtime. It works by running our application as its child. On reload, socketmaster will start another process and send a termination signal to the old children. So we can handle incoming requests while waiting for active connection on the old processes to finish. By doing this, we won’t lose any request. Zero downtime reload is achieved.

To install socketmaster you can go to https://github.com/zimbatm/socketmaster to download the binary or compile it yourself.
As written on the socketmaster Readme, there are few things that we need to do to integrate socketmaster with our service.

Your server is responsible for:

  • opening the socket passed on fd 3
  • not crashing
  • gracefully shutdown on SIGTERM (close the listener, wait for the child connections to close)

We will cover it all one by one.

To experiment with socketmaster, I created a simple web server in Go. I send continuous requests to the server and reload the service. We should get all successful responses, even when the server is reloaded. Note that even though socketmaster is written in Go, it can also handle any other system.

My web server in Go

I use the following code for the web server. This code will run a simple web server and we will integrate it with socketmaster to enable zero downtime reload.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func main() {
    // create handler
    r := mux.NewRouter()
    r.HandleFunc("/{path}", myHandler)

    // create listener
    f := os.NewFile(uintptr(3), "listener")
    listener, err := net.FileListener(f)
    if err != nil {
        log.Fatalln("error create listener", err)
    }

    srv := http.Server{
        Handler: r,
    }
    go func() {
        fmt.Println("listening...")
        srv.Serve(listener)
    }()

    // accept sigterm 
    term := make(chan os.Signal)
    signal.Notify(term, syscall.SIGTERM)

    // shutdown when sigterm received
    fmt.Println("got signal", <-term)
    fmt.Println("shutting down..")
    srv.Shutdown(context.Background())
}

On lines 7 to 11, you see that our web server will listen to fd 3, as stated on requirement point 1, that is opening the socket passed on fd 3.

For the next requirement not crashing, by default Go HTTP server has a panic recovery. But if you want to use your own panic recovery, you can check my post here.

Then for the last requirement gracefully shutdown on SIGTERM (close the listener, wait for the child connections to close), see the line 22 - 23 of the code above. We listen to SIGTERM and put it on a channel. On line 26, we wait until the termination signal is received on the channel. The code execution is blocked until the signal is received. If the termination signal is received, the code execution continues to line 28 where the server is gracefully shutdown. By then the socketmaster already spawned a new process to handle the incoming request.

I use this handler for testing. It will sleep for 3 seconds, then return a response with the current pid.

1
2
3
4
5
6
func myHandler(w http.ResponseWriter, r *http.Request) {
    fmt.Println("handle", r.URL.RequestURI())
    time.Sleep(3 * time.Second)
    pid := os.Getpid()
    w.Write([]byte(fmt.Sprintf(`{"status": "success", "pid": %d}`, pid)))
}

Run the socketmaster & test it

Build the go app. Then use this command to run the goapp with socket master.

socketmaster -listen tcp://:7008 -command=./mygoapp

-listen tcp://:7008 socket master will listen on tcp port 7008.
-command=./mygoapp this is the command that will be executed by socketmaster. The socketmaster will execute a new command on reload, then signaling the old process.

When run, it will produce log like this

socketmaster[2969] 2021/02/10 15:39:25 Listening on tcp://:7008
socketmaster[2969] 2021/02/10 15:39:25 Starting ./mygoapp [./mygoapp]
socketmaster[2969] 2021/02/10 15:39:26 [2970] listening...

To reload it we can send HUP signal to the socketmaster’s pid. In my example, the pid is 2969.

kill -HUP <pid>

To test it, I send continous request to the server, then reload it. Below is the result.

1
2
3
4
5
6
7
8
203504 :: ~ » curl localhost:7008/abc
{"status": "success", "pid": 332}
203504 :: ~ » curl localhost:7008/abc
{"status": "success", "pid": 332}
203504 :: ~ » curl localhost:7008/abc
{"status": "success", "pid": 368}
203504 :: ~ » curl localhost:7008/abc
{"status": "success", "pid": 368}

I reload the server when serving the request on line 3.
The first thing to make sure from the test is that all the requests got a response from the server. Notice that the pid in response in line 6 changed because it is handled by a different process after the reload.

Run it with systemd

Usually, like other server applications, the socketmaster is run as a daemon. Which means it runs in the background. In Linux, there is systemd to run it in the background. This is a sample file to run socketmaster as a daemon. This file is saved in /etc/systemd/system/myservice.service.

[Unit]
Description=<description about this service>

[Service]
ExecStart=socketmaster -listen tcp://:7008 -command=./mygoapp
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

To start it

systemctl start myservice  

To reload it

systemctl reload myservice

To get its status

systemctl status myservice
It will show this status
root@d6bc11757efd:/# systemctl status myservice
* myservice.service - <description about this service>
     Loaded: loaded (/etc/systemd/system/myservice.service; disabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-02-10 04:21:27 UTC; 1min 7s ago
    Process: 378 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
   Main PID: 352 (socketmaster)
      Tasks: 14 (limit: 2227)
     Memory: 2.7M
     CGroup: /docker/d6bc11757efdb63c8a359353764b29e72e752d39b2c6a8a2ca7514d56b6bcb86/system.slice/myservice.service
             |-352 /usr/local/bin/socketmaster -listen tcp://:7008 -command=./mygoapp
             `-379 ./mygoapp

Conclusion

With socketmaster we can reload our application without downtime. There are few things to be handled by your application to integrate with socketmaster. But it is simple and should not require much change. Socketmaster is ready to use in production. If you have any questions, leave a comment below.


See also