Recently we’ve been having issues with one of our client’s servers rebooting nightly and taking down the application running on that server. Rancher should automatically start up the application once the server is up again but the rancher-agent wasn’t starting up by itself, causing Rancher to not see the server status. We had to manually log into the server and run:

So here is how you can fix this using the built in autorun system called systemd in CoreOS.

NOTE: Anywhere you see a <URL> in this post, it is a reference to Rancher’s host URL. You can get this URL via your Rancher application UI under Infrastructure > Hosts > Add Host > Custom

The Install

To setup an autorun process, we need to create a service file under the /etc/systemd/system directory called rancher-agent.service:

Enter the follow into the file:

The [Unit] section just has metadata about this service
The [Service] section has the details on how to start and stop the service.

Before we can get into the details of the [Service] section, we need to understand how the Rancher Agent works.

When you run the command sudo /usr/bin/docker run -d --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/rancher:/var/lib/rancher rancher/agent:v1.0.2 <URL>  a few things happen:

  1. The rancher/agent that starts up will shortly exit with an error code of 137.
  2. A new rancher/agent will load up in it’s place and it will always be named rancher-agent.
  3. A new rancher agent instance will start up (an image of rancher/agent-instance:v0.8.1). This is the network agent that communicates with your Rancher host application.
  4. If you have a load balancer configured (via Rancher application), it will start up under the image rancher/agent-instance:v0.8.3.
  5. Any other docker containers that should be on this host will start firing up.

So with all those things in mind, here is what the [Service] script is doing:

Now that we have our service script file written, we need to enable it to start after a reboot by running this command:

Keep in mind that this only sets it up to start running during booting.
To start the script now, we have to execute the following command:

There won’t be any output from the command, however, you can see the status of your service by running:

If you go to your Rancher Application, you should also see your host there.

Rancher

Troubleshooting

While working on this, we had many issues but we won’t get into here. The important thing to know is how to work with systemd to reload and test your script.

Whenever you edit the rancher-agent.service file, you want to make sure to do the following steps:

  1. Stop the current service by running:
  2. Reload the service file(s) by running:
  3. Start the service by running:
  4. Check the status:

Hopefully this will help you troubleshoot your issues and get this back up and running.

 

Feel free to leave a comment below with any questions and we’ll do our best to answer your questions.