5 Supervisor and Application
So far our application requires an event manager and a registry. It may potentially use dozens, if not hundreds, of buckets. While we may think our implementation so far is quite good, no software is bug free, and failures are definitely going to happen.
When things fail, your first reaction may be: "let's rescue those errors". But, as we have learned in the Getting Started guide, in Elixir we don't have the defensive programming habit of rescuing exceptions, as commonly seen in other languages. Instead, we say "fail fast" or "let it crash". If there is a bug that leads our registry to crash, we have nothing to worry about because we are going to setup a supervisor that will start a fresh copy of the registry.
In this chapter, we are going to learn about supervisors and also about applications. We are going to create not one, but two supervisors, and use them to supervise our processes.
5.1 Our first Supervisor
Creating a supervisor is not much different from creating a GenServer. We are going to define a module named KV.Supervisor
, which will use the Supervisor behaviour, inside the lib/kv/supervisor.ex
file:
defmodule KV.Supervisor do
use Supervisor
def start_link do
Supervisor.start_link(__MODULE__, :ok)
end
@manager_name KV.EventManager
@registry_name KV.Registry
def init(:ok) do
children = [
worker(GenEvent, [[name: @manager_name]]),
worker(KV.Registry, [@manager_name, [name: @registry_name]])
]
supervise(children, strategy: :one_for_one)
end
end
Our supervisor has two children: the event manager and the registry. It's common to give names to processes under supervision so that other processes can access them by name without needing to know their pid. This is useful because a supervised process might crash, in which case its pid will change when the supervisor restarts it. We declare the names of our supervisor's children by using the module attributes @manager_name
and @registry_name
, then reference those attributes in the worker definitions. While it's not required that we declare the names of our child processes in module attributes, it's helpful, because doing so helps make them stand out to the reader of our code.
For example, the KV.Registry
worker receives two arguments, the first is the name of the event manager and the second is a keyword list of options. In this case, we set the name option to [name: KV.Registry]
(using our previously-defined module attribute, @registry_name
), guaranteeing we can access the registry by the name KV.Registry
throughout the application. It is very common to name the children of a supervisor after the module that defines them, as this association becomes very handy when debugging a live system.
The order children are declared in the supervisor also matters. Since the registry depends on the event manager, we must start the latter before the former. That's why the GenEvent
worker must come before the KV.Registry
worker in the children list.
Finally, we call supervise/2
, passing the list of children and the strategy of :one_for_one
.
The supervision strategy dictates what happens when one of the children crashes. :one_for_one
means that if a child dies only one is restarted to replace it. This strategy makes sense for now. If the event manager crashes, there is no reason to restart the registry and vice-versa. However, those dynamics may change once we add more children to supervisor. The Supervisor
behaviour supports many different strategies and we will discuss three of them in this chapter.
If we start a console inside our project using iex -S mix
, we can manually start the supervisor:
iex> KV.Supervisor.start_link
{:ok, #PID<0.66.0>}
iex> KV.Registry.create(KV.Registry, "shopping")
:ok
iex> KV.Registry.lookup(KV.Registry, "shopping")
{:ok, #PID<0.70.0>}
When we started the supervisor tree, both the event manager and registry worker were automatically started, allowing us to create buckets without the need to manually start these processes.
In practice though, we rarely start the application supervisor manually. Instead it is started as part of the application callback.
5.2 Understanding applications
We have been working inside an application this entire time. Every time we changed a file and ran mix compile
, we could see Generated kv.app
message in the compilation output.
We can find the generated .app
file at _build/dev/lib/kv/ebin/kv.app
. Let's have a look at its contents:
{application,kv,
[{registered,[]},
{description,"kv"},
{applications,[kernel,stdlib,elixir,logger]},
{vsn,"0.0.1"},
{modules,['Elixir.KV','Elixir.KV.Bucket',
'Elixir.KV.Registry','Elixir.KV.Supervisor']}]}.
This file contains Erlang terms (written using Erlang syntax). Even though we are not familiar with Erlang, it is easy to guess this file holds our application definition. It contains our application version
, all the modules defined by it, as well as a list of applications we depend on, like Erlang's kernel
and elixir
itself, and logger
which is specified in the application list in mix.exs
.
It would be pretty boring to update this file manually every time we add a new module to our application. That's why mix generates and maintains it automatically for us.
We can also configure the generated .app
file by customizing the values returned by the application/0
inside our mix.exs
project file. We will get to that in upcoming chapters.
5.2.1 Starting applications
When we define an .app
file, which is the application definition, we are able to start and stop the application as a whole. We haven't worried about this so far for two reasons:
Mix automatically starts our current application for us
Even if Mix didn't start our application for us, our application does not yet need to do anything when it starts
In any case, let's see how Mix starts the application for us. Let's start a project console with iex -S mix
and try:
iex> Application.start(:kv)
{:error, {:already_started, :kv}}
Oops, it's already started.
We can pass an option to mix to ask it to not start our application. Let's give it a try by running iex -S mix run --no-start
:
iex> Application.start(:kv)
{:error, {:not_started, :logger}}
Now we get an error because an application that :kv
depends on (:logger
in this case) hasn't been started. Mix normally starts the whole hierarchy of applications defined in our project's mix.exs
file and it does the same for all dependencies if they depend on other applications. But since we passed the --no-start
flag, we need to either start each application manually in the correct order or call Application.ensure_all_started
as follows:
iex> Application.ensure_all_started(:kv)
{:ok, [:logger, :kv]}
iex> Application.stop(:kv)
18:12:10.698 [info] Application kv exited :stopped
:ok
Nothing really exciting happens but it shows how we can control our application.
When you run
iex -S mix
, it is equivalent to runningiex -S mix run
. So whenever you need to pass more options to mix when starting iex, it's just a matter of typingmix run
and then passing any options therun
command accepts. You can find more information aboutrun
by runningmix help run
in your shell.
5.2.2 The application callback
Since we spent all this time talking about how applications are started and stopped, there must be a way to do something useful when the application starts. And indeed, there is!
We can specify an application callback function. This is a function that will be invoked when the application starts. The function must return a result of {:ok, pid}
, where pid
is the process identifier of a supervisor process.
We can configure the application callback in two steps. First, open up the mix.exs
file and change def application
to the following:
def application do
[applications: [],
mod: {KV, []}]
end
The :mod
option specifies the "application callback module", followed by the arguments to be passed on application start. The application callback module can be any module that implements the Application behaviour.
Now that we have specified KV
as the module callback, we need to change the KV
module, defined in lib/kv.ex
:
defmodule KV do
use Application
def start(_type, _args) do
KV.Supervisor.start_link
end
end
When we use Application
, we only need to define a start/2
function. If we wanted to specify custom behaviour on application stop, we could define a stop/1
function, as well. In this case, the one automatically defined by use Application
is fine.
Let's start our project console once again with iex -S mix
. We will see a process named KV.Registry
is already running:
iex> KV.Registry.create(KV.Registry, "shopping")
:ok
iex> KV.Registry.lookup(KV.Registry, "shopping")
{:ok, #PID<0.88.0>}
Excellent!
5.2.3 Projects or applications?
Mix makes a distinction between projects and applications. Based on the current contents of our mix.exs
file, we would say we have a Mix project that defines the :kv
application. As we will see in later chapters, there are projects that don't define any application.
When we say "project," you should think about Mix. Mix is the tool that manages your project. It knows how to compile your project, test your project and more. It also knows how to compile and start the application relevant to your project.
When we talk about applications, we talk about OTP. Applications are the entities that are started and stopped as a whole by the runtime. You can learn more about applications in the docs for the Application module, as well as by running mix help compile.app
to learn more about the supported options in def application
.
5.3 Simple one for one supervisors
We have now successfully defined our supervisor which is automatically started (and stopped) as part of our application lifecycle.
Remember however that our KV.Registry
is both linking and monitoring bucket processes in the handle_cast/2
callback:
{:ok, pid} = KV.Bucket.start_link()
ref = Process.monitor(pid)
Links are bi-directional, which implies that a crash in a bucket will crash the registry. Although we now have the supervisor, which guarantees the registry will be back up and running, crashing the registry still means we lose all data associating bucket names to their respective processes.
In other words, we want the registry to keep on running even if a bucket crashes. Let's write a test:
test "removes bucket on crash", %{registry: registry} do
KV.Registry.create(registry, "shopping")
{:ok, bucket} = KV.Registry.lookup(registry, "shopping")
# Kill the bucket and wait for the notification
Process.exit(bucket, :shutdown)
assert_receive {:exit, "shopping", ^bucket}
assert KV.Registry.lookup(registry, "shopping") == :error
end
The test is similar to "removes bucket on exit" except that we are being a bit more harsh. Instead of using Agent.stop/1
, we are sending an exit signal to shutdown the bucket. Since the bucket is linked to the registry, which is then linked to the test process, killing the bucket causes the registry to crash which then causes the test process to crash too:
1) test removes bucket on crash (KV.RegistryTest)
test/kv/registry_test.exs:52
** (EXIT from #PID<0.94.0>) shutdown
One possible solution to this issue would be to provide a KV.Bucket.start/0
, that invokes Agent.start/1
, and use it from the registry, removing the link between registry and buckets. However, this would be a bad idea, because buckets would not be linked to any process after this change. This means that if someone stops the kv
application, all buckets would remain alive as they are unreachable.
We are going to solve this issue by defining a new supervisor that will spawn and supervise all buckets. There is one supervisor strategy, called :simple_one_for_one
, that is the perfect fit for such situations: it allows us to specify a worker template and supervise many children based on this template.
Let's define our KV.Bucket.Supervisor
as follows:
defmodule KV.Bucket.Supervisor do
use Supervisor
def start_link(opts \\ []) do
Supervisor.start_link(__MODULE__, :ok, opts)
end
def start_bucket(supervisor) do
Supervisor.start_child(supervisor, [])
end
def init(:ok) do
children = [
worker(KV.Bucket, [], restart: :temporary)
]
supervise(children, strategy: :simple_one_for_one)
end
end
There are two changes in this supervisor compared to the first one.
First, we define a start_bucket/1
function that will receive a supervisor and start a bucket process as a child of that supervisor. start_bucket/1
is the function we are going to invoke instead of calling KV.Bucket.start_link
directly in the registry.
Second, in the init/1
callback, we are marking the worker as :temporary
. This means that if the bucket dies, it won't be restarted! That's because we only want to use the supervisor as a mechanism to group the buckets. The creation of buckets should always pass through the registry.
Run iex -S mix
so we can give our new supervisor a try:
iex> {:ok, sup} = KV.Bucket.Supervisor.start_link
{:ok, #PID<0.70.0>}
iex> {:ok, bucket} = KV.Bucket.Supervisor.start_bucket(sup)
{:ok, #PID<0.72.0>}
iex> KV.Bucket.put(bucket, "eggs", 3)
:ok
iex> KV.Bucket.get(bucket, "eggs")
3
Let's change the registry to work with the buckets supervisor. We are going to follow the same strategy we did with the events manager, where we will explicitly pass the buckets supervisor pid to KV.Registry.start_link/3
. Let's start by changing the setup callback in test/kv/registry_test.exs
to do so:
setup do
{:ok, sup} = KV.Bucket.Supervisor.start_link
{:ok, manager} = GenEvent.start_link
{:ok, registry} = KV.Registry.start_link(manager, sup)
GenEvent.add_mon_handler(manager, Forwarder, self())
{:ok, registry: registry}
end
Now let's change the appropriate functions in KV.Registry
to take the new supervisor into account:
## Client API
@doc """
Starts the registry.
"""
def start_link(event_manager, buckets, opts \\ []) do
# 1. Pass the buckets supevisor as argument
GenServer.start_link(__MODULE__, {event_manager, buckets}, opts)
end
## Server callbacks
def init({events, buckets}) do
names = HashDict.new
refs = HashDict.new
# 2. Store the buckets supevisor in the state
{:ok, %{names: names, refs: refs, events: events, buckets: buckets}}
end
def handle_cast({:create, name}, state) do
if HashDict.get(state.names, name) do
{:noreply, state}
else
# 3. Use the buckets supervisor instead of starting buckets directly
{:ok, pid} = KV.Bucket.Supervisor.start_bucket(state.buckets)
ref = Process.monitor(pid)
refs = HashDict.put(state.refs, ref, name)
names = HashDict.put(state.names, name, pid)
GenEvent.sync_notify(state.events, {:create, name, pid})
{:noreply, %{state | names: names, refs: refs}}
end
end
Those changes should be enough to make our tests pass! To complete our task, we just need to update our supervisor to also take the buckets supervisor as child.
5.4 Supervision trees
In order to use the buckets supervisor in our application, we need to add it as a child of KV.Supervisor
. Notice we are beginning to have supervisors that supervise other supervisors, forming so-called "supervision trees."
Open up lib/kv/supervisor.ex
, add an additional module attribute for the buckets supervisor name, and change init/1
to match the following:
@manager_name KV.EventManager
@registry_name KV.Registry
@bucket_sup_name KV.Bucket.Supervisor
def init(:ok) do
children = [
worker(GenEvent, [[name: @manager_name]]),
supervisor(KV.Bucket.Supervisor, [[name: @bucket_sup_name]]),
worker(KV.Registry, [@manager_name, @bucket_sup_name, [name: @registry_name]])
]
supervise(children, strategy: :one_for_one)
end
This time we have added a supervisor as child and given it the name of KV.Bucket.Supervisor
(again, the same name as the module). We have also updated the KV.Registry
worker to receive the bucket supervisor name as argument.
Also remember that the order in which children are declared is important. Since the registry depends on the buckets supervisor, the buckets supervisor must be listed before it in the children list.
Since we have added more children to the supervisor, it is important to evaluate if the :one_for_one
strategy is still correct. One flaw that shows up right away is the relationship between registry and buckets supervisor. If the registry dies, the buckets supervisor must die too, because once the registry dies all information linking the bucket name to the bucket process is lost. If the buckets supervisor is kept alive, it would be impossible to reach those buckets.
We could consider moving to another strategy like :one_for_all
. The :one_for_all
strategy kills and restarts all children whenever one of the children die. This change is not ideal either, because a crash in the registry should not crash the event manager. In fact, doing so would be harmful, as crashing the event manager would cause all installed event handlers to be removed.
One possible solution to this problem is to create another supervisor that will supervise the registry and buckets supervisor with :one_for_all
strategy, and have the root supervisor supervise both the event manager and the new supervisor with :one_for_one
strategy. The proposed tree would have the following format:
* root supervisor [one_for_one]
* event manager
* supervisor [one_for_all]
* buckets supervisor [simple_one_for_one]
* buckets
* registry
You can take a shot at building this new supervision tree, but we will stop here. This is because in the next chapter we will make changes to the registry that will allow the registry data to be persisted, making the :one_for_one
strategy a perfect fit.
Remember, there are other strategies and other options that could be given to worker/2
, supervisor/2
and supervise/2
functions, so don't forget to check out the Supervisor module documentation.