Fixing someone else's elixir project

2020-05-21

Table of contents

Make or break’s website is powered by an API server written in Elixir, using Phoenix.

The technology choice here was probably based on “let’s try something new”, so it’s understandable if the project structure feels awkward.

I barely touched this code, but I remember that deploying it has always been kind of messy. It took ~15 minutes to build and deploy a new version. It started by using Gatling, which seems to create Distillery releases (I don’t know what this means).

At some point, we changed to edeliver, and started using incremental builds to save some time.

This is all deployed to a VPS somewhere, manually managed. No Heroku, no docker, no CI. I have no idea why this is currently so hard/weird to deploy, but my goal is to make this deployable via docker and have its build time be reasonable, given the size of the codebase.

This is the repository I’m going to be looking at:

https://github.com/makeorbreak-io/api

Exploring the codebase

I have looked at the elixir syntax and did some basic changes to the project. I’ve interviewed someone on an episode of Conversas em código. That’s as far as my knowledge goes. I don’t know how the tooling works, or the runtime, or what the interpreter vs compiler expectations are.

The first thing I did was read the README. Things I noticed:

It talks about mix: mix deps.get, mix ecto.create, etc. I take it that mix is the npm/yarn/cargo/bundler+rake of this language.

It mentions edeliver: edeliver build, release, restart, migrate. This is executed directly on the server, from what I understand. It also mentions something about preserving dependency built files. I suppose this is trying to cache dependencies to make things faster.

It mentions systemd. This means there’s probably a service file somewhere under /etc/systemd/system/.

It lists a set of environment variables. This is good, because it’s an indicator that the configuration is done the way I do it in other languages, so there will be some familiarity here.

Environment variables

There’s a list of environment variables in the README, but these are usually outdated. Let’s see if that’s the case here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
hugopeixoto@laptop:~/w/m/api$ ack System\.get_env -h |
> grep -v '^#' |
> sed -e 's/.*System.get_env("\([^"]*\)").*/\1/' |
> sort -u
AI_CALLBACK_URL
AI_SERVER_HOST
AI_SERVER_TOKEN
DATABASE_URL
DB_URL
GITHUB_TOKEN
HOST
MAILGUN_API_DOMAIN
MAILGUN_API_KEY
POOL_SIZE
PORT
SECRET_KEY_BASE
SLACK_TOKEN

The project references both a DB_URL and a DATABASE_URL. This feels accidental. I’ve created an issue and a PR to normalize it:

https://github.com/makeorbreak-io/api/issues/9 https://github.com/makeorbreak-io/api/pull/10

The other environment variables all seem to match. The readme said something else of note:

Use your preferred method to add the following variables to your environment. You can find an example env file you can source in share/env/env

This means that this is not using dotenv, or anything similar, to manage environment variables while in development. Is there a dotenv package for elixir?

https://github.com/avdi/dotenv_elixir

WARNING: This isn’t the Elixir way.

Elixir has an excellent configuration system and this dotenv implementation has a serious limitation in that it isn’t available at compile time. It fits very poorly into a typical deployment setup using exrm or similar. Configuration management should be built around Elixir’s existing configuration system, and the limitations of this package make it inadequate for most users.

A good example is Phoenix which generates a project where the production config imports the “secrets” from a file stored outside of version control. Even if you’re using this for development, the same approach could be taken.

This is kind of odd, but it matches the config/ directory of the project:

1
2
3
4
5
6
7
8
9
hugopeixoto@laptop:~/w/m/api$ tree config/
config/
├── config.exs
├── dev.exs
├── prod.exs
├── prod.secret.exs
└── test.exs

0 directories, 5 files

There’s a .exs file per environment, with different configs:

1
2
3
4
5
6
7
8
9
10
11
hugopeixoto@laptop:~/w/m/api$ ack DATABASE_URL
config/test.exs
7:  url: "#{System.get_env("DATABASE_URL")}-test"

config/prod.secret.exs
8:  System.get_env("DATABASE_URL") ||
10:    environment variable DATABASE_URL is missing.

lib/api/repo.ex
8:  DATABASE_URL environment variable.
11:    {:ok, Keyword.put(opts, :url, System.get_env("DATABASE_URL"))}

So, production uses DATABASE_URL, test uses DATABASE_URL concatenated with -test. What does dev use? There’s no match in config/dev.exs.

What is that last match, lib/api/repo.ex?

1
2
3
4
5
6
7
8
9
10
11
12
13
defmodule Api.Repo do
  use Ecto.Repo,
    otp_app: :api,
    adapter: Ecto.Adapters.Postgres

  @doc """
  Dynamically loads the repository url from the
  DATABASE_URL environment variable.
  """
  def init(_, opts) do
    {:ok, Keyword.put(opts, :url, System.get_env("DATABASE_URL"))}
  end
end

Let’s confront it with config/test.exs:

1
2
3
4
5
6
7
use Mix.Config

# Configure your database
config :api, Api.Repo,
  pool: Ecto.Adapters.SQL.Sandbox,
  adapter: Ecto.Adapters.Postgres,
  url: "#{System.get_env("DATABASE_URL")}-test"

So, there’s a module Api.Repo which has an initializer that takes opts and returns {:ok, modified_opts}. What does config in test.exs do? I would assume that it prepares the opts that will be passed to the init function. Since Keyword.put overrides the :url key in opts, that would mean that setting url: in test.exs is useless.

With some help from Artur Ferreira, I found out that this apparently was a bug in the phoenix generator:

https://github.com/phoenixframework/phoenix/pull/2650

With the previous code, even if we were populating the url: param in the dev.exs file we would have errors being thrown as the DATABASE_URL would be empty and would also empty the param for ecto thus creating an error.

So this means that running tests in this project will accidentally use the same database that is used in development mode. I suspect that they don’t run tests that often.

So, all environments use DATABASE_URL, and every :url in the config files is irrelevant. Before moving on, let me read the config documentation, just so that I understand how it works underneath.

https://hexdocs.pm/mix/Mix.Config.html

This module is deprecated. Use Config and Config.Reader instead.

Oh, cool. Well, this doesn’t point to an upgrade guide or anything like that, so I searched the web for how to migrate, and ended up here:

https://elixir-lang.org/blog/2019/06/24/elixir-v1-9-0-released/

The main feature in Elixir v1.9 is the addition of releases. A release is a self-contained directory that consists of your application code, all of its dependencies, plus the whole Erlang Virtual Machine (VM) and runtime.

Although unrelated, this may be useful when I’m building the docker image.

We also use the work on releases to streamline Elixir’s configuration API. A new Config module has been added to Elixir. The previous configuration API, Mix.Config, was part of the Mix build tool. However, since releases provide runtime configuration and Mix is not included in releases, we ported the Mix.Config API to Elixir. In other words, use Mix.Config has been soft-deprecated in favor of import Config.

Ah, this explains it. Now if only they linked to an upgrade guide. Some more searching led me here:

https://hexdocs.pm/elixir/master/Config.html

1
2
3
4
5
6
7
import Config

config :some_app,
  key1: "value1",
  key2: "value2"

import_config "#{Mix.env()}.exs"

Migrating from use Mix.Config

The Config module in Elixir was introduced in v1.9 as a replacement to Mix.Config, which was specific to Mix and has been deprecated.

You can leverage Config instead of Mix.Config in two steps. The first step is to replace use Mix.Config at the top of your config files by import Config.

The second is to make sure your import_config/1 calls do not have a wildcard character. If so, you need to perform the wildcard lookup manually.

If you are using releases, see mix release, there is another configuration file called config/releases.exs. While config/config.exs and friends mentioned in the previous section are executed whenever you run a Mix command, including when you assemble a release, config/releases.exs is executed every time your production system boots. Since Mix is not available in a production system, config/releases.exs must not use any of the functions from Mix.

Well, it doesn’t look that different from Mix.Config. I’ll make a PR changing this. While I’m at it, I’ll remove the config/prod.secret.exs file as well, since we’re not storing secrets directly in config files, so the split between prod.exs and prod.secret.exs is kind of useless. The secret.exs file is commited to the repository, so even if it had any secrets, they wouldn’t be very secrety. We also had config/prod.secret.exs in gitgnore. Removing that as well.

While doing this, I saw this comment:

Using releases (Elixir v1.9+)

If you are doing OTP releases, you need to instruct Phoenix to start each relevant endpoint:

1
config :api, ApiWeb.Endpoint, server: true

Then you can assemble a release by calling mix release. See mix help release for more information.

This might be relevant later on.

https://github.com/makeorbreak-io/api/pull/11

Now, what does config do?

https://github.com/elixir-lang/elixir/blob/50995e66404fee889846db7343705f881b2a3f25/lib/elixir/lib/config.ex#L132

It seems to call put_config, which calls Process.put. Who reads this?

So it seems that init method in our lib/api/repo.ex is called by ecto:

https://github.com/elixir-ecto/ecto/blob/v3.4.4/lib/ecto/repo.ex#L381

Also from that file:

In case the URL needs to be dynamically configured, for example by reading a system environment variable, such can be done via the c:init/2 repository callback:

1
2
3
def init(_type, config) do
  {:ok, Keyword.put(config, :url, System.get_env("DATABASE_URL"))}
end

This is the code we saw earlier. Does this mean that using System.get_env in config/*.exs files will behave differently? To be determined.

Ecto fetches the config using:

1
Ecto.Repo.Supervisor.runtime_config(:runtime, __MODULE__, @otp_app, [])

which calls:

1
config = Application.get_env(otp_app, repo, [])

I am not sure how Process.put and Application.get_env relate. Looking at the Application documentation, it seems that Mix handles passing things to the application when it starts one.

I think that I have a good enough understanding of how environment variables get passed to applications. I also know that I’ll need to change this if I try to mess with Releases, so I will get back to this later.

Time to try to get a basic Dockerfile running.

Dockerfile

Like in other languages, I know that I don’t want to download every dependency if I don’t add new ones, so I’ll start with this:

1
2
3
4
5
6
7
FROM elixir

ADD mix.exs mix.lock /app/

WORKDIR /app

RUN mix deps.get

This does not work:

1
2
3
4
5
6
hugopeixoto@laptop:~/w/m/api$ docker build .
[...]
Could not find Hex, which is needed to build dependency :absinthe
Shall I install Hex? (if running non-interactively, use "mix local.hex --force") [Yn]
** (Mix) Could not find an SCM for dependency :absinthe from Api.MixProject
The command '/bin/sh -c mix deps.get' returned a non-zero code: 1

Hex is the package manager, so mix is not enough. Let’s add their suggestion to the Dockerfile:

1
2
3
4
5
6
7
8
FROM elixir

ADD mix.exs mix.lock /app/

WORKDIR /app

RUN mix local.hex --force
RUN mix deps.get

“Successfully built”. So, usually, now I would add the rest of the code and compile it. Let’s try that:

1
2
3
4
5
6
7
8
9
10
11
FROM elixir

ADD mix.exs mix.lock /app/

WORKDIR /app

RUN mix local.hex --force
RUN mix deps.get

ADD . /app/
RUN mix compile

Another error:

1
2
3
4
5
Could not find "rebar3", which is needed to build dependency :parse_trans
I can install a local copy which is just used by Mix
Shall I install rebar3? (if running non-interactively, use "mix local.rebar --force") [Yn]
** (Mix) Could not find "rebar3" to compile dependency :parse_trans, please ensure "rebar3" is available
The command '/bin/sh -c mix compile' returned a non-zero code: 1

Rebar seems to be a tool to “create, develop and release Erlang libraries”, as opposed to Elixir, I guess. Let’s add that as well:

1
2
3
4
5
6
7
8
9
10
11
12
FROM elixir

ADD mix.exs mix.lock /app/

WORKDIR /app

RUN mix local.hex --force
RUN mix rebar.hex --force
RUN mix deps.get

ADD . /app/
RUN mix compile

This worked, but it compiled all the dependencies in the mix compile step, which seems kind of wasteful. Reading through mix docs, there is a mix deps.compile task, so let’s try that before adding the full app:

1
2
3
4
5
6
7
8
9
10
11
12
13
FROM elixir

ADD mix.exs mix.lock /app/

WORKDIR /app

RUN mix local.hex --force
RUN mix rebar.hex --force
RUN mix deps.get
RUN mix deps.compile

ADD . /app/
RUN mix compile

OK, this seems to work. Here’s the time each step takes:

1
2
3
4
5
6
7
8
9
10
11
hugopeixoto@laptop:~/w/m/api$ docker build --no-cache . | grep --line-buffered 'Step\|Success' | ts -i
00:00:00 Step 1/9 : FROM elixir
00:00:00 Step 2/9 : ADD mix.exs mix.lock /app/
00:00:01 Step 3/9 : WORKDIR /app
00:00:00 Step 4/9 : RUN mix local.hex --force
00:00:03 Step 5/9 : RUN mix local.rebar --force
00:00:05 Step 6/9 : RUN mix deps.get
00:00:06 Step 7/9 : RUN mix deps.compile
00:02:10 Step 8/9 : ADD . /app/
00:00:01 Step 9/9 : RUN mix compile
00:00:11 Successfully built b32f0c99fcaf

We can see that the step that takes the longest is mix deps.compile, and we’re able to run it based on mix.exs and mix.lock only, which is great.

Now, I can run the server directly with mix phx.server. This won’t be making use of the releases feature, but we’ll get there as soon as this is running.

I added CMD mix phx.server to the docker file and tried running it:

1
2
3
4
5
6
7
8
9
10
11
12
13
hugopeixoto@laptop:~/w/m/api$ docker run -it mob-api
[error] GenServer #PID<0.383.0> terminating
** (RuntimeError) connect raised KeyError exception: key :database not found.
The exception details are hidden, as they may contain sensitive data such as
database credentials. You may set :show_sensitive_data_on_connection_error to
true when starting your connection if you wish to see all of the details
    (elixir 1.10.3) lib/keyword.ex:399: Keyword.fetch!/2
    (postgrex 0.15.3) lib/postgrex/protocol.ex:92: Postgrex.Protocol.connect/1
    (db_connection 2.2.1) lib/db_connection/connection.ex:69: DBConnection.Connection.connect/2
    (connection 1.0.4) lib/connection.ex:622: Connection.enter_connect/5
    (stdlib 3.12.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: nil
State: Postgrex.Protocol

This makes sense, since I didn’t set any environment variables. I’ll use --env to pass a file with the DATABASE_URL var:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
hugopeixoto@laptop:~/w/m/api$ cat dockerenv
DATABASE_URL=postgres://mob:secretpassword@172.17.0.1/mob-api

hugopeixoto@laptop:~/w/m/api$ createuser mob -P
Enter password for new role:
Enter it again:

hugopeixoto@laptop:~/w/m/api$ docker run --env-file dockerenv -it mob-api
[info] Running ApiWeb.Endpoint with cowboy 2.7.0 at 0.0.0.0:4000 (http)
[info] Access ApiWeb.Endpoint at http://localhost:4000
^C
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
       (l)oaded (v)ersion (k)ill (D)b-tables (d)istribution
^Chugopeixoto@laptop:~/w/m/api$

I guess it’s running. If I bind the port, I can then use curl to access it:

1
2
3
4
5
6
7
8
9
10
hugopeixoto@laptop:~/w/m/api$ docker run -p 4000:4000 --env-file dockerenv -it mob-api
# [in another terminal]

hugopeixoto@laptop:~/w/m/api$ curl -I localhost:4000
HTTP/1.1 404 Not Found
cache-control: max-age=0, private, must-revalidate
content-length: 50348
content-type: text/html; charset=utf-8
date: Fri, 22 May 2020 13:46:45 GMT
server: Cowboy

It’s running. One thing that this is not doing is running database migrations. I’ll leave that for last. Let’s look at the releases feature.

Elixir releases

Going back to the “Elixir v1.9 released” post:

Once a release is assembled, it can be packaged and deployed to a target as long as the target runs on the same operating system (OS) distribution and version as the machine running the mix release command

So there’s a mix release command.

A release does not require the source code to be included in your production artifacts. All of the code is precompiled and packaged. Releases do not even require Erlang or Elixir in your servers, as they include the Erlang VM and its runtime by default.

This means I’ll probably end up using docker multi-stage builds to optimize the final image.

You can start a new project and assemble a release for it in three easy steps:

1
2
3
$ mix new my_app
$ cd my_app
$ MIX_ENV=prod mix release

A release will be assembled in _build/prod/rel/my_app.

Let’s give this a try. I know that I’ll need to do something to the configuration files. I’m also not sure how this will interact with the other commands I currently have in the Dockerfile, but we’ll find out. I’ll start by adding the mix release command right after mix compile. Hopefully it is able to reuse some of the work done there.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
hugopeixoto@laptop:~/w/m/api$ cat Dockerfile
FROM elixir

ADD mix.exs mix.lock /app/

WORKDIR /app

RUN mix local.hex --force
RUN mix local.rebar --force
RUN mix deps.get
RUN mix deps.compile

ADD . /app/
RUN mix compile
ENV MIX_ENV=prod
RUN mix release

CMD mix phx.server

hugopeixoto@laptop:~/w/m/api$ docker build . -t mob-api
[...]
Step 11/12 : RUN mix release
 ---> Running in 0359472a81da
** (RuntimeError) environment variable DATABASE_URL is missing.
For example: ecto://USER:PASS@HOST/DATABASE

    (stdlib 3.12.1) erl_eval.erl:680: :erl_eval.do_apply/6
    (stdlib 3.12.1) erl_eval.erl:449: :erl_eval.expr/5
    (elixir 1.10.3) lib/code.ex:341: Code.eval_string_with_error_handling/3
    (stdlib 3.12.1) erl_eval.erl:680: :erl_eval.do_apply/6
    (stdlib 3.12.1) erl_eval.erl:126: :erl_eval.exprs/5
The command '/bin/sh -c mix release' returned a non-zero code: 1

I don’t want to pass the configuration at build time, so I’ll read a bit on the config/releases.exs thing:

https://hexdocs.pm/phoenix/releases.html

You may have noticed that, in order to assemble our release, we had to set both SECRET_KEY_BASE and DATABASE_URL. That’s because config/config.exs, config/prod.exs, and friends are executed when the release is assembled (or more generally speaking, whenever you run a mix command).

However, in many cases, we don’t want to set the values for SECRET_KEY_BASE and DATABASE_URL when assembling the release but only when starting the system in production

Checks out. So how do we solve this?

Luckily, for such use cases, mix release provides runtime configuration, which we can enable in three steps:

OK, I’ve done the second step (using import Config). Unfortunately, I did merge prod.secret.exs with prod.exs. The way I’m interpreting those instructions is that we would end up in a place where running this through mix with MIX_ENV=prod would no longer work, since it would lack secrets. If this is correct, this means that what this is doing, in practice, is making production only runnable through release. This means I can rename config/prod.exs to config/releases.exs. Let's try that. I need a stub file because we're loading config/#{Mix.env}.exs in config/config.exs`.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
hugopeixoto@laptop:~/w/m/api$ git mv config/prod.exs config/releases.exs
hugopeixoto@laptop:~/w/m/api$ touch config/prod.exs
hugopeixoto@laptop:~/w/m/api$ docker build . -t mob-api
[... nothing new in steps 1 through 10 ...]
Step 11/12 : RUN mix release
 ---> Running in 6cea5ec75e0c
 ==> markus
 Compiling 3 files (.ex)
 Generated markus app
 ===> Compiling parse_trans
 ===> Compiling mimerl
 ==> connection
 Compiling 1 file (.ex)
 Generated connection app
 ===> Compiling metrics
 ===> Compiling unicode_util_compat
 ===> Compiling idna
[... more dependency compiling ...]

Generated api app
* assembling api-0.1.0 on MIX_ENV=prod
* using config/releases.exs to configure the release at runtime
* skipping elixir.bat for windows (bin/elixir.bat not found in the Elixir installation)
* skipping iex.bat for windows (bin/iex.bat not found in the Elixir installation)

Release created at _build/prod/rel/api!

    # To start your system
    _build/prod/rel/api/bin/api start

Once the release is running:

    # To connect to it remotely
    _build/prod/rel/api/bin/api remote

    # To stop it gracefully (you may also send SIGINT/SIGTERM)
    _build/prod/rel/api/bin/api stop

To list all commands:

    _build/prod/rel/api/bin/api

Removing intermediate container 6cea5ec75e0c
 ---> 2f944ef18ac1
Step 12/12 : CMD mix phx.server
 ---> Running in 00da122127e9
Removing intermediate container 00da122127e9
 ---> 1b3ae8ad66eb
Successfully built 1b3ae8ad66eb
Successfully tagged mob-api:latest

Successfully built. On one hand, great, it worked. On the other hand, it recompiled everything, ignoring the previous mix deps.compile command. Maybe it is caused by me setting MIX_ENV=prod so late in the process. Let me try to move it to the beginning of the file and try again:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
hugopeixoto@laptop:~/w/m/api$ cat Dockerfile
FROM elixir

ENV MIX_ENV=prod

ADD mix.exs mix.lock /app/

WORKDIR /app

RUN mix local.hex --force
RUN mix local.rebar --force
RUN mix deps.get
RUN mix deps.compile

ADD . /app/
RUN mix compile
RUN mix release

CMD mix phx.server
hugopeixoto@laptop:~/w/m/api$ docker build . -t mob-api
[...]
Step 11/12 : RUN mix release
 ---> Running in 41d5c65d4ec2
* assembling api-0.1.0 on MIX_ENV=prod
* using config/releases.exs to configure the release at runtime
* skipping elixir.bat for windows (bin/elixir.bat not found in the Elixir installation)
* skipping iex.bat for windows (bin/iex.bat not found in the Elixir installation)

Release created at _build/prod/rel/api!

    # To start your system
    _build/prod/rel/api/bin/api start

Once the release is running:

    # To connect to it remotely
    _build/prod/rel/api/bin/api remote

    # To stop it gracefully (you may also send SIGINT/SIGTERM)
    _build/prod/rel/api/bin/api stop

To list all commands:

    _build/prod/rel/api/bin/api

Removing intermediate container 41d5c65d4ec2
 ---> 47a7a3b3b36e

That was it. The docs say that we need to copy the _build/prod/rel/my_app directory. Let’s take a look at what’s there.

1
2
3
4
5
6
7
hugopeixoto@laptop:~/w/m/api$ docker run --entrypoint bash -it mob-api
root@4c3117a8256d:/app# du -sh _build/prod/rel/api
44M     _build/prod/rel/api
root@4c3117a8256d:/app# find _build/prod/rel/api -type f | wc -l
1846
root@4c3117a8256d:/app# find _build/prod/rel/api -name "*.beam" | wc -l
1561

Small size, but a ton of beam files. I still need to change the CMD from mix phx.server to _build/prod/rel/my_app/bin/my_app.

Phoenix does have a section on containers, which is quite similar to what I’m doing:

https://hexdocs.pm/phoenix/releases.html#containers

I don’t need the asset pipeline part, since I’m dealing with an API only web service. The rest looks quite similar. I’ll adapt my Dockerfile to use multi-stage builds, like they do. I also need to create a .dockerignore file, because I’m tired of recompiling the app every time I change the dockerfile.

1
2
3
hugopeixoto@laptop:~/w/m/api$ cat .dockerignore
Dockerfile
.git
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
FROM elixir AS build

ENV MIX_ENV=prod

ADD mix.exs mix.lock /app/

WORKDIR /app

RUN mix local.hex --force
RUN mix local.rebar --force
RUN mix deps.get
RUN mix deps.compile

ADD . /app/
RUN mix compile
RUN mix release

FROM alpine:3.9 AS app
RUN apk add --no-cache openssl ncurses-libs

WORKDIR /app

RUN chown nobody:nobody /app
USER nobody:nobody
ENV HOME=/app

COPY --from=build --chown=nobody:nobody /app/_build/prod/rel/api ./


CMD bin/api start
1
2
hugopeixoto@laptop:~/w/m/api$ docker run --env-file dockerenv -it mob-api
/app/releases/0.1.0/../../erts-10.7.2/bin/erl: exec: line 12: /app/erts-10.7.2/bin/erlexec: not found

Well, this didn’t work. What am I missing? The file erlexec does exist. This type of errors is usually caused by the executable trying to load a dynamic library and not being able to find it. Let’s see what it is.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
hugopeixoto@laptop:~/w/m/api$ docker run --entrypoint sh -it mob-api
/app $ ldd /app/erts-10.7.2/bin/erlexec
      /lib64/ld-linux-x86-64.so.2 (0x7fcf8239b000)
      libm.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7fcf8239b000)
      libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7fcf8239b000)
/app $ ls -1 /
app
bin
dev
etc
home
lib
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
/app $ ls /lib64
ls: /lib64: No such file or directory

Well, derp. This is probably because I’m using alpine for the app image and a generic elixir one for the build image. I should be using an elixir:*-alpine image. Let’s change the first line to FROM elixir:alpine AS build and try again. I’ll also need to add build-base, since some dependencies need to compile things.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
hugopeixoto@laptop:~/w/m/api$ docker build . -t mob-api | grep --line-buffered 'Step\|Success' | ts -i
00:00:00 Step 1/20 : FROM elixir:alpine AS build
00:00:00 Step 2/20 : RUN apk add --no-cache build-base
00:00:20 Step 3/20 : ENV MIX_ENV=prod
00:00:01 Step 4/20 : ADD mix.exs mix.lock /app/
00:00:00 Step 5/20 : WORKDIR /app
00:00:01 Step 6/20 : RUN mix local.hex --force
00:00:04 Step 7/20 : RUN mix local.rebar --force
00:00:04 Step 8/20 : RUN mix deps.get
00:00:07 Step 9/20 : RUN mix deps.compile
00:02:13 Step 10/20 : ADD . /app/
00:00:00 Step 11/20 : RUN mix compile
00:00:08 Step 12/20 : RUN mix release
00:00:06 Step 13/20 : FROM alpine:3.9 AS app
00:00:00 Step 14/20 : RUN apk add --no-cache openssl ncurses-libs
00:00:00 Step 15/20 : WORKDIR /app
00:00:00 Step 16/20 : RUN chown nobody:nobody /app
00:00:00 Step 17/20 : USER nobody:nobody
00:00:00 Step 18/20 : ENV HOME=/app
00:00:00 Step 19/20 : COPY --from=build --chown=nobody:nobody /app/_build/prod/rel/api ./
00:00:03 Step 20/20 : CMD bin/api start
00:00:00 Successfully built 2bceb3b1061b
00:00:00 Successfully tagged mob-api:latest

hugopeixoto@laptop:~/w/m/api$ docker run --env-file dockerenv -it mob-api
warning: found Jason in your application configuration
for Phoenix JSON encoding, but failed to load the library.

(module Jason is not available).

Ensure Jason exists in your deps in mix.exs,
and you have configured Phoenix to use it for JSON encoding by
verifying the following exists in your config/config.exs:

    config :phoenix, :json_library, Jason

[...]

15:00:41.699 [info] Application api exited:
  Api.Application.start(:normal, []) returned an error:
  shutdown: failed to start child: Api.Repo
** (EXIT) an exception was raised:
** (UndefinedFunctionError) function Ecto.Adapters.Postgres.init/1 is undefined
   (module Ecto.Adapters.Postgres is not available)

Huh. We have a warning and an error. I think that they could both have the same underlying cause (module Jason/Ecto.Adapters.Postgres is not available).

mix.exs has both a list of deps and a def application. If I add :jason and :ecto_sql to the applications list, these errors go away. I am getting other similar errors, though, so I need to continue adding things until it works, I suppose.

OK, that worked. Now it’s complaining that there’s no manifest file:

1
2
3
4
17:27:31.332 [error] Could not find static manifest at
"/app/lib/api-0.1.0/priv/static/cache_manifest.json". Run "mix phx.digest"
after building your static files or remove the configuration from
"config/prod.exs".

I don’t need this, because there are no static files, so I’ll remove whatever entry they’re mentioning.

1
2
3
hugopeixoto@laptop:~/w/m/api$ docker run -p 4000:4000 --env-file dockerenv -it mob-api
17:32:12.194 [info] Running ApiWeb.Endpoint with cowboy 2.7.0 at :::4000 (http)
17:32:12.195 [info] Access ApiWeb.Endpoint at http://localhost

OK, it boots! Does it work, though? Let’s do a curl.

1
2
3
hugopeixoto@laptop:~/w/m/api$ curl -I localhost:4000
HTTP/1.1 500 Internal Server Error
content-length: 0

Oh no. This is the error that I’m getting:

1
2
** (UndefinedFunctionError) function ApiWeb.ErrorView.render/2 is undefined
   (module ApiWeb.ErrorView is not available)

From the phoenix docs:

https://hexdocs.pm/phoenix/custom_error_pages.html

Phoenix has a view called the ErrorView which lives in lib/hello_web/views/error_view.ex. The purpose of the ErrorView is to handle errors in a general way, from one centralized location.

So I should have a lib/api_web/views/error_view.ex file. Let’s confirm:

1
2
3
4
5
6
7
hugopeixoto@laptop:~/w/m/api$ ls -1 lib/api_web/views/
attendance_view.ex
email_view.ex
error_helpers.ex
layout_view.ex
membership_view.ex
pressentation_view.ex

There’s a error_helpers.ex, but no error_view.ex. All of the other files end in _view.ex, though. Could this have been a migration error? I’ll create the file according to the sample on phoenix docs and try again.

I added this:

1
2
3
4
5
6
7
defmodule ApiWeb.ErrorView do
  use ApiWeb, :view

  def template_not_found(template, _assigns) do
    Phoenix.Controller.status_message_from_template(template)
  end
end

And now I do get a 400:

1
2
3
4
5
6
7
8
hugopeixoto@laptop:~/w/m/api$ curl -I localhost:4000
HTTP/1.1 404 Not Found
cache-control: max-age=0, private, must-revalidate
content-length: 14
content-type: text/html; charset=utf-8
date: Fri, 22 May 2020 18:27:27 GMT
server: Cowboy
x-request-id: FhFtDYZfPjAzAswAAALB

Out of curiosity, I decided to check what our live website currently responds:

1
2
3
4
5
6
hugopeixoto@laptop:~/w/m/api$ curl -I https://api.makeorbreak.io
HTTP/1.1 500 Internal Server Error
Server: nginx
Date: Fri, 22 May 2020 18:38:54 GMT
Content-Length: 0
Connection: keep-alive

So this wasn’t a new error, I just didn’t expect this to blow up when hitting a non-existing route.

Before moving on, I’d like to check why this is responding with text/html instead of application/json. I’m guessing that the app was not generated with --no-html and all of that. Let’s see what I can find:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
hugopeixoto@laptop:~/w/m/api$ ack -w html config/ lib/api_web
config/config.exs
28:  render_errors: [view: ApiWeb.ErrorView, accepts: ~w(html json)],

config/releases.exs
36:# options, see https://hexdocs.pm/plug/Plug.SSL.html#configure/1

lib/api_web/templates/layout/email.html.eex
1:<!doctype html>
2:<html>
9:    html, body {
146:</html>

lib/api_web/router.ex
8:    plug :accepts, ["html"]

Let’s start by removing the html entry from render_errors. It worked:

1
2
3
4
5
6
7
8
hugopeixoto@laptop:~/w/m/api$ curl -I localhost:4000
HTTP/1.1 404 Not Found
cache-control: max-age=0, private, must-revalidate
content-length: 21
content-type: application/json; charset=utf-8
date: Fri, 22 May 2020 18:50:29 GMT
server: Cowboy
x-request-id: FhFuTznvV3eEIWIAAAFD

Now it would be nice to test an endpoint that does something, and that preferably touches the database.

Database and GraphQL

This project has graphiql installed, so I tried to access http://localhost:4000/graphiql:

1
2
3
4
5
6
7
UndefinedFunctionError{
  arity: 2,
  function: :put_options,
  message: nil,
  module: Absinthe.Plug,
  reason: nil
}

I had to add :absinthe_plug to the list of applications in mix.exs. Afterwards, I got:

1
2
3
%ArgumentError{
  message: "The supplied schema: Api.GraphQL.Schema is not a valid Absinthe Schema"
}

Could this be because the database is empty? Let me set up the migrate script that their documentation mentions.

I created the lib/api/release.ex module, and added an entrypoint.sh file with the following:

1
2
3
4
#!/usr/bin/env sh

bin/api eval "Api.Release.migrate"
bin/api start

Running this, I get:

1
2
** (Postgrex.Error) ERROR 42501 (insufficient_privilege)
   permission denied for table schema_migrations

Well, I obviously didn’t setup this properly. I created the user but didn’t give the appropriate grants.

I ended up dropping the database and creating it as owned by the user mob. I also made the user a super user, since some of the migrations need to install extensions. Yay for that. These are the commands I used:

1
2
3
4
hugopeixoto@laptop:~/w/m/api$ createdb -O mob mob-api
hugopeixoto@laptop:~/w/m/api$ psql
hugopeixoto=# alter user mob with superuser;
ALTER ROLE

This was enough to get the migrations to run without errors, but I’m still getting the same error. I looked at the code that produces the error message The supplied schema: Api.GraphQL.Schema is not a valid Absinthe Schema, and it looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  defp get_schema(opts) do
    default = Application.get_env(:absinthe, :schema)
    schema = Keyword.get(opts, :schema, default)

    try do
      Absinthe.Schema.types(schema)
    rescue
      UndefinedFunctionError ->
        raise ArgumentError,
              "The supplied schema: #{inspect(schema)} is not a valid Absinthe Schema"
    end

    schema
  end

So there’s an UndefinedFunctionError being thrown, and absinthe hides it under an ArgumentError. I was getting those undefined errors when modules were missing from the application list, so maybe I’m still missing an entry there. I’ll take a look at the deps and at the applications being loaded to see if there’s something obvious missing.

It worked. I added :absinthe and :absinthe_ecto and it started working. Good thing I looked at the code, it helped me find that exception swallowing pattern:

https://github.com/absinthe-graphql/absinthe_plug/blob/6696af35dcce5f47c3647744924fcb132b9b6231/lib/absinthe/plug.ex#L233

I can now see GraphiQL on http://localhost:4000/graphiql.

I added some dummy editions to the editions table:

1
INSERT INTO editions(id, name) VALUES (uuid_generate_v4(), '2018'), (uuid_generate_v4(), '2019');

And tried reading them with the query:

1
2
3
4
5
{
  bots {
    id
  }
}

Executing this only displayed an empty string in the results pane, but I did get an exception in the console:

1
2
3
4
5
6
7
8
%FunctionClauseError{
  args: nil,
  arity: 2,
  clauses: nil,
  function: :call,
  kind: nil,
  module: Api.GraphQL.Middleware.RequireAdmin
}

This query requires an admin API token, but I’m wondering why it blows up with a 500 instead of returning a proper error. To check if I will be fixing something that was already broken, I’ll run this query against production:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
hugopeixoto@laptop:~$ curl https://api.makeorbreak.io/graphql \
> -X POST \
> -H "Content-Type: application/json" \
> -d '{"query":"{ bots { id } }"}' | jq
{
  "errors": [
    {
      "path": [
        "bots"
      ],
      "message": "using this field requires authentication",
      "locations": [
        {
          "line": 1,
          "column": 0
        }
      ]
    }
  ],
  "data": {
    "bots": null
  }
}

This seems to be returning a sane response (for a graphql response, anyway), returning 200 OK.

So this might mean that the RequireAdmin module needs updating. Let’s look at RequireAdmin:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
defmodule RequireAdmin do
  @behaviour Absinthe.Middleware

  def call(%{context: %{current_user: %User{role: "admin"}}} = resolution, _config) do
    resolution
  end

  def call(%{context: %{current_user: %User{role: "participant"}}} = resolution, _config) do
    resolution
    |> Resolution.put_result({:error, %{
      message: "you do not have permission to access this field",
    }})
  end

  def call(%{context: %{current_user: nil}} = resolution, _config) do
    resolution
    |> Resolution.put_result({:error, %{
      message: "using this field requires authentication",
    }})
  end
end

This is implementing an Absinthe.Middleware. What seems to be happening is that there’s no matching call signature. Looking for the place where the current_user is injected in the context, I found this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
defmodule Api.GraphQL.GuardianContext do
  @behaviour Plug

  import Plug.Conn

  def init(opts), do: opts

  def call(conn, _) do
    context = build_context(conn)

    Absinthe.Plug.put_options(conn, context: context)
  end

  @doc """
  Return the current user context based on the authorization header
  """
  def build_context(conn) do
    with ["Bearer " <> token] <- get_req_header(conn, "authorization"),
         {:ok, current_user, _claims} <- ApiWeb.Guardian.resource_from_token(token) do
      %{current_user: current_user}
    else
      _ -> %{}
    end
  end
end

I suspect that we used to return %{current_user: nil}. Checking the old code:

https://github.com/makeorbreak-io/mob-api/blob/master/lib/api/graphql/plugs.ex#L16

1
2
3
4
5
6
7
8
9
10
11
def call(conn, _opts) do
  conn
  |> put_private(
    :absinthe,
    %{context:
      %{
        current_user: GuardianPlug.current_resource(conn)
      }
    }
  )
end

This confirms my suspicion. I changed the else clause to _ -> %{current_user: nil} and the 500 is gone. Here’s the pull request:

https://github.com/makeorbreak-io/api/pull/12

Now let’s try to get authenticated.

Registering an account: mutations and email sending

There’s a something related to registering users in Api.GraphQL.Mutations.Session, a field :register. I suppose I need to use this to register an account. This is what I tried:

1
2
3
mutation {
  register(email: "hugo.peixoto@gmail.com", password: "secret")
}

And I got this:

1
2
** (UndefinedFunctionError) function Argon2.hash_pwd_salt/1 is undefined
   (module Argon2 is not available)

Another application missing. Why is this missing so many applications? Why doesn’t this happen in development mode? After adding argon2, I got the same error for :premailex. After adding those, I got this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "data": {
    "register": null
  },
  "errors": [
    {
      "exception": {
        "code": "unknown_reason",
        "param": "email"
      },
      "locations": [
        {
          "column": 0,
          "line": 2
        }
      ],
      "message": "unknown_reason",
      "path": [
        "register"
      ]
    }
  ]
}

And no stacktrace to help. I’m guessing this might be related to the lack of email provider configuration. We have this set to use Mailgun in production and Bamboo.LocalAdapter in development. I’ll temporarily set the production adapter to LocalAdapter as well, just to see if it works.

Turns out the error was much simpler: I had already registered that account before. Maybe I double clicked or something, not sure. After truncating the users table, I’m getting this error:

1
** (MatchError) no match of right hand side value: {:error, :secret_not_found}

This was because SECRET_KEY_BASE was being set in config/config.exs, which is evaluated at build time instead of runtime, leading the value to be nil. I moved it to config/release.exs and it worked.

I was able to go to http://localhost:4000/sent_emails and see a preview of the email that would have been sent.

I’ll stop for now.

Summary

I needed to make an phoenix/elixir project deployable. The project was from 2017, and not much care was given to make sure it was up to date. The deployment story was messy and not ideal.

I’ve set up a docker image and used the Elixir Releases feature to create a standalone compiled version of the app. There were some issues with the configuration files, and some issues with missing dependent applications.

I still need to do some config tweaking, but other than that, the docker image is ready to go. I’ll be able to set up a staging environment, which was my main goal.

I checked how much memory the docker container was taking, to have a rough idea of the requirements for this:

1
2
hugopeixoto@laptop:~/w/m/mob-api$ docker stats upbeat_dewdney --no-stream --format="{{.MemUsage}}"
96.5MiB / 7.709GiB

~100MiB, this gives me a round number to work with. I had one of these running for a day, and it reported 22MiB, probably because most of it got paged out. Not sure what docker stats reports, but that’s for another day.