Containerized Execution of Pipelines
Kerblam! can ergonomically run pipelines inside containers for you, making it easier to be reproducible.
If Kerblam! finds a container recipe (such as a Dockerfile) of the same name
as one of your pipes in the ./src/dockerfiles/
folder
(e.g. ./src/dockerfiles/process_csv.dockerfile
for the ./src/pipes/process_csv.makefile
pipe),
it will use it automatically when you execute a pipeline (e.g. kerblam run process_csv
)
to run the pipeline inside a container.
Specifically, it will do something similar to this:
- Copy the pipeline to the root of the directory (as it does normally when you
launch
kerblam run
), as./executor
; - Run
docker build -f ./src/dockerfiles/process_csv.dockerfile --tag process_csv_kerblam_runtime .
to build the container; - Run
docker run --rm -it -v ./data:/data --entrypoint make process_csv_kerblam_runtime -f /executor
.
This last command runs the container, telling it to execute make
with
target file -f /executor
.
Note that it's not exactly what kerblam does - it has additional features
to correctly mount your paths, capture stdin
and stdout
, etc...
If you have your docker container COPY . .
, you can then effectively have
Kerblam! run your projects in docker environments, so you can tweak your
dependencies and tooling (which might be different than your dev environment)
and execute in a protected, reproducible environment.
Kerblam! will build the container images without moving the recipies around
(this is what the -f
flag does).
The .dockerfile
in the build context (next to the kerblam.toml
) is shared
by all pipes.
See the 'using a dockerignore' section
of the Docker documentation for more.
You can write dockerfiles for both make
and sh
pipes.
Kerblam! configures automatically the correct entrypoint and arguments to run
the pipe in the container.
Read the "writing dockerfiles for Kerblam!" section to learn more about how to write dockerfiles that work nicely with Kerblam! (spoiler: it's easier than writing canonical dockerfiles!).
For example, you can have the following Dockerfile:
# ./src/dockerfiles/process_csv.dockerfile
FROM ubuntu:latest
RUN apt-get install python, python-pip && \
pip install pandas
COPY . .
and this dockerignore file:
# ./src/dockerfiles/.dockerignore
.git
data
venv
and simply run kerblam run process_csv
to build the container and run
your code inside it.
If you run kerblam run
without a pipeline (or with a non-existant pipeline), you
will get the list of available pipelines.
You can see at a glance what pipelines have an associated dockerfile as they
are prepended with a little whale (🐋):
Error: No runtime specified. Available runtimes:
🐋◾ my_pipeline :: Generate the output data in a docker container
◾◾ local_pipeline :: Run some code locally
Default dockerfile
Kerblam! will look for a default.dockerfile
if it cannot find a container
recipe for the specific pipe (e.g. pipe.dockerfile
), and use that instead.
You can use this to write a generalistic dockerfile that works for your
most simple pipelines.
The whale (🐋) emoji in the list of pipes will be replaced by a fish (🐟) for
pipes that use the default container, so you can identify them easily:
Error: No runtime specified. Available runtimes:
🐋◾ my_pipeline :: Generate the output data in a docker container
🐟◾ another :: Run in the default container
Switching backends
Kerblam! runs containers by default with Docker, but you can tell it to use
Podman instead by setting the execution > backend
option in your kerblam.toml
:
[execution]
backend = "podman" # by default "docker"
Podman is slightly harder to set up but has a few benefits, mainly not having
to run in root mode, and being a FOSS program.
For 90% of usecases, you can use podman
instead of docker
and it will
work exactly the same.
Podman and Docker images are interchangeable, so you can use Podman with
dockerhub with no issues.
Setting the container working directory
Kerblam! does not parse your dockerfile or add any magic to the calls that it makes based on heuristics. This means that if you wish to save your code not in the root of the container, you must tell kerblam! about it.
For instance, this recipe copies the contents of the analysis in a folder
called "/app
":
COPY . /app/
This one does the same by using the WORKDIR
directive:
WORKDIR /app
COPY . .
If you change the working directory, let Kerblam! know by setting the
execution > workdir
option in kerblam.toml
:
[execution]
workdir = "/app"
In this way, Kerblam! will run the containers with the proper paths.
This option applies to ALL containers managed by Kerblam!
There is currently no way to configure a different working directory for every specific dockerfile.