How to setup MLflow in production

Get a Machine Learning model into production with MLflow in 10 minutes

How to setup MLflow in production

MLflow in production

I’ve run into MLflow around a week ago and, after some testing, I consider it by far the SW of the year. This can be very influenced by the fact that I’m currently working on the productivization of Machine Learning models.

Thus, I’m going to show you how to setup up MLflow in a production environment as the one David and I have for our Machine Learning projects.

Tracking Server Setup

The tracking server is the User Interface and metastore of MLflow. You can check the status of any run through this web application and centralize the model outputs and configurations in just one place.

The first thing we need to configure is the environment.


Let’s create a new Conda environment as it will be the place where MLflow will be installed:

conda create -n mlflow_env 
conda activate mlflow_env

Then we have to install the MLflow library:

conda install python 
pip install mlflow

Run the following command to check that the installation was successful:

mlflow --help		

We’d like our Traking Server to have a Postgres database as a backend for storing metadata, so the first step will be installing PostgreSQL:

sudo apt-get install postgresql postgresql-contrib postgresql-server-dev-all

Check installation connecting to the database:

sudo -u postgres psql

After the installation is successful, let’s create an user and a database for the Traking Server:


As we’ll need to interact with Postgres from Python, it is needed to install the psycopg2 library. However, to ensure a successful installation we need to install the gcc linux package before:

sudo apt install gcc 
pip install psycopg2

The last step will be creating a directory in our local machine for our Tracking Server to log there the Machine Learning models and other artifacts. Remember that the Postgres database is only used for storing metadata regarding those models (imaging adding a model or a virtual environment to a database). This directory is called artifact URI:

mkdir ~/mlruns


Everything is now setup to run the Tracking Server. Then write the following command:

mlflow server --backend-store-uri postgresql://mlflow:mlflow@localhost/mlflow --default-artifact-root file:/home/your_user/mlruns -h -p 8000

Now the Tracking server should be available a the following URL: However, if you Ctrl-C or exit the terminal, the server will go down.


If you want the Tracking server to be up and running after restarts and be resilient to failures, it is very useful to run it as a systemd service.

You need to go into the /etc/systemd/system folder and create a new file called mlflow-tracking.service with the following content:

Description=MLflow tracking server 

ExecStart=/bin/bash -c 'PATH=/path_to_your_conda_installation/envs/mlflow_env/bin/:$PATH exec mlflow server --backend-store-uri postgresql://mlflow:mlflow@localhost/mlflow --default-artifact-root file:/home/your_user/mlruns -h -p 8000' 


After that, you need to activate and enable the service with the following commands:

sudo systemctl daemon-reload 
sudo systemctl enable mlflow-tracking 
sudo systemctl start mlflow-tracking

Check that everything worked as expected with the following command:

sudo systemctl status mlflow-tracking

You can now restart your machine and the MLflow Tracking Server will be up and running after this restart.

In order to start tracking everything under this Tracking Server it is necessary to set the following environmental variable on .bashrc:


Remember to activate this change with:

. ~/.bashrc

Serve a Machine Learning model in production

Once the tracking server is up and the MLFLOW_TRACKING_URI is pointing to it in the .bashrc, it’s time to put your model into production.

Let’s start creating the production environment to run the ML model:


conda create -n production_env 
conda activate production_env 
conda install python 
pip install mlflow 
pip install sklearn

Then, let’s clone an example from the official repository to show how to ramp up a model:

GitHub example

mlflow run -P alpha=0.5

This run will generate a new entry in your tracking server alongside with a new folder in which the model and the configuration is stored (~/mlruns/0/some_uuid). Let’s check it:

ls -al ~/mlruns/0

Get the uuid related to the execution from the previous output and substitute the string “your_model_id” with it in the following line (of course you could do that searching for the uuid in the Tracking Server):

mlflow models serve -m ~/mlruns/0/your_model_id/artifacts/model -h -p 8001

What you have just done is serving your model as an HTTP endpoint in your server IP and port 8001 (be careful not having any service listening there), so that it is ready for receiving incoming data to return predictions. You can then query your model with a simple curl command:

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}'

Python using the requests module or any programming language is also fine for getting predictions, since HTTP protocol is language agnostic:

import requests 

host = ''
port = '8001' 

url = f'http://{host}:{port}/invocations' 

headers = {'Content-Type': 'application/json',} 

# test_data is a Pandas dataframe with data for testing the ML model
http_data = test_data.to_json(orient='split') 

r =, headers=headers, data=http_data) 
print(f'Predictions: {r.text}')


Finally, if you want to serve it in production the only thing you need to do is adding the systemd configuration:

Description=MLFlow model in production  

Restart=on-failure RestartSec=30
ExecStart=/bin/bash -c 'PATH=/path_to_your_conda_installation/envs/mlinproduction_env/bin/:$PATH exec mlflow models serve -m path_to_your_model -h host -p port'