Dynamic database seeding with Docker

15 Jan 2018

by Magnulf Pilskog

Working with Docker can help your business utilise data more effectively and more efficiently.

Our entire application stack is packaged using Docker. And there’s a very good reason for that.

Prior to setting up seeding with Docker we had issues with outdated examples and leftover data from previous demos. We also found that we were spending too much time manually keeping everything in sync, which, as our customer base grew, was unsustainable.

By working with Docker, we have created a user-friendly tool that anyone in our organization can use, both when resetting a demo laptop, and when getting a new on-site installation up and running quickly.

What is Docker?

Docker is a tool that makes it easy for developers to create, deploy, and run applications. This is done by using something called ‘containers’. Containers enable developers to an application with all of the parts it requires to run efficiently - libraries and other dependencies, for example - and send it all out as one package.

The-details

The details

We maintain all our demo workspaces, tutorials and help texts in our SaaS-production environment, which makes it easy to keep everything up-to-date. However, when running offline demos or setting up on-site solutions with customers, we need a way to copy data from the production environment to seed these installations. Prior to using Docker, this was an error-prone activity. But, after moving to Docker, we are now able to do this as part of every build.

The process of preparing and applying a data seed includes the following steps.

1. Copy and clean production data

To filter out other environment-specific metadata, it’s important to clean up the database dump before it’s distributed. The core of the clean up is a throw away MongoDB Docker container. The reason we have this step with the throw away container is a security precaution, as we don’t want to expose filtered out data in a lower Docker filesystem layer.

We start the container with the database dump available in a mounted volume, load MongoDB, run a clean up script in MongoDB, and finally deposit the cleaned database to the mounted volume. The Docker container is then discarded.

2. Package and distribute the seed data

For easy distribution, we package the database dump, the binary files, the MongoDB dump, and restore tools in one Docker image.

3. Distribution

Once the data is packaged along with the MongoDB client, we upload the image to our private account on Docker Hub. This is currently triggered manually, but automation is only a matter of configuration. This allows our developers, salespeople and customers to update their local installations by downloading the latest version simply by pulling the latest image from Docker Hub.

4. Seeding

Since all data is packaged along with the MongoDB client, seeding is simple. It is basically a matter of running two docker commands.

To populate the database, the seed image starts with a link to the MongoDB container, and restores the database using the bundled database dump:

docker run --rm --link ardoq_mongodb_1:mongodb ardoq/demo-seed mongorestore -h mongodb /work/demo_seed/

To populate the binary attachments, the seed image is started again with the data image volumes mounted, and simply copies the files over and exits: 

docker run --rm --volumes-from ardoq_api_1 ardoq/demo-seed:latest cp -r /work/attachments /data

Summary

Keeping on-site installations up-to-date with the latest data used to be time-consuming and error-prone. Using Docker, Ardoq is well on its way to automating the entire process.

New call-to-action