MLCube is a new open source container based infrastructure specification introduced to enable reproducibility in Python based machine learning workflows. It can utilize tools such as Podman, Singularity and Docker. Execution on remote platforms is also supported. One of the chairs of the MLCommons Best Practices working group that is developing MLCube is Diane Feddema from Red Hat. This introductory article explains how to run the hello world MLCube example using Podman on Fedora Linux.
Yazan Monshed has written a very helpful introduction to Podman on Fedora which gives more details on some of the steps used here.
First install the necessary dependencies.
sudo dnf -y update sudo dnf -y install podman git virtualenv \ policycoreutils-python-utils
Then, following the documentation, setup a virtual environment and get the example code. To ensure reproducibility, use a specific commit as the project is being actively improved.
virtualenv -p python3 ./env_mlcube source ./env_mlcube/bin/activate git clone https://github.com/mlcommons/mlcube_examples.git cd ./mlcube_examples/hello_world git checkout 5fe69bd pip install mlcube mlcube-docker mlcube describe
Now change the runner command from docker to podman by editing the file $HOME/mlcube.yaml so that the line
docker: docker
becomes
docker: podman
If you are on a computer with x86_64 architecture, you can get the container using
mlcube configure --mlcube=. --platform=docker
You will see a number of options
? Please select an image: ▸ registry.fedoraproject.org/mlcommons/hello_world:0.0.1 registry.access.redhat.com/mlcommons/hello_world:0.0.1 docker.io/mlcommons/hello_world:0.0.1 quay.io/mlcommons/hello_world:0.0.1
Choose docker.io/mlcommons/hello_world:0.0.1 to obtain the container.
If you are not on a computer with x86_64 architecture, you will need to build the container. Change the file $HOME/mlcube.yaml so that the line
build_strategy: pull
becomes
build_strategy: auto
and then build the container using
mlcube configure --mlcube=. --platform=docker
To run the tests, you may need to set SELinux permissions in the directories appropriately. You can check that SELinux is enabled by typing
sudo sestatus
which should give you output similar to
SELinux status: enabled ...
Josphat Mutai, Christopher Smart and Daniel Walsh explain that you need to be careful in setting appropriate SELinux policies for files used by containers. Here, you will allow the container to read and write to the workspace directory.
sudo semanage fcontext -a -t container_file_t "$PWD/workspace(/.*)?" sudo restorecon -Rv $PWD/workspace
Now check the directory policy by checking that
ls -Z
gives output similar to
unconfined_u:object_r:user_home_t:s0 Dockerfile unconfined_u:object_r:user_home_t:s0 README.md unconfined_u:object_r:user_home_t:s0 mlcube.yaml unconfined_u:object_r:user_home_t:s0 requirements.txt unconfined_u:object_r:container_file_t:s0 workspace
Now run the example
mlcube run --mlcube=. --task=hello --platform=docker mlcube run --mlcube=. --task=bye --platform=docker
Finally, check that the output
cat workspace/chats/chat_with_alice.txt
has text similar to
Hi, Alice! Nice to meet you. Bye, Alice! It was great talking to you.
You can create your own MLCube as described here. Contributions to the MLCube examples repository are welcome. Udica is a new project that promises more fine grained SELinux policy controls for containers that are easy for system administrators to apply. Active development of these projects is ongoing. Testing and providing feedback on them would help make secure data management on systems with SELinux easier and more effective.
Angel Yocupicio
Thank you very much for create this tool MLCube open source container. It is very interesting. I think that will include on one tutorial tune up for Fedora 36 coming soon.
Benson Muite
Thanks for your feedback, further tutorials would be great. I am not a developer of MLCube, a new user.
Michael Rivard
In order for rootless podman containers to access the host’s NVIDIA GPU, I had to run (once, on the host):
. (Search here for
.)
I have never seen this mentioned anywhere else, not even in NVIDIA’s own documentation or user forums.
Benson Muite
Thanks for this. Finding good desktop defaults is helpful, but SELinux also offers many opportunities for customization which are useful when working with sensitive and/or valuable data for which access needs to be controlled.
Jonatas Esteves
On the part about setting SELinux labels for the workspace directory, the commands
and
should not need
.
Also, I think this should be reported as an issue to MLCube. It could have done this automatically for you (or as an option) by just passing a
or
flag when mounting the directory.
Benson Muite
Thanks for your feedback. Passing :Z is one of the suggested options that could be implemented https://github.com/mlcommons/mlcube/issues/205 Suggestions on how this would work best in your workflows would be greatly appreciated.
sid
Thanks for the info