# From Virtualisation to Containerisation  ### Outline * **Presentation** (45m) * **Self-paced Workshop** (2h15) * BE Cloud & Docker (23/11) * To be continued with Orchestration & Deployment (08/01) #### This class will be successful if you understand * why we need a tool like docker * the basics of docker (containers, images) * the basics of a container registry * how to pull an image and run a container * what a Dockerfile looks like ### The need for Containers in software  #### IT Multimodality  #### The Matrix From Hell  #### Analogy  #### Solution ?  #### Solution !    ### Docker  Docker is **a** solution that **standardizes** packaging and execution of software in isolated environments (**containers**) that share resources and can communicate between themselves > Build, Share, and Run Any App, Anywhere [Docker](https://www.docker.com/) * Created in 2013 * Open Source * Not a new idea but set a new standard * Docker is a company built around its main product (Docker Engine) * in charge of dev of everything docker + additional paid services (Docker hub...) Docker is not the only solution for containers
Docker is some fancy tech over linux kernel capabilities (containers)  [more info](https://medium.com/@goyalsaurabh66/docker-basics-cb006b9be243) But Docker is available on [Windows and MacOS](https://www.docker.com/products/docker-desktop) !   https://www.youtube.com/c/AurelieVache/videos ### Containers or Virtual Machines #### Similarities * Isolated environments for applications * Movable between hosts #### Drawbacks of VMs * VM Contains full OS at each install => Install + Resource overhead * VM needs pre-allocation of resource for each VM (=> Waste if not used) * Communication between VM <=> Communication between computers #### Container vs Virtual Machine  #### Why are docker containers lightweight ?  #### Container vs Virtual Machine, an Analogy  #### Resources allocation in containers * Containers share underlying OS / Kernels * The container engine can allocate resources (CPU, Storage, RAM) on the fly (!= VM) * GPU is way easier to manage / share with containers  #### Some drawbacks of containers * Containers are based on linux tech (Docker makes Windows container possible though) * Isolation is not perfect since containers share underlying kernels (security and stability) ### Containers for Data Science #### Multiple People  #### Complex Workflows  #### Multiple Components  #### Data Science is about reproducibility * Experimental science * Communicating results * Hands-out to other teams * Deployment and versioning of models #### So... containers ? * ... for deployment * ... for standardized development environments * ... dependency management * ... for complex / large scale workflows ~it works on my notebook !~ *here's the model ready to run !* Build, Ship, Run in Data Science  Deployment  Reproducible development environment  Codespace is actually a container...  Machine Learning [at scale](https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#mlops_level_1_ml_pipeline_automation) !  [Netflix and notebook scheduling](https://medium.com/netflix-techblog/scheduling-notebooks-348e6c14cfd6)
https://www.kubeflow.org/  https://polyaxon.com/  ### Using Docker in practice  #### Vocabulary of Docker * **Layer**: Set of read-only files to provision the system * **Image**: Read-Only layer "snapshot" (or blueprint) of an environment. * **Images**: can inherit from other **Images**. Images must have a *name* and a *tag* * **Container**: Read-Write instance of an **Image** * **DockerFile**: Description of the process used to build an Image * **Container Registry**: Repository of Docker Images * **Dockerhub**: The main container registry of docker.com #### Workflow  #### Layers, Container, Image  #### Image  #### Layer / Image Analogy Docker: ```Dockerfile FROM python:3.11 RUN pip install torch CMD ipython ``` ```bash docker build -f Dockerfile -t my-image:1.0 . docker run my-image ``` Python: ```python class BaseImage: def __init__(self, a): self.a = a class NewImage(BaseImage): def __init__(self, a, b): super(NewImage, self).__init__(a=a) self.b = b container = NewImage(a=0,b=1) ``` #### Dockerfile / Layer / Image / Container Analogy  #### Dockerfile * Used to build Images ```Dockerfile FROM python:3.11 ENV MYVAR="HELLO" RUN pip install torch COPY my-conf.txt /app/my-conf.txt ADD my-file.txt /app/my-file.txt EXPOSE 9000 WORKDIR "/WORKDIR" USER MYUSER ENTRYPOINT ["/BIN/BASH"] CMD ["ECHO” , "${MYVAR}"] ``` ```bash docker build -f Dockerfile -t my-image:1.0 . docker run my-image ``` * Reproducible (if you include static data) * Can be put under version control (simple text file) #### Architecture  #### Registry * Local registry: All images/containers in your machine * https://hub.docker.com/ * GCP Container Registry * Social Dimension (share docker images to speed up development/deployment) #### In practice
### What about multi-applications containers ? #### Docker Compose * Multi-containers application with networking (communication) * "Glue" for complex applications and microservices  #### Docker Compose (example) A database and a webapp ```yaml services: app: build: . image: takacsmark/flask-redis:1.0 environment: - FLASK_ENV=development ports: - 5000:5000 redis: image: redis:7-alpine ``` `docker compose up` starts both containers (you will see that next week) #### Remember this !  ### An analogy...  [https://bernhardwenzel.com/2022/the-mental-model-of-docker-container-shipping/](https://bernhardwenzel.com/2022/the-mental-model-of-docker-container-shipping/) ### Docker and GCP #### GCP & Docker * The per-project dockerhub is called [Container Registry](https://cloud.google.com/container-registry/) * Your images look like this `eu.gcr.io/project-id/a/b/c:1.0` * You can use [Google Cloud Build](https://cloud.google.com/cloud-build/) to build dockerfiles remotely * `gcloud builds submit --tag gcr.io/[PROJECT_ID]/quickstart-image .` * To use gcloud with docker: `gcloud auth configure-docker` * You can even deploy ["virtual machines" with containers directly](https://cloud.google.com/compute/docs/containers) ### Demo time 