{"id":52524,"date":"2021-10-14T15:00:28","date_gmt":"2021-10-14T14:00:28","guid":{"rendered":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/?p=52524"},"modified":"2022-02-10T20:45:14","modified_gmt":"2022-02-10T19:45:14","slug":"building-scalable-data-science-applications-using-containers-part-6","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/","title":{"rendered":"Building Scalable Data Science Applications using Containers \u2013 Part 6"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader.jpg\" alt=\"An illustration representing a data warehouse, next to an illustration of Bit the Raccoon.\" width=\"1920\" height=\"700\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader.jpg 1920w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-300x109.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-1024x373.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-768x280.jpg 768w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-1536x560.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-330x120.jpg 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-800x292.jpg 800w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader-400x146.jpg 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionheader.jpg\" \/><\/p>\n<p>Welcome to the sixth part of this blog series around using containers for Data Science. In parts <a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2019\/05\/22\/how-to-use-containers-part-1\/\" target=\"_blank\" rel=\"noopener\">one<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/cross-industry\/2019\/05\/31\/how-to-use-containers-part-2\/\" target=\"_blank\" rel=\"noopener\">two<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/cross-industry\/2019\/06\/07\/how-to-use-containers-in-data-science-with-docker-and-azure-part-3\/\" target=\"_blank\" rel=\"noopener\">three<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2019\/10\/03\/using-containers-to-run-r-shiny-workloads-in-azure-part-4\/\" target=\"_blank\" rel=\"noopener\">four<\/a>, and <a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/\" target=\"_blank\" rel=\"noopener\">five<\/a>, we provided a number of building blocks that we\u2019ll use here. If this is the first blog you\u2019ve seen, it may be worth skimming the first five parts, or going back and progressing through them. We make a number of assumptions about your familiarity with Docker, storage, and multi-container applications, which were covered previously.<\/p>\n<p>In this article we convert the previous docker-compose application (<a href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/08\/27\/building-scalable-data-science-applications-using-containers-part-5\/\" target=\"_blank\" rel=\"noopener\">part five<\/a>) to one that capitalises on a Kubernetes approach \u2013 scalability, resilience, predefined configuration packages with Helm etc.<\/p>\n<p>Reviewing the previous Docker approach\u2019s structure, almost everything sits in a container mounting shared storage.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-1.png\" alt=\"A diagram of the Docker model created in the previous article\" width=\"558\" height=\"342\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-1.png 558w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-1-300x184.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-1-330x202.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-1-400x245.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-1.png\" \/><\/p>\n<p>Kubernetes brings a different dimension to how you might consider a solution, and our approach builds on this. In this article, we won\u2019t stretch the bounds of what Kubernetes can do, but we will show how to take an existing containers-based application, and slowly migrate that capability to cloud services with Kubernetes used as an orchestration engine.<\/p>\n<p>This is the revised architecture.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-2.png\" alt=\"A diagram showing the revised structure of the Docker model\" width=\"494\" height=\"360\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-2.png 494w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-2-300x220.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-2-330x240.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-2-400x291.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-2.png\" \/><\/p>\n<p>Things to note about this:<\/p>\n<ul>\n<li>The processing capability sits in Kubernetes, so you can scale out and up as needed.<\/li>\n<li>This is a very simple scenario, so we won\u2019t need capabilities such as load balancers, private networking, GPUs or sophisticated security approaches.<\/li>\n<li>We replaced the database container with a Postgres PaaS service. This allows us to take advantage of default scaling and resiliency patterns built into Azure.<\/li>\n<li>We use Blob storage instead of container volumes. For a Docker Compose application, or local container approach, volumes make sense. However, blob storage is performant, resilient, flexible in terms of security patterns, and we can share that resource across multiple components.<\/li>\n<\/ul>\n<p>We won\u2019t use hard-coded passwords and host names etc within our source as we did in the previous instalment, but we will use configurable variables. This is still less secure than it could be as environment variable values are still visible within Kubernetes configuration files. A more secure approach might use Azure Key Vault and say <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/aks\/csi-secrets-store-driver\" target=\"_blank\" rel=\"noopener\">CSI Secrets<\/a>. However, we want to minimise the length of this blog rather than be distracted by container security. The CSI Secrets link should clarify how to apply this yourself if you need.<\/p>\n<p>For the purposes of this blog, we assume that:<\/p>\n<ul>\n<li>You have the <a href=\"https:\/\/docs.microsoft.com\/en-us\/cli\/azure\/install-azure-cli\" target=\"_blank\" rel=\"noopener\">Azure CLI<\/a> installed in your environment.<\/li>\n<li>You have the <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/tools\/\" target=\"_blank\" rel=\"noopener\">Kubernetes CLI<\/a> installed in your environment.<\/li>\n<li>You have <a href=\"https:\/\/helm.sh\/docs\/intro\/install\/\" target=\"_blank\" rel=\"noopener\">Helm<\/a> installed in your environment.<\/li>\n<li>You have <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/storage\/common\/storage-use-azcopy-v10\" target=\"_blank\" rel=\"noopener\">azcopy<\/a> installed in your environment.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3>Let\u2019s Begin<\/h3>\n<p>All the code for this tutorial can be downloaded <a href=\"https:\/\/github.com\/jonmach\/D4DSAKS\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>We\u2019ll hold our application under a single directory tree. Create the <strong>aks<\/strong> directory, and then beneath that, create sub-directories called <strong>containers\/iload<\/strong>, and <strong>containers\/worker<\/strong>.<\/p>\n<p>As with the previous instalment, we will use the same classic <a href=\"https:\/\/www.cs.toronto.edu\/~kriz\/cifar.html\" target=\"_blank\" rel=\"noopener\">CIFAR<\/a> images set for our testing. There is a GitHub source that has them in jpg form, which can be downloaded <a href=\"https:\/\/github.com\/YoongiKim\/CIFAR-10-images\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>Go into your aks directory and clone the repo. You should see something like the following:<\/p>\n<pre>$ <strong>cd aks<\/strong>\r\n\r\n$ git clone <strong>https:\/\/github.com\/YoongiKim\/CIFAR-10-images.git<\/strong>\r\nCloning into 'CIFAR-10-images'...\r\nremote: Enumerating objects: 60027, done.\r\nremote: Total 60027 (delta 0), reused 0 (delta 0), pack-reused 60027\r\nReceiving objects: 100% (60027\/60027), 19.94 MiB | 16.28 MiB\/s, done.\r\nResolving deltas: 100% (59990\/59990), done.\r\nUpdating files: 100% (60001\/60001), done.\r\n\r\n$ <strong>tree -L 1 .<\/strong>\r\naks\r\n\u251c\u2500\u2500 CIFAR-10-images\r\n\u2514\u2500\u2500 containers\r\n\r\n2 directories<\/pre>\n<p>&nbsp;<\/p>\n<h3>Blob Storage and your image<\/h3>\n<p>Previous, we used container volumes. In this case, we\u2019ll use blob storage and all containers will reference the same content. Copy the script below into a file called <strong>initialprep.sh<\/strong>. Modify the first three lines to refer to the names of an Azure Resource Group, Storage Account, and Blob Container. It will create those resources and upload all the CIFAR images to the Storage Container. If you already have a resource group, storage Account and Storage Container, feel free to remove the lines that do the create.<\/p>\n<pre>RGNAME=\u201d<strong>rg-cifar<\/strong>\u201d\r\nSTG=\u201d<strong>cifarimages<\/strong>\u201d\r\nCON=\u201d<strong>cifarstorage<\/strong>\u201d\r\nEXPIRES=$(date --date='1 days' \"+%Y-%m-%d\")\r\nIMAGEDIR=\u201dCIFAR-10-images\u201d\r\n\r\n# Create environment\r\naz group create -l uksouth -n $RGNAME\r\naz storage account create --name $STG --resource-group $RGNAME --location uksouth --sku Standard_ZRS # Create Storage Account\r\naz storage container create --account-name $STG --name $CON --auth-mode login # Create your storage container\r\n\r\nACCOUNTKEY=$(az storage account keys list --resource-group $RGNAME --account-name $STG | grep -i value | head -1 | cut -d':' -f2 | tr -d [\\ \\\"])\r\n\r\n# Generate a temporary SAS key\r\nSAS=$(az storage container generate-sas --account-key $ACCOUNTKEY --account-name $STG --expiry $EXPIRES --name $CON --permissions acldrw | tr -d [\\\"])\r\n\r\n# Determine your URL endpoint\r\nSTGURL=$(az storage account show --name $STG --query primaryEndpoints.blob | tr -d [\\\"])\r\nCONURL=\"$STGURL$CON\"\r\n\r\n# Copy the files to your storage container\r\nazcopy cp \"$IMAGEDIR\" \"$CONURL?$SAS\" \u2013recursive<\/pre>\n<p>When we run this, you should see the resource creation followed by the upload to the repository.<\/p>\n<pre>$ <strong>initialprep.sh<\/strong>\r\n{\r\n    \"id\": \"\/subscriptions\/f14bca45-bd2d-42f2-8a45-1248ab77ba72\/resourceGroups\/rg-cifar2\",\r\n    \"location\": \"uksouth\",\r\n    \"managedBy\": null,\r\n    \"name\": \"rg-cifar2\",\r\n    \"properties\": {\r\n        \"provis\r\n\r\nJob 8b0ccc36-2050-0a44-496e-c09d979f3169 summary\r\nElapsed Time (Minutes): 0.8001\r\nNumber of File Transfers: 60025\r\nNumber of Folder Property Transfers: 0\r\nTotal Number of Transfers: 60025\r\nNumber of Transfers Completed: 60025\r\nNumber of Transfers Failed: 0\r\nNumber of Transfers Skipped: 0\r\nTotalBytesTransferred: 83127418\r\nFinal Job Status: Completed<\/pre>\n<p>&nbsp;<\/p>\n<h3>Database Storage<\/h3>\n<p>The previous container-only approach used a Postgres container to record results. Azure provides resilient, scalable services, which are easily configurable, so there\u2019s no need to build our own. Let\u2019s provision one of those services and refer to it later.<\/p>\n<p>Below you can see how to list available Postgres SKU types, where the format is (<strong>Model_Generation_Cores<\/strong>), so a <strong>B<\/strong>asic single core <strong>Gen 5<\/strong> server would be \u201c<strong>B_Gen5_1<\/strong>\u201d.<\/p>\n<pre><strong>$ az postgres server list-skus -l uksouth | grep -i id<\/strong>\r\n\r\n    \"id\": \"Basic\",\r\n        \"id\": \"B_Gen5_1\",\r\n        \"id\": \"B_Gen5_2\",\r\n    \"id\": \"GeneralPurpose\",\r\n        \"id\": \"GP_Gen5_2\",\r\n        \"id\": \"GP_Gen5_4\",\r\n        \"id\": \"GP_Gen5_8\",\r\n        \"id\": \"GP_Gen5_16\",\r\n        \"id\": \"GP_Gen5_32\",\r\n        \"id\": \"GP_Gen5_64\",\r\n    \"id\": \"MemoryOptimized\",\r\n        \"id\": \"MO_Gen5_2\",\r\n        \"id\": \"MO_Gen5_4\",\r\n        \"id\": \"MO_Gen5_8\",\r\n        \"id\": \"MO_Gen5_16\",\r\n        \"id\": \"MO_Gen5_32\",<\/pre>\n<p>Choose the smallest server available. We\u2019ll allocate a basic single core server with 50GB of storage. At the time of writing, this cost around \u00a325\/month but we could also have chosen much less expensive SQL-DB server for around \u00a35\/month with 2GB of storage, but we\u2019d need to change your SQL slightly. We\u2019ve changed as little as necessary from our previous instalment of this blog, but feel free to make your own optimisations.<\/p>\n<p>Here you can see that I\u2019m provisioning a database called <strong>cifardb<\/strong> with an administrator name of \u2018<strong>jon<\/strong>\u2019 and a password of \u2018<strong>P@ssw0rd123<\/strong>\u2019, It also returns the fully qualified domain name of the server (<strong>cifardb.postgres.database.azure.com<\/strong>).<\/p>\n<p>By default, Postgres denies access to all services. You can define private networks to ensure very granular access within and from outside Azure. In this case, we\u2019ll provide default access to any Azure service (e.g. Kubernetes). Note, that this <strong>does not<\/strong> provide access to any external public endpoint.<\/p>\n<pre><strong>$ az postgres server create --resource-group rg-cifar --name cifardb --location uksouth --admin-user jon --admin-password \"P@ssw0rd123\" --sku-name B_Gen5_1 --storage-size 51200\r\nChecking the existence of the resource group 'rg-cifar'...<\/strong>\r\n{\r\n.\r\n.\r\n    \"administratorLogin\": \"jon\",\r\n    \"password\": \"P@ssw0rd123\",\r\n.\r\n    \"fullyQualifiedDomainName\": \"cifardb.postgres.database.azure.com\",\r\n.\r\n}\r\n$\r\n\r\n# Allow Azure services (e.g. Kubernetes) to access this\r\n<strong>$ az postgres server firewall-rule create --resource-group rg-cifar --server-name cifardb --name \"AllowAllLinuxAzureIps\" --start-ip-address \"0.0.0.0\" --end-ip-address \"0.0.0.0\"\r\n<\/strong>\r\n{\r\n    \"endIpAddress\": \"0.0.0.0\",\r\n.\r\n    \"startIpAddress\": \"0.0.0.0\",\r\n    \"type\": \"Microsoft.DBforPostgreSQL\/servers\/firewallRules\"\r\n}<\/pre>\n<p>&nbsp;<\/p>\n<h3>The Kubernetes Cluster<\/h3>\n<p>We\u2019re now at the stage where the components to be added are containers. Where we previously used Docker, we\u2019ll now run them on a Kubernetes cluster. The purpose of this article is not to focus on everything Kubernetes. Rather, give a simple example of running Data Science services on Azure Kubernetes.<\/p>\n<p>There are many publicly available guides to understanding the fundamentals of Kubernetes, as well as the Azure approach to implementing it. Microsoft has a set of modules that will introduce you to many of the concepts <a href=\"https:\/\/docs.microsoft.com\/en-us\/learn\/paths\/intro-to-kubernetes-on-azure\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>Create a file called <strong>aks.sh<\/strong> containing the following, and place this within the aks directory. Replace the Resource Group, AKS Server Name and Azure Container Repository names with your choices.<\/p>\n<pre>RGNAME=<strong>rg-cifar<\/strong>\r\nAKSNAME=<strong>cifarcluster<\/strong>\r\nACRNAME=<strong>jmcifaracr<\/strong>\r\n\r\n# Create an AKS cluster with default settings\r\naz aks create -g $RGNAME -n $AKSNAME --kubernetes-version 1.19.11\r\n\r\n# Create an Azure Container Registry\r\naz acr create --resource-group $RGNAME --name $ACRNAME --sku Basic\r\n\r\n# Attach the ACR to the AKS cluster\r\naz aks update -n $AKSNAME -g $RGNAME --attach-acr $ACRNAME<\/pre>\n<p>What this does is create a Kubernetes cluster and a Container Registry and then gives the cluster permission to pull images from the registry. Execute that script.<\/p>\n<pre>$ <strong>aks.sh<\/strong>\r\n{\r\n.\r\n    \"kubernetesVersion\": \"1.19.11\",\r\n.\r\n    \"networkProfile\": {\r\n        \"dnsServiceIp\": \"10.0.0.10\",\r\n.\r\n}<\/pre>\n<p>Now we\u2019ll let our local Kubernetes CLI environment (e.g. laptop \/ desktop) connect to our Azure Kubernetes cluster and confirm that we can see services running.<\/p>\n<pre><strong>$ az aks get-credentials --name cifarcluster --resource-group rg-cifar\r\n\r\n$ kubectl get services -A<\/strong>\r\n\r\nNAMESPACE   NAME                           TYPE        CLUSTER-IP    EXTERNAL-IP PORT(S)       AGE\r\ndefault     kubernetes                     ClusterIP   10.0.0.1      &lt;none&gt;      443\/TCP       27d\r\nkube-system healthmodel-replicaset-service ClusterIP   10.0.243.143  &lt;none&gt;      25227\/TCP     27d\r\nkube-system kube-dns                       ClusterIP   10.0.0.10     &lt;none&gt;      53\/UDP,53\/TCP 27d\r\nkube-system metrics-server                 ClusterIP   10.0.133.242  &lt;none&gt;      443\/TCP       27d<\/pre>\n<p>This shows the cluster running and that we can control it from our local environment.<\/p>\n<p>&nbsp;<\/p>\n<h3>Sense Check<\/h3>\n<p>Let\u2019s confirm where we are in the overall process.<\/p>\n<ol>\n<li>We created an Azure environment to run our application.<\/li>\n<li>We allocated some Azure storage and uploaded 60,000 images.<\/li>\n<li>We created an Azure (Postgres) database.<\/li>\n<li>We set up a Kubernetes environment to run our application.<\/li>\n<\/ol>\n<p>The final part is to add the application. The key thing to consider with this new approach is that where previously we built all the services, a cloud platform allows us to take advantage of commodity capabilities that are already designed to be scalable and resilient. Looking at the diagram of the cloud version of this application, there are three components outstanding, and each of these uses containers.<\/p>\n<ol>\n<li>The RabbitMQ service to queue requests.<\/li>\n<li>A process to add new image requests to the queue.<\/li>\n<li>A process to take a request off the queue, categorise it and record the result.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h3>The Queue Process<\/h3>\n<p>The next component in our solution is the queueing mechanism. Previously, we built a RabbitMQ container to manage our requests. We\u2019ll do the same here, but not with a Dockerfile. We could, but let\u2019s show you an alternative approach using Helm. Helm is a Kubernetes package manager that allows you to install and configure applications very easily. We could achieve the same by building our own container, but Helm makes the process trivial, and there are many ready-made applications available. The documentation for installing RabbitMQ using Helm <a href=\"https:\/\/bitnami.com\/stack\/rabbitmq\/helm\" target=\"_blank\" rel=\"noopener\">can be found here<\/a>, but the two lines below are all I needed to get RabbitMQ installed and running in my environment.<\/p>\n<pre><strong>$ helm repo add bitnami https:\/\/charts.bitnami.com\/bitnami\r\n$ helm install rabbitmq bitnami\/rabbitmq<\/strong>\r\n\r\n.\r\n.\r\nCredentials:\r\n    echo \"Username : user\"\r\n    echo \"Password : $(kubectl get secret --namespace default rabbitmq -o jsonpath=\"{.data.rabbitmq-password}\" | base64 --decode)\"\r\n    echo \"ErLang Cookie : $(kubectl get secret --namespace default rabbitmq -o jsonpath=\"{.data.rabbitmq-erlang-cookie}\" | base64 --decode)\"\r\n.\r\n.\r\n.\r\nTo Access the RabbitMQ AMQP port:\r\n    echo \"URL : amqp:\/\/127.0.0.1:5672\/\"\r\n    kubectl port-forward --namespace default svc\/rabbitmq 5672:5672\r\nTo Access the RabbitMQ Management interface:\r\n    echo \"URL : http:\/\/127.0.0.1:15672\/\"\r\n    kubectl port-forward --namespace default svc\/rabbitmq 15672:15672<\/pre>\n<p>There is some interesting information to note here:<\/p>\n<ol>\n<li>You can delete the same deployment using \u2018<strong>helm delete rabbitmq<\/strong>\u2019<\/li>\n<li>It provided you with a means of finding out default credentials if you didn\u2019t provide them as part of the initial configuration.<\/li>\n<li>The \u2018<strong>port forward<\/strong>\u2019 command shown here allows you to access the RabbitMQ service contained in your Azure container, from your local browser, and a local IP address. You will see later that there is actually no external IP exposed in this environment. This elegantly provides you with a means of interacting with your service.<\/li>\n<\/ol>\n<pre>$ echo \"Username : user\"\r\nUsername : <strong>user<\/strong>\r\n$ echo \"Password : $(kubectl get secret --namespace default rabbitmq -o jsonpath=\"{.data.rabbitmq-password}\" | base64 --decode)\"\r\nPassword : <strong>7TrP8KOVdC<\/strong><\/pre>\n<p>We\u2019ll need these credentials in a minute. In the meantime, let\u2019s see what was deployed in our environment:<\/p>\n<pre>$ <strong>kubectl get services<\/strong>\r\nNAME                TYPE        CLUSTER-IP     EXTERNAL-IP PORT(S)                     AGE\r\nkubernetes          ClusterIP   10.0.0.1       443\/TCP                                 27d\r\nrabbitmq            ClusterIP   10.0.180.137   5672\/TCP,4369\/TCP,25672\/TCP,15672\/TCP   15m\r\nrabbitmq-headless   ClusterIP   None           4369\/TCP,5672\/TCP,25672\/TCP,15672\/TCP   15m\r\n\r\n$ <strong>kubectl get pods<\/strong>\r\nNAME         READY   STATUS    RESTARTS   AGE\r\nrabbitmq-0   1\/1     Running   0          16m<\/pre>\n<p>As there is no external IP address, use the <strong>port forward<\/strong> command and let\u2019s interact with RabbitMQ.<\/p>\n<pre><strong>$ kubectl port-forward --namespace default svc\/rabbitmq 15672:15672 &amp;<\/strong>\r\n[1] 88032\r\nForwarding from 127.0.0.1:15672 -&gt; 15672\r\nForwarding from [::1]:15672 -&gt; 15672<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-3.png\" alt=\"The RabbitMQ login screen\" width=\"600\" height=\"212\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-3.png 600w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-3-300x106.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-3-330x117.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-3-400x141.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-3.png\" \/><\/p>\n<p>If we now add the credentials extracted earlier, we can see our running RabbitMQ environment.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-4.png\" alt=\"The newly setup RabbitMQ environment\" width=\"758\" height=\"464\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-4.png 758w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-4-300x184.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-4-330x202.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-4-400x245.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-4.png\" \/><\/p>\n<p>&nbsp;<\/p>\n<h3>The Initial Load<\/h3>\n<p>This process performs two functions. First, it connects to our Postgres environment and creates the CATEGORY_RESULTS table if it doesn\u2019t already exist, and then it queues all the images that were uploaded to the storage account earlier so they can be classified. In this example we\u2019re running this as a one-off, but you could also take a more sophisticated approach using a location argument for daily, or ad-hoc batches of images.<\/p>\n<p>Go into the <strong>containers\/iload<\/strong> directory and create a file called <strong>iload.py<\/strong> containing the following:<\/p>\n<pre>#!\/usr\/bin\/env python\r\nimport sys, os, json, pika\r\nimport psycopg2\r\nfrom azure.storage.blob import ContainerClient\r\n\r\n# Get Environment Vars\r\n<strong>RMQ_USER<\/strong>=os.environ[\"RMQ_USER\"] # RabbitMQ Username\r\n<strong>RMQ_PASS<\/strong>=os.environ[\"RMQ_PASS\"] # RabbitMQ Password\r\n<strong>RMQ_HOST<\/strong>=os.environ[\"RMQ_HOST\"] # RabbitMQ Hostname\r\n<strong>SQL_HOST<\/strong>=os.environ[\"SQL_HOST\"] # SQL Hostname\r\n<strong>SQL_DB<\/strong>=os.environ[\"SQL_DB\"] # SQL Database\r\n<strong>SQL_USER<\/strong>=os.environ[\"SQL_USER\"] # SQL Username\r\n<strong>SQL_PASS<\/strong>=os.environ[\"SQL_PASS\"] # SQL Password\r\n<strong>STG_ACNAME<\/strong>=os.environ[\"STG_ACNAME\"] # Storage Account Name\r\n<strong>STG_ACKEY<\/strong>=os.environ[\"STG_ACKEY\"] # Storage Account Key\r\n\r\n# Set up database table if needed\r\ncmd = <strong>\"\"\"\r\n                CREATE TABLE IF NOT EXISTS CATEGORY_RESULTS (\r\n                FNAME VARCHAR(1024) NOT NULL,\r\n                CATEGORY NUMERIC(2) NOT NULL,\r\n                PREDICTION NUMERIC(2) NOT NULL,\r\n                CONFIDENCE REAL);\r\n      \"\"\"<\/strong>\r\npgconn = psycopg2.connect(user=SQL_USER, password=SQL_PASS,\r\n                host=SQL_HOST, port=\"5432\", database=SQL_DB)\r\ncur = pgconn.cursor()\r\ncur.execute(cmd)\r\ncur.close()\r\npgconn.commit()\r\n\r\n# Load all images in defined storage account\r\nCONNECTION_STRING=\"DefaultEndpointsProtocol=https\" + \\\r\n    \";EndpointSuffix=core.windows.net\" + \\\r\n    \";AccountName=\"+STG_ACNAME+\";AccountKey=\"+STG_ACKEY\r\nROOT=\"\/CIFAR-10-images\" # This is where the images are held\r\ncontainer = ContainerClient.from_connection_string(CONNECTION_STRING, container_name=\"cifar\")\r\n\r\nrLen = len(ROOT)\r\nclasses = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')\r\n\r\n# Determine the expected category by parsing the directory (after the root path)\r\ndef fnameToCategory(fname):\r\n    for c in classes:\r\n        if (fname.find(c) &gt; rLen):\r\n            return (classes.index(c))\r\n    return -1 # This should never happen\r\n\r\nIMGS=[]\r\nblob_list = container.list_blobs()\r\nfor blob in blob_list:\r\n    if blob.name.endswith(('.png', '.jpg', '.jpeg')):\r\n        cat = fnameToCategory(blob.name)\r\n        data = {\"image\" : blob.name, \"category\": cat, \"catName\": classes[cat]}\r\n        message = json.dumps(data)\r\n        IMGS.append(message)\r\nprint(\"Number of Images to add to queue = \", len(IMGS))\r\n\r\n# Now write them into the queue\r\ncredentials = pika.PlainCredentials(RMQ_USER, RMQ_PASS)\r\nparameters = pika.ConnectionParameters(RMQ_HOST, 5672, '\/', credentials)\r\nconnection = pika.BlockingConnection(parameters)\r\nchannel = connection.channel()\r\nchannel.queue_declare(queue='image_queue', durable=True)\r\n\r\nfor i in IMGS:\r\n    channel.basic_publish( exchange='', routing_key='image_queue', body=i,\r\n        properties=pika.BasicProperties(delivery_mode=2,)\r\n    )\r\n    print(\"Queued \", i)\r\n\r\nconnection.close()<\/pre>\n<p>As with the previous version of this application, the script extracts all image names in our storage location and adds them to a queue to be classified. The first key difference with this version is that our images aren\u2019t stored in a container\u2019s local disk, but in an Azure storage account so we\u2019ll need our blob storage credentials.<\/p>\n<p>The second thing to note is that we\u2019re using environment variables within the code. This means that the script can refer to customised and changing services without a need to continually modify the code. You can use the same code against different data sources, queues, or storage accounts.<\/p>\n<p>In the <strong>containers\/iload<\/strong> directory create a file called <strong>Dockerfile<\/strong> containing the following.<\/p>\n<pre>FROM ubuntu\r\n\r\nRUN apt-get update\r\nRUN apt-get install -y python3 python3-pip\r\n\r\nRUN apt-get update &amp;&amp; apt-get install -y poppler-utils net-tools vim\r\nRUN pip install azureml-sdk\r\nRUN pip install azureml-sdk[notebooks]\r\nRUN pip install azure.ai.formrecognizer\r\nRUN pip install azure.storage.blob\r\nRUN pip install jsonify\r\nRUN pip install pika\r\nRUN pip install psycopg2-binary\r\n\r\nADD iload.py \/\r\n\r\nCMD [\"python3\", \".\/iload.py\" ]<\/pre>\n<p>This simply defines a container with Python installed, and relevant libraries to access Azure storage, Postgres, and RabbitMQ.<\/p>\n<p>Within that directory, build the container, and then we\u2019ll then move it to our Azure Container Registry.<\/p>\n<pre><strong>$ docker build -t iload .<\/strong>\r\n.\r\n.\r\n=&gt; writing image sha256:4ef19e469755572da900ec15514a4a205953a457c4f06f2795b150db3f2b11eb \r\n=&gt; naming to docker.io\/library\/iload<\/pre>\n<p>Now we\u2019ll log in to our Azure Container Registry, tag our local image against a target image in the remote repository, and then push it to Azure. We\u2019ll also confirm that it is there, by doing an Azure equivalent of a <strong>docker images<\/strong> (az acr repository list\u2026). Note that we are prefixing the image tag with the name of the Azure Container Registry (<strong>jmcifaracr.azurecr.io<\/strong>).<\/p>\n<pre># Login to the Azure Container Repository\r\n$ <strong>az acr login -n rg-cifar -n jmcifaracr<\/strong>\r\nLogin Succeeded\r\n\r\n$ <strong>docker tag iload jmcifaracr.azurecr.io\/iload:1.0<\/strong>\r\n\r\n$ <strong>docker images<\/strong>\r\nREPOSITORY                    TAG      IMAGE ID       CREATED          SIZE\r\niload                         latest   4ef19e469755   32 minutes ago   1.23GB\r\njmcifaracr.azurecr.io\/iload   1.0      4ef19e469755   32 minutes ago   1.23GB\r\n\r\n$ <strong>docker push jmcifaracr.azurecr.io\/iload:1.0<\/strong>\r\nThe push refers to repository [jmcifaracr.azurecr.io\/iload]\r\n6dfdee2e824f: Pushed\r\ne35525d1f4bf: Pushed\r\n.\r\n.\r\n4942a1abcbfa: Pushed\r\n1.0: digest: sha256:e9d606e50f08c682969afe4f59501936ad0706c4a81e43d281d66073a9d4ef28 size: 2847\r\n\r\n$ <strong>az acr repository list --name jmcifaracr --output table<\/strong>\r\nResult\r\n--------\r\nIload<\/pre>\n<p>We\u2019re almost there.<\/p>\n<p>Kubernetes has a number of ways of executing workload. The two we\u2019re interested in specifically are deployments and jobs. The key difference is that a job is executed once, whereas a deployment is expected to remain operational, and if anything happens to the process, then Kubernetes will attempt to keep that resource operational. In other words, if a container dies, then it will be restarted.<\/p>\n<p>For the iload process, we only want this to load our 60,000 images and then terminate. We don\u2019t want to load the images, and restart the container, only to load them again, and again etc. To run this job, we\u2019ll provide a configuration file containing the job details and submit it to Kubernetes.<\/p>\n<p>In the <strong>containers\/iload<\/strong> directory, create a file called <strong>iload-job.yml<\/strong> with the following:<\/p>\n<pre>apiVersion: batch\/v1\r\nkind: Job\r\nmetadata:\r\n    name: iload\r\nspec:\r\n    template:\r\n        spec:\r\n            containers:\r\n            - name: iload\r\n                image: jmcifaracr.azurecr.io\/iload:1.0\r\n                imagePullPolicy: Always\r\n                env:\r\n                    - name: RMQ_USER\r\n                      value: \"<strong>user<\/strong>\"\r\n                    - name: RMQ_PASS\r\n                      value: \"<strong>7TrP8KOVdC<\/strong>\"\r\n                    - name: RMQ_HOST\r\n                      value: \"<strong>rabbitmq<\/strong>\"\r\n                    - name: SQL_HOST\r\n                      value: \"<strong>cifardb.postgres.database.azure.com<\/strong>\"\r\n                    - name: SQL_DB\r\n                      value: \"<strong>postgres<\/strong>\"\r\n                    - name: SQL_USER\r\n                      value: \"<strong>jon@cifardb.postgres.database.azure.com<\/strong>\"\r\n                    - name: SQL_PASS\r\n                      value: \"<strong>P@ssw0rd123<\/strong>\"\r\n                    - name: STG_ACNAME\r\n                      value: \"<strong>cifarimages<\/strong>\"\r\n                    - name: STG_ACKEY\r\n                      value: \"<strong>xxxxxxxxxxxxxxxx<\/strong>\"\r\n                resources:\r\n                    requests:\r\n                      cpu: 500m\r\n                      memory: 512Mi\r\n                    limits:\r\n                      cpu: 500m\r\n                      memory: 512Mi\r\n           restartPolicy: Never<\/pre>\n<p>Let\u2019s spend some time looking at this.<\/p>\n<p>The job is going to process the images just uploaded to the container repository. All variables in the script are defined here. We could run this using different values and keep our source code stable. We are using the RabbitMQ and Postgres credentials shown earlier. In addition, we\u2019re referencing our blob storage key and container derived earlier.<\/p>\n<p>Note that the passwords are shown here in clear text, and ideally, we would use something like Azure Key Vault where none of this information is visible. You might consider a more secure approach using <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/aks\/csi-secrets-store-driver\" target=\"_blank\" rel=\"noopener\">CSI Secrets<\/a>, where none of this information is exposed outside of the container.<\/p>\n<p>If we kick off that job using kubectl, you will see it being deployed, and a pod created. Once the job completes, you can also see that the container logs show the job\u2019s progress.<\/p>\n<pre>$ <strong>kubectl apply -f iload-job.yml<\/strong>\r\njob.batch\/iload created\r\n\r\n$ <strong>kubectl get pods<\/strong>\r\nNAME READY STATUS RESTARTS AGE\r\niload-gpgqg 1\/1 Running 0 41s\r\nrabbitmq-0 1\/1 Running 0 159m\r\n\r\n$ <strong>kubectl get jobs<\/strong>\r\nNAME COMPLETIONS DURATION AGE\r\niload 1\/1 62s 17m\r\n\r\n$ <strong>kubectl logs iload-gpgqg<\/strong>\r\n.\r\n.\r\n.\r\nQueued {\"image\": \"CIFAR-10-images\/train\/truck\/4992.jpg\", \"category\": 9, \"catName\": \"truck\"}\r\nQueued {\"image\": \"CIFAR-10-images\/train\/truck\/4993.jpg\", \"category\": 9, \"catName\": \"truck\"}\r\nQueued {\"image\": \"CIFAR-10-images\/train\/truck\/4994.jpg\", \"category\": 9, \"catName\": \"truck\"}\r\nQueued {\"image\": \"CIFAR-10-images\/train\/truck\/4995.jpg\", \"category\": 9, \"catName\": \"truck\"}\r\nQueued {\"image\": \"CIFAR-10-images\/train\/truck\/4996.jpg\", \"category\": 9, \"catName\": \"truck\"}\r\nQueued {\"image\": \"CIFAR-10-images\/train\/truck\/4997.jpg\", \"category\": 9, \"catName\": \"truck\"}\r\nQueued {\"image\": \"CIFAR-10-images\/train\/truck\/4998.jpg\", \"category\": 9, \"catName\": \"truck\"}\r\nQueued {\"image\": \"CIFAR-10-images\/train\/truck\/4999.jpg\", \"category\": 9, \"catName\": \"truck\"}<\/pre>\n<p>If you return to the RabbitMQ dashboard, you will see the queue contents increase from zero to 60,000 items. At its peak, the job added around 3,500 requests per second.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-5.png\" alt=\"Showing the queue contents increased to 60,000\" width=\"776\" height=\"374\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-5.png 776w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-5-300x145.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-5-768x370.png 768w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-5-330x159.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-5-400x193.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-5.png\" \/><\/p>\n<p>The final component in our application is the worker process. Its role is to take an item off the queue, classify it, and then record accuracy of predictions.<\/p>\n<p>Go into the <strong>containers\/worker<\/strong> directory and create a file called <strong>worker.py<\/strong> containing the following:<\/p>\n<pre>#!\/usr\/bin\/env python\r\n\r\nfrom mxnet import gluon, nd, image\r\nimport mxnet as mx\r\nfrom mxnet.gluon.data.vision import transforms\r\nfrom gluoncv import utils\r\nfrom gluoncv.model_zoo import get_model\r\nimport psycopg2\r\nimport pika, time, os, json\r\nfrom azure.storage.blob import ContainerClient\r\n\r\nimport cv2\r\nimport numpy as np\r\n\r\n# Get Environment Vars\r\nRMQ_USER=os.environ[\"<strong>RMQ_USER<\/strong>\"] # RabbitMQ Username\r\nRMQ_PASS=os.environ[\"<strong>RMQ_PASS<\/strong>\"] # RabbitMQ Password\r\nRMQ_HOST=os.environ[\"<strong>RMQ_HOST<\/strong>\"] # RabbitMQ Hostname\r\nSQL_HOST=os.environ[\"<strong>SQL_HOST<\/strong>\"] # SQL Hostname\r\nSQL_DB=os.environ[\"<strong>SQL_DB<\/strong>\"] # SQL Database\r\nSQL_USER=os.environ[\"<strong>SQL_USER<\/strong>\"] # SQL Username\r\nSQL_PASS=os.environ[\"<strong>SQL_PASS<\/strong>\"] # SQL Password\r\nSTG_ACNAME=os.environ[\"<strong>STG_ACNAME<\/strong>\"] # Storage Account Name\r\nSTG_ACKEY=os.environ[\"<strong>STG_ACKEY<\/strong>\"] # Storage Account Key\r\nLOGTODB=os.environ[\"<strong>LOGTODB<\/strong>\"] # Log data to Database?\r\n\r\n# Location of Images on blob storage\r\nCONNECTION_STRING=\"DefaultEndpointsProtocol=https\" + \\\r\n    \";EndpointSuffix=core.windows.net\" + \\\r\n    \";AccountName=\"+STG_ACNAME+\";AccountKey=\"+STG_ACKEY\r\n\r\ncontainer = ContainerClient.from_connection_string(CONNECTION_STRING, container_name=\"cifar\")\r\n\r\nclass_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\r\nnet = get_model('cifar_resnet110_v1', classes=10, pretrained=True)\r\n\r\ntransform_fn = transforms.Compose([\r\n        transforms.Resize(32), transforms.CenterCrop(32), transforms.ToTensor(),\r\n        transforms.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])\r\n    ])\r\n\r\ndef predictCategory(fname):\r\n    blob_client = container.get_blob_client(fname)\r\n    imgStream = blob_client.download_blob().readall()\r\n    img = mx.ndarray.array(cv2.imdecode(np.frombuffer(imgStream, np.uint8), -1))\r\n    img = transform_fn(img)\r\n    \r\n    pred = net(img.expand_dims(axis=0))\r\n    ind = nd.argmax(pred, axis=1).astype('int')\r\n    print('%s is classified as [%s], with probability %.3f.'%\r\n        (fname, class_names[ind.asscalar()], nd.softmax(pred)[0][ind].asscalar()))\r\n    return ind.asscalar(), nd.softmax(pred)[0][ind].asscalar()\r\n\r\ndef InsertResult(connection, fname, category, prediction, prob):\r\n    count=0\r\n    try:\r\n        cursor = connection.cursor()\r\n        qry = \"\"\" INSERT INTO CATEGORY_RESULTS (FNAME, CATEGORY, PREDICTION, CONFIDENCE) VALUES (%s,%s,%s,%s)\"\"\"\r\n        record = (fname, category, prediction, prob)\r\n        cursor.execute(qry, record)\r\n\r\n        connection.commit()\r\n        count = cursor.rowcount\r\n\r\n    except (Exception, psycopg2.Error) as error :\r\n        if(connection):\r\n            print(\"Failed to insert record into category_results table\", error)\r\n    finally:\r\n        cursor.close()\r\n        return count\r\n\r\n# Routine to pull message from queue, call classifier, and insert result to the DB\r\ndef callback(ch, method, properties, body):\r\n    data = json.loads(body)\r\n    fname = data['image']\r\n    cat = data['category']\r\n    pred, prob = predictCategory(fname)\r\n    if (LOGTODB == 1):\r\n        count = InsertResult(pgconn, fname, int(cat), int(pred), float(prob))\r\n    else:\r\n        count = 1 # Ensure the message is ack'd and removed from queue\r\n    \r\n    if (count &gt; 0):\r\n        ch.basic_ack(delivery_tag=method.delivery_tag)\r\n    else:\r\n        ch.basic_nack(delivery_tag=method.delivery_tag)\r\n\r\npgconn = psycopg2.connect(user=SQL_USER, password=SQL_PASS,\r\n                          host=SQL_HOST, port=\"5432\", database=SQL_DB)\r\ncredentials = pika.PlainCredentials(RMQ_USER, RMQ_PASS)\r\nparameters = pika.ConnectionParameters(RMQ_HOST, 5672, '\/', credentials)\r\nconnection = pika.BlockingConnection(parameters)\r\n\r\nchannel = connection.channel()\r\n\r\nchannel.queue_declare(queue='image_queue', durable=True)\r\nprint(' [*] Waiting for messages. To exit press CTRL+C')\r\n\r\nchannel.basic_qos(prefetch_count=1)\r\nchannel.basic_consume(queue='image_queue', on_message_callback=callback)\r\n\r\nchannel.start_consuming()<\/pre>\n<p>The main function of this hasn\u2019t changed since the previous version of this instalment. It takes a request from the queue, containing an image\u2019s physical location, and its expected category returning a predicted category and a confidence value. It also stores these values in a database if desired.<\/p>\n<p>Like the iload process, the key differences here are as follows:<\/p>\n<ol>\n<li>The configuration is based on environment variables, where previously they were hard coded.<\/li>\n<li>The images are stored in blob storage, and not on local disk.<\/li>\n<\/ol>\n<p>We also added the ability to log results depending on the value of an environment variable, so you might want to play with this to determine the performance impact of logging.<\/p>\n<p>In the <strong>containers\/worker<\/strong> directory create a file called <strong>Dockerfile<\/strong> containing the following.<\/p>\n<pre>FROM ubuntu\r\n\r\nRUN apt-get update\r\nRUN apt-get install -y python3 python3-pip\r\n\r\nRUN pip3 install --upgrade mxnet gluoncv pika\r\nRUN pip3 install psycopg2-binary\r\n\r\nRUN pip install azureml-sdk\r\nRUN pip install azureml-sdk[notebooks]\r\nRUN pip install azure.ai.formrecognizer\r\nRUN pip install azure.storage.blob\r\nRUN pip install opencv-python\r\n\r\nARG DEBIAN_FRONTEND=noninteractive\r\nRUN apt-get install ffmpeg libsm6 libxext6 -y\r\n\r\n# Add worker logic necessary to process queue items\r\nADD worker.py \/\r\n\r\n# Start the worker\r\nCMD [\"python3\", \".\/worker.py\" ]<\/pre>\n<p>Again, this is relatively straight forward. You build a container with the requisite Azure, Python, RabbitMQ, and machine learning libraries installed.<\/p>\n<p>As with the iload process, you need to build a local container, tag it against a target image in the Azure Container Registry and then push it to Azure.<\/p>\n<pre>$ <strong>docker build -t worker .<\/strong>\r\n.\r\n.\r\n=&gt; [12\/12] ADD worker.py \r\n\/\r\n\r\n=&gt; exporting to \r\nimage\r\n\r\n=&gt; =&gt; exporting \r\nlayers\r\n\r\n=&gt; =&gt; writing image \r\nsha256:9716e1e98687cfc3dd5f66640e441e4aa24131ffb3b3bd4c5d0267a06abcc802\r\n\r\n=&gt; =&gt; naming to \r\ndocker.io\/library\/worker\r\n\r\n$ <strong>docker tag worker jmcifaracr.azurecr.io\/worker:1.0<\/strong>\r\n$ <strong>docker images<\/strong>\r\nREPOSITORY                     TAG      IMAGE ID       CREATED              SIZE\r\nworker                         latest   9716e1e98687   About a minute ago   2.24GB\r\njmcifaracr.azurecr.io\/worker   1.0      9716e1e98687   About a minute ago   2.24GB\r\niload                          latest   4ef19e469755   3 hours ago          1.23GB\r\njmcifaracr.azurecr.io\/iload    1.0      4ef19e469755   3 hours ago          1.23GB\r\n\r\n$ <strong>docker push jmcifaracr.azurecr.io\/worker:1.0<\/strong>\r\nThe push refers to repository [jmcifaracr.azurecr.io\/worker]\r\n.\r\n.\r\n\r\n$ <strong>az acr repository list --name jmcifaracr --output table<\/strong>\r\nResult\r\n--------\r\niload\r\nworker<\/pre>\n<p>Now we need to provide a deployment file for the worker process. This defines how it is run within Kubernetes.<\/p>\n<p>In the <strong>containers\/worker<\/strong> directory, create a file called <strong>worker-deployment.yml<\/strong> containing the following:<\/p>\n<pre>apiVersion: apps\/v1\r\nkind: <strong>Deployment<\/strong>\r\nmetadata:\r\n    name: worker\r\nspec:\r\n    <strong>replicas: 1<\/strong>\r\n    selector:\r\n        matchLabels:\r\n            app: worker\r\n    template:\r\n        metadata:\r\n            labels:\r\n                app: worker\r\n        spec:\r\n            containers:\r\n            - name: <strong>worker<\/strong>\r\n                image: <strong>jmcifaracr.azurecr.io\/worker:1.0<\/strong>\r\n                imagePullPolicy: Always\r\n                env:\r\n                    - name: <strong>RMQ_USER<\/strong>\r\n                      value: \"user\"\r\n                    - name: <strong>RMQ_PASS<\/strong>\r\n                      value: \"7TrP8KOVdC\"\r\n                    - name: <strong>RMQ_HOST<\/strong>\r\n                      value: \"rabbitmq\"\r\n                    - name: <strong>SQL_HOST<\/strong>\r\n                      value: \"cifardb.postgres.database.azure.com\"\r\n                    - name: <strong>SQL_DB<\/strong>\r\n                      value: \"postgres\"\r\n                    - name: <strong>SQL_USER<\/strong>\r\n                      value: \"jon@cifardb.postgres.database.azure.com\"\r\n                    - name: <strong>SQL_PASS<\/strong>\r\n                      value: \"P@ssw0rd123\"\r\n                    - name: <strong>STG_ACNAME<\/strong>\r\n                      value: \"cifarimages\"\r\n                    - name: <strong>STG_ACKEY<\/strong>\r\n                      value: \u201cxxxxxxxx\u201d\r\n                    - name: <strong>LOGTODB<\/strong>\r\n                      value: \"1\"\r\n                resources:\r\n                    requests:\r\n                        cpu: 100m\r\n                        memory: 128Mi\r\n                    limits:\r\n                        cpu: 150m\r\n                        memory: 128Mi<\/pre>\n<p>Let\u2019s spend a bit of time going through this as well.<\/p>\n<p>First, this is a deployment, and it ensures that there is always a defined number of <strong>replicas<\/strong> (or pods in this case) running. This configuration uses a single pod, but when we increase this number later, you\u2019ll see how it affects the environment and performance. Second, each pod is allocated an amount of memory and CPU. Some processes are memory intensive, and others compute centric. You can decide how much to dedicate to each pod type.<\/p>\n<p>Let\u2019s deploy that container and evaluate the performance.<\/p>\n<pre>$ <strong>kubectl apply -f worker-deployment.yml<\/strong>\r\ndeployment.apps\/worker created\r\n\r\n$ <strong>kubectl get deployments<\/strong>\r\nNAME     READY   UP-TO-DATE   AVAILABLE   AGE\r\nworker   1\/1     1            1           52s\r\n\r\n$ <strong>kubectl get pods<\/strong>\r\nNAME                      READY   STATUS      RESTARTS   AGE\r\niload-gpgqg               0\/1     Completed   0          110m\r\nrabbitmq-0                1\/1     Running     0          4h29m\r\nworker-5df6cb8cb7-qnwtq   1\/1     Running     0          54s<\/pre>\n<p>You can see that there is an active deployment and a single worker running. This is the view from the RabbitMQ dashboard \u2013 <strong>1.8 requests<\/strong> on average per second.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-6.png\" alt=\"The RabbitMQ dashboard showing 1.8 requests per second\" width=\"758\" height=\"324\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-6.png 758w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-6-300x128.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-6-330x141.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-6-400x171.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-6.png\" \/><\/p>\n<p>Increase the number of parallel workers to <strong>5<\/strong> by modifying the <strong>replica<\/strong> count in the <strong>worker-deployment.yml<\/strong> file and redeploying it. You will then have 5 pods. Each worker takes a request from the queue, performs the image classification, and writes the content to Postgres.<\/p>\n<pre>$ <strong>kubectl apply -f worker-deployment.yml<\/strong>\r\ndeployment.apps\/worker configured\r\n\r\n$ <strong>kubectl get deployments<\/strong>\r\nNAME     READY   UP-TO-DATE   AVAILABLE   AGE\r\nworker   1\/1     1            1           52s\r\n\r\n$ <strong>kubectl get pods<\/strong>\r\nNAME                      READY   STATUS      RESTARTS   AGE\r\niload-gpgqg               0\/1     Completed   0          112m\r\nrabbitmq-0                1\/1     Running     0          4h32m\r\nworker-5df6cb8cb7-flqp4   1\/1     Running     0          51s\r\nworker-5df6cb8cb7-hsl2p   1\/1     Running     0          51s\r\nworker-5df6cb8cb7-qnwtq   1\/1     Running     0          3m32s\r\nworker-5df6cb8cb7-v9t6p   1\/1     Running     0          51s\r\nworker-5df6cb8cb7-x4dt4   1\/1     Running     0          51s<\/pre>\n<p>Performance has now increased to an average of 8.8 requests per second.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-7.png\" alt=\"Showing 8.8 requests per second in RabbitMQ\" width=\"694\" height=\"424\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-7.png 694w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-7-300x183.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-7-330x202.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-7-400x244.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-7.png\" \/><\/p>\n<p>Here is a view of performance after increase the replica count even further to 20 (35 requests per second).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-8.png\" alt=\"Showing the increased performance after improving replica count\" width=\"734\" height=\"450\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-8.png 734w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-8-300x184.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-8-330x202.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-8-400x245.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-8.png\" \/><\/p>\n<p>And then 35 workers (55 requests per second).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full size-full webp-format aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-9.png\" alt=\"Showing the increased performance after improving worker count\" width=\"680\" height=\"410\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-9.png 680w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-9-300x181.png 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-9-330x199.png 330w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-9-400x241.png 400w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/Picture-9.png\" \/><\/p>\n<p>This isn\u2019t linear scalability, nor is it an invitation to simply increase the number of workers to 500. Each Kubernetes node has a limited amount of physical resource. During our tests, we achieved 70 requests per second after playing with how much memory and CPU were allocated to each pod. This is an exercise for you to consider with your own workloads. What should be understood though, is that you can scale your service as needed with the underlying Kubernetes architecture to support that. More pods, nodes, clusters etc as needed.<\/p>\n<p>&nbsp;<\/p>\n<h3>Conclusions and Considerations<\/h3>\n<p>This article showed how to take an existing multi-container Docker application and migrate it to the Azure Kubernetes Service. Where possible, commodity PaaS capabilities were considered (database, storage etc.). We also showed how to use a publicly available configuration using Helm.<\/p>\n<p>The previous instalment of this blog solely used containers writing the results to Postgres. We did the same here, but there\u2019s nothing to suggest a need to immediately query the results. If this were performance critical, we might consider writing the results to a file, and then batch uploading those to a database at some point for analysis \u2013 much more efficient.<\/p>\n<p>Our application is tiny, and arguably too small to justify an entire Kubernetes environment. However, a Kubernetes environment normally runs many different applications simultaneously within private networks, using well defined security, performance monitoring, and with much more flexibility in terms of scalability and cost optimisation. Since you are only charged for the Kubernetes environment not the number of pods, you can run as many or as few applications as you like in that environment subject to capacity.<\/p>\n<p>You might also want to consider adding a node pool for GPU nodes that will dramatically change your performance where your applications are able to use a service\u2019s underlying GPU. More information can be found <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/aks\/gpu-cluster\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>The articles in this series have focused on the basics of containers on Azure to address some data science patterns with an assumed current interest in on-premises containers to deliver data science solutions.<\/p>\n<p>We haven\u2019t considered the use of <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/machine-learning\/mlops\/\" target=\"_blank\" rel=\"noopener\">MLOps<\/a> where you might approach machine learning and data science with the same rigour, governance, and outcome transparency offered to software development. It hasn\u2019t considered the use of <a href=\"https:\/\/docs.microsoft.com\/EN-US\/azure\/machine-learning\/overview-what-is-azure-machine-learning\" target=\"_blank\" rel=\"noopener\">Azure Machine Learning<\/a> where you might want to replace some of your historical code with PaaS machine learning capabilities, and optimised compute.<\/p>\n<p>Future instalments may look at these, incorporating your containers with these prebuilt Azure capabilities.<\/p>\n<p><b>Note:<\/b>\u00a0If you\u2019ve finished this tutorial and created a specific resource group to try it, then you may want to remove it to ensure you\u2019re no longer being charged for resources that are no longer needed.<\/p>\n<p>&nbsp;<\/p>\n<h3 class=\"x-hidden-focus\">About the authors<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-thumbnail size-thumbnail alignright lazyloaded\" src=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/05\/Jon-Machtynger-150x150.jpg\" alt=\"Jon Machtynger\" width=\"150\" height=\"150\" data-sizes=\"\" data-src=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/05\/Jon-Machtynger-150x150.jpg\" data-srcset=\"\" \/>Jon is a Microsoft Cloud Solution Architect specialising in Advanced Analytics &amp; Artificial Intelligence with over 30 years of experience in understanding, translating and delivering leading technology to the market. He currently focuses on a small number of global accounts helping align AI and Machine Learning capabilities with strategic initiatives. He moved to Microsoft from IBM where he was Cloud &amp; Cognitive Technical Leader and an Executive IT Specialist.<\/p>\n<p class=\"x-hidden-focus\">Jon has been the Royal Academy of Engineering Visiting Professor for Artificial Intelligence and Cloud Innovation at Surrey University since 2016, where he lectures on various topics from machine learning, and design thinking to architectural thinking.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"attachment-full webp-format alignright\" src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/NlRMimt__400x400.jpg\" alt=\"A photo of Mark Whitby\" width=\"150\" height=\"150\" data-orig-srcset=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/NlRMimt__400x400.jpg 400w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/NlRMimt__400x400-300x300.jpg 300w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/NlRMimt__400x400-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/NlRMimt__400x400-250x250.jpg 250w, https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/NlRMimt__400x400-330x330.jpg 330w\" data-orig-src=\"https:\/\/www.microsoft.com\/en-us\/industry\/blog\/wp-content\/uploads\/sites\/22\/2021\/10\/NlRMimt__400x400.jpg\" \/>Mark has worked at Microsoft for five and a half years with a focus on helping customers adopt cloud native technologies. Before Microsoft, he spent around twenty years in the financial services industry, primarily at major UK banks, where he worked in various roles across operations, engineering and architecture. He loves discovering new technologies, learning them in depth and teaching others.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this second article in a two-part miniseries, Jon Machtynger and Mark Whitby convert the previous docker-compose application to one that capitalises on a Kubernetes approach.<\/p>\n","protected":false},"author":430,"featured_media":36918,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ms_queue_force_push":false,"ms_queue_id":"","ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","footnotes":""},"categories":[594],"post_tag":[519],"content-type":[],"coauthors":[531,1776],"class_list":["post-52524","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technetuk","tag-technet-uk"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Building Scalable Data Science Applications using Containers \u2013 Part 6 - Microsoft Industry Blogs - United Kingdom<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building Scalable Data Science Applications using Containers \u2013 Part 6 - Microsoft Industry Blogs - United Kingdom\" \/>\n<meta property=\"og:description\" content=\"In this second article in a two-part miniseries, Jon Machtynger and Mark Whitby convert the previous docker-compose application to one that capitalises on a Kubernetes approach.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Industry Blogs - United Kingdom\" \/>\n<meta property=\"article:published_time\" content=\"2021-10-14T14:00:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-02-10T19:45:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"450\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jon Machtynger, Mark Whitby\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jon Machtynger, Mark Whitby\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"22 min read\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/\"},\"author\":[{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/author\/jon\/\",\"@type\":\"Person\",\"@name\":\"Jon Machtynger\"},{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/author\/mark-whitby\/\",\"@type\":\"Person\",\"@name\":\"Mark Whitby\"}],\"headline\":\"Building Scalable Data Science Applications using Containers \u2013 Part 6\",\"datePublished\":\"2021-10-14T14:00:28+00:00\",\"dateModified\":\"2022-02-10T19:45:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/\"},\"wordCount\":3192,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg\",\"keywords\":[\"TechNet UK\"],\"articleSection\":[\"TechNet UK\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/\",\"url\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/\",\"name\":\"Building Scalable Data Science Applications using Containers \u2013 Part 6 - Microsoft Industry Blogs - United Kingdom\",\"isPartOf\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg\",\"datePublished\":\"2021-10-14T14:00:28+00:00\",\"dateModified\":\"2022-02-10T19:45:14+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#primaryimage\",\"url\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg\",\"contentUrl\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg\",\"width\":800,\"height\":450,\"caption\":\"An illustration representing a data warehouse, next to an illustration of Bit the Raccoon.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building Scalable Data Science Applications using Containers \u2013 Part 6\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#website\",\"url\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/\",\"name\":\"Microsoft Industry Blogs - United Kingdom\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization\",\"name\":\"Microsoft Industry Blogs - United Kingdom\",\"url\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png\",\"contentUrl\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Industry Blogs - United Kingdom\"},\"image\":{\"@id\":\"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building Scalable Data Science Applications using Containers \u2013 Part 6 - Microsoft Industry Blogs - United Kingdom","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/","og_locale":"en_US","og_type":"article","og_title":"Building Scalable Data Science Applications using Containers \u2013 Part 6 - Microsoft Industry Blogs - United Kingdom","og_description":"In this second article in a two-part miniseries, Jon Machtynger and Mark Whitby convert the previous docker-compose application to one that capitalises on a Kubernetes approach.","og_url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/","og_site_name":"Microsoft Industry Blogs - United Kingdom","article_published_time":"2021-10-14T14:00:28+00:00","article_modified_time":"2022-02-10T19:45:14+00:00","og_image":[{"width":800,"height":450,"url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","type":"image\/jpeg"}],"author":"Jon Machtynger, Mark Whitby","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jon Machtynger, Mark Whitby","Est. reading time":"22 min read"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#article","isPartOf":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/"},"author":[{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/author\/jon\/","@type":"Person","@name":"Jon Machtynger"},{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/author\/mark-whitby\/","@type":"Person","@name":"Mark Whitby"}],"headline":"Building Scalable Data Science Applications using Containers \u2013 Part 6","datePublished":"2021-10-14T14:00:28+00:00","dateModified":"2022-02-10T19:45:14+00:00","mainEntityOfPage":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/"},"wordCount":3192,"commentCount":0,"publisher":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization"},"image":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#primaryimage"},"thumbnailUrl":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","keywords":["TechNet UK"],"articleSection":["TechNet UK"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/","name":"Building Scalable Data Science Applications using Containers \u2013 Part 6 - Microsoft Industry Blogs - United Kingdom","isPartOf":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#primaryimage"},"image":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#primaryimage"},"thumbnailUrl":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","datePublished":"2021-10-14T14:00:28+00:00","dateModified":"2022-02-10T19:45:14+00:00","breadcrumb":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#primaryimage","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","contentUrl":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2020\/01\/datasolutionthumb.jpg","width":800,"height":450,"caption":"An illustration representing a data warehouse, next to an illustration of Bit the Raccoon."},{"@type":"BreadcrumbList","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2021\/10\/14\/building-scalable-data-science-applications-using-containers-part-6\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/"},{"@type":"ListItem","position":2,"name":"Building Scalable Data Science Applications using Containers \u2013 Part 6"}]},{"@type":"WebSite","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#website","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/","name":"Microsoft Industry Blogs - United Kingdom","description":"","publisher":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#organization","name":"Microsoft Industry Blogs - United Kingdom","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-content\/uploads\/sites\/22\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Industry Blogs - United Kingdom"},"image":{"@id":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts\/52524","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/users\/430"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/comments?post=52524"}],"version-history":[{"count":0,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/posts\/52524\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/media\/36918"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/media?parent=52524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/categories?post=52524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/post_tag?post=52524"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/content-type?post=52524"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/wp-json\/wp\/v2\/coauthors?post=52524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}