Cloud & AI Analytics

Integrate Azure Databricks with Azure Machine Learning Workspace

This Blog deals with Integrating Azure Databricks with Azure Machine Learning Workspace

  • Databricks is a unified data and analytics platform built to enable all data personas: data engineers, data scientists and data analysts.
  • It is a managed platform that gives data developers all the tools and infrastructure they need to be able to focus on the data analytics, without worry about managing Databricks clusters, libraries, dependencies, upgrades, and other tasks that are not related to driving insights from data.
  • Higher productivity and collaboration
  • Integrates easily with the whole Microsoft stack
  • Extensive list of data sources
  • Suitable for small jobs too
  • Azure Machine Learning is a separate and modernized service that delivers a complete data science platform.
  • It supports both code-first and low-code experiences.
  • Azure Machine Learning studio is a web portal in Azure Machine Learning that contains low-code and no-code options for project authoring and asset management.

Use of Integrating Azure Databricks with Azure ML

  • To train a model using Spark MLlib and deploy the model to ACI/AKS.
  • With automated machine learning capabilities using an Azure ML SDK.
  • As a compute target from an Azure Machine Learning pipeline.

Summary Steps

  • Please refer the GitHub link for the actual code .
  • Create Storage account, Blob Storage container (is used for this demo). But other data sources like ADLS, cosmos DB, SQL database, MySQL database can also be used.
  • The main script for running this pipeline will be present in azure ml.
  • This pipeline is built in two steps.
  • Data preparation and model building in azure databricks. For this I used mlflow. Similarly we can use automl also.
  • Metrics for evaluating the model performance as a python script is present in azure ml.
  • Adding both the steps into azure ml pipeline step.
  • Execute the pipeline.
  • Track model performance with different metrics logged, model registry and can be monitored in azure ml.

Steps

  • Create Azure Databricks Workspace in azure portal as shown below.
  •  In Azure Databricks workspace , Click on Link Azure ML workspace and UI see below will be popping up.
  • In this, you can create a new ML workspace or you can link the existing workspace. When linking the azure ml workspace make sure(subscription id, resource group and location) are same as azure databricks.
  • Once after completing the above step, we can see azureml is linked with azure databricks UI as shown below.
  • Launch databricks workspace, create cluster with configuration as per requirement.
  • Create notebook and write code for data preprocessing and model building. It can be done independently or else in same notebook based on the data and on requirement.(refer this jupyter notebook as named “data_prepartion”)
  • Launch azureml workspace, create compute instance in compute section of azure ml.
  • Launch Jupyter notebook and create two script file as per the need or design layout. (Refer this python script file as named “DatabricksStep_Job.py” main pipeline script, and Evaluate.py for logging metrics of the registered model.)
  • You can refer these DatabricksStep_Job.py script  in azure ml. It covers all compute creation for both azure ml and azure databricks in it.
  • Compute created for executing commands in evaluate.py in azureml
  • Compute attached for executing commands in data_prepartion ipynb file in azure databricks
  • After running pipeline and executed successfully. Can trace all model metrics, logging information, model registered in azure ml.
  • Similarly we can check each steps involved in this pipeline in azure ml covering azure databricks as well in pipeline over view.

GitHub : https://github.com/vigneshSs-07/Azure_DBK_with_AzureML

Share:

Leave a Comment

Your email address will not be published. Required fields are marked *