An introduction to cloud analytics
Microsoft Azure is a platform that can cater to your analytical workloads – picking the right tool for the right job is the key. Fortunately, the core offerings can be broken down into three platform as a service (PaaS) offerings for storing and managing your high scale data workloads, Azure Data Lake, Azure Databricks and HDInsight, and a well-integrated tool for visualising it, Microsoft Power BI.
Storing and managing your data
Analytics in the cloud is ultimately about storing your data in the cloud where it can be conveniently processed using powerful services. There are three Azure services for processing your data. One is built by Microsoft and the other two are popular non-Microsoft platforms hosted as first-party services on Azure.
Azure Data Lake Analytics (ADLA) is a massively parallel job service that can ingest file data and dynamically process it into more manageable data. ADLA uses U-SQL, a query language that is a mix of C# and SQL. It is deeply integrated with Visual Studio for development and debugging. It is also integrated with Active Directory, so if you are already using Microsoft for your identity management, it is a convenient way to extend your prior technology investments.
Azure Data Lake Analytics works hand-in-hand with another Azure service called Azure Data Lake Storage (ADLS). ADLS Gen2, which was made available to the public earlier this year, takes many of the features of the original ADLS and builds them on top of Azure Blob Storage. Since Azure Data Lake Storage is built around Apache YARN, it will also play well with any platform that uses the open Apache Hadoop Distributed File System (HDFS) standard, such as Databricks or HDInsight.
Azure Databricks is based on the popular Apache Spark analytics platform and makes it easier to work with and scale data processing and machine learning. The team that developed Databricks is in large part of the same team that originally created Spark as a cluster-computing framework at University of California, Berkeley. In 2017, the Databricks team worked with Microsoft to develop Azure Databricks as a first-party Microsoft service that integrates natively with Active Directory and other Azure tools.
If you prefer to process and analyse data using open source frameworks, HDInsight is a platform that combines several of them, including Apache Hadoop, Spark, Kafka, Hive, and Storm. This is the most cost-effective option for Azure-based analytics in the cloud. Using open source frameworks also allows you to enjoy community support and community apps while having access to Azure security and service level agreements (SLAs).
Viewing your data
Housing and analysing your data is only part of the story. To visualise your data, Microsoft provides Power BI, a powerful data visualisation tool that integrates with Data Lake Storage, Databricks, and HDInsight.
Produce dashboards and reports with rich visualisations in Power BI. There are 3 components to note when using Power BI:
- Power BI Desktop is a Windows desktop application for your data analysts to build and create dashboards and reports to share with your wider organisation and business users .
- Data Analysts will publish their content to the Power BI service, this is a cloud service where you you store and share the reports you create with others members of your organization.
- For those roles within your business that are away from their main devices there are also IOS, Android and Windows apps to access from your mobile devices to get access to your content wherever and whenever you need it.
Power BI offers a variety of visualisation types out-of-the-box such as bar charts, pie charts, gauges, KPIs, scatter charts, and maps. Besides these standard charts, Power BI also enables you to create your own custom visualisations. You can share your visualisations with others on a community site or become inspired by other people’s charts. In addition, you have the ability to create even more impact with Report Themes. As with custom visualisations, you can share your custom designs in the community themes gallery.
Summary
Azure analytics in the cloud provides multiple ways to process and analyse your high scale data, whether you want to use Microsoft solutions or prefer to use open source solutions hosted on Azure. Either way, Azure provides the security, data storage and compute resources, data storage and compute resources to allow you to work with big data in a manner of your choosing through Data Lake Storage, Databricks, and HDInsight. Once your data is processed and analysed, you can use Microsoft Power BI to visualise and present your results on both desktop and mobile platforms and paint a picture of your cloud data.