{"id":18159,"date":"2023-07-07T15:00:00","date_gmt":"2023-07-07T14:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/?p=18159"},"modified":"2023-07-06T14:44:12","modified_gmt":"2023-07-06T13:44:12","slug":"get-started-using-analytics-in-the-cloud","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2023\/07\/07\/get-started-using-analytics-in-the-cloud\/","title":{"rendered":"An introduction to cloud analytics"},"content":{"rendered":"

\"An<\/p>\n

Microsoft Azure is a platform that can cater to your analytical workloads – picking the right tool for the right job is the key. Fortunately, the core offerings can be broken down into three platform as a service (PaaS) offerings for storing and managing your high scale data workloads, Azure Data Lake<\/a>, Azure Databricks<\/a> and HDInsight<\/a>, and a well-integrated tool for visualising it, Microsoft Power BI<\/a>.<\/p>\n

Storing and managing your data<\/h2>\n

Analytics in the cloud is ultimately about storing your data in the cloud where it can be conveniently processed using powerful services. There are three Azure services for processing your data. One is built by Microsoft and the other two are popular non-Microsoft platforms hosted as first-party services on Azure.<\/p>\n

Azure Data Lake Analytics (ADLA) is a massively parallel job service that can ingest file data and dynamically process it into more manageable data. ADLA uses U-SQL, a query language that is a mix of C# and SQL. It is deeply integrated with Visual Studio for development and debugging. It is also integrated with Active Directory, so if you are already using Microsoft for your identity management, it is a convenient way to extend your prior technology investments.<\/p>\n

Azure Data Lake Analytics works hand-in-hand with another Azure service called Azure Data Lake Storage (ADLS). ADLS Gen2, which was made available to the public earlier this year, takes many of the features of the original ADLS and builds them on top of Azure Blob Storage<\/a>. Since Azure Data Lake Storage is built around Apache YARN, it will also play well with any platform that uses the open Apache Hadoop Distributed File System (HDFS) standard, such as Databricks or HDInsight.<\/p>\n

Azure Databricks is based on the popular Apache Spark analytics platform and makes it easier to work with and scale data processing and machine learning. The team that developed Databricks is in large part of the same team that originally created Spark as a cluster-computing framework at University of California, Berkeley. In 2017, the Databricks team worked with Microsoft to develop Azure Databricks as a first-party Microsoft service that integrates natively with Active Directory and other Azure tools.<\/p>\n

If you prefer to process and analyse data using open source frameworks, HDInsight is a platform that combines several of them, including Apache Hadoop, Spark, Kafka, Hive, and Storm. This is the most cost-effective option for Azure-based analytics in the cloud. Using open source frameworks also allows you to enjoy community support and community apps while having access to Azure security and service level agreements (SLAs).<\/p>\n

Viewing your data<\/h2>\n

Housing and analysing your data is only part of the story. To visualise your data, Microsoft provides Power BI, a powerful data visualisation tool that integrates with Data Lake Storage<\/a>, Databricks<\/a>, and HDInsight<\/a>.<\/p>\n

Produce dashboards and reports with rich visualisations in Power BI. There are 3 components to note when using Power BI:<\/p>\n