Data Science News and Insights | Microsoft Fabric Blog

European Fabric Community Conference 2024: Building an AI-powered data platform

Arun Ulag — Wed, 25 Sep 2024 07:00:00 +0000

Thank you to everyone joining us at the first annual European Microsoft Fabric Community Conference this week in Stockholm, Sweden! Besides seeing the beautiful views of Old Town, attendees are getting an immersive analytics and AI experience across 120 sessions, 3 keynotes, 10 workshops, an expo hall, community lounge, and so much more. They are seeing firsthand the latest capabilities we are bringing to the Fabric platform. For those unable to attend, this blog will highlight the most significant announcements that are already changing the way our customers interact with Fabric.

Microsoft Fabric

Learn how to set up Fabric for your business and discover resources that help you take the first steps

Try Fabric today

Over 14,000 customers have invested in the promise of Microsoft Fabric to accelerate their analytics including industry-leaders like KPMG, Chanel, and Grupo Casas Bahia. For example, Chalhoub Group, a regional luxury retailer with over 750 experiential retail stories, used Microsoft Fabric to modernize its analytics and streamline its data sources into one platform, significantly speeding up their processes.

“It’s about what the technology enables us to achieve—a smarter, faster, and more connected operational environment.”

—Mark Hourany, Director of People Analytics, Chalhoub Group

Check out the myriad ways customers are using Microsoft Fabric to unlock more value from their data:

New capabilities coming to Microsoft Fabric

Since launching Fabric, we’ve released thousands of product updates to create a more complete data platform for our customers. And we aren’t slowing down anytime soon. We’re thrilled to share a new slate of announcements that are applying the power of AI to help you accelerate your data projects and get more done.

Specifically, these updates are focused on making sure Fabric can provide you with:

AI-powered development: Fabric can give teams the AI-powered tools needed for any data project in a pre-integrated and optimized SaaS environment.

An AI-powered data estate: Fabric can help you access your entire multi-cloud data estate from a single, open data lake, work from the same copy of data across analytics engines, and use that data to power AI innovation

AI-powered insights: Fabric can empower everyone to better understand their data with AI-powered visuals and Q&A experiences embedded in the Microsoft 365 apps they use every day.

Let’s look at the latest features and integrations we are announcing in each of these areas.

AI-powered development

Onelake

OneLake, the OneDrive for data

With Microsoft Fabric, you have a single platform that can handle all of your data projects with role-specific tools for data integration, data warehousing, data engineering, data science, real-time intelligence, and business intelligence. All of your data teams can work together in the same pre-integrated, optimized experience, and get started immediately with an intuitive UI and low code tools. All the workloads access the same unified data lake, OneLake, and work from a single pool of capacity to simplify the experience and ease collaboration. With built-in security and governance, you can secure your data from any intrusion and ensure only the right people have access to the right data. And as we continue to infuse Copilot and other AI experiences across Fabric, you can not only use Fabric for any application, but also accelerate time to production. In the video below, check out how users can take advantage of Copilot to create end-to-end solutions in Fabric:

Today, I’m thrilled to share several new enhancements and capabilities coming to the platform and each workload in Fabric.

Fabric platform

We’re building platform-wide capabilities to help you more seamlessly manage DevOps, tackle projects of any scale and complexity. First, we’re updating the UI for deployment pipelines, in preview, to be more focused, easier to navigate, and have a smoother flow, now in preview. Next, we’re introducing the Terraform provider for Fabric, in preview, to help customers ensure deployments and management tasks are executed accurately and consistently. The Terraform provider enables users to automate and streamline deployment and management processes using a declarative configuration language. We are also adding support for Azure service principal in Microsoft Fabric REST APIs to help customers automate the deployment and management of Fabric environments. You can manage principal permissions for Fabric workspaces, as well as the creation and management of Fabric artifacts like eventhouses and lakehouses.

We’re excited to announce the general availability of Fabric Git integration. Sync Fabric workspaces with Git repositories, leverage version control, and collaborate seamlessly using Azure DevOps or GitHub. We are also extending our integration with Visual Studio Code (VS Code). You can now debug Fabric notebooks with the web version of VS Code and integrate Fabric environments as artifacts with the Synapse VS Code extension—allowing you to explore and manage Fabric environments from within VS Code. To learn more about these updates, read the Fabric September 2024 Update blog.

Security and governance

To help organizations govern the massive volumes of data across their data estate, we’re adding more granular data management capabilities including item tagging and enhancements to domains—both of which are now in preview. We’re introducing the ability to apply tags to Fabric items, helping users more easily find and use the right data. Once applied, data consumers can view, search, and filter by the applied tags across various experiences. We’re also enhancing domains and subdomains with more controls for admins including the ability to define a default sensitivity label, domain level export and sharing settings, and insights for admins, on tenant domains. Finally, for data owners, we’re adding the ability to search for data by domain, to filter workspaces by domain, and to view domain details in a data item’s location.

Over the past year, we’ve launched a myriad of security features designed to secure your data at every step of the analytics journey. Two of our network security features, trusted workspace access, and managed private endpoints, were previously only available in F64 or higher capacities. We’re excited to share that, based on your feedback, we are making these features available in all Fabric capacities. We’re also making managed private endpoints available in trial capacities as part of this release.

We’re also announcing deeper integration with Microsoft Purview, Microsoft’s unified data security, data governance, and compliance solution. Coming soon, security admins will be able to use Microsoft Purview Information Protection sensitivity labels to manage who has access to Fabric items with certain labels—similar to Microsoft 365. Also coming soon, we are extending support for Microsoft Purview Data Loss Prevention (DLP) policies, so security admins can apply DLP policies to detect the upload of sensitive data, like social security numbers, to a lakehouse in Fabric. If detected, the policy will trigger an automatic audit activity, can alert the security admin, and can even show a custom policy tip to data owners to remedy themselves. These capabilities will be available at no additional cost during preview in the near term, but will be part of a new Purview pay-as-you-go consumptive model, with pricing details to follow in the future. Learn more about how to secure your Fabric data with Microsoft Purview by watching the following video:

You can also complement and extend the built-in governance in Fabric by seamlessly connecting your Fabric data to the newly reimagined Purview Data Governance solution—now generally available. This new solution delivers an AI-powered, business-friendly, and unified solution that can seamlessly connect to data sources within Fabric and across your data estate to streamline and accelerate the activation of your modern data governance practice. Purview integrations enable Fabric customers to discover, secure, govern, and manage Fabric items from a single pane of glass within Purview for an end-to-end approach to their data estate. Learn more about these Microsoft Purview innovations.

Workload enhancements and updates

We’re also making significant updates across the six core workloads in Fabric: Data Factory, Data Engineering, Data Warehouse, Data Science, Real-Time Intelligence, and Microsoft Power BI.

Data Factory

In the Data Factory workload, built to help you solve some of the most complex data integration scenarios, we are simplifying the data ingestion experience with copy job, transforming the dataflow capability, and releasing enhancements for data pipelines. With copy job, now in preview, you can ingest data at petabyte scale, without creating a dataflow or data pipeline. Copy job supports full, batch, and incremental copy from any data sources to any data destinations. Next, we are releasing the Copilot in Fabric experience for Dataflows Gen2 into general availability—empowering everyone to design dataflows with the help of an AI-powered expert. We’re also releasing Fast Copy in Dataflows Gen2 into general availability, enabling you to ingest large amounts of data using the same high-performance backend for data movement used in Data Factory (e.g., “copy” activity in data pipelines, or copy job). Lastly for Dataflows Gen2, we are introducing incremental refresh into preview, allowing you to limit refreshes to just new or updated data to reduce refresh times.

Along with the dataflow announcements, we’re announcing an array of enhancements for data pipelines in Fabric, including the general availability of the on-premises data gateway integration, the preview of Fabric user data functions in data pipelines, the preview of invoke remote pipeline to call Azure Data Factory (ADF) and Synapse pipelines from Fabric, and a new session tag parameter for Fabric Spark notebook activity to enable high-concurrency Notebook runs. Additionally, we’ve made it easier to bring ADF pipelines into Fabric by linking your existing pipelines to your Fabric workspace. You’ll be able to fully manage your ADF factories directly from the Fabric workspace UI and convert your ADF pipelines into native Fabric pipelines with an open-source GitHub project.

Data Engineering

For the Data Engineering workload, we’re updating the native execution engine for Fabric Spark and releasing upgraded Fabric Runtime 1.3 into general availability. The native execution engine enhances Spark job performance by running queries directly on lakehouse infrastructure, achieving up to four times faster performance compared to traditional Spark based on the TPC-DS 1TB benchmark. The native execution engine can now, in preview, support Fabric Runtime 1.3, which together can further enhance the performance of Spark jobs and queries for both data engineering and data science projects. This engine has been completely rewritten to offer superior query performance across data processing; extract, transform, load (ETL); data science, and interactive queries. We are also excited to announce a new acceleration tab and UI enablement for the native execution engine.

Additionally, we are announcing an extension of support in Spark to mirrored databases, providing a consistent and convenient way to access and explore databases seamlessly with the Spark engine. You can easily add data sources, explore data, perform transformations, and join your data with other lakehouses and mirrored databases. Finally, we are excited to launch T-SQL notebooks into public preview. The T-SQL notebook enables SQL developers to author and run T-SQL code with a connected Fabric data warehouse or SQL analytics endpoint, allowing them to execute complex T-SQL queries, visualize results in real-time, and document analytical process within a single, cohesive interface.

Data Warehouse

We are excited to announce the Copilot in Fabric experience for Data Warehouse is now in preview. This AI assistant experience can help developers generate T-SQL queries for data analysis, explain and add in-line code comments for existing T-SQL queries, fix broken T-SQL code, and answer questions about general data warehousing tasks and operations. Learn more about the Copilot experience for Data Warehouse here. And as mentioned above, we are announcing T-SQL notebooks—allowing you to create a notebook item directly from the data warehouse editor in Fabric and use the rich capabilities of notebooks to run T-SQL queries.

Real-Time Intelligence

In May 2024, we launched a new workload called Real-Time Intelligence that combined Synapse Real-Time Analytics and Data Activator with a range of additional new features, currently in preview, to help organizations make better decisions with up-to-the-minute insights. We are excited to share new capabilities, all in preview, to help you better ingest, analyze, and visualize your real-time data.

First, we’re announcing the launch of the new Real-Time hub user experience; a redesigned and enhanced experience with a new left navigation, a new page called “My Streams” to create and access custom streams, and four new eventstream connectors: Azure SQL Managed Instance – change data capture (MI CDC), SQL Server on Virtual Machine – change data capture (VM CDC), Apache Kafka, and Amazon MSK Kafka. These new sources empower you to build richer, more dynamic eventstreams in Fabric. We’re also enhancing eventstream capabilities by supporting eventhouse as a new destination for your data streams. Eventhouses, equipped with KQL databases, are designed to analyze large volumes of data, particularly in scenarios that demand real-time insight and exploration.

We’re also pleased to announce an upgrade to the Copilot in Fabric experience in Real-Time Intelligence, which translates natural language into KQL, helping you better understand and explore your data stored in Eventhouse. Now, the assistant supports a conversational mode, allowing you to ask follow-up questions that build on previous queries within the chat. With the addition of multi-variate anomaly detection, it’s even easier to discover the unknowns in your high-volume, high-granularity data. You can also have Copilot create a real-time dashboard instantly based on the data in your table, providing immediate insights you can share in your organization.

Finally, we are upgrading the Data Activator experience to make it easier to define a variety of rules to act in response to changes in your data over time, and the richness of our rules have improved to include more complex time window calculations and responding to every event in a stream. You can set up alerts from all your streaming data, Power BI visuals, and real-time dashboards and now even set up alerts directly on your KQL queries. With these new enhancements, you can make sure action is taken the moment something important happens.

Learn more about all of these workload enhancements in the Fabric September 2024 Update blog.

Power BI

We’re thrilled to announce new capabilities across Power BI that will make it easier to track and use the KPIs that matter most to you, create organizational apps, and work with Direct Lake semantic models.

First, we are announcing the preview of Metric sets which will allow users to promote consistent and reliable metrics in large organizations across Fabric, making it easier for end users to discover and use standardized metrics from corporate models. With Metric sets, trusted creators within an organization can develop standardized metrics, which incorporate essential business logic from Power BI. These creators can organize the metrics into collections, promote and certify them, and make them easily discoverable for end users and other creators. These endorsed and promoted metrics can then be used to build Power BI reports, improving data quality across the organization, and can also be reused in other Fabric solutions, such as notebooks.

We’re improving organizational apps in Power BI, a key tool for packaging and securely distributing Power BI reports to your organization. Now in preview, you can create multiple organizational apps in each workspace, and they can contain other Fabric items like notebooks and real-time dashboards. The app interface can even be customized, giving you more control over the color, navigation style, and landing experience.

We’re also making it easier to work with Direct Lake semantic models with new version history for semantic models, similar to the experience found across the Microsoft 365 apps. Power BI users can also now live edit Direct Lake semantic models right from Power BI Desktop. And we’re excited to announce a capability widely asked for by Power BI users: a dark mode in Power BI Desktop.

Finally, we’re announcing the general availability of OneLake integration for semantic models in Import mode. OneLake integration automatically writes data imported into your semantic models to Delta Lake tables in OneLake so that you can enjoy the benefits of Fabric without any migration effort. Once added to a lakehouse in OneLake, you can use T-SQL, Python, Scala, PySpark, Spark SQL, or R on these Delta tables to consume this data and add business value. All of this value comes at no additional cost as data stored in OneLake for Power BI import semantic models is included in the price of your Power BI licensing.

Learn more about the Power BI announcements in the Power BI September 2024 Feature blog. Also see the AI-powered insights section below for new Copilot experiences for Power BI creators and consumers.

AI-powered data estate

With OneLake, Fabric’s unified data lake, you can create a truly AI-powered data estate to fuel your AI innovation and data culture. OneLake’s shortcuts and mirroring capabilities enable you to access your entire multi-cloud data estate from a single, intuitively organized data lake. With your data in OneLake, you can then work from a single copy across analytics engines, whether you are using Spark, T-SQL, KQL, or Analysis Services and even access that data from other apps like Microsoft Excel or Teams. Today, we are thrilled to share even more capabilities and enhancements coming to OneLake that can help you better connect to and manage your data estate.

One of the biggest benefits of OneLake is the ability to create shortcuts to your data sources, which virtualizes data in OneLake without moving or duplicating it. We are pleased to announce that shortcuts for Google Cloud Services (GCS) and S3-compatible sources are now generally available. These shortcuts also support the on-premise data gateway, which you can use to connect to your on-premise S3 compatible sources as well as GCS buckets that are protected by a virtual private cloud. We’ve also made enhancements to the REST APIs for OneLake shortcuts, including adding support for all current shortcut types and introducing a new list operation. With these improvements, you can programmatically create and manage your OneLake shortcuts.

We’re also excited to announce further integration with Azure Databricks with the ability to access Databricks Unity Catalog tables directly from OneLake—now in preview. Users can just provide the Azure Databricks workspace URL and select the catalog, and Fabric creates a shortcut for every table in the selected catalog, keeping the data in sync in near real-time. Once your Azure Databricks Catalog item is created, it behaves the same as any other item in Fabric, so you can access the table through SQL endpoints, notebooks, or Direct Lake mode for Power BI reports. Learn more about the OneLake shortcut and Azure Databricks announcements in the Fabric September 2024 Updates blog.

At Microsoft Build last May, we announced an expanded partnership with Snowflake that gives our customers the flexibility to easily connect and work across our tools. Today, I’m excited to share progress on this partnership with the upcoming preview of shortcuts to Iceberg tables. In the coming weeks, Microsoft Fabric engines will be able to consume Iceberg data with no movement or duplication using OneLake shortcuts. Simply point to an Iceberg dataset from Snowflake or another Iceberg-compatible service, and OneLake virtualizes the table as a Delta Lake table for broad compatibility across Fabric engines. This means you can work with a single copy of your data across Snowflake and Fabric. With the ability to write Iceberg data to OneLake from Snowflake, Snowflake customers will have the flexibility to store Iceberg data in OneLake and use it across Fabric.

Finally, we’ve released mirroring support for Snowflake databases into general availability—providing a seamless, no-ETL experience for integrating existing Snowflake data with the rest of your data in Microsoft Fabric. With this capability, you can continuously replicate Snowflake data directly into Fabric OneLake in near real-time, while maintaining strong performance on your transactional workloads. Learn more about Snowflake mirroring in Fabric.

AI-powered insights

With your data teams using the AI-enhanced tools in Fabric to accelerate development of insights across your data estate, you then need to ensure these insights reach those who can use them to inform decisions. With easy-to-understand Power BI reports and AI-powered Q&A experiences, Fabric bridges the gap between data and business results to help you foster a culture that empowers everyone to find data-driven answers.

We’re announcing a richer Copilot experience in Power BI to help create reports in a clearer, more transparent way. This new experience, now in preview, includes improved conversational abilities between you and Copilot that makes it easier to provide more context to Copilot initially so you can get the report you need on the first try. Copilot will even provide report outlines to improve transparency on data fields being used. We are also releasing the ability to auto-generate descriptions for measures into general availability. Lastly, report viewers can now use Copilot to summarize a report or page right from the Power BI mobile app, now in preview.

We’re also enhancing email subscriptions for reports by extending dynamic per recipient subscriptions to include both paginated and Power BI reports. With dynamic subscriptions, you can set up a single email subscription that delivers customized reports to each recipient based on the data in the semantic model. For reports that are too large for email format, we are also giving you the ability to deliver Power BI and paginated report subscriptions to a OneDrive or SharePoint location for easy access. Finally, you can now create print-ready, parameterized paginated reports using the Get Data experience in Power BI Report Builder—accessing over 100 data sources.

Learn more about all of the Power BI announcements in the Power BI September 2024 Feature blog.

Start building your Fabric skills

We are grateful so many of you have decided to grow your skills with Microsoft Fabric. In the past six months alone, more than 17,000 individuals have earned the Fabric Analytics Engineer Associate certification, making it the fastest growing certification in Microsoft’s history. Today, we’re excited to announce a brand-new certification for data engineers coming in late October. The new Microsoft Certified: Fabric Data Engineer Associate certification will help you prove your skills with data ingestion, transformation, administration, monitoring, and performance optimization in Fabric.

Our portfolio of Microsoft Credentials for Fabric also includes four Microsoft Applied Skills, which are a complement to Microsoft certifications and free of cost. Applied Skills test your ability to complete a real-world scenario in a lab environment and provide you with formal credentials that showcase your technical skills to employers. For Fabric, we have Applied Skills credentials covering implementing lakehouses, data warehouses, data science and real-time intelligence solutions.

Visit the Fabric Career Hub to get the best free resources to help you get certified and the latest certification exam discounts. Don’t forget to also join the vibrant Fabric community to connect with like-minded data professionals, get all your Fabric technical questions answered, and stay current on the latest product updates, training programs, events, and more.

And if you want to test your skills, explore Fabric, and win prizes, you can also register for the Microsoft Fabric and AI Learning Hackathon. To learn more, you can join our Ask Me Anything event on October 8.

Join us at Microsoft Ignite

We are excited to bring even more innovation to the Microsoft Fabric platform at Microsoft Ignite this year. Join us from November 19 through November 21, 2024 either in person in Chicago or online. You will see firsthand the latest solutions and capabilities across all of Microsoft and connect with experts, community leaders, and partners who can help you modernize and manage your own intelligent apps, safeguard your business and data, accelerate productivity, and so much more.

Explore additional resources for Microsoft Fabric

If you want to learn more about Microsoft Fabric: 

Sign up for the Microsoft Fabric free trial.
Register for the upcoming Microsoft Data and Analytics Forum.
View the updated Microsoft Fabric Roadmap.
Visit the Microsoft Fabric website.
Join the Fabric community.
Read other in-depth, technical blogs on the Microsoft Fabric Updates Blog.
Read additional blogs by industry-leading partners:
- Plug and play Databricks Unity Catalog in the Microsoft Fabric data platform and Why Microsoft Fabric is critical for AI-driven enterprises, by Sonata Software, a leading Modernization Engineering company.
- Citizen developers can scale work with IT pros like never before with Microsoft Fabric, by iLink Digital, a global provider of next-generation digital transformative solutions.
- Significance of version history for Direct Lake semantic models, by Softcrylic, now part of Hexaware Technologies, a global technology and business process services company.
Explore the Fabric technical documentation.

The post European Fabric Community Conference 2024: Building an AI-powered data platform appeared first on Microsoft Fabric Blog.

How Azure Maps can help you unlock location intelligence in Power BI

Wangui McKelvey — Thu, 12 Sep 2024 15:00:00 +0000

As organizations realize the value they can unlock from analyzing location data, adoption of location analytics has been surging. In Forrester’s 2023 Data and Analytics Survey, nearly 90% of data and analytics decision-makers said that their organization is currently focusing on building location intelligence (LI) capabilities or will do so in the next two years. A few of the challenges organizations face as they try to adopt location intelligence include:

Lack of skills or expertise: Organizations require more practitioners who can understand and effectively leverage location data for business insights.

Data integration: Incorporating location data with other business data is critical for extracting meaningful insights and can be challenging due to disparate tools.

Data quality and standardization: Organizations must ensure that all location data that they use is accurate and collected while adhering to all data privacy and other regulations.

Azure Maps Power BI visual

Learn more.

To help our customers easily access the benefits of location intelligence, we introduced the Azure Maps visual in Fabric, specifically to Power BI, in 2023. Azure Maps enables users to visualize and analyze their location data on maps to uncover patterns and trends. Using it is as simple as enabling the Azure Maps Power BI visual, selecting the Azure Maps icon from the visualizations pane in Power BI, and then diving in to enhance your dashboard with location intelligence. Azure Maps in Power BI comes with a range of capabilities to help you better optimize your data including geocoding, reference layers, traffic layers, and more.

Azure Maps for Power BI

Build intelligent location-based experiences for applications across many different industries

Learn more

Geocoding is the most fundamental step in location intelligence and entails converting addresses into latitude and longitude points. For example, a retail store can use the geocoding capability to visualize where their customers are located and determine which postal code, state, and or country has the maximum concentration of their customers. In Power BI, geocoding comes to life when a report creator drags their address data into the “Location” field in the format pane. Addresses are then automatically converted into geographic coordinates and placed on a map. Alternatively, if you already have georeferenced data, then you can drag data directly to the “Latitude” and “Longitude” fields to see it on a map. Currently, the Azure Maps visual supports geocoding for up to 30,000 data points in a single visual. Learn more about geocoding here.

Reference layer is another capability offered by Azure Maps that enables Power BI report creators to add an additional data layer on top of their maps to gain deeper location-based insights. For example, an insurance company can overlay natural disasters data such as historic wildfires or flood occurrences on top of their clients’ locations to understand how risk-prone their clients are and then use those insights to determine the insurance rates for those customers. Azure Maps enables users to incorporate data from multiple sources, including: GeoJSON, KML, WKT, or SHP formats. In the July 2024 update for Power BI, we also introduced data source support for CSV files and dynamic URLs Learn more about reference layers here.

Traffic layers enables Power BI report creators to overlay real-time traffic data on maps within Power BI by turning the traffic layer slider in the format pane on. By selecting “Show Incidents,” you can visualize traffic flow and incidents such as road closures and construction on the map. Logistics companies often overlay their telematics data and then use traffic layers to optimize their routes. They can determine where potential delays in deliveries can occur or where incidents such as harsh breaking are likely to take place. Learn more about traffic layers here.

The Range Selection capability enables means report creators to can place a pin on a map and define a search area either by time or distance. For example, a coffee chain trying to open a new location in the downtown Seattle area can use the “Distance” metric to determine how many other competitor coffee shops are within a two mile radius of the city-center. Alternatively, the coffee chain can use the “Time” metric to determine how much time it will take for their target customers to reach the coffee shop from a given location. With these location-based insights, the coffee chain can make informed decisions ranging from determining where to open the next store, to analyzing store performance, or to planning for inventory to meet customer demands. Learn more about range selection here.

Azure Maps visual also includes other data visualization tools that help report creators in impactful storytelling. Heat Maps, also known as Density Maps, are a type of overlay on a map used to represent the density of data using different colors. Heat maps are often used to show the data “hot spots” on a map. Heat maps are a great way to render datasets with large number of points. For example, a heat map can be created to identify the frequency with which customers visit shopping centers in different locations. Learn more about heat maps here.

Learn more about the Azure Maps visual in Fabric

We are constantly updating the Azure Maps feature to improve location intelligence within Power BI. Upcoming updates for Azure Maps visual will include more location data visualization and intelligence capabilities. As businesses continue to embrace data-driven decision-making, the combination of Azure Maps and Microsoft Fabric will play a crucial role in unlocking the full potential of location data.

To learn more about Azure Maps visual and to get started, visit our documentation page.

The post How Azure Maps can help you unlock location intelligence in Power BI appeared first on Microsoft Fabric Blog.

Microsoft Fabric, explained for existing Synapse users

Bogdan Crivat — Mon, 20 Nov 2023 16:31:04 +0000

Earlier this year, at Microsoft Build, we introduced, in Public Preview, Microsoft Fabric, “the biggest data product announcement since SQL Server”. Today, we are announcing the General Availability of Microsoft Fabric.

Arun explains in detail why we all believe Microsoft Fabric will redefine the current analytics landscape. I will focus here on what it means for customers that are using the current Platform-as-a-Service (PaaS) version of Synapse, explaining what it means for your current investments (spoiler: we fully support them), but also how to think about the future.

What happens with PaaS Azure Synapse Analytics

The PaaS offering of Azure Synapse Analytics is an enterprise analytics service designed to accelerate time to insight across data warehouses and big data systems. It brings together the SQL technologies used in enterprise data warehousing, Azure Data Factory pipelines, Apache Spark technologies for big data, and Azure Data Explorer for log and time series analytics.

Microsoft has no current plans to retire Azure Synapse Analytics. Customers can continue to deploy, operate, and expand the PaaS offering of Azure Synapse Analytics. Rest assured, should these plans change, Microsoft will provide you with advanced notice and will adhere to the support commitments in our Modern Lifecycle Policy in order to ensure our customers’ needs are met.

The evolution of Microsoft’s big data analytics products

The next versions of our big data analytics products are now a core part of Microsoft Fabric.

Fabric opens new architectural horizons for our analytical engines. Fabric offers a unified storage abstraction for all your data, OneLake, organized into a logical data mesh, with federated governance and granular control and an intuitive, personalized data hub. All Fabric engines separate storage from compute, and store data in OneLake using a single, open data format.

On this new foundation, we can invent new, unprecedented ways of deploying pipelines, data warehousing, data engineering, data science, observability and real-time analytics technologies, to ultimately simplify and increase the efficiency of our customers’ solutions. Fabric allows us, and our customers, to do more. This is why most of our innovation efforts will be focused on Fabric.

How to think about your current Azure PaaS Synapse Analytics solutions

As mentioned above, there is no immediate need to change anything, as the current platform is fully supported by Microsoft. Your existing solutions will keep working. Your in-progress deployments can continue, all with our full support.

However, you probably have already started thinking about a Microsoft Fabric future for your analytics solutions. The following steps may help you with this thought process.

Understand Microsoft Fabric

Microsoft Fabric represents a significant upgrade to all our analytics engines. All of them are improved, faster, and more scalable. And there is a lot to learn about the new engines and how to best use them. Fabric reimagines collaboration and empowers the business users in an unprecedented way. But it is much more than just better engines or just better integration.

The unified, open-source data format means that there is no need to copy data from one engine to another. You can shape data using the technology of your choice, then query it with any other technology.

Fabric introduces completely new ways to make your data part of your analytics landscape. Shortcuts (within Azure, or cross cloud), database mirroring, seamless access to Dataverse and M365 data, all these solutions are designed to remove friction and costs.

Understanding these technologies will enable you to make the best out of Fabric, in terms of efficiency, agility and costs.

Our teams have worked hard to produce detailed documentation for all the Fabric concepts, and the best complement for the documentation is hands-on experience. The easiest way to understand Fabric in depth is to try the product: Microsoft Fabric free trial . Arun’s blog spells out clearly how to learn more about Microsoft Fabric.

Understand what it means for your solution

Your analytics solution may use different technologies and engines. Fabric is a complete analytics platform, so you will find, inside Microsoft Fabric, new and enhanced analytics capabilities of the products with which you are familiar today.

Fabric brings new capabilities, that have no parallel with the current PaaS Synapse Analytics offering. The Fabric SQL Engine can operate, with equal performance, scale, and security, over any OneLake artifact (warehouses, lakehouses, mirrored databases). It also supports cross-artifact operations removing the need for extra copies while Power BI, for example, in DirectLake mode, can now analyze real time streaming data, or Spark output.

All these changes enable simpler, more efficient solutions, removing the need for intermediate steps and multiple data copies. Your solution can get significantly simpler and cheaper.

Below, I use one example of common PaaS Azure Synapse Analytics architectures, together with a possibly more efficient solution in Fabric, to demonstrate such potential simplifications.

Example 1: Data Lake, from Synapse to Fabric

Today, you may prepare your data in an Azure Data Lake Storage Gen2 (ADLSg2) lakehouse (typically using Spark, Synapse or Azure Databricks), then use a pipeline to load data into a Synapse SQL Dedicated Pool, then use Power BI or some other BI tool for your report.

You can keep your current solution intact, and upgrade to Fabric engines.

In Fabric, however, this solution can be simplified:

A Data Engineering Lakehouse, in Microsoft Fabric, allows you to use your current ADLSg2 data, as prepared with Synapse Spark or Azure Datrabricks (via shortcuts).
The SQL Analytics Endpoint allows you to apply the security rules from the Dedicated Pool directly over the Lakehouse. There is no need for a dedicated capacity, nor for the pipeline copying from the lake to your warehouse.
Using the new DirectLake mode, Power BI can now operate directly over the Lakehouse, with performance similar to Import. Your other BI tools can continue to operate over the SQL Analytics Endpoint.
By migrating your Notebooks and Spark Jobs to Fabric Spark, your Lakehouse data will be automatically optimized for all the other Fabric engines (while also being stored in an open format)

To learn more about the Lakehouse pattern in Microsoft Fabric, please visit Lakehouse end-to-end scenario: overview and architecture – Microsoft Fabric | Microsoft Learn

Assess our migration tools and processes

We are investing significant development efforts in migration processes and tooling. And our migration efforts are prioritizing current PaaS Synapse Analytics customers.

The processes and tools we are designing are intended to minimize the friction, disruption and cost for our existing customers.

As you will see in the section on Migration Resources, we are developing tools to:

Use your data in-place whenever possible
Reuse code investments (pipelines, notebooks) when possible
Migrate code (stored procedures, views, notebooks)

These investments are not complete. We will keep posting updates to our migration tools. Join the fast-growing Fabric community , and our specialists as well as external experts will be ready to work with you. The Fabric Ideas forum, on the community site, is the best way to suggest new features, and it is closely monitored by the Microsoft Fabric product teams.

Develop, then plan to deploy a migration strategy

After having learned about Fabric and evaluating the product, you will have developed enough confidence in the new Fabric engines and the migration technology to move your solution to Fabric. For some of you this may happen soon, for others it may take years.

There is no rush – we will keep supporting your existing solutions – but we are ready for you to migrate whenever the time is right.

When you are ready to move your solution to Fabric, you will be able to exchange your existing 1- or 3-year Synapse Reserved Instance (RI) purchases for 1 year Fabric RI purchases to continue to apply your reservation discounts in Fabric. Additionally, if you want to increase your RI commitment for your Fabric portfolio you will have access to discounts of >40% over the Fabric Pay-as-you-go pricing.

In the next sections, the product leaders explain how to think about Fabric from the perspective of different PaaS Synapse Analytics workloads.

Data Factory Pipelines

Data Factory in Microsoft Fabric brings Power Query and Azure Data Factory together into a modern trusted data integration experience, that empowers data and business professionals to extract, load, and transform data for their organization. In addition, powerful data orchestration capabilities enable you to build simple to complex data workflows, that orchestrate the steps needed for your data integration needs.

Key concepts in Data Factory in Microsoft Fabric include:

Get Data and Transformation with Dataflow Generation 2 is an evolution of Dataflow in Power BI. Dataflow Generation 2 is re-architected to leverage Fabric compute engines for data processing and transformation. This enables Dataflow Generation 2 to ingest and transform data at any scale.
Data Orchestration with Data Pipelines – For customers familiar with Azure Data Factory (ADF), data pipelines in Microsoft Fabric use the same technology that powers Azure Data Factory. As part of the GA of Fabric, data pipelines in Microsoft Fabric will have most of the activities available in ADF.See here a list of activities that will be part of data pipelines in Fabric. SSIS activity will be added to data pipelines by Q2 CY2024.
Enterprise-ready Data Movement – Whether it is petabyte-scale data to small data, Data Factory provides a serverless and intelligent data movement platform that enables you to move data between diverse data sources and data destinations reliably. With support for 170+ connectors, Data Factory in Fabric enables you to move data between multi-clouds, data sources on-premises, and within virtual networks (VNet). Intelligent throughput optimization enables the data movement platform to automatically detect the size of the compute needed for data movement.

To enable customers to upgrade to Microsoft Fabric from Azure Data Factory (ADF), we will be supporting the following:

Data pipelines activities – For many of the activities that you use in ADF, we have added these into Data Factory in Fabric. In addition, we have added new activities (e.g. Teams, Outlook) for notifications. See here for a list of activities that are available in Data Factory in Fabric.
OneLake/Lakehouse connector in Azure Data Factory – For many ADF customers, you can now integrate with Microsoft Fabric, and bring data into the Fabric Onelake
Azure Data Factory Mapping Dataflow to Fabric – We have put together a guide for ADF customers who are looking at building new data transformations in Fabric.Find out more at https://aka.ms/datafactoryfabric/docs/guideformappingdataflowusers

In addition, customers looking at migrating their ADF mapping dataflows to Fabric, you can leverage sample code from the Fabric Customer Advisory Team (Fabric CAT) to convert mapping dataflows to Spark code. Find out more at https://github.com/sethiaarun/mapping-data-flow-to-spark

As part of Data Factory in Fabric roadmap, we will be working towards the preview of the following by Q2 CY2024:

Mounting of Azure Data Factory in Fabric – This enables customers to be able to mount their existing Azure Data Factory in Microsoft Fabric. All ADF pipelines will work as-it-is, and continue running on Azure, while enabling you to explore Fabric, and work out an upgrade plan.
Upgrade from Azure Data Factory pipelines to Fabric – We will be working with customers and the community on learning how we can best support upgrades of data pipelines from ADF to Fabric. As part of this, we will deliver an upgrade experience that empowers you to test your existing data pipelines in Fabric using mounting and upgrading the data pipelines.

Learn more about how you can upgrade to Data Factory in Fabric – https://aka.ms/datafactoryfabric/upgradetofabric

Synapse Data Warehouse

Fabric Data Warehouse is the next generation of data warehousing in Microsoft Fabric. It is the first transactional data warehouse to natively support an open data format enabling data engineers and business users to collaborate seamlessly without compromising security or governance. Just like the previous data warehouse generation, SQL provides multi-table ACID transactional guarantees. It is built on the well-established SQL Server Query Optimizer and Distributed Query Processing engine but comes with major improvements that address many of the challenges customers face in enabling workloads associated with modern analytics. These improvements were driven by rearchitecting the data warehouse by leveraging IP from both Dedicated and Serverless SQL Pools along with:

Separation of storage and compute: data is stored in OneLake and is clearly separated from the compute used by the SQL engine. There is an elastic allocation of compute resources based on demand, as well as use of distinct compute resources for different workload types on top of the same data.
Leveraging the infinite compute capabilities of Azure Cloud: giving us the capability of going beyond a limited topology offered by the Synapse Gen2 architecture.
Support for open data format: allowing a single copy of the data to be used by all the Fabric workloads such as Data Science, Data Engineering, and Power BI.

With this new architecture, the new engine enables numerous new capabilities that were not possible in either Dedicated and Serverless SQL Pools such as:

Cross database querying without any ETL or data movement.
Cloning without creating copies of the data.
Autoscaling enabling elastic scale up and down of the compute nodes with dynamic resource allocation tailored to data volume, usage, or query complexity.
Enabling a pay for what you use pricing model.
No knobs performance via automated query optimizations, statistics, and data distributions.

All of this with the concepts familiar to SQL users such as Views, Stored Procedures, SQL security (row-level security, column-level security, dynamic data masking) and full benefits of the T-SQL tooling ecosystem.

These architectural changes cannot be backported to either one of the old engines. Because of the open format, your data warehouses cannot be upgraded in place either. Data stored in a proprietary format in Gen2 needs to be extracted and stored in the open format of Fabric.

A migration can be done at your own pace when you are ready to leverage these new capabilities. To enable this, we have added the following available to you now:

Ability to export your Dedicated SQL Pool to a SQL Project and import it in Fabric.
PowerShell scripts are available in GitHub that convert Gen2 DDL to Fabric supported DDL.
Detailed migration documents with best practices. Find out more at the Azure Synapse dedicated SQL pools to Fabric Migration Guidance whitepaper

In addition, we have also started working on an in-product Migration Assistant that will automatically detect and convert your Synapse Gen2 code to Fabric Data Warehouse code. It will also redirect your endpoints, so you don’t have to worry about application migration. We anticipate this to be available in CY24.

Synapse Data Engineering

Fabric Data Engineering is our big data analytics workload in Fabric, empowering data engineers to leverage the power of Apache Spark to transform their data at scale and build out a lakehouse architecture. The Fabric Data Engineering experience targets users of Apache Spark pools in the Azure Synapse Analytics world. Here are some of the key takeaways regarding the Fabric Data Engineering experience:

Runtime for big data workloads

Every Fabric workspace comes pre-wired with a ‘starter pool’ (default Spark cluster) with a Fabric Runtime that contains up to date versions of Spark, Delta, Java and Python. Just like in Azure Synapse Analytics, customers can also create their own custom clusters with their own configurations and libraries if they want.

The Apache Spark experience in Fabric also contains many new and exciting enhancements:

Starter pools in Fabric are automatically kept live meaning users can enjoy sessions that start within ~15 seconds
High concurrency mode in Fabric means multiple notebooks can be attached to a single session, accelerating the start-up times and reducing costs
Spark clusters start all the way from a single node, further reducing the costs of getting started with Spark

Simplified lakehouse architecture

Every Fabric workspace also comes pre-wired with OneLake, our SaaSified data lake for the organization. Users can easily create lakehouse items, which are the perfect container for bringing in all your data into OneLake using Spark, dataflows and pipelines. Existing data can be easily included with no data movement through the use of shortcuts. We will also automatically discover metadata of Delta tables for you, making it super easy to start working with existing data with zero friction. Additionally, we have reduced the price for Spark in Fabric by almost 40% vs. the retail price of Synapse Spark.

Here are some other exciting things to keep in mind about the lakehouse in Fabric:

Every lakehouse comes with a built in SQL endpoint and Power BI dataset. This means that as soon as you transform your data with Spark, you can start querying it using our SQL engine and Power BI, with no data movement necessary
Spark (along with every other Fabric engine) will automatically write the data into the lakehouse with v-order enabled, automatically optimizing it for BI reporting

First Class Developer Experiences

The Synapse Data Engineering experience brings in familiar authoring tools, including notebooks for interactive querying experiences and Spark Job Definitions for submitting batch jobs. These capabilities come with a variety of new enhancements and users even have some new authoring experiences to look forward to:

Notebooks in Fabric include numerous usability improvements including auto-save, real time collaboration and commenting, a built-in file system as well as native file format support when checking into git. Users can also make use of light-weight scheduling (in addition to using the pipeline activity).
Spark Job Definitions come with retry policy support, making it easier to continuously run long running streaming jobs
Native VS Code support makes it easy to work with your Data Engineering items (notebooks, Spark Jobs, lakehouse) all in your favorite IDE, including full debugging support
The newly released environment item streamlines the packaging of all of your Spark configurations, libraries, cluster settings and more, and simplifies the re-usability of your hardware and software environment across your code artifacts.

To summarize, with Synapse Data Engineering, you can start building on top of your existing Azure Synapse Spark investments quickly and incrementally. Start by leveraging shortcuts to existing data in your data lake and bringing-in your notebooks using the import capability. We are starting work on an in-product migration assistant but in the meantime, please use our newly published Azure Synapse Spark to Fabric Migration Guidance whitepaper.

Synapse Data Science

Synapse Data Science empowers data scientists to explore their data, build and operationalize their predictive models. Coming from the Azure Synapse Analytics world, you will see many familiar constructs such as Python and R being baked into the runtime including many popular ML packages, the ability to install your own third party & custom libraries as well as the availability of SynapseML, our open source library for creating massively scalable ML pipelines.

Fabric Data Science offers a variety of new capabilities data scientists can look forward to:

Model & Experiment tracking

Data scientists are able to leverage experiments and models as readily available in items in the Fabric workspace. Support for ML models and experiments allows users to manage models and track experiment runs using standard MLFLow APIs. Comparison experiences make it easy to compare different experiment runs and auto logging helps capture key metrics automatically as users author code to train models.

Model batch scoring

To operationalize their ML models, users can leverage the scalable PREDICT function for distributed batch scoring on Spark. This capability exists in Azure Synapse today and so existing Synapse users should feel right at home. The Fabric Data Science experience provides low code UI for scoring data and tight integration with the lakehouse, making it easy to enrich data and surface it in Power BI reports with zero friction.

Data Exploration & Enrichments

Fabric Data Science offers many innovative solutions in the space of exploring and transforming your data. These include:

Data Wrangler – a low code UI for carrying out data transformations that automatically generate Python code
Semantic Link – a library enabling seamless connectivity to the Power BI semantic model through data science tools like notebooks
Pre-built AI models – newly released public preview capability providing built-in access to Azure AI services like text analytics and translation services

The migration path for a data scientist in Azure Synapse Analytics is like that of a Spark data engineer – they will need to consider their notebooks, Spark pools and data. We recommend starting with the Azure Synapse Spark to Fabric Migration Guidance whitepaper.

Synapse Real-time Analytics

Synapse Real-time Analytics is a robust platform tailored to deliver real-time data insights and observability analytics capabilities for a wide range of data types. This includes observability time-based data like logs, events, and telemetry data. It’s the true streaming experience in Fabric! Building on the same foundation as Azure Synapse Data Explorer, Synapse Real-time Analytics equips both citizen data scientists and professional data engineers with a suite of features and tools to fully unleash the potential of their data.

Rapid Deployment

Experience unmatched efficiency by creating a database, ingesting data, running queries, and generating Power BI reports, all within a 5-minute timeframe. Real-time Analytics puts speed at the forefront, allowing you to dive into data analysis without delay.

Get Data

For an authentic streaming experience in Fabric, the “Get Data” feature has received a modern facelift with an intuitive design and user-friendly interface. It simplifies data ingestion, accepting any data format or structure from various sources in either streaming or batch mode. Your data becomes query able within seconds.

Query Versatility

Whether you’re a Kusto Query Language (KQL) enthusiast or prefer traditional SQL, Real-time Analytics accommodates your needs. This service enables you to generate quick KQL or SQL queries, ensuring that you can work in your preferred language and obtain results swiftly. It doesn’t matter if you’re working with a small dataset (a few gigabytes), a medium-sized one (a few terabytes), or even massive datasets (in the petabytes range).

Data Exploration

Fabric Real-Time Analytics offers a multitude of innovative solutions for exploring and visualizing your data, including:

KQL Queryset: A workbench for creating, managing, and sharing your queries.

Power BI Report: A one-click option to generate a Power BI report on top of any query or table.

Notebook: Seamlessly connect your Fabric Notebook with the KQL Database for data ingestion and querying.

NL2KQL (Coming Soon): Write your query in natural language, and Fabric will generate and execute the corresponding KQL query for you.

Real-Time Dashboard (Coming Soon): The Fabric Real-Time Dashboard is a collection of tiles that enable native export of Kusto Query Language (KQL) queries as visuals. This allows for easy query modification and visual formatting, enhancing data exploration and delivering superior query and visualization performance.

Fabric Real-Time Analytics is your gateway to real-time insights and a streamlined data analysis experience. Whether you’re pioneering new data horizons or looking to optimize your data analytics solutions, this service is your trusted partner. Stay ahead in the data game and embark on your journey with Fabric Real-Time Analytics today.

For more information on Fabric Real-Time Analytics, visit the general availability blog.

Migration planning

Fabric KQL databases are 100% compliant with Azure Data Explorer (ADX) and Azure Synapse Data Explorer (Preview) and our powered by the same technology. It means that all current applications, SDK, integrations, and tools that work with ADX will continue to work smoothly with Fabric KQL databases.

There is a broad set of capabilities to support mixed environments and migrations, some are available now and some will light up in the next months.

Available now:
- Full binary compatibility of APIs, SDKs and tools.
- Create a database shortcut to host a read only, in place, up to date instance of the database in Fabric.
Coming over the next months:
- Migrate an Azure Synapse Data Explorer pool from a Synapse workspace and attach it to a Fabric workspace
- Attach an Azure Data Explorer cluster to a Fabric workspace
- Sync Azure Data Explorer user queries and dashboards into a Fabric workspace query sets and dashboards

Migration Resources

Azure Data Factory

Azure Synapse DW to Fabric Migration Guidance

Azure Synapse Spark to Fabric Migration Guidance

Azure Synapse Data Explorer and Azure Data Explorer to Fabric Migration Guidance

The post Microsoft Fabric, explained for existing Synapse users appeared first on Microsoft Fabric Blog.