Data analytics - Microsoft SQL Server Blog http://approjects.co.za/?big=en-us/sql-server/blog/topic/data-analytics/ Official News from Microsoft’s Information Platform Thu, 19 Mar 2026 23:32:24 +0000 en-US hourly 1 http://approjects.co.za/?big=en-us/sql-server/blog/wp-content/uploads/2018/08/cropped-cropped-microsoft_logo_element-150x150.png Data analytics - Microsoft SQL Server Blog http://approjects.co.za/?big=en-us/sql-server/blog/topic/data-analytics/ 32 32 Announcing SQL Server 2025 (preview): The AI-ready enterprise database from ground to cloud http://approjects.co.za/?big=en-us/sql-server/blog/2025/05/19/announcing-sql-server-2025-preview-the-ai-ready-enterprise-database-from-ground-to-cloud/ Mon, 19 May 2025 16:00:00 +0000 Announcing SQL Server 2025—empowering customers to develop modern AI applications securely using their data, complete with best-in-class security, performance, and availability.

The post Announcing SQL Server 2025 (preview): The AI-ready enterprise database from ground to cloud appeared first on Microsoft SQL Server Blog.

]]>
Organizations are using generative AI to stay ahead of the competition, but the real advantage lies in harnessing the power of your own data securely and at scale.

SQL Server 2025, now in public preview, empowers customers to develop modern AI applications securely using their data, complete with best-in-class security, performance, and availability. It provides built-in, extensible AI capabilities, enhanced developer productivity, and seamless integration with Microsoft Azure and Microsoft Fabric—all within the SQL Server engine using familiar T-SQL language. 

Your data, any model, anywhere 

One of the most exciting new capabilities of SQL Server 2025 is the integration of AI directly into the database engine, enabling more intelligent search. With built-in vector search capabilities, you can perform semantic searches over your own data to find matches based on similarity, alongside full text search and filtering you are already using in SQL Server. This built-in capability opens up a host of exciting new use cases such as discovering deeper connections within large datasets, providing a natural conversational experience across various enterprise systems.  

SQL Server 2025 introduces enhanced model management by building model definitions directly into T-SQL, enabling seamless integration with popular AI services such as Azure AI Foundry, Azure OpenAI Service, OpenAI, Ollama, and others. Models are all accessed through REST APIs allowing you to deploy any model securely isolated from the SQL Server engine, anywhere from ground to cloud. As developers test embedding models to find the best fit for their use cases, whether running open-source models on laptops or hosting purpose-trained models, SQL Server 2025 makes it convenient to switch models without needing to change the code. 

This release also provides other essential building blocks for AI development and operational retrieval-augmented generation (RAG) patterns powered by AI agents. It includes vector embedding generation and text chunking built into T-SQL, using Disk Approximate Nearest Neighbor (DiskANN) as a vector index for faster, resource-efficient, and accurate results. Additionally, SQL Server 2025 offers seamless integration with popular AI frameworks like LangChain, Semantic Kernel, and Entity Framework Core. 

“With the new semantic search and RAG capabilities in SQL Server 2025, we can empower existing GenAI solutions with data embeddings to create next-generation, more intelligent AI applications. By connecting systems, we deliver a seamless, natural conversational experience across enterprise environments.”

—Markus Angenendt, Data Platform Infrastructure Lead, Kramer & Crew  

Microsoft’s most significant release for SQL developers in the last decade 

We understand that developers need the right tools and interfaces for modern, data-intensive applications. SQL Server 2025 delivers a rich set of feature enhancements that significantly streamline development process, reduce code complexity, and improve developer productivity. Along with built-in AI capabilities, this release makes SQL Server 2025 the most significant release for SQL developers since the introduction of SQL Server 2016 a decade ago.  

Enhancing data enrichment is our first area of focus in this release. SQL Server 2025 offers native JSON support, empowering developers to process JSON documents natively. Combined with REST APIs and Regular Expressions (RegEx) enablement, developers can now enrich, validate, and manipulate their datasets with external data sources. This allows for building more dynamic, enterprise-grade applications with richer functionality and enhanced performance. 

Empowering developers to build real-time, event-driven applications with SQL Server is another scenario that this release unlocks. Change Event Streaming allows users to consume transaction log changes as events directly from SQL Server to Microsoft Azure Event Hubs. This provides a new method to mitigate some of the issues developers have seen with the Input/Output (I/O) overhead of Change Data Capture (CDC). It also opens new possibilities such as developing real-time, event-driven applications powered by AI agents. 

There’s also excitement on the language and tooling front. We’re thrilled to announce the preview of our new open-source Python driver for SQL Server.1 Built from the ground up, this driver offers Python developers a robust, efficient, and fully open-source solution for connecting to SQL Server, as simple as pip install. In addition, we are bringing AI-powered assistance directly into your workflow with the integration of MSSQL Extension for Visual Studio Code with GitHub Copilot, now in preview. With GitHub Copilot aware of your SQL Server database connection, developers can generate SQL and object-relational mapping (ORM) migrations, explore schemas, optimize queries with intelligent suggestions, and streamline database interactions—all within in Visual Studio Code.  

“I am genuinely enthusiastic about the AI advancements in SQL Server 2025. These features, along with the enhancements in RegEx and JSON data support, promise to make AI functionalities accessible to a broader range of software applications, and significantly enhance our database operations.” 

—Jacob Saugmann, SQL Specialist, J.H. Schultz Information A/S

Best-in-class security, performance, and availability 

This release builds on SQL Server’s history as an industry leader in database security, performance, and availability. For our enterprise customers, security is non-negotiable. SQL Server remains as the most secure database in the last decade.2 SQL Server 2025 continues the product’s legacy of top-notch security by incorporating modern identity and encryption practices. Support for Microsoft Entra managed identities improves credential management and reduces potential vulnerabilities.3

We’re bringing Optimized Locking to SQL Server, to reduce lock memory consumption and minimize blocking for concurrent transactions through Transaction ID (TID) Locking and Lock After Qualification (LAQ). This capability enables customers to increase uptime and enhance concurrency and scale for SQL Server applications. 

SQL Server 2025 has over 50 enhancements made to the database engine including key improvements for HADR all based on customer feedback. This new release will bring enhancements to performance for applications with no code changes required through Intelligent Query Processing (IQP) and columnstore improvements, enhancements for query processing, and enabling Query Store for readable secondaries.  

“Security Cache Improvement proved invaluable for high-demand environments like ours, reducing disruption when applying permissions on servers with 20,000–25,000 active connections. This enhancement ensures minimal performance impact, streamlining security management. The ordered non-clustered Columnstore index significantly improved query performance by over 63%, optimizing workloads reliant on analytical processing.”

—Madhab Paudel, Database Engineer, Entain

Cloud agility through Azure 

To build scalable analytics, data needs to be extracted, transformed, normalized, and made available in a central place. SQL Server 2025 will support database mirroring in Fabric, giving you near real-time analytics using a zero extract, transform, and load (ETL) experience and allowing you to offload your analytical workloads to Fabric.3  

Azure is a critical component of SQL Server. With Azure Arc, SQL Server 2025 will continue to offer cloud capabilities to enable customers to better manage, secure and govern your SQL estate at scale across on-premises and cloud. 

“Fabric mirroring for SQL Server 2025 helps MSC to build the bridge to bring our operational data into Microsoft Fabric.”

—Javier Villegas, IT Director of DBA and BI Services, Mediterranean Shipping Company

Get started with SQL Server 2025 today

With every AI-powered query for hybrid search, every millisecond saved in query execution, every change event streamed in real time, SQL Server 2025 is a critical building block for modern data-intensive applications in this AI era. Ready to try it out? Learn more about SQL Server 2025.

SQL Server Management Studio (SSMS) 21, now generally available, is based on Visual Studio 2022 and includes 64-bit support. This modernized version is available from the Visual Studio Installer, offers automatic updates, and introduces Git integration, query editor enhancements, and a new connection experience.

Microsoft Copilot in SSMS, now in preview, is available as an optional workload when installing SSMS 21, and assists customers in writing, editing, and fixing T-SQL queries using natural language. Leveraging database context, it also helps with database administration, maintenance, configuration and more.3

Explore solutions and capabilities

Person smiling while working on computer

SQL Server 2025

Explore new capabilities in AI development, enhanced model management, and more


1 Public preview of Python driver is June 1, 2025. The alpha version is available on GitHub today. 

2 According to the National Institute of Standards and Technology Comprehensive Vulnerability Database, as of December 2024.

3 Although SQL Server 2025 in public preview is free to try, using some features such as Microsoft Entra, Fabric and Copilot in SQL Server Management Studio 21 could incur costs based on usage. Try Azure for free and explore Fabric trial capacity.

The post Announcing SQL Server 2025 (preview): The AI-ready enterprise database from ground to cloud appeared first on Microsoft SQL Server Blog.

]]>
SQL Server Integration Services (SSIS) Microsoft Connector for Oracle deprecation  http://approjects.co.za/?big=en-us/sql-server/blog/2025/01/21/sql-server-integration-services-ssis-microsoft-connector-for-oracle-deprecation/ Tue, 21 Jan 2025 16:00:00 +0000 In July 2025, Microsoft will discontinue support for the Microsoft Connector for Oracle in SQL Server Integration Services.

The post SQL Server Integration Services (SSIS) Microsoft Connector for Oracle deprecation  appeared first on Microsoft SQL Server Blog.

]]>
In July 2025, Microsoft will discontinue support for the Microsoft Connector for Oracle in SQL Server Integration Services (SSIS). This blog provides essential details to help customers prepare for this change in advance.

The Microsoft Connector for Oracle enables data export from and import into Oracle databases within an SSIS package. This feature, available in Enterprise editions of SQL Server 2019 and 2022, will remain functional for the lifecycle of the SQL Server product. However, support for this feature will officially end on July 4, 2025. With the deprecation, future product releases will provide no further bug fixes. Additionally, it will not be supported from SQL Server 2025 and onwards.

Today, customers are leveraging the Microsoft Connector for Oracle in a variety of scenarios, including integrating Oracle data with other sources and supporting ETL (Extract, Transform, Load) processes to gain valuable insights. We recommend that customers use the SSIS ADO.NET Source and ADO.NET Destination components as the primary alternative solution to the Microsoft Connector for Oracle.

These SSIS ADO.NET components offer similar ETL capabilities for connecting Oracle databases with a .NET provider, specifically the OracleClient Data Provider, to connect, transfer, and transform your data efficiently. For further detailed instructions, please refer to the step-by-step guide.

If you need any assistance, please contact Microsoft Support.

A developer working from home on a computer

Microsoft SQL Server

Get the flexibility you need to use integrated solutions and apps with your data—in the cloud, on-premises, or at the edge.

Exploring best-in-class connectivity to Oracle with Microsoft Fabric 

The announcement of the deprecation of the SQL Server Integration Services (SSIS) Microsoft Connector for Oracle also presents an opportunity to explore new solutions for modern data integration with Oracle.

Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building. It offers a comprehensive suite of services including Data Engineering, Data Factory, Data Science, Real-Time Analytics, Data Warehouse, and Databases. 

Data Factory in Microsoft Fabric offers a modern data integration experience with Oracle databases, allowing reading from Oracle databases on-premises or behind a virtual network, and writing to any data destination.

Mirroring in Microsoft Fabric allows users to enjoy a highly integrated, end-to-end, and easy-to-use product that is designed to simplify your analytics needs. You can continuously replicate your existing data estate directly into Fabric’s OneLake, which can be used for all your analytical needs. This feature allows businesses to continuously integrate their existing data estate without complex ETL. 

Let’s explore the details of each of the approaches below. 

Oracle connectivity with Data Factory

Data Factory in Microsoft Fabric provides a modern data integration experience to ingest, prepare, and transform data from a rich set of data sources. It incorporates the simplicity of Power Query, and you can use more than 200 native connectors to connect to data sources on-premises and in the cloud.

One of the powerful features of Data Factory is its ability to configure and manage Oracle database connections in a copy activity. This functionality allows organizations to seamlessly integrate their Oracle databases into their data pipelines, ensuring efficient data movement and transformation. Configure Oracle database in a copy activity provides comprehensive instructions on how to perform this configuration. 

You can leverage the on-premises data gateway to securely connect to your on-premises Oracle database. This gateway acts as a bridge, enabling seamless data movement between on-premises data sources and cloud services. For detailed instructions, please refer to move data from Oracle to Fabric Lakehouse via pipeline and on-premises data gateway.

Replicating Oracle data into Fabric’s OneLake with Mirroring 

Mirroring in Microsoft Fabric offers a modern approach to seamlessly accessing and ingesting data from any database or data warehouse into OneLake in Microsoft Fabric. This feature allows businesses to continuously integrate their existing data estate without complex ETL processes. 

Open Mirroring in Fabric is extensible, customizable, and built on the open Delta Lake table format. It enables applications and data ISVs (Independent Software Vendors) to write change data directly into a mirrored database in Fabric using public APIs (Application Programming Interface). Once the data lands in OneLake, Open Mirroring handles complex data changes, ensuring all mirrored data remains continuously up-to-date and ready for analysis. 

We are thrilled to see Oracle Golden Gate streamline the delivery of mirroring solutions in Microsoft Fabric by integrating their data solution into Open Mirroring. As a key partner in our Open Mirroring ecosystem, Oracle Golden Gate offers a powerful and seamless approach to data replication, enabling continuous and efficient integration of data into Microsoft Fabric’s OneLake. This partnership highlights our commitment to providing modern, extensible solutions that simplify data integration and drive value for our customers. 

Simplifying Oracle to SQL Server Migration: Leveraging Microsoft SQL Server Migration Assistant (SSMA)

Additionally, if you are looking to migrate Oracle Database to SQL Server, Microsoft SQL Server Migration Assistant (SSMA) is a tool designed to automate database migration. SQL Server Migration Assistant (SSMA) for Oracle is a comprehensive environment that helps you quickly migrate Oracle databases to SQL Server, Azure SQL Database. The Oracle to SQL Server migration guide provides detailed instructions on how to migrate your Oracle database to SQL Server using SSMA for Oracle. This comprehensive guide ensures a smooth transition, minimizing disruptions and maximizing efficiency.

Looking forward

The deprecation of the SSIS Microsoft Connector for Oracle offers an opportunity to explore and implement more advanced and robust data integration solutions. By considering the ADO.NET components, Microsoft Fabric, or Microsoft SQL Server Migration Assistant for Oracle, organizations can ensure continued efficiency and reliability in their data integration processes. Each of these alternatives brings unique benefits, allowing businesses to choose the one that best aligns with their operational requirements and strategic goals. 

As the landscape of data integration evolves, staying informed about the latest tools and technologies will be crucial for maintaining a competitive edge and achieving seamless data connectivity. By proactively addressing the deprecation and selecting the appropriate alternative, organizations can continue to leverage their data assets effectively and drive business success. 


Resources 

Learn more about Data Factory in Microsoft Fabric and Oracle to SQL Server migration

The post SQL Server Integration Services (SSIS) Microsoft Connector for Oracle deprecation  appeared first on Microsoft SQL Server Blog.

]]>
Announcing Microsoft SQL Server 2025: Enterprise AI-ready database from ground to cloud http://approjects.co.za/?big=en-us/sql-server/blog/2024/11/19/announcing-microsoft-sql-server-2025-apply-for-the-preview-for-the-enterprise-ai-ready-database/ Tue, 19 Nov 2024 13:30:00 +0000 Microsoft SQL Server 2025, an AI-ready database with built-in security, hybrid AI vector search, and integration with Microsoft Fabric and Microsoft Azure.

The post Announcing Microsoft SQL Server 2025: Enterprise AI-ready database from ground to cloud appeared first on Microsoft SQL Server Blog.

]]>
The increasing adoption of AI technologies is presenting new challenges for our customers’ data estate and applications. Most organizations expect to deploy AI workloads across a hybrid mix of cloud, edge, and dedicated infrastructure, with privacy and security being more important than ever.

Microsoft SQL Server 2025, now in preview, is an enterprise AI-ready database from ground to cloud that tackles these challenges by bringing AI to customers’ data. This release continues SQL Server’s three decades of innovation in performance and security, adding new AI capabilities. With Microsoft Fabric integration, customers can bring their data into the next generation of data analytics. The release supports hybrid environments across clouds, on-premises datacenters, and edge, leveraging Microsoft Azure innovation for customers’ databases.

Graph describing the three categories of ground-to-cloud features in Microsoft SQL Server 2025: Bult-in AI, best-in-class security and performance, and Fabric and Azure Arc connected.

Over the years, SQL Server has transcended well beyond a traditional relational database. With the latest release of SQL Server, we’re enabling customers to build AI applications deeply integrated with the SQL engine. SQL Server 2025 is transforming into a vector database in its own right, using the built-in filtering capabilities along with a vector search, with great performance and is easily consumable by developers using T-SQL.

AI built-in

This new version has AI built-in, simplifying AI application development and retrieval-augmented generation (RAG) patterns with secure, performant, and easy-to-use vector support, leveraging familiar T-SQL syntax. With this new capability, you can combine vectors with your SQL data for a hybrid AI vector search.

Build AI applications with your enterprise database

SQL Server 2025 is an enterprise-ready vector database with built-in security and compliance, bringing enterprise AI to your data. It features a native vector store and index powered by DiskANN, a vector search technology using disk storage to efficiently find similar data points in large datasets. These databases efficiently support chunking and enable accurate data retrieval through semantic searching. In this latest SQL Server version, flexible AI model management within the engine using Representational State Transfer (REST) interfaces allows you to use AI models from ground to cloud.

In addition, whether customers are working on data preprocessing, model training, or RAG patterns, extensible, low-code tools offer flexible model interfaces within the SQL engine, supported by T-SQL and external REST endpoints. These tools enhance developers’ ability to create various AI applications through seamless integration with popular AI frameworks like LangChain, Semantic Kernel, and Entity Framework Core.

Boost developer productivity

When building data-intensive applications such as AI applications, it’s critical to focus on extensibility, frameworks, and data enrichment to enhance developers’ productivity. We ensure SQL will provide a best-in-class experience for developers by incorporating features such as REST API support, GraphQL integration through Data API Builder, and Regular Expression enablement. Additionally, native JSON support enables developers to more effectively deal with frequently changing schema and hierarchical data, facilitating the creation of more dynamic applications. Overall, we’re enhancing SQL development to be more extensible, performant, and user-friendly. All functionalities are underpinned by the security provided by the SQL Server engine, making it a truly enterprise-ready platform for AI.

Best-in-class security and performance

SQL Server 2025 is an industry leader in database security and performance. Support for Microsoft Entra managed identities improves credential management, reduces potential vulnerabilities, and provides compliance and auditing capabilities. SQL Server 2025 introduces outbound authentication support for MSI (Managed Service Identity) for SQL Server enabled by Azure Arc.

We’re also introducing performance and availability enhancements, extensively battle-tested on Microsoft Azure SQL, to SQL Server. In the new version you can boost workload performance and reduce troubleshooting with enhanced query optimization and query performance execution. Optional Parameter Plan Optimization (OPPO) is designed to enable SQL Server to choose the optimal execution plan based on customer-provided runtime parameter values and to significantly reduce bad parameter sniffing problems that can exist in workloads. Persisted statistics on secondary replicas prevent the loss of statistics during a restart or failover, thereby avoiding potential performance degradation. Regarding query execution, the improvements in batch mode processing and columnstore indexing further establish SQL Server as a mission-critical database for analytical workloads.   

Optimized locking reduces lock memory consumption and minimizes blocking for concurrent transactions through Transaction ID (TID) Locking and Lock After Qualification (LAQ). This capability enables customers to increase uptime and enhance concurrency and scale for SQL Server applications. 

Change event streaming for SQL Server brings real-time application integration with event driven architectures, command query responsibility segregation, and real-time intelligence. This will add new database engine capabilities to capture and publish incremental changes to data and schema to a provided destination such as Azure Event Hubs and Kafka in near real-time.

Microsoft Fabric and Azure Arc connected

In traditional data warehouse and data lake scenarios, integrating all your data involves designing, monitoring, and managing complex ETL (Extract, Transform, Load) processes to transfer operational data from SQL Server. These traditional methods do not support real-time data transfer, resulting in latency that prevents the creation of real-time analytics. Microsoft Fabric offers comprehensive, integrated, and AI-enhanced data analytics services designed to meet modern requirements of analytical workloads. Mirrored SQL Server Database in Fabric is a fully managed, resilient process that simplifies SQL Server data replication to Microsoft OneLake in near real-time. Mirroring will enable customers to continuously replicate data from SQL Server databases running on Azure virtual machines or outside of Azure, serving online transaction processing (OLTP) or operational store workloads directly into OneLake in order to facilitate analytics and insights on the unified Fabric data platform.

Azure continues to be a critical component of SQL Server. With Azure Arc, SQL Server 2025 will continue to offer cloud capabilities to enable customers better manage, secure, and govern their SQL estate at scale across on-premises and cloud. Capabilities like automatic patching, automatic backups, monitoring, and Best Practices Assessment offer customers more ways to streamline routine tasks and further enhance their business continuity. In addition, Azure Arc simplifies SQL Server licensing by offering a pay-as-you-go option, bringing flexibility and licensing visibility to our customers.

Sign up for the preview today

We’re currently onboarding customers and partners to SQL Server 2025 preview, in advance of general availability in the coming year. 

Register today to apply for the SQL Server 2025 Community Technology Preview (CTP)1 and stay informed about SQL Server 2025 updates.

Microsoft just announced the upcoming release of SQL Server Management Studio (SSMS) 21 Preview 1. This release integrates Microsoft Copilot capabilities into SSMS. The Copilot experience streamlines SQL development by offering real-time suggestions, code completions, and best practice recommendations. If you would like to take part and have an early hands-on experience with this new capability, please use this link to indicate your interest.


1Some of the new capabilities covered in this blog may not be available in the first CTP version.

The post Announcing Microsoft SQL Server 2025: Enterprise AI-ready database from ground to cloud appeared first on Microsoft SQL Server Blog.

]]>
Announcing the retirement of SQL Server Stretch Database http://approjects.co.za/?big=en-us/sql-server/blog/2024/07/03/announcing-the-retirement-of-sql-server-stretch-database/ Wed, 03 Jul 2024 16:00:00 +0000 In July 2024, SQL Server Stretch Database will be discontinued for SQL Server 2022, 2019, and 2017.

The post Announcing the retirement of SQL Server Stretch Database appeared first on Microsoft SQL Server Blog.

]]>
Ever since Microsoft introduced SQL Server Stretch Database in 2016, our guiding principles for such hybrid data storage solutions have always been affordability, security, and native Azure integration. Customers have indicated that they want to reduce maintenance and storage costs for on-premises data, with options to scale up or down as needed, greater peace of mind from advanced security features such as Always Encrypted and row-level security, and they seek to unlock value from warm and cold data stretched to the cloud using Microsoft Azure analytics services.     

During recent years, Azure has undergone significant evolution, marked by groundbreaking innovations like Microsoft Fabric and Azure Data Lake Storage. As we continue this journey, it remains imperative to keep evolving our approach on hybrid data storage, ensuring optimal empowerment for our SQL Server customers in leveraging the best from Azure.

Retirement of SQL Server Stretch Database 

On November 16, 2022, the SQL Server Stretch Database feature was deprecated from SQL Server 2022. For in-market versions of SQL Server 2019 and 2017, we had added an improvement that allowed the Stretch Database feature to stretch a table to an Azure SQL Database. Effective July 9, 2024, the supporting Azure service, known as SQL Server Stretch Database edition, is retired. Impacted versions of SQL Server include SQL Server 2022, 2019, 2017, and 2016.  

In July 2024, SQL Server Stretch Database will be discontinued for SQL Server 2022, 2019, 2017, and 2016. We understand that retiring an Azure service may impact your current workload and use of Stretch Database. Therefore, we kindly request that you either migrate to Azure or bring their data back from Azure to your on-premises version of SQL Server. Additionally, if you’re exploring alternatives for archiving data to cold and warm storage in the cloud, we’ve introduced significant new capabilities in SQL Server 2022, leveraging its data virtualization suite. 

The path forward 

SQL Server 2022 supports a concept named CREATE EXTERNAL TABLE AS SELECT (CETaS). It can help customers archive and store cold data to Azure Storage. The data will be stored in an open source file format named Parquet. It operates well with complex data in large volumes. With its performant data compression, it turns out to be one of the most cost-effective data storage solutions. Using OneLake shortcuts, customers then can leverage Microsoft Fabric to realize cloud-scale analytics on archived data.  

Our priority is to empower our SQL Server customers with the tools and services that leverage the latest and greatest from Azure. If you need assistance in exploring how Microsoft can best empower your hybrid data archiving needs, please contact us.

New solution FAQs

What’s CETaS? 

Creates an external table and then exports, in parallel, the results of a Transact-SQL SELECT statement. 

  • Azure Synapse Analytics and Analytics Platform System support Hadoop or Azure Blob Storage.
  • SQL Server 2022 (16.x) and later versions support CETaS to create an external table and then export, in parallel, the result of a Transact-SQL SELECT statement to Azure Data Lake Storage Gen2, Azure Storage Account v2, and S3-compatible object storage. 

What is Fabric? 

Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building. Fabric offers a comprehensive suite of services including Data engineering, Data Factory, Data Science, Real-Time Analytics, Data Warehouse, and Databases. 

With Fabric, you don’t need to assemble different services from multiple vendors. Instead, it offers a seamlessly integrated, user-friendly platform that simplifies your analytics requirements. Operating on a software as a service (SaaS) model, Fabric brings simplicity and integration to your solutions. 

Fabric integrates separate components into a cohesive stack. Instead of relying on different databases or data warehouses, you can centralize data storage with Microsoft OneLake. AI capabilities are seamlessly embedded within Fabric, eliminating the need for manual integration. With Fabric, you can easily transition your raw data into actionable insights for business users. 

What is OneLake shortcuts?  

Shortcuts in OneLake allow you to unify your data across domains, clouds, and accounts by creating a single virtual data lake for your entire enterprise. All Fabric experiences and analytical engines can directly connect to your existing data sources such as Azure, Amazon Web Services (AWS), and OneLake through a unified namespace. OneLake manages all permissions and credentials, so you don’t need to separately configure each Fabric workload to connect to each data source. Additionally, you can use shortcuts to eliminate edge copies of data and reduce process latency associated with data copies and staging. 

Shortcuts are objects in OneLake that point to other storage locations. The location can be internal or external to OneLake. The location that a shortcut points to is known as the target path of the shortcut. The location where the shortcut appears is known as the shortcut path. Shortcuts appear as folders in OneLake and any workload or service that has access to OneLake can use them. Shortcuts behave like symbolic links. They’re an independent object from the target. If you delete a shortcut, the target remains unaffected. If you move, rename, or delete a target path, the shortcut can break. 

Learn more 

Abstract image

Microsoft Fabric

Bring your data into the era of AI

The post Announcing the retirement of SQL Server Stretch Database appeared first on Microsoft SQL Server Blog.

]]>
SQL Server Integration Services (SSIS) Change Data Capture Attunity feature deprecations http://approjects.co.za/?big=en-us/sql-server/blog/2024/02/28/sql-server-integration-services-ssis-change-data-capture-attunity-feature-deprecations/ Wed, 28 Feb 2024 16:00:00 +0000 This blog provides details to help support customers in modernizing to new solutions well in advance of this change.

The post SQL Server Integration Services (SSIS) Change Data Capture Attunity feature deprecations appeared first on Microsoft SQL Server Blog.

]]>
In December 2025, Microsoft will discontinue support for the Change Data Capture (CDC) components by Attunity and Change Data Capture (CDC) service for Oracle by Attunity of SQL Server Integration Services (SSIS). This blog provides details to help support customers in modernizing to new solutions well in advance of this change. The following components for which support will be discontinued:

SQL Server Intergration Services

Learn More

Customers using these two features are encouraged to modernize to Data Factory in Microsoft Fabric or Azure Data Factory. Customers can use incremental data loading capability from Azure Data Factory. Azure Data Factory can be used for on-premises data sources with a self-hosted integration runtime and is fully compatible with all impacted versions of SQL Server.

Data Factory in Microsoft Fabric enables you to move and transform data from various sources to various destinations. It’s a managed cloud service designed specifically for handling complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.

If you need any assistance as you plan your CDC modernization please contact Microsoft Support.

Learn more about Data Factory in Microsoft Fabric and Azure Data Factory:

Frequently Asked Questions

What’s Data Factory in Microsoft Fabric?

Data Factory in Microsoft Fabric is the next generation of Azure Data Factory which provides cloud-scale data movement and data transformation services that allow you to solve the most complex ETL scenarios. It’s intended to make your experience easy to use, powerful, and truly enterprise-grade.  Data Factory empowers you with a modern data integration experience to ingest, prepare and transform data from a rich set of data sources (for example, databases, data warehouse, Lakehouse, real-time data, and more). Whether you are a citizen or professional developer, you will be able to transform the data with intelligent transformations and leverage a rich set of activities. With Data Factory in Microsoft Fabric, we are bringing fast copy (data movement) capabilities to both dataflows and data pipelines. With Fast Copy, you can move data between your favorite data stores blazing fast. Most importantly, Fast Copy enables you to bring data to your Lakehouse and Data Warehouse in Microsoft Fabric for analytics.

What’s Azure Data Factory?

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. It is a fully managed, serverless data integration solution for ingesting, preparing, and transforming all your data at scale. With Azure Data Factory, you can visually integrate data sources using more than 90 built-in, maintenance-free connectors. The service enables you to create and schedule data-driven workflows, called pipelines, that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

You can use Azure Data Factory to access and integrate data from on-premises data sources. One way to do this is by using a self-hosted integration runtime, which acts as a bridge between your on-premises data sources and the cloud-based Azure Data Factory service. This allows you to create data-driven workflows that can ingest data from your on-premises data stores and move it to the cloud for further processing and transformation.

How fast can I ingest data in Fabric data pipelines?

Fabric Data Factory allows you to develop pipelines that maximize data movement throughput for your environment. These pipelines fully utilize the following resources:

  • Network bandwidth between the source and destination data stores.
  • Source or destination data store input/output operations per second (IOPS) and bandwidth This full utilization means you can estimate the overall throughput by measuring the minimum throughput available with the following resources:
    • Source data store
    • Destination data store
  • Network bandwidth in between the source and destination data stores Meanwhile, we continuously work on innovations to boost the best possible throughput you can achieve. Today, the service can move 1 TB TPC-DI dataset (parquet files) into both Fabric Lakehouse table and Data Warehouse within five minutes—moving 1 billion rows under one minute; Please note that this performance is only a reference by running the above testing dataset. The actual throughput still depends on the factors listed previously. In addition, you can always multiply your throughput by running multiple copy activities in parallel. For example, using ForEach loop.

Where can I find more training resources to get started?


The post SQL Server Integration Services (SSIS) Change Data Capture Attunity feature deprecations appeared first on Microsoft SQL Server Blog.

]]>
Azure Data Studio 1.41 release http://approjects.co.za/?big=en-us/sql-server/blog/2023/01/25/azure-data-studio-1-41-release/ Wed, 25 Jan 2023 18:30:00 +0000 A new release of Azure Data Studio to share—introducing 1.41.

The post Azure Data Studio 1.41 release appeared first on Microsoft SQL Server Blog.

]]>
We are less than one month into 2023 and already have a new release of Azure Data Studio to share—introducing 1.41! With this release, we migrated to a new authentication library, made improvements based on user requests and feedback, and addressed a slew of existing issues that had been logged by users—including some that were really old. We would like to express our gratitude to the community for creating issues in GitHub, and for engaging with the engineering team when more information was needed. To those users that provided logs or more detail about their environment and the problem: thank you. We often need additional details to pinpoint the root cause of an issue, and we can do that faster thanks to your help. We will continue to engage with users as we improve the reliability of Azure Data Studio and add new features throughout 2023.

Azure Data Studio

A modern open-source, cross-platform hybrid data analytics tool designed to simplify the data landscape.

A woman sitting at a table using a laptop

Connectivity

The migration from the Active Directory Authentication Library (ADAL) to Microsoft Authentication Library (MSAL) was a significant undertaking by the team. This was necessary as ADAL support ends in June of this year, and it provides multiple benefits for those environments using Azure Activity Directory (AAD). AAD users should notice an improved and more reliable experience, particularly around token refresh and connection stability. This also helped us fix an issue in the MySQL extension for AAD. 

Additional changes include improved loading of Azure resources and new Dedicated SQL Pools and Azure Synapse Analytics nodes in the Azure tree. Azure Data Studio 1.41 also provides the ability to customize the name of firewall rules for Azure SQL and adds support for connecting to a server alias (versus a server name).

If you have applications that use ADAL, please see the Migrate applications to the Microsoft Authentication Library (MSAL) page for more information.

Object explorer

A new area of focus in this release is Object Explorer (OE), and this will continue to be an area we improve upon in the next few releases. Those with serverless Azure SQL previously reported issues with folders not expanding correctly, and with databases being brought online (thus incurring costs) when it was not expected. Other users noted that expanding OE timed out after 45 seconds. We have addressed all these issues in this release, in addition to adding support for Ledger Views.

Query results

The query results window got a fair bit of attention this release as we work through the backlog of open issues. First, we introduced a new configuration option to show or hide the action bar in the query results view. The Query Editor > Results: Show Action Bar option can be found in the command palette (CTRL + , ) if you type Show Action Bar. By default, the action bar is shown in the query results pane, as seen in the screenshot below:

1Query Results window with Action Bar text and arrow pointing to the action bar on the right side of the screen.

There are also improvements around opening JSON files and the visibility of the horizontal scroll bar in the query results pane. Azure Data Studio 1.41 now correctly handles line breaks in cells when copying from the results grid and pasting to an editor, and the auto-resizing of columns in the output pane has been updated to better display column contents. Finally, cell selection and navigation in the results grid have been enhanced, and we introduced additional summary details when selecting multiple cells in the results window:

Query Results window with seven cells highlighted and average, count and sum information displayed on the bottom toolbar.

Extensions

Multiple teams have been working on updates to various extensions available from Azure Data Studio.  For SQL Projects, we have improved the experience of finding projects by providing a dropdown that lists saved projects, rather than requiring users to browse to their location. We had reports that differences in schema compare were not highlighted correctly, and that problem has been fixed.

Users of the SQL Migration extension will see an improvement in the migration process as we better support migrations to specific subscriptions (such as government), and the extension now includes the Premium Series Memory Optimized SQL MI SKU as a recommendation where appropriate.

MongoDB and Microsoft Azure continue to build on their partnership by introducing an extension for MongoDB Atlas and Azure Data Studio on the Azure Marketplace. This Extension is available in Public Preview as of today, Wednesday, January 25, 2023.  You already know that Azure Data Studio is a modern open-source, cross-platform hybrid data analytics tool designed to simplify your data landscape, and customers can use Azure Data Studio to work with their data sitting in one or more Azure data services. MongoDB Atlas on Azure provides a fully managed solution for MongoDB in the cloud, and you can now seamlessly connect to and query data on MongoDB Atlas right from Azure Data Studio. This allows you to interact with data on MongoDB Atlas alongside other data services and provides a unified view of your data estate.  If you are an Azure customer that is curious about building applications with MongoDB Atlas and want to amplify your integrated experience inside Azure Data Studio, try Pay-As-You-Go Atlas on the Azure Marketplace today!

MongoDB Atlas extension landing page in Azure Data Studio.

With this 1.41 release, the Polyglot Notebooks extension will be removed from the Azure Data Studio Extension Marketplace. For a polyglot notebooks experience, we recommend folks use the Polyglot Notebooks in Visual Studio Code.

Odds and ends

Continuing on our path of adding support for arm64, we now include support for arm64 on Windows.  Whether you run iOS or Windows, Azure Data Studio 1.41 now provides the capability to leverage arm64, resulting in improved performance.

We are pleased to see users embracing Table Designer and Query Plan Viewer, two features that became generally available (GA) in the November release. In 1.41 we fixed an issue related to opening Table Designer for Ledger tables, and one related to creating a table when another table with the same name already exists.

There were also two requests specific to Query Plan Viewer that got attention in this release. When saving query plan files from Azure Data Studio, we now incrementally append a number to the end of the file for unique naming, and we’ve altered the default folder location when saving plans for a more consistent experience.

Lastly, we had previously announced that we were removing Big Data Cluster functionality from Azure Data Studio. This removal has been delayed until a later release.

Looking forward

We are already at work on the next release of Azure Data Studio and are making plans for what we want to accomplish in 2023. You can expect that we will continue to review backlog issues and address them as they relate to an existing area of focus. We have more changes coming related to the connection dialog and object explorer, and you will also see improvements in user management. Finally, if you see a comment on an issue you opened–whether recent or ages ago–please feel free to respond and provide more information if you are able. 

The post Azure Data Studio 1.41 release appeared first on Microsoft SQL Server Blog.

]]>
The path forward for SQL Server analytics http://approjects.co.za/?big=en-us/sql-server/blog/2022/02/25/the-path-forward-for-sql-server-analytics/ Fri, 25 Feb 2022 18:00:00 +0000 Today, we are announcing changes to SQL Server analytics.

The post The path forward for SQL Server analytics appeared first on Microsoft SQL Server Blog.

]]>
Today, we are announcing changes to SQL Server analytics which includes:

  • Customer feedback
  • Retirement of SQL Server 2019 Big Data Clusters
  • Retirement of PolyBase scale-out groups
  • Path forward

Customer feedback

We continue to see increased migration to the cloud, with analytical workloads leading that charge.

Customers have indicated that analytics in the cloud best aligns to employee skillsets, deployment simplicity and manageability, and cloud flexibility and scalability.

When we first introduced cloud analytics in 2017, many were still investing in on-premises analytical workloads. Today, we offer a wealth of cloud-based services that provide users with similar functionality, including Azure Data Lake Storage (ADLS), Azure Synapse Analytics, Azure SQL, and Azure Machine Learning.

According to the Gartner® 2020 Data and Analytics survey:

  • Analytics, BI, and data science are the most common use cases being accelerated to the cloud due to COVID-19. The organization needs faster delivery of analytics insights to take action. Cloud, with its fast provision and prototyping ability, is an ideal place to start analytics and data science initiatives to nimbly react to the fast pace of changes.¹
  • In the 2020 Gartner Data and Analytics Cloud survey, 74 percent of organizations use or plan to use cloud for analytics, BI and data science.¹

Retirement of SQL Server Big Data Clusters

Today, we are announcing the retirement of SQL Server 2019 Big Data Clusters. All existing users of SQL Server 2019 with Software Assurance will be fully supported on the platform for the next three years, through February 28, 2025. This software will continue to be maintained through SQL Server cumulative updates until that time. In the latest version of SQL Server, we are engineering the best mix of on-premises and in-cloud relational workloads and connectivity to Azure Synapse Analytics for advanced analytics in a flexible, scalable, and integrated environment. Please see below and read our documentation on SQL Server Big Data Clusters to learn more.

Changes to PolyBase support in SQL Server

Today, we are announcing the retirement of PolyBase scale-out groups in Microsoft SQL Server. Scale-out group functionality will be removed from the product in SQL Server 2022. In-market SQL Server 2019, 2017, and 2016 will continue to support the functionality to the end of support for those products.

PolyBase data virtualization will continue to be fully supported as a scale-up feature in SQL Server.

Secondly, Cloudera (CDP) and Hortonworks (HDP) external data sources will also be retired for all in-market versions of SQL Server and will not be included in SQL Server 2022. Moving forward, support for external data sources will be limited to product versions in mainstream support by the respective vendor. You are encouraged to use the new object storage integration functionality available in SQL Server 2022.

In SQL Server 2022, users will need to configure their external data sources to use new connectors when connecting to Azure Storage. The table below summarizes the change:

External Data SourceFromTo
Azure Blob Storagewasb[s]abs
ADLS Gen 2abfs[s]adls

The path forward

If you wish to run analytics on-premises, SQL Server 2022 also provides important new capabilities, building upon its data virtualization suite of connectors by providing object storage integration over REST APIs. We will also continue to invest in the Spark SQL connector to ensure first-class connectivity from Apache Spark to all our SQL products. Additionally, we continue to invest in expanding hybrid capabilities with Azure Arc-enabled data services.

Integrating SQL Server with cloud analytics solutions is a critical capability, which is why we are introducing Azure Synapse Link for SQL Server 2022, the latest release of SQL Server, which will be generally available to purchase later this year. This is a major investment in helping you realize cloud-scale analytics in near real-time on your operational data.

Our priority is to empower you with the tools and services that ensure SQL Server integrates seamlessly into the world of analytic workloads in the cloud by blending operational, analytical, and virtual use cases in our flagship database engine. Please contact your Microsoft account manager if you need assistance in exploring how Microsoft can best empower your analytical needs.


¹Gartner Inc.: Use Cloud to Compose Analytics, BI and Data Science Capabilities for Reusability and Resilience, Julian Sun, Joao Tapadinhas, June 10, 2021.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. 

The post The path forward for SQL Server analytics appeared first on Microsoft SQL Server Blog.

]]>
Meet us at SQLBits 2022 and level up as a data professional http://approjects.co.za/?big=en-us/sql-server/blog/2022/02/10/meet-us-at-sqlbits-2022-and-level-up-as-a-data-professional/ Thu, 10 Feb 2022 16:00:00 +0000 We are excited to be the premium sponsor at this year’s SQLBits 2022, March 8 – 12, in-person in London and virtually.

The post Meet us at SQLBits 2022 and level up as a data professional appeared first on Microsoft SQL Server Blog.

]]>
It has been over two years since we have had the opportunity to meet face-to-face with our data community at a large event and we miss it. From hallway conversations to the energy that comes from solving problems and helping people understand complex concepts, we cannot wait to teach, meet and greet everyone. This is why we are excited to be the premium sponsor at this year’s SQLBits 2022, March 8 – 12, in-person in London and virtually.

As the lead sponsor, we will deliver content including the keynote, five full-day training days, and over fifty general sessions. With so many opportunities to educate, we are bringing the full Azure data team including folks from across the data platform, such as SQL Server, Azure SQL, Cosmos DB, Azure Purview, Azure Synapse Analytics, and Power BI.

Start the week with my team for two day-long training sessions where you will have a unique chance to work directly with Microsoft engineering:

The Hands-on Azure SQL Workshop on March 8 will help you translate your existing SQL Server skills to Azure SQL. Bring your laptop and get ready to learn hands-on. You will gain a foundational knowledge of what to use when, as well as how to configure, monitor, and troubleshoot the “meat and potatoes” of SQL Server in Azure: security, performance, and availability.

Migrate SQL Server to Azure on Wednesday, March 9 will help you migrate your SQL Server environments to Azure. In this session, the Microsoft engineering team will show you everything you need to know, including the tools and knowledge you need to make your migrations seamless, cost-efficient, and optimized for speed.

Other training sessions cover topics such as Azure SQL Database, Synapse Analytics, and Power BI.

All speaker proceeds from these sessions will be given back to a local charity.

The SQLBits event theme this year is Video Games—and in the “Level Up With Azure Data” keynote, Buck Woody has asked me to come talk about SQL Server 2022 and Azure Data. He assures me I will have help with some surprise guests so it should be interesting. It is always a fun keynote when Buck and I are on stage, and this year you really do not want to miss it!

You also have the opportunity to attend the Microsoft general sessions to learn about the entire Azure data platform.

Take a look at some of the learning available SQLBits 2022

Unified Data Governance with Azure PurviewGaurav Malhotra, Evangeline White
What’s New in Azure SQL MINiko Neugebauer
The fundamentals of building a lakehouse with SynapseLuke Moloney
SQL Server in Azure Virtual Machine ReimaginedPam Lahoud
Microsoft Database InnovationsAnna Hoffman
Azure Arc-Enabled Data ServicesJes Schultz, Buck Woody
Azure SQL Database customer success stories for IoT workloadsSilvano Coriani
Azure SQL availability and resiliencyEmily Lisa
Microsoft SQL Server 2022 Deep Dive (two parts)Pedro Lopes
Modernize your Oracle workloads to Azure DataAlexandra Ciortea
Empowering every individual with Power BIMohammad Ali, Patrick LeBlanc
AMA with the Microsoft Engineering team hosted by
Bob Ward
“Rockstars” of the engineering team

See all the opportunities to engage with Microsoft engineering by heading over to our blog on Microsoft Tech Community, Ready for SQLBits 2022. And don’t forget to stop by our booth, where you can get your questions answered by members of the Microsoft team.

SQLBits is a marathon of top-quality training from global specialists, with two days of full-day training sessions and three days of general sessions. As always with SQLBits, Saturday, March 12 is free to attend. Meet with community leaders sharing their real-world experience and Microsoft product teams providing deep insights into innovations that meet your needs.

Register today for SQLBits 2022

Join Microsoft at this hybrid event for the latest on the data platform and a chance to see whether Buck Woody or I have the best arcade game skills!

Register to attend, and we’ll see you there, in-person, or virtually!

The post Meet us at SQLBits 2022 and level up as a data professional appeared first on Microsoft SQL Server Blog.

]]>
PASS Data Community Summit keynote: a bridge to a new universe http://approjects.co.za/?big=en-us/sql-server/blog/2021/11/08/pass-data-community-summit-keynote-a-bridge-to-a-new-universe/ Mon, 08 Nov 2021 18:00:40 +0000 It is almost time for PASS Data Community Summit 2021, a free online conference for the Microsoft data platform professional.

The post PASS Data Community Summit keynote: a bridge to a new universe appeared first on Microsoft SQL Server Blog.

]]>
It is almost time for PASS Data Community Summit 2021, a free online conference for the Microsoft data platform professional. The conference, hosted by Redgate, will include the latest SQL Server and Azure data innovations, practical training, and networking to empower you to transform your career and your organization. This year’s event is coming to you online for free from November 8 – 12, 2021, and we will continue the tradition of a Microsoft day one keynote.

Deliver faster performance than ever before with SQL Server and Azure

Hear directly from Microsoft’s Rohan Kumar and senior Microsoft engineering leaders during the day one kick-off keynote as they take you on a journey to a new universe shaped by our past—and built to take us into a limitless future. The cloud has created a whole new universe and advancements in Microsoft data products and services are your bridge.

You’ll see how you can use your existing SQL Server and Azure skills, and learn about new tools and platforms available from Microsoft to deliver faster performance than ever before. You’ll see how to shape your data so you can harness its power to find a new galaxy of insights, answers, and predictions. And you will hear about new innovations that continue Microsoft’s rich heritage of data integrity and governance.

Additionally, in the special on-demand keynote, Microsoft Azure Data CTO Raghu Ramakrishnan and team will share a technical keynote and demos showing Azure Purview and SQL.

Register for the PASS Data Community Summit

Don’t miss this opportunity to see how Microsoft is uniquely positioned to provide you with an end-to-end data platform seamlessly integrating limitless database scale and performance, unmatched analytics and intelligence, and unified data governance.

After the keynotes, ground your learning with in-depth training in one of more than two dozen sessions Microsoft will be delivering. Hear the latest from the Engineering teams who develop the tools you use every day. After your sessions, don’t forget to visit the virtual exhibit hall where you can connect with our team across SQL Server 2022, Azure SQL, Azure Synapse Analytics, Microsoft Power BI, Azure Arc, and more.

Register for PASS Data Community Summit today.

The post PASS Data Community Summit keynote: a bridge to a new universe appeared first on Microsoft SQL Server Blog.

]]>
What’s new with SQL Server Big Data Clusters—CU13 Release http://approjects.co.za/?big=en-us/sql-server/blog/2021/10/06/whats-new-with-sql-server-big-data-clusters-cu13-release/ Wed, 06 Oct 2021 15:00:09 +0000 Today, we’re proud to announce the release of the latest cumulative update, CU13, for SQL Server Big Data Clusters which includes important changes and capabilities.

The post What’s new with SQL Server Big Data Clusters—CU13 Release appeared first on Microsoft SQL Server Blog.

]]>
SQL Server Big Data Clusters (BDC) is a capability brought to market as part of the SQL Server 2019 release. Big Data Clusters extends SQL Server’s analytical capabilities beyond in-database processing of transactional and analytical workloads by uniting the SQL engine with Apache Spark and Apache Hadoop to create a single, secure, and unified data platform. It is available exclusively to run on Linux containers, orchestrated by Kubernetes, and can be deployed in multiple-cloud providers or on-premises.

Today, we’re proud to announce the release of the latest cumulative update, CU13, for SQL Server Big Data Clusters which includes important changes and capabilities:

  • Hadoop Distributed File System (HDFS) distributed copy capabilities through azdata
  • Apache Spark 3.1.2
  • SQL Server Big Data Clusters runtime for Apache Spark release 2021.1
  • Password rotation for Big Data Cluster’s auto-generated Active Directory service accounts during BDC deployment
  • Enable Advanced Encryption Standard (AES) Optional parameter on the automatically generated AD accounts

Major improvements in this update are highlighted below, along with resources for you to learn more and get started.

HDFS distributed copy capabilities through azdata

Hadoop HDFS DistCP is a command line tool that enables high-performant distributed data copy between HDFS clusters. On SQL Server Big Data Clusters CU13 we are surfacing the capability of distcp through the new azdata bdc hdfs distcp command to enable inter Big Data Clusters distributed data copy. This enables data migration scenarios between SQL Server Big Data Clusters; supporting both secure and non-secure cluster deployment configurations.

For more information, see:

Apache Spark 3.1.2

Up to cumulative update 12, Big Data Clusters relied on the Apache Spark 2.4 line, which reached its end of life in May 2021. Consistent with our continuous improvement commitment to the Big Data and Machine Learning capabilities of the Apache Spark engine, CU13 brings in the current release of Apache Spark, version 3.1.2.

This new version of Apache Spark brings stellar performance benefits on big data processing workloads. Using the reference TCP-DS 10 TB workload in our tests we were able to reduce runtime from 4.19 hours to 2.96 hours, a 29.36 percent improvement achieved just by switching engines while using the same hardware and configuration profiles, no additional application optimizations. The improvement mean of individual query runtime is 36 percent.

Individual TCP-DS 10TB query runtimes between Spark 2.4 and Spark 3.1. Chart shows that average runtimes across all queries are 30 lower, highlighting the benefits of using Spark 3.1 with CU13.

Spark 3 is a major release and as such, contains breaking changes. Following the same established best practice in the SQL Server universe, perform a side-by-side deployment of SQL Server Big Data Clusters to validate your current workload with Spark 3 before upgrading. You can leverage the new azdata HDFS distributed copy capability to have a subset of your data needed to validate this workload. For more information, see the following articles to help you assess your scenario before upgrading to the CU13 release:

SQL Server Big Data Clusters runtime for Apache Spark release 2021.1

With this release of SQL Server Big Data Clusters, we doubled down on our commitment of release cadence, binary compatibility, and consistency of experiences for data engineers and data scientists through the SQL Server Big Data Clusters runtime for Apache Spark initiative.

The SQL Server Big Data Clusters runtime for Apache Spark is a consistent versioned block of programming language distributions, engine optimizations, core libraries, and packages for Apache Spark.

Here is a summary of the SQL Server Big Data Clusters runtime for Apache Spark release 2021.1 shipped with SQL Server Big Data Clusters CU13:

  • Apache Spark 3.1.2
  • Scala 2.12 for Scala Spark
  • Python 3.8 for PySpark
  • Microsoft R Open 3.5.2 for SparkR and sparklyr

For more information on all included packages and how to use it, see:

Password rotation for Big Data Cluster’s Active Directory service accounts

When a big data cluster is deployed with Active Directory integration for security, there are Active Directory (AD) accounts and groups that SQL Server creates during a big data cluster deployment, see auto-generated active directory objects for further information.

When it comes to security-sensitive customers, it is usually required security reinforcement such as setting password expiration policies, allowing the administrator to set user passwords to never expire or expire after a certain number of days. For SQL Server Big Data Cluster deployments it was previously required to manually rotate the password for those auto-generated active directory objects.

With SQL Server Big Data Clusters CU13, we are now releasing the azdata bdc rotate command to rotate passwords for all auto-generated accounts except for the DSA account. In order to update the DSA password for SQL Server Big Data Clusters we are releasing a specific operation notebook.

Enable Advanced Encryption Standard (AES) on the automatically generated AD accounts

Today’s enterprise environments are facing a lot more challenges than it used to be. Using secure and encrypted connections when authenticating with Kerberos will significantly lower the risk to encounter attacks such as Kerberoasting; a type of attack targeting service accounts in Active Directory.  Starting with SQL Server Big Data Clusters CU13, we’re enabling the Advanced Encryption Standard (AES)  support on the auto-generated AD accounts by allowing users to set an optional boolean parameter in the BDC deployment profile to indicate this AD account supports Kerberos AES 128 bit and 256 bit encryptions.

For more information, see:

Ready to learn more?

Check out the SQL Server Big Data Clusters CU13 release notes to learn more about all the improvements available with the latest update. For a technical deep-dive on Big Data Clusters, read the documentation and visit our GitHub repository.

Follow the instructions on our documentation page to get started and deploy Big Data Clusters.

The post What’s new with SQL Server Big Data Clusters—CU13 Release appeared first on Microsoft SQL Server Blog.

]]>