{"id":34478,"date":"2021-02-16T09:00:59","date_gmt":"2021-02-16T17:00:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/?p=34478"},"modified":"2024-01-22T22:51:28","modified_gmt":"2024-01-23T06:51:28","slug":"whats-new-with-sql-server-big-data-clusters","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2021\/02\/16\/whats-new-with-sql-server-big-data-clusters\/","title":{"rendered":"What\u2019s new with SQL Server Big Data Clusters"},"content":{"rendered":"
SQL Server Big Data Clusters<\/a> (BDC) is a new capability brought to market as part of the SQL Server 2019 release. BDC extends SQL Server\u2019s analytical capabilities beyond in-database processing of transactional and analytical workloads by uniting the SQL engine with Apache Spark and Apache Hadoop to create a single, secure, and unified data platform. BDC is available exclusively to run on Linux containers, orchestrated by Kubernetes, and can be deployed in multiple-cloud providers or on-premises.<\/p>\n Today, we\u2019re announcing the release of the latest cumulative update (CU9) for SQL Server Big Data Clusters, which includes important capabilities:<\/p>\n This announcement highlights some of the major improvements, provides additional context to better understand the design behind these capabilities, and points you to relevant resources to learn more and get started.<\/p>\n SQL Server Big Data Clusters, a feature released as part of SQL Server 2019, is a data platform for operational and analytical workloads. We are announcing new configuration management functionality as part of today\u2019s CU9 release. Workload requirements are constantly changing and these enhancements will help customers ensure that their Big Data Cluster is always prepared for their needs.<\/p>\n Configuration management is the ability to alter or tune various parts of the Big Data Cluster after deployment and to provide users with clarity into the cluster\u2019s configurations. This allows administrators to configure the Big Data Cluster configurations to meet their workload\u2019s needs. Whether an administrator wants to turn on SQL Agent, define the baseline resources for their organization\u2019s Spark jobs, or even see what settings are configurable at each scope\u2014configuration management is the one-stop solution to meet these needs.<\/p>\n To enable this functionality, we are exposing new commands to the azdata\u00a0 command line interface (CLI). Azdata, an interface to manage a BDC, now includes post-deployment configuration functionality to set, diff, and apply configuration settings. To start, customers can configure settings at the cluster, service, and resource scope and then commit them for change. After applying pending configuration changes, customers can monitor the process through azdata or Azure Data Studio. Once the update is completed, the Big Data Cluster is ready for the next workload.<\/p>\n Learn more and get started with configuration management<\/a>.<\/p>\n Data engineers and data scientists often want to experiment with and use a variety of different libraries and packages as part of their workflows. There are separate ways to do this for each language including importing from Maven, installing from Python Package Index (PyPi) or conda, or installing from Microsoft R Application Network (MRAN). Before today, customers could import jars from Maven or reference custom packages stored in Hadoop Distributed File System (HDFS) through Spark job configurations.<\/p>\n Starting in CU9, data engineers and data scientists now have added flexibility for their PySpark jobs through job-level virtual environments. They can easily configure a conda virtual environment and get to work with their favorite Python libraries.<\/p>\n Learn how to configure a job-level Spark environment<\/a>.<\/p>\n In SQL Server Big Data Clusters CU8, we introduced a comprehensive encryption at rest feature set that focused on system-managed keys. This enabled application-level encryption capabilities to all data stored in the platform, on both SQL Server and HDFS. The HDFS experience provided at that time for administrators was centered on usage of Azure Data Studio Notebooks to control all aspects of the feature. Starting with CU9, in addition to expanding the Notebook experience, we are enabling HDFS encryption zones and HDFS key management through azdata. This enables the automation of encryption at rest administrative tasks for HDFS administrators, a much desirable and consistent feature of the SQL Server Big Data Clusters platform.<\/p>\n To learn more about the new notebooks and the new azdata commands, visit the release notes<\/a>.<\/p>\n Check out the SQL Server CU9 release notes<\/a> for Big Data Clusters to learn more about all of the improvements available with the latest update. For a technical deep-dive on Big Data Clusters, read the documentation and visit our GitHub repository<\/a>.<\/p>\n\n
Configuring SQL Server Big Data Clusters to meet your business needs<\/h2>\n
Spark job library management<\/h2>\n
Improving the experience on encryption at rest<\/h2>\n
Ready to learn more?<\/h2>\n