Data warehousing - Microsoft SQL Server Blog

Announcing the retirement of SQL Server Stretch Database

Debbi Lyons — Wed, 03 Jul 2024 16:00:00 +0000

Ever since Microsoft introduced SQL Server Stretch Database in 2016, our guiding principles for such hybrid data storage solutions have always been affordability, security, and native Azure integration. Customers have indicated that they want to reduce maintenance and storage costs for on-premises data, with options to scale up or down as needed, greater peace of mind from advanced security features such as Always Encrypted and row-level security, and they seek to unlock value from warm and cold data stretched to the cloud using Microsoft Azure analytics services.

During recent years, Azure has undergone significant evolution, marked by groundbreaking innovations like Microsoft Fabric and Azure Data Lake Storage. As we continue this journey, it remains imperative to keep evolving our approach on hybrid data storage, ensuring optimal empowerment for our SQL Server customers in leveraging the best from Azure.

Retirement of SQL Server Stretch Database

On November 16, 2022, the SQL Server Stretch Database feature was deprecated from SQL Server 2022. For in-market versions of SQL Server 2019 and 2017, we had added an improvement that allowed the Stretch Database feature to stretch a table to an Azure SQL Database. Effective July 9, 2024, the supporting Azure service, known as SQL Server Stretch Database edition, is retired. Impacted versions of SQL Server include SQL Server 2022, 2019, 2017, and 2016.

In July 2024, SQL Server Stretch Database will be discontinued for SQL Server 2022, 2019, 2017, and 2016. We understand that retiring an Azure service may impact your current workload and use of Stretch Database. Therefore, we kindly request that you either migrate to Azure or bring their data back from Azure to your on-premises version of SQL Server. Additionally, if you’re exploring alternatives for archiving data to cold and warm storage in the cloud, we’ve introduced significant new capabilities in SQL Server 2022, leveraging its data virtualization suite.

SQL Server in Microsoft Fabric? Use CETaS to move data!

The path forward

SQL Server 2022 supports a concept named CREATE EXTERNAL TABLE AS SELECT (CETaS). It can help customers archive and store cold data to Azure Storage. The data will be stored in an open source file format named Parquet. It operates well with complex data in large volumes. With its performant data compression, it turns out to be one of the most cost-effective data storage solutions. Using OneLake shortcuts, customers then can leverage Microsoft Fabric to realize cloud-scale analytics on archived data.

Our priority is to empower our SQL Server customers with the tools and services that leverage the latest and greatest from Azure. If you need assistance in exploring how Microsoft can best empower your hybrid data archiving needs, please contact us.

New solution FAQs

What’s CETaS?

Creates an external table and then exports, in parallel, the results of a Transact-SQL SELECT statement.

Azure Synapse Analytics and Analytics Platform System support Hadoop or Azure Blob Storage.
SQL Server 2022 (16.x) and later versions support CETaS to create an external table and then export, in parallel, the result of a Transact-SQL SELECT statement to Azure Data Lake Storage Gen2, Azure Storage Account v2, and S3-compatible object storage.

What is Fabric?

Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building. Fabric offers a comprehensive suite of services including Data engineering, Data Factory, Data Science, Real-Time Analytics, Data Warehouse, and Databases.

With Fabric, you don’t need to assemble different services from multiple vendors. Instead, it offers a seamlessly integrated, user-friendly platform that simplifies your analytics requirements. Operating on a software as a service (SaaS) model, Fabric brings simplicity and integration to your solutions.

Fabric integrates separate components into a cohesive stack. Instead of relying on different databases or data warehouses, you can centralize data storage with Microsoft OneLake. AI capabilities are seamlessly embedded within Fabric, eliminating the need for manual integration. With Fabric, you can easily transition your raw data into actionable insights for business users.

What is OneLake shortcuts?

Shortcuts in OneLake allow you to unify your data across domains, clouds, and accounts by creating a single virtual data lake for your entire enterprise. All Fabric experiences and analytical engines can directly connect to your existing data sources such as Azure, Amazon Web Services (AWS), and OneLake through a unified namespace. OneLake manages all permissions and credentials, so you don’t need to separately configure each Fabric workload to connect to each data source. Additionally, you can use shortcuts to eliminate edge copies of data and reduce process latency associated with data copies and staging.

Shortcuts are objects in OneLake that point to other storage locations. The location can be internal or external to OneLake. The location that a shortcut points to is known as the target path of the shortcut. The location where the shortcut appears is known as the shortcut path. Shortcuts appear as folders in OneLake and any workload or service that has access to OneLake can use them. Shortcuts behave like symbolic links. They’re an independent object from the target. If you delete a shortcut, the target remains unaffected. If you move, rename, or delete a target path, the shortcut can break.

Learn more

Microsoft Fabric

Bring your data into the era of AI

Explore solutions

The post Announcing the retirement of SQL Server Stretch Database appeared first on Microsoft SQL Server Blog.

PASS Data Community Summit keynote: a bridge to a new universe

SQL Server Team — Mon, 08 Nov 2021 18:00:40 +0000

It is almost time for PASS Data Community Summit 2021, a free online conference for the Microsoft data platform professional. The conference, hosted by Redgate, will include the latest SQL Server and Azure data innovations, practical training, and networking to empower you to transform your career and your organization. This year’s event is coming to you online for free from November 8 – 12, 2021, and we will continue the tradition of a Microsoft day one keynote.

Deliver faster performance than ever before with SQL Server and Azure

Hear directly from Microsoft’s Rohan Kumar and senior Microsoft engineering leaders during the day one kick-off keynote as they take you on a journey to a new universe shaped by our past—and built to take us into a limitless future. The cloud has created a whole new universe and advancements in Microsoft data products and services are your bridge.

You’ll see how you can use your existing SQL Server and Azure skills, and learn about new tools and platforms available from Microsoft to deliver faster performance than ever before. You’ll see how to shape your data so you can harness its power to find a new galaxy of insights, answers, and predictions. And you will hear about new innovations that continue Microsoft’s rich heritage of data integrity and governance.

Additionally, in the special on-demand keynote, Microsoft Azure Data CTO Raghu Ramakrishnan and team will share a technical keynote and demos showing Azure Purview and SQL.

Register for the PASS Data Community Summit

Don’t miss this opportunity to see how Microsoft is uniquely positioned to provide you with an end-to-end data platform seamlessly integrating limitless database scale and performance, unmatched analytics and intelligence, and unified data governance.

After the keynotes, ground your learning with in-depth training in one of more than two dozen sessions Microsoft will be delivering. Hear the latest from the Engineering teams who develop the tools you use every day. After your sessions, don’t forget to visit the virtual exhibit hall where you can connect with our team across SQL Server 2022, Azure SQL, Azure Synapse Analytics, Microsoft Power BI, Azure Arc, and more.

The post PASS Data Community Summit keynote: a bridge to a new universe appeared first on Microsoft SQL Server Blog.

New in Azure Synapse Analytics: CICD for SQL Analytics using SQL Server Data Tools

Kevin Ngo — Thu, 07 Nov 2019 17:00:08 +0000

At Microsoft Ignite 2019, we announced Azure Synapse Analytics, a major evolution of Azure SQL Data Warehouse. The same industry leading data warehouse now provides a whole new level of performance, scale, and analytics capabilities. One of these capabilities is SQL Analytics, which provides a rich set of enterprise data warehousing features.

Today we are announcing the general availability of the highest requested feature for SQL Analytics in Azure Synapse, SQL Server Data Tools (SSDT) database projects. This release includes support for SQL Server Data Tools with Visual Studio 2019 along with native platform integration with Azure DevOps providing built-in continuous integration and deployment (CI/CD) capabilities for enterprise level deployments. This announcement also comes with support for the Schema Compare extension in Azure Data Studio for SQL Analytics. You can now expect a frictionless development and deployment experience on any platform for your analytics solution.

Since announcing preview support for SQL Server Data Tools (SSDT), customers have been able to use popular SQL Server Data Tools features such as Schema Compare, build, and publish for local development of their data warehouse. Although this has helped customers accelerate project development, an automated build, test, and deployment infrastructure is still critical for continuous integration and deployment (CI/CD) scenarios. Without the native integration with Azure DevOps, customers were still forced to manually write PowerShell and TSQL scripts integrated with Azure DevOps for an automated release process.

With SQL Server Data Tools generally available and native Azure DevOps support, you can now set up stable release pipelines without any custom code, and changes to your data warehouse model can be safely and automatically promoted across development, testing, and production environments. Preview customers such as T-Mobile will now be able to accelerate their feature development with Azure Synapse.

“In our current environment, we would have needed hundreds of custom scripts to validate and promote changes across our test and production environments. We’re excited to now simply use SSDT, MSBuild, and the Publish task in Azure DevOps to deploy and release features to production on a consistent and faster cadence.” – Anthony Sabol, Director, Reporting & Analytics at T-Mobile.

Integrate with Microsoft Azure Repos for continuous integration

Data engineers and developers can easily integrate their SQL Server Data Tools database projects with Microsoft Azure Repos.

Configure continuous deployment using Microsoft Azure Pipelines

Changes committed to source control in Azure Repos can automatically be pre-validated using MSBuild and promoted to target environments using Microsoft Azure Pipelines and the built-in SQL Analytics deployment task extension.

Cross platform support for Schema Compare with Azure Data Studio

Azure Data Studio is a cross-platform database tool that now allows you to compare the schema between two data warehouse definitions.

Next steps

Install Visual Studio 2019 for SQL Analytics to get started
Set up source control integration with SQL Analytics by using Azure Repos
Integrate a simple Azure Pipeline for continuous integration and deployment for SQL analytics capabilities within Azure Synapse
Learn more about Azure Data Studio SQL Compare
For feature requests and providing us feedback, please vote on our UserVoice

The post New in Azure Synapse Analytics: CICD for SQL Analytics using SQL Server Data Tools appeared first on Microsoft SQL Server Blog.

SQL Server 2017 achieves top TPC benchmarks for OLTP and DW on Linux and Windows

Victoria Nwobodo — Thu, 16 May 2019 18:00:25 +0000

Once again, SQL Server 2017 has led the pack with three new TPC benchmarks published in April 2019, ranking SQL Server 2017 as the fastest database for online transaction processing (OLTP) and data warehouse (DW) workloads on Windows and Linux.

Together with our partners, SQL Server continues to innovate with high-performing, enterprise-ready solutions that deliver unparalleled price performance. We’re pleased to share the results for the TPC-H (the industry standard for benchmarking data warehouse) performance at 3TB with HPE that currently holds the record as the industry’s fastest 2-socket non-clustered solution, and 10TB with Cisco, as well as the TPC-E (the industry standard for benchmarking online transaction processing) performance at 10TB with Lenovo. Check out the results below.

System

Performance

Database Operating System

HPE Proliant DL380 Gen10

1,244,450 QphH@3000GB

Learn more about price performance

SQL Server 2017 Enterprise Edition

SUSE Linux Enterprise Server 15

Cisco UCS C480 M5 Server

1,651,514 QphH@10000GB

Learn more about price performance

SQL Server 2017 Enterprise Edition

Red Hat Enterprise Linux 7.6

The table above shows the TPC-H data warehousing results published on SQL Server 2017 with configuration (size). The table below shows the TPC-E OLTP results published on SQL Server 2017 with throughput and price/performance ratio.

System

Throughput performance

Price/tpsE

Database Operating System

Lenovo ThinkSystem SR650

7,012.53 tpsE

Learn more about performance price/tpsE

SQL Server 2017 Enterprise Edition

Windows Server 2016 Standard Edition

These new results for TPC-H at 3TB scale prove the superior combination of SQL Server 2017 Enterprise Edition, SUSE Linux Enterprise Server, and HPE ProLiant DL380 Gen10. The benchmark showcases the secure, resilient, and versatile 2P/2U HPE ProLiant DL380 Gen10 as an industry-leading server for multi-workload compute, ideal for structured data analysis such as decision support for large databases.

In addition, a new record for the 10TB non-clustered TPC-H benchmark has been accomplished with a winning combination of SQL Server 2017, Red Hat Linux 7.6, and Cisco UCS C480 M5 Server. Cisco’s 4-socket Cascade Lake system surpassed the previous #1 10TB non-clustered result by 11.6 percent in performance along with a cost reduction.

The TPC-E results showcase SQL Server 2017 blazing the trail together with our strategic partner Lenovo. Lenovo’s ThinkSystem SR650 on a 2-socket Cascade Lake system was the fastest 2-socket system when combined with SQL Server 2017 and Windows Server 2016. It surpassed the previous 2-socket Sky Lake publication by 3 percent, and was also the best overall price performance leader.

Learn more about SQL Server 2017.

All claims in this post were made as of 5/15/2019.

Learn more

SQL Server 2019 is now in public preview.
Learn more about big data clusters in the SQL Server 2019 big data clusters white paper and sign up to try out the new features for big data, analytics, and AI in SQL Server 2019 Big Data Clusters by signing up for the SQL Server 2019 Early Adoption Program.

The post SQL Server 2017 achieves top TPC benchmarks for OLTP and DW on Linux and Windows appeared first on Microsoft SQL Server Blog.

SQL Server 2019 preview combines SQL Server and Apache Spark to create a unified data platform

Asad Khan — Mon, 24 Sep 2018 13:00:37 +0000

Today at Ignite, Microsoft announced the preview of SQL Server 2019. For 25 years, SQL Server has helped enterprises manage all facets of their relational data. In recent releases, SQL Server has gone beyond querying relational data by unifying graph and relational data and bringing machine learning to where the data is with R and Python model training and scoring. As the volume and variety of data increases, customers need to easily integrate and analyze data across all types of data.

Now, for the first time ever, SQL Server 2019 creates a unified data platform with Apache Spark^TM and Hadoop Distributed File System (HDFS) packaged together with SQL Server as a single, integrated solution. Through the ability to create big data clusters, SQL Server 2019 delivers an incredible expansion of database management capabilities, further redefining SQL Server beyond a traditional relational database. And as with every release, SQL Server 2019 continues to push the boundaries of security, availability, and performance for every workload with Intelligent Query Processing, data compliance tools and support for persistent memory. With SQL Server 2019, you can take on any data project, from traditional SQL Server workloads like OLTP, Data Warehousing and BI, to AI and advanced analytics over big data.

SQL Server provides a true hybrid platform, with a consistent SQL Server surface area from your data center to public cloud—making it easy to run in the location of your choice. Because SQL Server 2019 big data clusters are deployed as containers on Kubernetes with a built-in management service, customers can get a consistent management and deployment experience on a variety of supported platforms on-premises and in the cloud: OpenShift or Kubernetes on premises, Azure Kubernetes Service (AKS), Azure Stack (on AKS) and OpenShift on Azure. With Azure Hybrid Benefit license portability, you can choose to run SQL Server workloads on-premises or in Azure, at a fraction of the cost of any other cloud provider.

SQL Server – Insights over all your data

SQL Server continues to embrace open source, from SQL Server 2017 support for Linux and containers to SQL Server 2019 now embracing Spark and HDFS to bring you a unified data platform. With SQL Server 2019, all the components needed to perform analytics over your data are built into a managed cluster, which is easy to deploy and it can scale as per your business needs. HDFS, Spark, Knox, Ranger, Livy, all come packaged together with SQL Server and are quickly and easily deployed as Linux containers on Kubernetes. SQL Server simplifies the management of all your enterprise data by removing any barriers that currently exist between structured and unstructured data.

Here’s how we make it easy for you to break down barriers to realized insights across all your data, providing one view of your data across the organization:

Simplify big data analytics for SQL Server users. SQL Server 2019 makes it easier to manage big data environments. It comes with everything you need to create a data lake, including HDFS and Spark provided by Microsoft and analytics tools, all deeply integrated with SQL Server and fully supported by Microsoft. Now, you can run apps, analytics, and AI over structured and unstructured data – using familiar T-SQL queries or people familiar with Spark can use Python, R, Scala, or Java to run Spark jobs for data preparation or analytics – all in the same, integrated cluster.
Give developers, data analysts, and data engineers a single source for all your data – structured and unstructured – using their favorite tools. With SQL Server 2019, data scientists can easily analyze data in SQL Server and HDFS through Spark jobs. Analysts can run advanced analytics over big data using SQL Server Machine Learning Services: train over large datasets in Hadoop and operationalize in SQL Server. Data scientists can use a brand new notebook experience running on the Jupyter notebooks engine in a new extension of Azure Data Studio to interactively perform advanced analysis of data and easily share the analysis with their colleagues.
Break down data silos and deliver one view across all of your data using data virtualization. Starting in SQL Server 2016, PolyBase has enabled you to run a T-SQL query inside SQL Server to pull data from your data lake and return it in a structured format—all without moving or copying the data. Now in SQL Server 2019, we’re expanding that concept of data virtualization to additional data sources, including Oracle, Teradata, MongoDB, PostgreSQL, and others. Using the new PolyBase, you can break down data silos and easily combine data from many sources using virtualization to avoid the time, effort, security risks and duplicate data created by data movement and replication. New elastically scalable “data pools” and “compute pools” make querying virtualized data lighting fast by caching data and distributing query execution across many instances of SQL Server.

“From its inception, the Sloan Digital Sky Survey database has run on SQL Server, and SQL Server also stores object catalogs from large cosmological simulations. We are delighted with the promise of SQL Server 2019 big data clusters, which will allow us to enhance our databases to include all our big data sets. The distributed nature of SQL Server 2019 allows us to expand our efforts to new types of simulations and to the next generation of astronomical surveys with datasets up to 10PB or more, well beyond the limits of our current database solutions.”- Dr. Gerard Lemson, Institute for Data Intensive Engineering and Science, Johns Hopkins University.

Enhanced performance, security, and availability

The SQL Server 2019 relational engine will deliver new and enhanced features in the areas of mission-critical performance, security and compliance, and database availability, as well as additional features for developers, SQL Server on Linux and containers, and general engine enhancements.

Industry-leading performance – The Intelligent Database

The Intelligent Query Processing family of features builds on hands-free performance tuning features of Adaptive Query Processing in SQL Server 2017 including Row mode memory grant feedback, approximate COUNT DISTINCT, Batch mode on rowstore, and table variable deferred compilation.
Persistent memory support is improved in this release with a new, optimized I/O path available for interacting with persistent memory storage.
The Lightweight query profiling infrastructure is now enabled by default to provide per query operator statistics anytime and anywhere you need it.

Advanced security – Confidential Computing

Always Encrypted with secure enclaves extends the client-side encryption technology introduced in SQL Server 2016. Secure enclaves protect sensitive data in a hardware or software-created enclave inside the database, securing it from malware and privileged users while enabling advanced operations on encrypted data.
SQL Data Discovery and Classification is now built into the SQL Server engine with new metadata and auditing support to help with GDPR and other compliance needs.
Certification Management is now easier using SQL Server Configuration Manager.

Mission-critical availability – High uptime

Always On Availability Groups have been enhanced to include automatic redirection of connections to the primary based on read/write intent.
High availability configurations for SQL Server running in containers can be enabled with Always On Availability Groups using Kubernetes.
Resumable online indexes now support create operations and include database scoped defaults.

Developer experience

Enhancements to SQL Graph include match support with T-SQL MERGE and edge constraints.
New UTF-8 support gives customers the ability to reduce SQL Server’s storage footprint for character data.
The new Java language extension will allow you to call a pre-compiled Java program and securely execute Java code on the same server with SQL Server. This reduces the need to move data and improves application performance by bringing your workloads closer to your data.
Machine Learning Services has several enhancements including Windows Failover cluster support, partitioned models, and support for SQL Server on Linux.

Platform of choice

Additional capabilities for SQL Server on Linux include distributed transactions, replication, Polybase, Machine Learning Services, memory notifications, and OpenLDAP support.
Containers have new enhancements including use of the new Microsoft Container Registry with support for RedHat Enterprise Linux images and Always On Availability Groups for Kubernetes.
You can read more about what’s new in SQL Server 2019 in our documentation.

SQL Server 2019 support in Azure Data Studio

Expanded support for more data workloads in SQL Server requires expanded tooling. As Microsoft has worked with users of its data platform we have seen the coming together of previously disparate personas: database administrators, data scientists, data developers, data analysts, and new roles still being defined. These users increasingly want to use the same tools to work together, seamlessly, across on-premises and cloud, using relational and unstructured data, working with OLTP, ETL, analytics, and streaming workloads.

Azure Data Studio offers a modern editor experience with lightning fast IntelliSense, code snippets, source control integration, and an integrated terminal. It is engineered with the data platform user in mind, with built-in charting of query result sets, an integrated notebook, and customizable dashboards. Azure Data Studio currently offers built-in support for SQL Server on-premises and Azure SQL Database, along with preview support for Azure SQL Managed Instance and Azure SQL Data Warehouse.

Azure Data Studio is today shipping a new SQL Server 2019 Preview Extension to add support for select SQL Server 2019 features. The extension offers connectivity and tooling for SQL Server big data clusters, including a preview of the first ever notebook experience in the SQL Server toolset, and a new PolyBase Create External Table wizard that makes accessing data from remote SQL Server and Oracle instances easy and fast.

Getting started

Find additional resources and get started today by visiting the links below:

Preview SQL Server 2019 for Windows, Linux, or Docker.
Sign up for the Early Adoption Program to get advice and support for your project from SQL Server engineering, or to try SQL Server 2019 big data clusters.
Download Azure Data Studio to get started with new SQL Server big data features like data virtualizations and notebooks.

The post SQL Server 2019 preview combines SQL Server and Apache Spark to create a unified data platform appeared first on Microsoft SQL Server Blog.

SQL Server 2008 and 2008 R2 are reaching end of support. What’s next?

SQL Server Team — Fri, 06 Jul 2018 17:00:52 +0000

SQL Server 2008 and 2008 R2 have had a tremendous run. But all good things come to an end, right? On July 9, 2019, Microsoft will end Extended Support, which means no more updates or support of any kind, potentially leaving you vulnerable to security and compliance issues.

The good news is, you still have plenty of time and options to avoid any heartburn caused by the technology “circle of life.” And we’ll lay out all of those options for you in a webinar on July 12.

In this webinar, we’ll show you:

how to migrate your applications and data to make the transition more than just an upgrade
how to avoid business disruptions and adopt the most current security technologies
the advantages of moving your legacy platforms to SQL Server 2017 and Azure
the range of guidance and resources available to help, no matter which path you choose

We want this transition to be as seamless and pain-free as possible for you. Register now and find out all the ways we’ve come up with to make that happen.

The post SQL Server 2008 and 2008 R2 are reaching end of support. What’s next? appeared first on Microsoft SQL Server Blog.

Gartner names Microsoft a leader in the Magic Quadrant for Data Management Solutions for Analytics (DMSA)

SQL Server Team — Tue, 07 Mar 2017 17:47:28 +0000

This post was authored by Rohan Kumar, General Manager, DS SQL Engineering.

We’re excited that Gartner has recognized Microsoft as a leader in the Magic Quadrant for Data Management Solutions for Analytics (DMSA). Gartner defines the DMSA as a system for storing, accessing, processing, and delivering data intended for one of the primary use cases that support analytics.[1] These use cases include supporting ongoing traditional, operational, logical, and context-independent data warehousing.[2] The DMSA thus represents an evolution from the traditional data warehousing approach. Although data warehousing continues to be a major use case, the DMSA encompasses new trends such as data lakes and context-independent data warehouses that enable data science uses cases.[3]

At Microsoft we have been championing a similar evolution to make big data processing and analytics simpler and more accessible to transform data into intelligent action. We do this through SQL Server 2016 and the Cortana Intelligence Suite to offer a comprehensive portfolio of solutions to do data warehousing, big data, and advanced analytics solutions. SQL Server as an example gives organizations a breadth of capabilities to do analytics on-premises, including features like real-time operational analytics, in-memory columnstore, integration with Hadoop via PolyBase, in-database analytics with R Services built in, and fast time to market with reference architectures. In the cloud, we offer Cortana Intelligence Suite, which has SQL Data Warehouse, a truly elastic, scale-out data warehouse; HDInsight, a managed Hadoop service that runs Hortonworks Data Platform; Data Lake Analytics, an on-demand analytics job service to power intelligent action; Data Lake Store, a no-limits data lake to power intelligent action; and DocumentDB, a global scale-out NoSQL service with <10ms guarantees.

In providing customers with these solutions, our goal is to help them realize the full potential of their data and give them the ability to transform their business. As an example, Tangerine uses a data warehouse on-premises with Hadoop in the cloud so that it can query relational and nonrelational data to accelerate its time to insights. With such a solution, Tangerine is looking to transform the financial services industry by building a predictive, context-aware application that gives it information based on the time and where the customer is.

We’re excited that Gartner recognized both our ability to execute and the completeness of vision by placing Microsoft in the leader’s quadrant of the Magic Quadrant for Data Management Solutions for Analytics. Gartner notes that leaders have been able to adapt rapidly to this changing market and have pursued all the primary use cases Gartner identified to support analytics.[4] The push for cloud has also affected the relative ratings among the leaders.[5] In the coming year, we will continue to focus on delivering the highest value to our customers and partners through innovations that make data warehousing, big data, and analytics even more accessible to transform data into intelligent action. You can read the full report, “Magic Quadrant for Data Management Solutions for Analytics,” here.

To learn more:

[1] Source: Gartner’s Magic Quadrant for Data Management Solutions for Analytics, 2017.
[2] Source: Gartner’s Magic Quadrant for Data Management Solutions for Analytics, 2017.
[3] Source: Gartner’s Magic Quadrant for Data Management Solutions for Analytics, 2017.
[4] Source: Gartner’s Magic Quadrant for Data Management Solutions for Analytics, 2017.
[5] Source: Gartner’s Magic Quadrant for Data Management Solutions for Analytics, 2017.

The post Gartner names Microsoft a leader in the Magic Quadrant for Data Management Solutions for Analytics (DMSA) appeared first on Microsoft SQL Server Blog.

Windows Server 2016 and SQL Server 2016: Leveraging Hyper-V large-scale VM performance for in-memory transaction processing

SQL Server Team — Mon, 03 Oct 2016 17:00:23 +0000

This post was authored by Liang Yang, Principal Performance Engineer on the Hyper-V team and Jos de Bruijn, Senior Program Manager on the SQL Server team.

With Windows Server 2016, Microsoft has significantly bumped up the Hyper-V Virtual Machine (VM) scale limit to embrace new scenarios such as running e-commerce large in-memory databases for Online Transaction Processing (OLTP) and Data Warehousing (DW) purposes. In this post on the Windows Server Blog, we highlight the performance of in-memory transaction processing at scale using SQL Server 2016 running in a Windows Server 2016 Hyper-V VM.

The post Windows Server 2016 and SQL Server 2016: Leveraging Hyper-V large-scale VM performance for in-memory transaction processing appeared first on Microsoft SQL Server Blog.

Eight scenarios with Apache Spark on Azure that will transform any business

SQL Server Team — Mon, 29 Aug 2016 15:00:23 +0000

This post was authored by Rimma Nehme, Technical Assistant, Data Group.

Since its birth in 2009, and the time it was open sourced in 2010, Apache Spark has grown to become one of the largest open source communities in big data with over 400 organizations from 100 companies contributing to it. Spark stands out for its ability to process large volumes of data 100x faster, because data is persisted in-memory. Azure cloud makes Apache Spark incredibly easy and cost effective to deploy with no hardware to buy, no software to configure, with a full notebook experience to author compelling narratives, and integration with partner business intelligence tools. In this blog post, I am going to review of some of the truly game-changing usage scenarios with Apache Spark on Azure that companies can employ in their context.

Scenario #1: Streaming data, IoT and real-time analytics

Apache Spark’s key use case is its ability to process streaming data. With so much data being processed on a daily basis, it has become essential for companies to be able to stream and analyze it all in real time. Spark Streaming has the capability to handle this type of workload exceptionally well. As shown in the image below, a user can create an Azure Event Hub (or an Azure IoT Hub) to ingest rapidly arriving data into the cloud; both Event and IoT Hubs can intake millions of events and sensor updates per second that can then be processed in real-time by Spark.

Businesses can use this scenario today for:

Streaming ETL: In traditional ETL (extract, transform, load) scenarios, the tools are used for batch processing, and data must be first read in its entirety, converted to a database compatible format, and then written to the target database. With Streaming ETL, data is continually cleaned and aggregated before it is pushed into data stores or for further analysis.
Data enrichment: Streaming capability can be used to enrich live data by combining it with static or ‘stationary’ data, thus allowing businesses to conduct more complete real-time data analysis. Online advertisers use data enrichment to combine historical customer data with live customer behavior data and deliver more personalized and targeted ads in real-time and in the context of what customers are doing. Since advertising is so time-sensitive, companies have to move fast if they want to capture mindshare. Spark on Azure is one way to help achieve that.
Trigger event detection: Spark Streaming can allow companies to detect and respond quickly to rare or unusual behaviors (“trigger events”) that could indicate a potentially serious problem within the system. For instance, financial institutions can use triggers to detect fraudulent transactions and stop fraud in its tracks. Hospitals can also use triggers to detect potentially dangerous health changes while monitoring patient vital signs and sending automatic alerts to the right caregivers who can then take immediate and appropriate action.
Complex session analysis: Using Spark Streaming, businesses can use events relating to live sessions, such as user activity after logging into a website or application, can be grouped together and quickly analyzed. Session information can also be used to continuously update machine learning models. Companies can then use this functionality to gain immediate insights as to how users are engaging on their site and provide more real-time personalized experiences.

Scenario #2: Visual data exploration and interactive analysis

Using Spark SQL running against data stored in Azure, companies can use BI tools such as Power BI, PowerApps, Flow, SAP Lumira, QlikView and Tableau to analyze and visualize their big data. Spark’s interactive analytics capability is fast enough to perform exploratory queries without sampling. By combining Spark with visualization tools, complex data sets can be processed and visualized interactively. These easy-to-use interfaces then allow even non-technical users to visually explore data, create models and share results. Because wider audience can analyze big data without preconceived notions, companies can test new ideas and visualize important findings in their data earlier than ever before. Companies can identify new trends and new relationships that were not apparent before and quickly drill down into them, ask new questions and find ways to innovate in new and smarter ways.

This scenario is even more powerful when interactive data discovery is combined with predictive analytics (more on this later in this blog). Based on relationships and trends identified during discovery, companies can use logistic regression or decision tree techniques to predict the probability of certain events in the future (e.g., customer churn probability). Companies can then take specific, targeted actions to control or avert certain events.

Scenario #3: Spark with NoSQL (HBase and Azure DocumentDB)

This scenario provides scalable and reliable Spark access to NoSQL data stored either in HBase or our blazing fast, planet-scale Azure DocumentDB, through “native” data access APIs. Apache HBase is an open-source NoSQL database that is built on Hadoop and modeled after Google BigTable. DocumentDB is a true schema-free managed NoSQL database service running in Azure designed for modern mobile, web, gaming, and IoT scenarios. DocumentDB ensures 99% of your reads are served under 10 milliseconds and 99% of your writes are served under 15 milliseconds. It also provides schema flexibility, and the ability to easily scale a database up and down on demand.

The Spark with NoSQL scenario enables ad-hoc, interactive queries on big data. NoSQL can be used for capturing data that is collected incrementally from various sources across the globe. This includes social analytics, time series, game or application telemetry, retail catalogs, up-to-date trends and counters, and audit log systems. Spark can then be used for running advanced analytics algorithms at scale on top of the data coming from NoSQL.

Companies can employ this scenario in online shopping recommendations, spam classifiers for real time communication applications, predictive analytics for personalization, and fraud detection models for mobile applications that need to make instant decisions to accept or reject a payment. I would also include in this category a broad group of applications that are really “next-gen” data warehousing, where large amounts of data needs to be processed inexpensively and then served in an interactive form to many users globally. Finally, internet of things scenarios fit in here as well, with the obvious difference that the data represents the actions of machines instead of people.

Scenario #4: Spark with Data Lake

Spark on Azure can be configured to use Azure Data Lake Store (ADLS) as an additional storage. ADLS is an enterprise-class, hyper-scale repository for big data analytic workloads. Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts in an enterprise environment to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. Because ADLS is a file system compatible with Hadoop Distributed File System (HDFS), it makes it very easy to combine it with Spark for running computations at scale using pre-existing Spark queries.

The data lake scenario arose because new types of data needed to be captured and exploited by companies, while still preserving all of the enterprise-level requirements like security, availability, compliance, failover, etc. Spark with data lake scenario enables a truly scalable advanced analytics on healthcare data, financial data, business-sensitive data, geo-location coordinates, clickstream data, server log, social media, machine and sensor data. If companies want an easy way of building data pipelines, have unparalleled performance, insure their data quality, manage access control, perform change data capture (CDC) processing, get enterprise-level security seamlessly and have world-class management and debugging tools, this is the scenario they need to implement.

Scenario #5: Spark with SQL Data Warehouse

While there is still a lot of confusion, Spark and big data analytics is not a replacement for traditional data warehousing. Instead, Spark on Azure can complement and enhance a company’s data warehousing efforts by modernizing the company’s approaches to analytics. A data warehouse can be viewed as an ‘information archive’ that supports business intelligence (BI) users and reporting tools for mission-critical functions of company. My definition of mission-critical is any system that supports revenue generation or cost control. If such a system fails, companies would have to manually perform these tasks to prevent loss of revenue or increased cost. Big data analytics systems like Spark help augment such systems by running more sophisticated computations, smarter analytics and delivering deeper insights using larger and more diverse datasets.

Azure SQL Data Warehouse (SQLDW) is a cloud-based, scale-out database capable of processing massive volumes of data, both relational and non-relational. Built on our massively parallel processing (MPP) architecture, SQLDW combines the power of the SQL Server relational database with Azure cloud scale-out capabilities. You can increase, decrease, pause, or resume a data warehouse in seconds with SQLDW. Furthermore, you save costs by scaling out CPU when you need it and cutting back usage during non-peak times. SQLDW is the manifestation of elastic future of data warehousing in the cloud.

Some of the use cases of Spark with SQLDW scenario may include: using data warehouse to get a better understanding of its customers across product groups, then using Spark for predictive analytics on top of that data. Running advanced analytics using Spark on top of the enterprise data warehouse containing sales, marketing, store management, point of sale, customer loyalty, and supply chain data, then run advanced analytics using Spark to drive more informed business decisions at the corporate, regional, and store levels. Using Spark with the data warehousing data, companies can literally do anything from risk modeling, to parallel processing of large graphs, to advanced analytics, text processing – all on top of their elastic data warehouse.

Scenario #6: Machine Learning using R Server, MLlib

Another and probably one of the most prominent Spark use cases in Azure is machine learning. By storing datasets in-memory during a job, Spark has great performance for iterative queries common in machine learning workloads. Common machine learning tasks that can be run with Spark in Azure include (but are not limited to) classification, regression, clustering, topic modeling, singular value decomposition (SVD) and principal component analysis (PCA) and hypothesis testing and calculating sample statistics.

Typically, if you want to train a statistical model on very large amounts of data, you need three things:

Storage platform capable of holding all of the training data
Computational platform capable of efficiently performing the heavy-duty mathematical computations required
Statistical computing language with algorithms that can take advantage of the storage and computation power

Microsoft R Server, running on HDInsight with Apache Spark provides all three things above. Microsoft R Server runs within HDInsight Hadoop nodes running on Microsoft Azure. Better yet, the big-data-capable algorithms of ScaleR takes advantage of the in-memory architecture of Spark, dramatically reducing the time needed to train models on large data. With multi-threaded math libraries and transparent parallelization in R Server, customers can handle up to 1000x more data and up to 50x faster speeds than open source R. And if your data grows or you just need more power, you can dynamically add nodes to the Spark cluster using the Azure portal. Spark in Azure also includes MLlib for a variety of scalable machine learning algorithms, or you can use your own libraries. Some of the common applications of machine learning scenario with Spark on Azure are listed in a table below.

Vertical

Sales and Marketing

Finance and Risk

Customer and Channel

Operations and Workforce

Retail

Demand forecasting

Loyalty programs

Cross-sell and upsell

Customer acquisition

Fraud detection

Pricing strategy

Personalization

Lifetime customer value

Product segmentation

Store location demographics

Supply chain management

Inventory management

Financial Services

Customer churn

Loyalty programs

Cross-sell and upsell

Customer acquisition

Fraud detection

Risk and compliance

Loan defaults

Personalization

Lifetime customer value

Call center optimization

Pay for performance

Healthcare

Marketing mix optimization

Patient acquisition

Fraud detection

Bill collection

Population health

Patient demographics

Operational efficiency

Pay for performance

Manufacturing

Demand forecasting

Marketing mix optimization

Pricing strategy

Perf risk management

Supply chain optimization

Personalization

Remote monitoring

Predictive maintenance

Asset management

Examples with just a few lines of code that you can try out right now:

Predict food inspection results using Machine Learning + Spark
Predict building temperature using Machine Learning + Spark

Scenario #7: Putting it all together in a notebook experience

For data scientists, we provide out-of-the-box integration with Jupyter (iPython), the most popular open source notebook in the world. Unlike other managed Spark offerings that might require you to install your own notebooks, we worked with the Jupyter OSS community to enhance the kernel to allow Spark execution through a REST endpoint.

We co-led “Project Livy” with Cloudera and other organizations to create an open source Apache licensed REST web service that makes Spark a more robust back-end for running interactive notebooks. As a result, Jupyter notebooks are now accessible within HDInsight out-of-the-box. In this scenario, we can use all of the services in Azure mentioned above with Spark with a full notebook experience to author compelling narratives and create data science collaborative spaces. Jupyter is a multi-lingual REPL on steroids. Jupyter notebook provides a collection of tools for scientific computing using powerful interactive shells that combine code execution with the creation of a live computational document. These notebook files can contain arbitrary text, mathematical formulas, input code, results, graphics, videos and any other kind of media that a modern web browser is capable of displaying. So, whether you’re absolutely new to R or Python or SQL or do some serious parallel/technical computing, the Jupyter Notebook in Azure is a great choice.

You can also use Zeppelin notebooks on Spark clusters in Azure to run Spark jobs. Zeppelin notebook for HDInsight Spark cluster is an offering just to showcase how to use Zeppelin in an Azure HDInsight Spark environment. If you want to use notebooks to work with HDInsight Spark, I recommend that you use Jupyter notebooks. To make development on Spark easier, we support IntelliJ Spark Tooling which introduces native authoring support for Scala and Java, local testing, remote debugging, and the ability to submit Spark applications to the Azure cloud.

Scenario #8: Using Excel with Spark

As a final example, I wanted to describe the ability to connect Excel to Spark cluster running in Azure using the Microsoft Open Database Connectivity (ODBC) Spark Driver. Download it here.

Excel is one of the most popular clients for data analytics on Microsoft platforms. In Excel, our primary BI tools such as PowerPivot, data-modeling tools, Power View, and other data-visualization tools are built right into the software, no additional downloads required. This enables users of all levels to do self-service BI using the familiar interface of Excel. Through a Spark Add-in for Excel users can easily analyze massive amounts of structured or unstructured data with a very familiar tool.

Conclusion

Above, I’ve described some of the amazing, game-changing scenarios for real-time big data processing with Spark on Azure. Any company across the globe, from a huge enterprise to a small startup can take their business to the next level with these scenarios and solutions. The question is, what are you waiting for?

The post Eight scenarios with Apache Spark on Azure that will transform any business appeared first on Microsoft SQL Server Blog.

The elastic future of data warehousing

SQL Server Team — Tue, 12 Jul 2016 13:00:02 +0000

This post was authored by Joseph Sirosh, Corporate Vice President, Data Group.

Announcing the general availability of Azure SQL Data Warehouse, an elastic, parallel, columnar data warehouse as a service.

A defining characteristic of cloud computing is elasticity – the ability to rapidly provision and release resources to match what a workload requires – so that a user pays no more and no less than what they need to for the task at hand. Such just-in-time provisioning can save customers enormous amounts of money when their workloads are intermittent and heavily spiked. And in the modern enterprise, there are few workloads that have a desperate need for such elastic capabilities as data warehousing. Traditionally built on-premises with very expensive hardware and software, most enterprise Data Warehouse (DW) systems have very low utilization except during peak periods of data loading, transformation and report generation.

With the general availability of the Azure SQL Data Warehouse, we are delivering the true promise of cloud elasticity to data warehousing. It is a fully managed DW as a Service that you can provision in minutes and scale up to 60 times larger in seconds. With a few clicks in the Azure Portal, you can launch a data warehouse, and start analyzing or querying data at the scale of hundreds of terabytes. Our architecture separates compute and storage so that you can independently scale them, and use just the right amount of each at any given time. A very unique pause feature allows you to suspend compute in seconds and resume when needed while your data remains intact in Azure storage. And SQL Data Warehouse offers an availability SLA of 99.9% – the only public cloud data warehouse service that offers an availability SLA to customers.

According to Gartner, “For years, many data warehousing vendors have been operating from a playbook of tightly balanced storage and compute conﬁguration units. Cloud architectures are forcing a shift in this approach, with vendors starting to decouple storage and compute, and allowing them to independently scale. We believe this approach to be the correct one, and that other vendors in the space will need to adopt this methodology if they are to stay competitive.”¹

Azure SQL Data Warehouse uses an elastic massively parallel processing (MPP) architecture built on top of the industry-leading SQL Server 2016 database engine. It allows you to interactively query and analyze data using the broad set of existing SQL-based tools and business intelligence applications that you use today. It uses column stores for high performance analytics and storage compression, a rich collection of aggregation capabilities of SQL Server, and state of the art query optimization capabilities. With built-in capabilities such as Polybase, it allows you to query Hadoop systems directly, enabling a single SQL-based query surface for all your data.

Azure SQL Data Warehouse is also part of the Cortana Intelligence Suite, which is a fully managed big data and advanced analytics suite to transform your data into intelligent action. It easily integrates with components of the suite such as Azure Data Factory for data integration pipelines, with Azure Machine Learning for predictive analytics, Power BI for business intelligence, HDInsight for big data insights, R and Spark for big data analytics. For an example of such integration, see the airline industry sample on PowerBI.com. This shows a Power BI report based on a real world predictive maintenance solution for a major airline. The data for this report comes from a variety of sources including IoT streams from aircraft engines, air traffic control information, route restrictions and fuel usage data. All this is integrated and landed into a Azure SQL DW and processed with Azure Machine Learning to detect operational anomalies and trends. The report is “live” and you can interact with it and experience Power BI in conjunction with Azure SQL DW and Azure ML.

The distinct capabilities of Azure SQL Data Warehouse include:

Data warehousing as a service

Gone are the pains associated with administering, managing, patching and manual tuning of data warehouses. There are no knobs to turn, no physical or virtual infrastructure to manage and the service is simple, resilient and secure with reliable storage. This enables the focus on driving the analytics and getting the value from your data rather than on managing your data warehousing software and hardware; Azure SQL Data Warehouse handles it all for you.

Unmatched security and access control

With malicious and even insider attacks becoming a key concern for enterprises, an alarm system over your critical enterprise data is a must have to avoid serious damage to your business and reputation. Only Azure SQL Data Warehouse delivers auditing and threat detection built into the service, with advanced machine learning to detect abnormal query patterns and alert you of potential security issues before it is too late. Data at rest is protected by Transparent Data Encryption.

Additionally, SQL Data Warehouse is the only cloud data warehouse service that works seamlessly with Azure Active Directory which currently supports 1.3 billion daily authentications across 600 million user accounts. This enables Single Sign-On (SSO) and role-based access control. You can even have finer-granularity permissions that let you control which operations a user can do on individual columns, tables, views, procedures, and other objects in the database. These features further protect data by ensuring just the right users have access to the right data—a critical capability when centralizing vast amounts of proprietary and sensitive data for analytics in an enterprise.

Multidimensional elasticity

Currently the majority of cloud database and data warehouse services are provisioned with fixed storage and compute resources. Resizing of resources typically compromises availability and/or performance. This means that service users typically end up with over-provisioned and expensive underutilized resources to accommodate possible peak demand or in the worst case, under-provisioned resources unable to handle sudden work overloads.

Unlike existing cloud services which can take anywhere from a couple of hours to a couple of days to do the data warehouse resizing, SQL Data Warehouse’s unique elastic technology decouples storage and compute, enabling each layer to become independently scalable almost instantaneously. This makes it possible to provision one or more data warehouses in minutes, and then independently scale users, data, and workloads in seconds to optimally match the demand. Further, elastic scaling also makes it possible to simultaneously load and query data, because every user and workload can have exactly the resources needed, without contention, and with minimal impact to production queries.

“Getting featured in the iOS App Store was a big deal for a small company like ours as our users increased from 3,000 to 300,000 in 48 hours. To keep up with this 100x increase in workload, we simply added data warehouse compute capacity by moving a slider and our services just scaled in minutes—we didn’t miss an insight,” notes Paul Ohanian, CTO, PoundSand.

Save as you go, with fast pause and resume

Starting and shutting data warehouse clusters may take a considerable amount of time. Leaving the data warehouse running continuously incurs potentially high and unnecessary costs, especially if you are running your jobs periodically and the data warehouse is sitting idle in-between for extended periods of time. Now you can pause your data warehouse for the required time, saving compute costs, and quickly resume it later when needed. You can even write a PowerShell script, then automate the schedule with Azure Automation to automatically pause or resume the cluster based on the specific needs of your business.

“When we learned about the pause and resume capabilities of SQL Data Warehouse and integrated services like Azure Machine Learning and Data Factory, we switched from AWS Redshift, migrating over 7TB of uncompressed data over a week for the simple reasons of saving money and enabling a more straight-forward implementation for advanced analytics. To meet our business intelligence requirements, we load data once or twice a month and then build reports for our customers. Not having the data warehouse service running all the time is key for our business and our bottom line,” said Bill Sabo, managing director of information technology at Integral Analytics.

Seamless querying of structured and unstructured data

An increasing amount of data in today’s rapidly digitizing world is unstructured data such as clickstreams, sensor data, location data, customer support emails and chat transcripts, much of which is harnessed for analysis in big data systems. The ability to integrate and join such data with your core relational enterprise data is often a highly desired capability. With built-in PolyBase technology, SQL Data Warehouse allows you to access and combine both non-relational and relational data. You can run queries on external data in Hadoop or Azure blob storage using familiar SQL, often without making any changes to the existing queries. Underneath, the queries are optimized for optimal execution without any burden on the user for tuning. Furthermore, you can quickly import and export data back and forth between relational tables in SQL Data Warehouse and non-relational data in Hadoop or Azure Blob Storage using simple T-SQL statements. The rich SQL programmability support (stored procs, functions and PolyBase) empower users to query the data however they want.

Christoph Leinemann, senior director data engineering at Jet.com says, “with Azure SQL Data Warehouse, we use PolyBase to ingest data from HDInsight then run thousands of analytical queries per day over tens of billions of records—about 20TB of data. This enables us to monitor price history and market dynamics to adjust pricing and ensure we’re offering our customers the best price.”

Integration with the SQL Server tool ecosystem you already use and love

Azure SQL Data Warehouse already fits into the tool ecosystem you already use, with native JBDC and ODBC connectors, and with a broad set of independent software vendors and partners who already support SQL Server, such as Alteryx, Attunity, Informatica, Redgate and SnapLogic. For BI capabilities, it integrates with the industry-leading Power BI service in Azure, and even with Microsoft Excel. For a beautifully visualized walkthrough of Microsoft Power BI and SQL Server 2016 Reporting Services including Mobile BI, please watch this demo. Microsoft also works with a set of popular BI partners to ensure the tools your teams use work great with SQL Data Warehouse, including Looker Data Sciences, Tableau Software and Qlik Technologies.

Experience modern data warehousing in the cloud for yourself

Today we have thousands of customers who are already using Azure SQL Data Warehouse. Many of these customers are experiencing significant performance gains over existing multi-million dollar data warehouses on-premises. With SQL Data Warehouse, some multi-hour queries in our customer environments finish now in under an hour, and some queries that took five to ten minutes now complete in seconds. Get started with SQL Data Warehouse today and experience the speed, scale, elasticity, security and ease of use of a true modern data warehouse as a service for yourself.

– Joseph

¹Source: Gartner, The Data Warehouse and DMSA Market: Current and Future States, 201, June 16, 2016.

The post The elastic future of data warehousing appeared first on Microsoft SQL Server Blog.