{"id":25471,"date":"2018-11-13T09:00:29","date_gmt":"2018-11-13T17:00:29","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/?p=25471"},"modified":"2018-11-13T07:51:42","modified_gmt":"2018-11-13T15:51:42","slug":"creating-a-data-hub-for-your-analytics-with-polybase","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2018\/11\/13\/creating-a-data-hub-for-your-analytics-with-polybase\/","title":{"rendered":"Creating a data hub for your analytics with PolyBase"},"content":{"rendered":"

Data is the new currency of the digital world, and Microsoft is uniquely positioned to help businesses and consumers get the most out of their data assets. With the leading database in the world, SQL Server, and the rapidly growing Azure cloud data platform, Microsoft is delivering a modern data platform. We are witnessing a paradigm shift in data management where data in different silos are being brought together to create high-value data sets that are used to drive critical business divisions in industry verticals like retail, banking, healthcare, and more.<\/p>\n

SQL Server 2016 introduced a new feature called PolyBase<\/a><\/strong> that enables your SQL Server instance to process Transact-SQL queries that read data from Hadoop. The same query can also join with relational tables in your SQL Server. SQL Server 2019 CTP 2.0 introduces new connectors for PolyBase including SQL Server, Oracle, Teradata, MongoDB, Azure SQL DB, Azure SQL DW, Cosmos DB, and virtually any ODBC-accessible data source.<\/p>\n

The arrival of the new connectors enables customers to leverage PolyBase for creating a virtual data hub for a wide variety of data sources within the enterprise. It\u2019s very common to find an enterprise scenario where data from an Oracle database needs to be joined with data from another SQL Server instance for serving a business purpose. Starting with SQL Server 2019, you can leverage PolyBase for creating a source-agnostic solution using external tables. This opens multiple possibilities like building a modern data warehousing solution spanning SQL Server, Oracle, and Teradata.<\/p>\n

Let\u2019s consider an example where we have a set of tables in multiple heterogeneous data sources such as Oracle which stores information that needs to be joined with tables in SQL Server. You can create an external table in SQL Server that retrieves data from the Oracle database, making the Oracle data available as a virtual part of the SQL Server database. The screenshot below shows how you can create an External Table<\/strong> using the Create External Table Wizard<\/strong> in Azure Data Studio<\/a><\/strong>.<\/p>\n

\"\"<\/p>\n

The ability to reference an external data source like an Oracle database table in an SQL Server database table opens multiple possibilities:<\/p>\n