{"id":37980,"date":"2020-07-31T13:00:28","date_gmt":"2020-07-31T12:00:28","guid":{"rendered":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/?p=37980"},"modified":"2020-12-18T23:49:54","modified_gmt":"2020-12-18T22:49:54","slug":"power-bi-performance-tuning-workflow-part-1","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2020\/07\/31\/power-bi-performance-tuning-workflow-part-1\/","title":{"rendered":"Power BI Performance Tuning Workflow – Part 1"},"content":{"rendered":"
<\/p>\n
This is the first in a two-part blog series covers various tips and tricks on Power BI performance tuning. In this blog, the focus is on best practices related to data preparation and design data models to meet performance requirements.<\/p>\n
Power BI is a <\/span>cloud based analytics platform that provides\u00a0<\/span>self-service<\/span> analytics at enterprise scale, unifying <\/span>data from many sources to create interactive, immersive dashboards and reports that provide actionable insights and drive business results.<\/span>\u00a0<\/span><\/p>\n One of the common issues we hear is that a “<\/span>Power<\/span> BI report is running slow”<\/span>.<\/span>\u00a0<\/span>B<\/span>roadly, the<\/span> areas that can be optimised and tuned <\/span>are as follows:<\/span><\/p>\n \u00a0<\/span>\u00a0<\/span><\/p>\n Import<\/em>\u202f<\/em> mode is the most common mode used to develop models. This mode delivers extremely fast performance thanks to in-memory querying and provides complete DAX support. Because of these strengths, it’s the default mode when creating a new Power BI desktop solution.<\/span>\u00a0<\/span><\/p>\n The Power BI engine offers around 5-10x compression depending on <\/span>the data<\/span> type and values. In a shared <\/span>capacity,<\/span> the import model can store around 1GB of data but can scale using premium capacity that supports larger datasets.\u00a0<\/span>However, you should avoid<\/span> the import model in these situations:<\/span><\/p>\n <\/p>\n This is essentially the opposite to the Import model. Whenever you need real time data for reporting or analytics or you cannot store your data in the Power BI workspace (shared or premium, based on your environment), use Direct Query.<\/span>\u00a0<\/span><\/p>\n If your team or organisation already uses a tabular model, you can continue with AAS or SSAS tabular for live connection scenarios. <\/span>\u00a0<\/span><\/p>\n There are limitations of a Direct Query model, including <\/span>extra network round-trip<\/span>s<\/span>\u00a0to\u00a0<\/span>retrieve the dataset from the source. DAX coverage is also limited and depends on the supportability w.r.t to the various data sources.<\/span>\u00a0<\/span><\/p>\n \u00a0<\/span><\/p>\n \u00a0<\/span><\/p>\n Query caching is a premium feature that provides performance benefits when a dataset is accessed frequently and doesn’t need to be refreshed often.<\/span>\u00a0<\/span><\/p>\n Query caching can also minimise load on premium workspaces because it reuses the existing cached data and overall reduced the number of queries.<\/span><\/p>\n <\/p>\n In Power BI models, there can be intermediate tables used as <\/span>staging layers<\/span> or custom queries that the author hides to prevent access. These hidden tables <\/span>consume<\/span> memory, and one way to improve performance is to disable <\/span>load.<\/span> Don\u2019t get confused with the \u201cHide in report view\u201d option. This only removes from the <\/span>view,<\/span> but the table is still loaded in the model and <\/span>consumes memory<\/span>.\u00a0<\/span>\u00a0<\/span><\/p>\n The Auto date\/time is a data load option in Power BI Desktop. When the date columns are loaded into the model, this feature helps developers to create time-based visuals easily.<\/span><\/p>\n If a data<\/span>\u00a0model has many dat<\/span>e\/<\/span>time fields, this setting can create several internal tables and increase the memory footprint of small models.\u00a0 <\/span>\u00a0<\/span><\/p>\n <\/p>\n GroupKind.Local performs faster than the default setting. When the <\/span>data\u00a0<\/span>is sorted or in continuous fashion\u00a0<\/span>you can speed up your grouping operations considerably.\u202f You can re<\/a><\/span>ad more about this on Microsoft Docs.<\/span><\/p>\n <\/p>\n Star schema<\/span>\u202fis a mature modelling approach widely adopted by relational data warehouses. It requires modellers to classify their model tables as either <\/span>dimension<\/span>\u202for\u202f<\/span>fact<\/span>.<\/span>\u00a0<\/span><\/p>\n A well-structured model design should include tables that are either dimension-type tables or fact-type tables. Avoid mixing the two types of tables into a single table that might have dimension fields or additive measures. We also recommend that you should try to create the right number of tables with the right relationships in place.<\/span><\/p>\n You can read more about Star Schema<\/a> on Microsoft Docs.<\/span><\/p>\n <\/p>\n Integers are a fixed length datatype that use run length encoding, whereas strings use dictionary encoding. Also, if you sort the column data using the integer column, the level of compression will be significant higher. The Power BI segments boundary is 1 million. <\/span>\u00a0<\/span><\/p>\n \u00a0<\/span><\/p>\n If you create models that have higher precision such as numeric or dat<\/span>e\/<\/span>time etc<\/span>, it reduces the compression ratio and increases load times. Wherever possible, find ways to reduce precision without impacting business requirements. <\/span>\u00a0<\/span><\/p>\n For example,<\/span>\u00a0Date has a\u00a0<\/span>precision<\/span> of milliseconds. Use Date only if you need to use Date, Time if you only need to use Time. Also, reduce the precision and round off to values <\/span>wherever<\/span>\u00a0possible.<\/span>\u00a0<\/span><\/p>\n \u00a0<\/span><\/p>\n The source tables or views might have many columns, but in your data model the area of interest might be restricted to a few rows. <\/span>For e<\/span>xample, auditing columns such as Last Modified are not useful for an analytics purpose.\u00a0<\/span>Remove unwanted tables or columns in the model, as this will reduce the model size and improve refresh times. <\/span>\u00a0<\/span><\/p>\n \u00a0<\/span><\/p>\n The bi-directional cross-filtering feature is very powerful, and it allows us to solve complex models with more ease. However, if you have a model which is full of bi-directional filters, any slicing or filtering activity might slow down because of the relationship propagation chain.<\/span>\u00a0<\/span><\/p>\n Also, if not modelled correctly, you might see inconsistent behaviour, particularly in a snowflake architecture.<\/span>\u00a0<\/span><\/p>\n <\/p>\n Not all numeric columns are additive in nature, <\/span>such as<\/span>\u00a0surrogate\u00a0<\/span>columns or<\/span> primary keys. By default, when any numeric column is placed on a visual, Power BI will aggregate that column, increasing the report compute time. It’s recommended that you change the default aggregations to none for such columns. One of the key points is that the fact-type tables always load data at the correct granularity level. If you have a higher granularity, you would miss detailed information. Too low, and it becomes too detailed for reports, and you have a cost-inefficient large model size.<\/span><\/p>\n \u00a0<\/span><\/p>\n Whenever possible, it’s best to develop a model in Import mode. This mode provides the greatest design flexibility and best performance.<\/span>\u00a0<\/span>However, challenges related to large data volumes, or reporting on near real-time data, cannot be solved by Import models. In either of these cases, you can consider a DirectQuery model, providing your data is stored in a single data source that’s\u202f<\/span>supported by DirectQuery mode<\/span><\/a>.<\/span>\u00a0<\/span><\/p>\n Further, you can consider developing a Composite model in the following situations:<\/span><\/p>\n\n
<\/p>\nInappropriate use of\u00a0<\/span>Direct Query and <\/span>Import\u00a0<\/span>\u00a0<\/span><\/h2>\n
<\/span>Direct Query<\/span><\/i>\u202fmode is an alternative to the Import model. The model in this mode consists only of metadata defining the model structure. When the model is queried, queries that are compatible with<\/span> the underlying data sources are fired off in the background to retrieve data.<\/span><\/p>\n
\u00a0<\/span>\u00a0<\/span><\/p>\nConsiderations for Import<\/span><\/h2>\n
\n
Considerations for Live Connection and Direct Query<\/span><\/h2>\n
Data Refresh<\/span><\/h2>\n
\n
Use Query Caching<\/span><\/h2>\n
<\/span><\/p>\nData Loading<\/span><\/h2>\n
Modelling Improvements<\/span><\/h3>\n
<\/p>\nDon\u2019t use Auto Date\/Time <\/span>\u00a0<\/span><\/h3>\n
<\/span><\/p>\nGroupKind.Local<\/span><\/h3>\n
D<\/span>ata models<\/span><\/h2>\n
Star Schema<\/span><\/h3>\n
<\/span><\/p>\nPrefer integers over strings<\/span><\/h3>\n
Avoid high precision\/cardinality columns<\/span><\/h3>\n
Lean Model<\/span><\/h3>\n
Bi-Directional Relationships<\/span><\/h3>\n
<\/span><\/p>\nDefault Aggregations<\/span><\/h3>\n
Composite Model and Aggregations<\/span><\/h3>\n