{"id":38370,"date":"2020-08-04T16:00:18","date_gmt":"2020-08-04T15:00:18","guid":{"rendered":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/?p=38370"},"modified":"2020-12-18T23:49:40","modified_gmt":"2020-12-18T22:49:40","slug":"power-bi-performance-tuning-workflow-part-2","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-gb\/industry\/blog\/technetuk\/2020\/08\/04\/power-bi-performance-tuning-workflow-part-2\/","title":{"rendered":"Power BI Performance Tuning Workflow – Part 2"},"content":{"rendered":"
<\/p>\n
In Part 1 of the series<\/a>, you learned how to ingest, transform, and load data, and how to develop and optimise the Power BI data model. This article provides guidance that enables developers to write optimised DAX expressions and design reports, as well as enabling administrators to deploy and manage Power BI solutions.<\/p>\n <\/p>\n Let’s take a look at DAX Measures versus DAX with Variables:<\/p>\n DAX Measures<\/strong><\/p>\n DAX with Variables<\/strong><\/p>\n In the second scenario, the measure Total Rows in the first scenario is executed twice, whereas it is executed only once in the variables scenario. Under such scenarios, variables can improve performance significantly.<\/p>\n <\/p>\n It’s recommended that measures return BLANK when a meaningful value cannot be returned, as this design allows Power BI to render reports faster. Also, returning BLANK is efficient because report visuals\u2014by default\u2014eliminate groupings when summarisations are set to BLANK.<\/p>\n <\/p>\n A common scenario is to use HASONEVALUE() to check if there is only one value present in a column after applying slicers and filters, and then use the VALUES(column name) DAX function to get the single value.<\/p>\n SELECTEDVALUE () performs both the above steps internally and gets the value if there is only one distinct value present in that column, or returns blank in case there are multiple values available.<\/p>\n <\/p>\n VALUES () will return an error if it encounters multiple values. Normally, users handle it using error functions which are bad for performance. Instead of using that, SELECTEDVALUE () must be used. It is a better approach and returns blank in case of multiple values being encountered.<\/p>\n <\/p>\n Power BI adds a blank value to the column in case it finds a referential integrity violation. For direct queries, Power BI by default adds a blank value to the columns as it does not have a way to check for violations.<\/p>\n DISTINCT ()<\/strong>: does not return blank when encountering an integrity violation. It returns blank only if it is in part of the original data.<\/p>\n VALU<\/span><\/strong>ES ()<\/strong>: includes blank, as it’s added by Power BI due to referential integrity violations.<\/span><\/p>\n The usage of either of the functions should be the same throughout the whole report. Use VALUES () in the whole report if possible so that blank values are not an issue.<\/p>\n <\/p>\n Format functions are done in a single-threaded formula engine and slows down the calculation for large numbers of string values.<\/p>\n <\/p>\n A virtual relationship is simulated with DAX and column equivalency. FILTER|CONTAINS executes slower than\u00a0TREATAS.<\/p>\n <\/p>\n The built-in function ISBLANK () to check for blank values is faster than using the comparison operator \u201c= Blank ()\u201d<\/p>\n <\/p>\n Use (a-b)\/b with variables instead of (a\/b)-1. The performance is the same in both cases usually, but under edge cases when both a and b are blank values, the former will return blanks and filter out the date whereas the latter will return -1 and increase the query space.<\/p>\n <\/p>\n DIVIDE () function has an extra parameter which is returned in cases where the denominator value is zero. For scenarios that might have a zero in the denominator, it is suggested to use DIVIDE () as it will internally check if the denominator is zero. It also checks for ISBLANK (). However, if there can be a guarantee that the denominator will be non-zero, it is better to use \/ because DIVIDE() will perform an additional if ( denominator <> 0 ) check.<\/p>\n <\/p>\n IFERROR () and ISERROR () are sometimes used in measure. These functions force the engine to perform a step by step execution of the row to check for errors. So wherever possible, replace with the in-built function for error checking.<\/p>\n Example: DIVIDE () and SELECTEDVALUE () perform an error check internally and return expected results.<\/p>\n <\/p>\n When the intention is to count table rows, it is recommended that you always use the COUNTROWS function. It’s more efficient and doesn’t consider BLANKs contained in any column of the table.<\/p>\n <\/p>\n SUMMARIZE is used to provide aggregated results by performing groups of actions on the columns. It is recommended to use the SUMMARIZECOLUMNS () function, which is the optimised version. SUMMARIZE should only be used to get just the grouped elements of a table without any measures\/aggregations associated with it.<\/p>\n <\/p>\n Instead of applying a filter to the whole table, it is better to apply filters to a desired column. Also, use ALL with the FILTER function if there is no specific need to keep the current context. To calculate measures while ignoring all the filters applied on a column, use the All (ColumnName) function along with FILTER instead of Table or VALUES ().<\/p>\n <\/p>\n By default, measure expressions are calculated in an iterative manner. Adding Addcolumns will convert it to a nested loop and further slow down the performance.<\/p>\n <\/p>\n <\/p>\n A good starting point is to build a report with few tiles, with a maximum limit of 20 visuals. Every visual generates at least one query against its data source, so a higher number of visuals can throttle the CPU and network.<\/p>\n Tables\/matrix in reports with thousands of rows, many columns and measures can be complex and slow. Also, so many rows can be overwhelming for users to gain good insights.<\/p>\n Slicers\/Filters<\/strong><\/p>\n <\/p>\n The on-premises data gateway (OPDG) acts as a link to provide secure data transfers between on-premises data and Azure data services. One of the cloud services in Power BI uses OPDG to connect to data sources that are connected to data sources on-premises, or those within a private vnet.<\/p>\n <\/p>\n <\/p>\n The on-premises data gateway has settings controlling the resource usage on the machine where the gateway is installed. By default, gateways automatically scale these values, using resources depending on CPU usage. In scenarios related to poor refresh, there are a few settings that can be considered<\/p>\n Read more on how to adjust gateway performance based on server CPU<\/a> on Microsoft Docs.<\/p>\n <\/p>\n Currently the gateway application utilises resources on the gateway machine based on the pre-configured state. The autoscaling application can now scale to use more or less resources depending on the system CPU.<\/p>\n To allow the gateway to scale based on CPU, this configuration \u201cMashupDisableContainerAutoConfig\u201d would need to be\u00a0set to\u00a0false. When this is done, the following configurations are adjusted based on the gateway CPU.<\/p>\n Read more about this on the Power BI blog.<\/a><\/p>\n <\/p>\n <\/p>\n Power BI gateway refresh and data movement returns a large dataset that is temporarily stored on the gateway machine.\u202f It is recommended to have SSD storage for the spooling layer.<\/p>\n <\/p>\n Is your Power BI report is running slow? Nilabja Ball walks us through what you can tune and optimise in the second part of this two part series.<\/p>\n","protected":false},"author":430,"featured_media":36918,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","footnotes":""},"categories":[594],"post_tag":[519],"content-type":[],"coauthors":[1577],"class_list":["post-38370","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technetuk","tag-technet-uk"],"yoast_head":"\nDAX\u00a0Calculations<\/h2>\n
Variables vs Repeatable Measures<\/h3>\n
<\/p>\n
<\/p>\nHandling Blanks<\/h3>\n
Sales (No Blank<\/span>) =\r\nIF<\/span> (\r\n ISBLANK<\/span>([Sales]),\r\n 0,\r\n [Sales]\r\n)<\/pre>\nSELECTEDVALUE () vs\u00a0HASONEVALUE ()<\/h3>\n
SELECTEDVALUE () vs VALUES ()<\/h3>\n
DISTINCT () vs VALUES ()<\/h3>\n
Avoid FORMAT in measures<\/h3>\n
Optimize Virtual Relationships using TREATAS ()<\/h3>\n
<\/p>\nISBLANK () vs Blank ()<\/h3>\n
Ratio Calculation efficiently<\/h3>\n
DIVIDE () vs \/<\/h3>\n
Avoid IFERROR () and ISERROR ()<\/h3>\n
COUNTROWS () vs COUNT ()<\/h3>\n
SUMMARIZE () Vs\u00a0SUMMARIZECOLUMNS ()<\/h3>\n
FILTER (all (ColumnName)) vs\u00a0FILTER (VALUES ()) vs FILTER(Table)<\/h3>\n
Avoid\u00a0AddColumns () in measure expression<\/h3>\n
Complex Row Level Security<\/h2>\n
\n
<\/p>\nReports and dashboard design<\/h2>\n
Limit Number of Visuals<\/h3>\n
\n
<\/p>\n\n
<\/p>\nSimplify Table or Matrix data<\/h3>\n
\n
\n
<\/p>\n\n
Data Source\/Network Latency<\/h2>\n
<\/p>\nData Gateway\u00a0Configuration<\/h2>\n
\n
Gateway Parallelism<\/h3>\n
\n
AutoScaling Gateway<\/h3>\n
Cluster Load Balancing<\/h3>\n
\n
Use SSD\/Fast Storage<\/h3>\n
Useful Links<\/h2>\n
\n