Announcing the 2nd Preview of the Datasets Deployment pbi-tools release (RC.2). This preview adds support for XMLA refresh tracing, enhanced parameters handling, and the ability to convert "thick" reports into dataset and "thin" report during deployment.

Dowload the Preview

New Features

Enhancements

  • #153 CreateOrOverwrite now default mode for Report/PBIX deployments

For import deployments (thin reports and PBIX files), the default import mode is now CreateOrOverwrite which simplifies the .pbixproj.json deployment manifest.

Before:

      "mode": "Report",
      "source": {
        "type": "Folder",
        "path": "./*"
      },
      "authentication": { ... },
      "options": {
        "import": {
          "nameConflict": "CreateOrOverwrite"
        }
      },
      "environments": { ... }

After:

      "mode": "Report",
      "source": {
        "type": "Folder",
        "path": "./*"
      },
      "authentication": { ... },
      "environments": { ... }

In this case, the entire options section can be omitted.

For reference, any of the valid values list here canbe specified instead of the default CreateOrOverwrite.

Bugfixes

  • #109 pbi-tools info no longer fails when another instance of SSAS rund on the same machine
  • #111 Model-only import fails due to logging
  • #127 Folder or File sources containing spaces aren’t matched (Desktop edition only)
  • #102 Cross-platfom conform resolution of default TEMP path

Refresh Tracing

Do you know what exactly is going on during your dataset refreshes and where most time is spent? SSAS is providing very rich and granular trace events as part of any refresh operation, and those are available to Power BI Premium users via the XMLA endpoint.

However, Power BI Service exposes none of that data, and there is limited tooling available. The tool known to most people today is SQL Server Profiler which ships with SSMS. With SQL Server Profiler, one manually defines a trace session, connects to an XMLA endpoint, and effectively hits Start and Stop to record events for offline analysis. This is what the Events Selection page looks like:

SQL Profiler Events Selection

Whilst this approach is very powerful, it is also quite involved - a user with sufficient privileges explicitly “runs a trace session”. This is likely done only as a one-off or in order to troubleshoot production issues.

pbi-tools now helps to operationalize that same process and make it part of each Production refresh. Refresh tracing can be enabled as part of a pbi-tools deploy refresh. With the feature enabled, trace event logs can be emitted to the console (and hence become part of the CI/CD pipline logs). Furthermore, refresh summary metrics can be generated and written to a CSV file for external processing.

Analyzing refresh behavior and timings alongside using VertiPaq Analyzer should be the two key tools deployed by anyone maintaining a production Power BI dataset. This release of pbi-tools makes accessing the former significantly easier and available to a wider audience.

Setup

As all deployment related settings, Refresh Tracing is configured in a .pbiproj.json file, inside the options/refresh/tracing section of a deployment manifest. See an example file here with an explanation of the various new settings following below:

tracing.enabled Use to turn the Refresh Tracing feature on or off. Default: False
tracing.logEvents Controls live output of trace events to the console (hence, CI/CD log).
tracing.logEvents.filter Filters the events logged to the console. An array of expressions using the format {EventClass}|{EventSubclass}|{ObjectType}. Wildcards are supported. See below for further details.
tracing.summary Controls the generation of summary stats by monitoring ProgressReportEnd events and collecting their IntegerData, Duration, and CpuTime metrics.
tracing.summary.events A string array specifying the EventSubclass names to collect summary stats for. A full list of available values is available here.
tracing.summary.objectTypes A string array specifying the object types to collect summary stats for. Values must match the object names listed here.
tracing.summary.outpath Generates a CSV file containing summary stats at the specified (relative) file path. Skips if no path is provided. (Do ensure the location is writable!)
tracing.summary.console Determines whether or not to print summary stats to the console (in addition to or instead of the CSV file output).

The LogEvents Filter

pbi-tools Refresh Tracing currently subscribes to all four Progress Reports Event Categories:

  • ProgressReportBegin
  • ProgressReportCurrent
  • ProgressReportError
  • ProgressReportEnd

Each Progress Report Begin event begins with a stream of progress events and is terminated with a Progress Report End event. The stream may contain any number of Progress Report Current and Progress Report Error events. (Quoted from the above page.)

In order to limit emitted events to a reasonable selection, each incoming event will have a key generated that concatenates the Event Class, Subclass, and Object type. For instance: ProgressReportBegin|ReadData|Partition.

The filter expressions provided at tracing.logEvents.filter are used to determine whether or not to log an event. Note that those filters do NOT impact summary stats (those have their independent selection mechanism).

Wildcards (*, ?) are allowed, hence the above example would be matched by any of those filter expressions:

  • *|ReadData|Partition
  • *|Partition
  • *
  • ProgressReportBegin|*|*
  • ProgressReport*|ReadData|Partition

This is an example log output generated for the Contoso Sales Model with the settings file above:

|00:00:01.4394826| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Partition</pii>' of table '<pii>Time Intelligence</pii>'." Rows:null (Partition: Partition)
|00:00:01.4416719| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Partition</pii>' of table '<pii>Metric</pii>'." Rows:null (Partition: Partition)
|00:00:01.4419043| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Info-f44d74ec-0372-4982-8ace-63656474683c</pii>' of table '<pii>Info</pii>'." Rows:null (Partition: Info-f44d74ec-0372-4982-8ace-63656474683c)
|00:00:01.4421741| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Product</pii>' of table '<pii>Product</pii>'." Rows:null (Partition: Product)
|00:00:01.4423584| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Date</pii>' of table '<pii>Date</pii>'." Rows:null (Partition: Date)
|00:00:01.4426052| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Promotion</pii>' of table '<pii>Promotion</pii>'." Rows:null (Partition: Promotion)
|00:00:01.4429962| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Customer</pii>' of table '<pii>Customer</pii>'." Rows:null (Partition: Customer)
|00:00:01.4431927| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Time Intelligence</pii>' table started." Rows:null (Partition: Partition)
|00:00:01.4433958| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Time Intelligence</pii>' table." Rows:4 (Partition: Partition)
|00:00:01.4454190| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Partition</pii>' of table '<pii>Time Intelligence</pii>'." Rows:0 (Partition: Partition)
|00:00:01.4460480| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Sales</pii>' of table '<pii>Sales</pii>'." Rows:null (Partition: Sales)
|00:00:01.4525113| [ProgressReportBegin|TabularRefresh] "Started processing partition '<pii>Store</pii>' of table '<pii>Store</pii>'." Rows:null (Partition: Store)
|00:00:01.4527182| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Metric</pii>' table started." Rows:null (Partition: Partition)
|00:00:01.4528967| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Metric</pii>' table." Rows:4 (Partition: Partition)
|00:00:01.7086326| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Partition</pii>' of table '<pii>Metric</pii>'." Rows:0 (Partition: Partition)
|00:00:03.4487734| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Product</pii>' table started." Rows:null (Partition: Product)
|00:00:03.4491366| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Product</pii>' table." Rows:2517 (Partition: Product)
|00:00:03.4492869| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Info</pii>' table started." Rows:null (Partition: Info-f44d74ec-0372-4982-8ace-63656474683c)
|00:00:03.4495241| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Info</pii>' table." Rows:7 (Partition: Info-f44d74ec-0372-4982-8ace-63656474683c)
|00:00:03.5966709| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Info-f44d74ec-0372-4982-8ace-63656474683c</pii>' of table '<pii>Info</pii>'." Rows:0 (Partition: Info-f44d74ec-0372-4982-8ace-63656474683c)
|00:00:03.5968676| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Product</pii>' of table '<pii>Product</pii>'." Rows:0 (Partition: Product)
|00:00:03.5980443| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Store</pii>' table started." Rows:null (Partition: Store)
|00:00:03.5988739| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Date</pii>' table started." Rows:null (Partition: Date)
|00:00:03.5996038| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Store</pii>' table." Rows:306 (Partition: Store)
|00:00:03.7410000| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Store</pii>' of table '<pii>Store</pii>'." Rows:0 (Partition: Store)
|00:00:03.7412612| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Promotion</pii>' table started." Rows:null (Partition: Promotion)
|00:00:03.7414874| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Promotion</pii>' table." Rows:28 (Partition: Promotion)
|00:00:03.7422141| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Promotion</pii>' of table '<pii>Promotion</pii>'." Rows:0 (Partition: Promotion)
|00:00:03.7423440| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Customer</pii>' table started." Rows:null (Partition: Customer)
|00:00:03.7425389| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Date</pii>' table." Rows:2556 (Partition: Date)
|00:00:03.8863450| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Date</pii>' of table '<pii>Date</pii>'." Rows:0 (Partition: Date)
|00:00:03.8866039| [ProgressReportBegin|ReadData] "Reading data for the '<pii>Sales</pii>' table started." Rows:null (Partition: Sales)
|00:00:03.8888622| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Customer</pii>' table." Rows:10000 (Partition: Customer)
|00:00:03.8890133| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:10000 (Partition: Sales)
|00:00:03.8891499| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Customer</pii>' table." Rows:18863 (Partition: Customer)
|00:00:04.0294533| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Customer</pii>' of table '<pii>Customer</pii>'." Rows:0 (Partition: Customer)
|00:00:04.0305165| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:20000 (Partition: Sales)
|00:00:04.0313302| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:30000 (Partition: Sales)
|00:00:04.0315326| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:40000 (Partition: Sales)
|00:00:04.0316618| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:50000 (Partition: Sales)
|00:00:05.8142209| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:60000 (Partition: Sales)
|00:00:05.8144364| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:70000 (Partition: Sales)
|00:00:05.8145502| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:80000 (Partition: Sales)
|00:00:05.8146863| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:90000 (Partition: Sales)
|00:00:05.8148028| [ProgressReportCurrent|ReadData] "Reading data for the '<pii>Sales</pii>' table." Rows:100000 (Partition: Sales)
|00:00:05.8149232| [ProgressReportEnd|ReadData] "Finished reading data for the '<pii>Sales</pii>' table." Rows:100231 (Partition: Sales)
|00:00:05.8160732| [ProgressReportEnd|TabularRefresh] "Finished processing partition '<pii>Sales</pii>' of table '<pii>Sales</pii>'." Rows:0 (Partition: Sales)

Running pbi-tools deploy locally first allows to find the appropriate tracing settings. A recommended configuration can be found in this file.

Analysis

The summary CSV file is particularly useful for further analysis, potentially even trend reporting.

Summary CSV Excel

Pivoting that dataset by Table/Partition and Event allows for a convenient visualization of the various metrics. Note that the timings of each sub type for any given object are not additive since some of those processes run in parallel. The TabularRefresh subclass provides the overall timing for a given object’s refresh.

Refresh Summary Pivot

External Tracing Resources

Deploying “thick” reports

This release adds support for “thick” report deployments - Power BI Desktop projects containing both a dataset and an associated report. This works as an extension to the Dataset deployment mode. Unlike the tradional publishing of thick reports from Power BI Desktop, the pbi-tools approach converts the report into a thin report and deploys the dataset and the related report as separate artifacts. Which has the added benefit that the report can be downloaded from Power BI Service as a thin report, with no danger of the entire dataset being embedded in the exported file.

The feature is enabled using a single setting inside options/dataset and only works if the deployment source is PbixProj folder:

      "options": {
        "dataset": {
          "deployEmbeddedReport": true
        }
      }

A live connection file is generated automatically as part of the deployment and is injected into the PBIX file compiled from sources.

By default, the report name and destination workspace match those of the dataset. However, either setting can be customized for each environment:

        "UAT": {
          "workspace": "Datasets [PROD]",
          "displayName": " [UAT]",
          "refresh": true,
          "report": {
            "skip": false,
            "workspace": "Name-or-ID",
            "displayName": "Report Name.pbix"
          }
        }

Parameter replacement is supported, and report deployment can be disabled in selected environments ("skip": true).

PowerQuery Parameter Enhancements

Deployment manifest parameters previously always had to have string values. When used for dataset PQ expression replacement, that limited the application to “text” parameters, and also prevented the assignment of the M null value (as a language keyword it must not be enclosed in quotes).

This release extends the manifest schema to allow non-string parameters. A parameters object could now look like this:

{
    "Number": 1,
    "Number2": 0.4,
    "Null": null,
    "String": "foo",
    "Bool": true,
    "Date": "#date(2022, 6, 1)",
    "Duration": "#duration(5, 0, 0, 0)"
}

Numeric and Boolean parameters are supported, as well as the null value.

Any parameter values starting with # are considered a literal M expression and are injected verbatim, without any quotation characters, for instance #duration(5, 0, 0, 0).

Furthermore, this release enables correct escaping of literal double-quote characters inside text values. For example, the value abc"DEF will be converted to M as "abc""DEF". Previously, those cases would have emitted invalid M code.

An alternative complex parameter format is also supported, allowing for additional metadata attributes to be added to parameters in a future release. In this case, the parameter payload must be a json object with a value property (the property name is case insensitive) containing the parameter value. For example:

{
    "Number": { "value": 1},
    "Number2": { "Value": 0.4 },
    "Null": { "VALUE": null },
    "String": { "value": "foo" },
    "Bool": { "value": true },
    "Date": { "Value": "#date(2022, 6, 1)" },
    "Duration": { "value": "#duration(5, 0, 0, 0)" }
}

Note that the deployment process does NOT perform type validation. For instance, if a PQ parameter is declared as a number in the model, however, a bool value is assigned to it via the deployment manifest, it will most likely fail the data refresh. The dataset/manifest author needs to ensure that value types are compatible.

Another enhancement relating to deployment parameters is the ability to reference system parameters within explicit manifest parameters (in addition to environment variables). The standard parameter syntax applies: {{PARAM_NAME}}.

System parameters currently supported are:

ENVIRONMENT Contains the effective deployment environment name, for instance “Development”.
PBITOOLS_VERSION Contains the pbi-tools version number, for instance “1.0.0-rc.2+20220703”.
PBIXPROJ_FOLDER Folder sources only: Contains the source folder name, for instance “Contoso Sales”.
FILE_NAME File sources only: Contains the source file name, for instance “Contoso Sales.bim”.
FILE_NAME_WITHOUT_EXT Always available: Either the source folder name or the source file name without the extension, whichever applies.

The example .pbixproj.json file listed above demonstrates some of the new parameters features.

What’s Next

This preview is part of the RC.2 release which will complete the Datasets Deploy Milestone.

The overall objective is to allow service principals to own Power BI datasets throughout their entire lifetime, i.e. allow for fully automated dataset creation, updates, credentials setting, refreshing, and monitoring.
Done
#97 Create Dataset
#97 Update Dataset Schema from BIM/PbixProj sources
#97 Update and freeze dataset parameters during deployment
#97 Refresh Dataset
#147 Gather and report refresh statistics
To Do
#36 Update Dataset schema from Tabular Editor sources
#146 Set parameters specific to environment
#129 Refresh individual tables/partitions
#140 Set dataset users
#135 Bind dataset to gateway
#139 Set Power BI refresh schedule
#141 Export model dependencies from thin report
Set (cloud) credentials (excl OAuth)
Apply refresh policy
Take Ownership of Dataset
Export dataset sources from XMLA endpoint