How to Prevent the Deletion of Datasets in Power BI

Note - This guide is being published in Oct. 2020. This month Microsoft began rolling out the "New Look" to the Power BI service. We have enabled the new look and the features underlying so any screenshots in our guide are based on the new look. Also to note, we are using SSAS tabular with a data gateway and a live connection from the Power BI service.

Update May 2023 - As of May 2023 the protected dataset is still the best approach to prevent the deletion of datasets (and associated reports) in your Power BI tenant. Store your datasets in a workspace that only admins have access to. Keep all master reports in your GIT repository. Share the protected dataset from the admin workspace to user groups in your organization. Hard to believe this is still an issue, but it is.

Overview

Datasets are what we use to build reports in the Power BI service. A dataset will reference a data source that has been configured in the Power BI service. We are using SSAS Tabular to build models that can be consumed by Power BI, we publish these models and then access them through a data gateway. We create the data sources on the data gateways in the Power BI Service and then link the data sets to the data sources.

All reports you build in the Power BI service are going to reference a dataset. If a single dataset has 100 reports built off of that data and somebody comes in an accidentally deletes that dataset, all the reports that are tied to that dataset are also deleted. Power BI is a self-service product and you want your teams to be able to build their own reports. In order for teams of a Power BI workspace to build their own reports you need to grant those users contributor permissions on their workspace. This permission also allows those users the ability to delete datasets and that is where you can run in to trouble.

For some reason Microsoft has not done a great job with options to protect content in Power BI. There is no concept as a recycle bin in the Power BI service, when it's gone it's gone. The security roles for workspaces are too broad, the main roles are admin, contributor, and viewer (we'll get more in to this in a minute) and most users will be in the contributor role if you want them to create their own reports (we do). The contributor role is over-permissioned unfortunately so we have to take extra steps to protect our content in Power BI.

That is the purpose of this guide, to you walk you through steps on how to protect content in the Power BI service and reduce the risk of a dataset (and all of the content that depends on it) from being deleted while at the same time enabling your teams to use self-service reporting and workspace management.

Dataset Basics

  • Datasets depend on data sources. We can define a data source on a data gateway but to create a dataset, we first need to create a report in the Power Bi desktop. In our environment our Power BI desktop report will use a live connection to an SSAS Tabular database/model. When we publish this report to the Power BI service it will create a dataset. We link the dataset to the data source.

  • When you delete a dataset in a workspace, all reports that are built from that dataset in that same workspace will be deleted along with it. If a report was created in another workspace that referenced the deleted dataset, that report will become orphaned and no longer work.

  • Datasets are based on GUIDs and not names. If you delete a dataset and try to republish it again with the same name, as far as Power BI is concerned that is a new dataset and no existing content can be linked to it. Since datasets are based on GUIDs and not names, it is possible to have more than one dataset with the same name so this could cause some confusion in a workspace.

  • A dataset inherits permissions from the workspace it resides in, but it can also be granted additional permissions beyond the workspace permissions by modifying the dataset permissions (we will explain this below).

Workspace Configuration

In order to protect our Power BI datasets we need to store them in a workspace that only your BI developers have access to. From there we will grant build permissions to users on the datasets we publish in this workspace but they won't have access to delete that dataset. Using this along with version control (DevOps, BitBucket, etc.) is the best way to secure and protect content in your environment.

Note - what we are doing here assumes you and your users are using the Power BI environment in the Professional or Premium capacity. You will need this type of licensing to share content across the organization.

In the Power BI Service, create a new workspace named BI Admin. For the members, start by granting admin rights to your BI Developer admin(s). These are the people who will publish, modify, and maintain all the content in this workspace. In the screen shot below you can see we created the new workspace and permissioned myself as admin to the workspace, but there is no content in the workspace yet. While we are at we are also going to add a single user who only has view permissions in the workspace. This is the user we will delegate build permissions to on the dataset.

PBI_Dataset1.jpg

Publish Dataset

Now we are ready to publish our first dataset to the secure workspace.  Remember, we are using an SSAS tabular model with a live connection, your environment might be different but the securing process is still the same.

Open Power BI desktop and make a connection to your data source, for us that is an SSAS tabular model.

PBI_Dataset2.jpg

When you connect to the SSAS tabular model your will see all the tables in the model populate in the report.

PBI_Dataset3.jpg

On the report canvas there is no need to create a visualization, what you should do instead is to make a note about the report and the dataset, something that tells you what it is and what team it is for. For example, if this dataset is for the "Product Team" I might add something like you see in the screen shot below.

PBI_Dataset4.jpg

When you publish your first report for a particular dataset to the Power BI service it will also publish the dataset that report is connected to. In the case of our report here it is connected to the SSAS tabular model for the Adventure Works database. It's important to keep in mind that the dataset in the Power BI service will have the same name as the report it references. You do not want to rename the dataset after it is published, instead give your template report a name you want reflected in the dataset, something like "Product Support Dataset".

Now we are ready to publish this to the Power BI service. Click on the Publish button on the home tab, select the target workspace, in this case the BI Admin workspace, and click the Select button.

PBI_Dataset5.jpg

You will see the following message when the report and dataset have been published.

PBI_Dataset6.jpg

Now let's open up the Power BI workspace we created earlier, you should now see your report and your dataset.

PBI_Dataset7.jpg

I understand the naming around the report doesn't make much sense, but we don't want to rename the dataset after it is published so we give the report a name that best suits the dataset. After we have published the dataset next we need to link it to a data source. Click on the Manage Gateway option in the settings gear. When you are in the Gateway settings, click Add a Data Source and add the model of the SSAS tabular model your dataset is based on. You will need all the details of the SSAS server you are connecting to, this also assumes you already have the data gateway up and running.

PBI_Dataset8.jpg

Once the data source has been added we can go back to settings of the dataset. Expand the Gateway connection section and for the target database, select the data source you just added in the previous step. As you can see in the image below, we named the target data source AdventureWorks.

PBI_Dataset9.jpg

If you recall earlier, we setup a user on the workspace that has only view permissions. That user is a member of the Product Support team and wants to use the new dataset we just published to build reports from. We want to make sure the user can't delete the underlying dataset thus deleting any reports tied to that dataset. So what we are going to do is give the user build permissions on the dataset directly, this will allow them to build reports off of the dataset but will not let them modify the dataset. If this user was a contributor to the workspace they would be able to delete the dataset, that is what we are trying to prevent.

By default the dataset will inherit permissions from the workspace, but we can grant additional permissions on the dataset. Click on the Manage Permissions option of the dataset.

PBI_Dataset10.jpg

Now we can grant the build permission to our view only user, this will allow that user to select this dataset and build reports from it while working in their own workspace. It's basically sharing the dataset without elevating the users rights to the dataset beyond exactly what they need.

PBI_Dataset11.jpg

If our user Kevin wants to build a report from this new dataset, in Kevin's team workspace he can click New>Report. From there he can view all datasets he is permissioned to and can select the Product Support dataset we just published and granted build permissions on.

PBI_Dataset12.jpg

Now we need to select the Product Support dataset and click create. This will effectively create a linked reference to the dataset allowing the user to create a new report in their workspace.

PBI_Dataset13.jpg

When they save the report they won't even see the dataset in their workspace much less be able to delete the master copy of it. This is how we ensure that our users have permissions to create and modify their own content without running the risk of errantly deleting a dataset that then wipes out reports for the rest of the team.

Also, now when we want to add changes to the underlying data model in SSAS all we have to do is update the model, refresh the report in Power BI desktop, and republish the report back to the Power BI service. This will preserve all connections from reports to the dataset while also giving those reports access to the new assets published to the SSAS model.

Default Schema and Schema Scope in Stored Procedures

When working in SQL server, if you call an object--EXEC, SELECT FROM, etc, you can do so without specifying a schema and it will default to dbo (unless you have assigned a different default schema to that user, though it is more common not to). What happens if you have a procedure in, let's say, a dev schema and you are calling tables from both the dev and dbo schemas inside it? If you are including the schema names in your query <SchemaName.TableName> then everything will work as expected, but what happens if you exclude the schema from a table you are calling in hopes that it will return dbo? That depends on where the table exists, in dev, dbo, or both and what schema the calling object belongs to. 

I wanted to put together a script that explains the behavior of default schema and schema scope in stored procedures. Let's walk through the following script for a more meaningful explanation.

--======================================================
-- Default Schema and Schema Scope in Stored Procedures 
--======================================================

-- Create a dev schema if you don't have one already

CREATE SCHEMA [dev] AUTHORIZATION [dbo];
GO

--DROP TABLE IF EXISTS dbo.ordertest;
--DROP TABLE IF EXISTS dev.ordertest;

-- Create a table in the dbo schema

CREATE TABLE dbo.ordertest (
ID smallint,
Targets varchar(20) );

INSERT INTO dbo.ordertest
SELECT 1, 'dbo';

-- Create the same table, this time in the dev schema

CREATE TABLE dev.ordertest (
ID smallint,
Targets varchar(20) );

INSERT INTO dev.ordertest
SELECT 1, 'dev';
GO

-- Notice we populate the dbo.ordertest with the targets value of dbo and the the dev.ordertest with dev

-- Now we need to create a stored procedure in the dev schema. We create this without specifying the schema in our SELECT table. 

CREATE PROCEDURE dev.TestSchema AS
    SET NOCOUNT ON

        BEGIN
        SELECT * FROM ordertest -- No schema specified here
        END;
GO

-- Now lets do a simple query against ordertest without specifying a schema

SELECT * FROM ordertest

-- The result is dbo, because dbo is the default schema in SQL

-- This time we specify the dev schema and get the value from our dev table

SELECT * FROM dev.ordertest

-- What happenes when we call our stored procedure in dev, which performs a SELECT against the table without a schema?

EXEC dev.TestSchema;

-- Returns dev, because the scope of the stored procedure is the dev schema, so it assumes an undefined schema is the same as the object calling it.

So if we have two tables of the same name in a SQL database that belong to two different schemas, if you call one of those in a SELECT without specifying the schema you will get the table in the dbo schema because that is default schema in SQL. If you write that same SELECT without specifying a schema and drop it in a stored procedure that belongs to the dev schema, when you execute that stored procedure it will return the results from the dev table because dev is the schema scope for that procedure. This can be confusing if you are expecting the result from dbo since that is the behavior you are used to. Now what happens if we delete the table in dev and call the procedure again?

-- What happenes if we delete the dev table and then run the procedure again?

DROP TABLE IF EXISTS dev.ordertest;

EXEC dev.TestSchema;

-- Now it returns dbo.

Now the query returns the result from dbo because the table no longer exists in the dev schema (the scope of the stored procedure) so it once again defaults to dbo. One more reason to always include schema names in all of our queries, that way you never run in to the situation we have illustrated here.


    

Mirrored Database in Disconnected State - Have you Changed the Mirror Endpoint Owner Recently?

Change Mirroring Endpoint Owners and GRANT CONNECT Permissions on the Endpoint

Let's say you have an ex-employee who at some point created a database mirror on one of your SQL servers. That employee leaves the company so you need to remove them from the differnet environments they had rights to. When you go to remove the account you find out they are the owner of a mirroring endpoint. You use ALTER AUTHORIZATION to change the owner of the mirroring endpoint to the SA account. Simple enough, right?

Did you know, when you change the ownership of a mirroring endpoint SQL server drops any existing GRANT CONNECT permissions on that endpoint? The next time you restart your server all mirrored databases would be in a disconnected state and inaccessible unless you issue GRANT CONNECT permissions to the SQL service account for the mirroring endpoint.

Follow this guide to change an endpoint owner and grant permission on the endpoint.

-- Find out who owns the mirroring endpoint

SELECT
   me.name AS EndpointName    
  ,sp.name AS EndpointOwner -- ID of the principal that created and owns this endpoint
  ,me.type_desc AS EndpointDescription
  ,me.endpoint_id AS EndpointID
  ,me.state_desc AS EndpointState
  ,role_desc AS RoleDescription -- Can be none, partner, witness, or all
FROM sys.database_mirroring_endpoints me
INNER JOIN sys.server_principals sp ON me.principal_id = sp.principal_id

Let's assume the owner of the endpoint is a former employee of the company and we would like to remove their account from the server. Before we can remove their account we first need to change ownership of the endpoint they own. Before you do this, we need to look at permissions on the server endpoints.

-- Check permissions on server endpoints

SELECT
   e.type_desc AS EndpointType
  ,e.name AS EndpointName
  ,sp.[state_desc] AS PermissionState
  ,sp.[permission_name] AS PermissionName
  ,SUSER_NAME(sp.grantor_principal_id) AS GrantedBy
  ,SUSER_NAME(sp.grantee_principal_id) AS GrantedTo
FROM sys.server_permissions sp, sys.endpoints e
WHERE sp.major_id = e.endpoint_id
   AND sp.class_desc = 'ENDPOINT'

You should see a list of permissions on all of your endpoints on the server. We are looking for endpoints of the type DATABASE_MIRRORING, with the GRANT CONNECT permission assigned to the SQL Service Account. If you don't see this permission then you probably changed the owner of the mirroring endpoint without granting connect permissions back to the SQL service account. Keep reading.

-- Change endpoint owner to sa

ALTER AUTHORIZATION ON ENDPOINT::Mirroring TO sa; -- Mirroring is the mirroring endpoint name

-- Let's check the owner again

SELECT
   me.name AS EndpointName    
  ,sp.name AS EndpointOwner
  ,me.type_desc AS EndpointDescription
  ,me.endpoint_id AS EndpointID
  ,me.state_desc AS EndpointState
  ,role_desc AS RoleDescription
FROM sys.database_mirroring_endpoints me
INNER JOIN sys.server_principals sp ON me.principal_id = sp.principal_id

-- Now let's check the server permissions on the endpoint

SELECT
   e.type_desc AS EndpointType
  ,e.name AS EndpointName
  ,sp.[state_desc] AS PermissionState
  ,sp.[permission_name] AS PermissionName
  ,SUSER_NAME(sp.grantor_principal_id) AS GrantedBy
  ,SUSER_NAME(sp.grantee_principal_id) AS GrantedTo
FROM sys.server_permissions sp, sys.endpoints e
WHERE sp.major_id = e.endpoint_id
   AND sp.class_desc = 'ENDPOINT'

Do you see the GRANT CONNECT permission to your service account? No? If you restarted your SQL server right now your mirrors would go in to a disconnected state and the mirrored databases would not be able to be accessed until you issued GRANT CONNECT on the mirroring endpoint.

-- GRANT CONNECT on Mirroring endpoint

GRANT CONNECT ON ENDPOINT::Mirroring TO [YourDomain\DomainAccount];

-- Now let's check the server permissions on the endpoint once again

SELECT
   e.type_desc AS EndpointType
  ,e.name AS EndpointName
  ,sp.[state_desc] AS PermissionState
  ,sp.[permission_name] AS PermissionName
  ,SUSER_NAME(sp.grantor_principal_id) AS GrantedBy
  ,SUSER_NAME(sp.grantee_principal_id) AS GrantedTo
FROM sys.server_permissions sp, sys.endpoints e
WHERE sp.major_id = e.endpoint_id
   AND sp.class_desc = 'ENDPOINT'

You should see the GRANT CONNECT permissions now. Follow these steps to ensure a disruption-free change to your Mirroring endpoint owner!

Here are some additional reference you can follow if you need to further troubleshoot your database mirror.

-- STOP and START a mirroring endpoint

ALTER ENDPOINT Mirroring STATE=STOPPED -- Mirroring is the name of the mirror endpoint

ALTER ENDPOINT Mirroring STATE=STARTED -- Mirroring is the name of the mirror endpoint

-- Resume mirroring after entering a suspended state

ALTER DATABASE <DatabaseName> SET PARTNER RESUME

-- Create a new mirroring endpoint

CREATE ENDPOINT Mirroring

STATE = STARTED

AS TCP (LISTENER_PORT = 5022) -- 5022 is the default port

FOR DATABASE_MIRRORING (ROLE=PARTNER);