The first step in any report is to acquire the data that will drive the report visualizations.
This data can come from a variety of sources:
- Delimited Text Files
- Excel Spreadsheets
- PDF Files
- Online Services
The list of connectors is quite extensive and growing every month with regularly scheduled updates.
Clicking on Get Data provides a list of connectors.
For our demonstration, we will be pulling data from two different sources:
- An Excel Workbook named “xlsx” that contains a table named “MasterCustomer”. This table contains information about our customers; like name, contact information, location, etc.
- A comma-delimited text file named “txt” that holds invoice data (i.e., sales transactions.)
Loading the Invoice Data
Starting with the “InvoiceData.txt” file, we click Get Data then browse out to the text file.
We’re presented with a preview of the file’s content so we can verify that we have selected the correct file and that the proper delimiter is being used to parse the data.
We can click Load to load the data directly into Power BI’s Data Model, or you can click Transform Data to open Power Query and perform any needed transformations on the data in preparation for loading to the Data Model.
Even if you don’t think the data requires any transformations, it’s a good idea to load it into Power Query first just to ensure that the data is being interpreted correctly (ex: dates are being read as dates and not text or numbers.)
Power Query can be used to remove unwanted columns, filter unwanted rows, and even enhance the data by creating calculations, such as calculating the Profit from the difference between Sale Price and Cost.
If you have experience using Power Query in Excel, those experiences are transferrable to Power BI. The program is 99.9% the same in both Excel and Power BI.
In the above image, several transformation steps were performed. These steps can be seen listed on the right side of the window.
- SOURCE – The sales data was pulled from the source file.
- PROMOTED HEADERS – The first row in the data was promoted to a Header Row status.
- CHANGED TYPE – All columns were examined and assigned appropriate data types.
Although Power BI is good at performing certain transformations automatically, like detecting data types, it’s a good idea to glance over the results to verify the actions are within your expectations.
Sometimes it’s not that an incorrect data type is assigned, rather a “not as efficient” data type is assigned. Such as a fractional data type is assigned to a column of whole numbers. The fractional data type would work, but if there’s no need to store fractions, a whole number data type will utilize memory more efficiently.
We could also perform operations like combining columns. In our text file, the “Year”, “Month”, and “Day” data is stored in separate columns.
We would like that information to be combined and transformed into a single “Date” column.
Select the columns “Month”, “Day”, and “Year” in that order, then click Transform -> Merge Columns.
We can define a Custom separator as a “/” (forward slash) and name the newly merged column “Date”.
The newly merged column appears as follows:
The data type is not set correctly; see the “ABC” icon in the upper left corner of the column?
We can set this column to a “Date” data type by clicking the “ABC” icon and selecting “Date”.
Notice that all the performed steps are listed in the Applied Steps panel on the right.
If someone updates the source text file, we will be able to click a button and refresh the data. All of the recorded steps will be executed on the new data to update our Data Model.
Loading the Customer Information
The customer information is located in the Excel file named “CustomerMaster.xlsx”.
Bring the new data in by selecting Home -> New Source -> Excel Workbook.
In the preview windows, select the table named “MasterCustomer1” and click OK.
PRO TIP: If given the choice between selecting a sheet and selecting a table, it’s considered a best practice to select the table since it will confine the selection to the data range, and it will detect the header row if present.
A transformation we can apply to the customer data is to split the information in the “CityProvince” column into separate columns: one column for “City” and another column for “Province”.
We can see that there is a “space & open parentheses” between the city and the province.
Select the “CityProvince” column then click Transform -> Split Column – By Delimiter.
Set the delimiter to a “space & open parentheses” and click OK.
We now have the cities and the provinces in separate columns.
One small issue is the leftover closed parentheses at the end of the province names.
We will remove them by replacing all “)” characters with nothing.
Select the “CityProvince.2” column and click Transform -> Replace Values.
In the Replace Values dialog box, enter a closing parenthesis in the “Value to Find” field and leave the “Replace With” field empty.
Now with perfectly separated cities and provinces, rename the columns “City” and “Province” respectively.
Lastly, we’ll remove the unneeded columns from the table by selecting Home -> Choose Columns.
Uncheck the unwanted columns and click OK.
Removing columns that do not contribute to the visualizations or aid in underlying functionality is a great way to reduce the size and complexity of the Data Model while simultaneously improving performance.
As with the invoice data, all the transformation steps for the customer data have been recorded and will execute automatically when the data is refreshed.
Rename the Query “MasterCustomer”.
We are ready to load the results of the Power Query queries into the Data Model.
Click the button labeled “Close & Apply” on the Home ribbon.
After the data is loaded into the Data Model, switch to the Data View and examine the results of the query output.
Expanding the two titles on the right in the Fields panel shows all the loaded columns for each table.
Editing the Data Transformations
If you need to return to the Power Query Editor to modify the original transformations, select the Data View, then right-click the query name and select Edit Query.