Unfortunately, the results of the combining process are not perfect because the data was not presented in proper Excel Data Tables, so certain transformations were made based on assumptions that may not have been correct.
Headers are not correct, there are blank lines (nulls) in the data, file names are listed with the records.
This is where things get a bit technical because we must examine at a deeper level what took place from the time we clicked the sad robot until now.
Examining Applied Steps
If you look at the right side of the Power Query Editor you will see a list of applied steps.
If you click from step to step through the list, you can see “snapshots in time”, or how the data looked after that specific step had completed. It’s almost like a time machine where you can go back in time and visit the data at any point in its processing.
Let’s quickly summarize the steps in our query:
- Connect to the data source (the folder)
- Filter out the files that don’t have “_Data” in their names
- Filter out non-Excel files
Here is where Power Query took over after we clicked the sad robot button.
- Filtered the folder contents to visible files only (those without an active “Hidden” attribute)
- Invoke a custom function that opens the sample file and determines where the data is located and attempt to locate the header rows. This extracts the file contents and stores them as nested tables in a new column.
- Renames the “Name” column to “Source.Name”
- Deletes all columns except “Source.Name” and the nested table’s column
- Expands the contents of the nested tables to create a new table displaying the file’s contents in a single table
- Attempt to assign proper data types to the columns (this step fails miserably due to incorrectly defined headers)
Adding Our Transformations
We know we need to “tweak” the query. The query made a gallant effort, but it fell short in a few key areas.
The question becomes where do we want to add our transformations?
If we click “Transform Sample File” in the queries panel, we can see how Power Query processed the first file.
When we click “Report Files”, we see how all files were processed through the same steps applied to the sample file.
We need to determine if we want to adjust the steps made at the early stages of the query when working with the sample file, or at the later stage when we are the contents of the remaining files in the folder.
Word to the Wise: Many tasks can be performed at most any point during processing, but certain tasks can only be performed at certain points in time. The only way to know for sure when something needs to be performed is to experiment and see what works best for your given scenario. I know that’s not exactly concrete rocket-science level advice, but you’d be surprised how much of Power Query comes down to trial and error. The more you work with Power Query, the easier it becomes to know intuitively what works and when it works best.
Reducing and Renaming Data
Suppose we want the following additional transformations:
- “Source.Name” column to be called “Company”
- Company names should not contain “_Data.xlsx” in their names
Begin by “backing up” in the Applied Steps list to the “Removed Others Columns1” step.
Rename the “Source.Name” column to “Company” (Transform (tab) -> Any Column (group) -> Rename) and click Insert if prompted.
Remove all the unwanted text from the company names by selecting Transform (tab) -> Text Column (group) -> Extract -> Text Before Delimiter. Click Insert if prompted to insert a step.
In the Text Before Delimiter dialog box, enter “_Data” in the Delimiter field.
We could get away with just an underscore character here, but we’re “future-proofing” this, remember?
Once we progress to the next step in the Applied Steps list, we see a more fully-formed “Company” column.
The problem now is that in our last step, was attempting to data type a column named “Source.Name”, but we have prematurely changed it to “Company” without informing it of such an action.
This step of data typing is unnecessary at this point since the column headings have not been handled properly, so we will delete this step and render this error irrelevant. Click the “X” to delete the “Changed Type” step.