Data can be messy!
What we mean by “messy” is that when you receive data, whether it be from an email attachment, a download from a website or database, or just another user’s Excel file, the data will likely not be in the format or structure you require for proper use.
Common data issues like separating names, isolating values, standardizing text casing, etc. can occupy much of our time to correct. Often, we spend more time fixing the data than we spend analyzing the data.
Many tools have been developed over the years to help Excel users deal with messy data. In the recent few years, a new tool has emerged that makes almost any data correction issue as simple as a single click to correct.
Welcome to the world of Power Query.
Let’s look at just a few of the almost countless possibilities Power Query has to offer in terms of turning messy data into perfect data.
In the “old days”, correcting messy data came down to writing complex formulas to transform data. Imagine writing the following formula to extract a number from within a set of text.
For many users, this is an impossible task. The user would type the needed data by manually examining the source material. If this constituted hundreds or thousands of records, the time required could be substantial.
On the other hand, if we use the Get and Transform tools (also known as “Power Query”) the task is reduced to just a few clicks… and NO formulas.
The best part of this occurs when we add new data. A single click to tell Power Query to refresh the data will apply all the transformations to the newly added data.
Let’s look at some examples of common data transformation issues.
Upcoming Power Query Course (Summer 2020)
These and many other data transformation skills can be acquired in my upcoming Power Query course.
This course is ideal for those that work with data and want to automate the reconstruction of their data and simplify their ongoing analysis.
Sign up to my Newsletter if you’d like to get notified once my complete Power Query course becomes available in June:
The Data and the Mission
Below is a snapshot of our data which is formatted as an official Excel Table named “TEmployees”.
The transformations we require are:
- Split the Department and Position into 2 separate columns
- Extract the Employee Number from the middle of the Unique Code and format it as E-## (ex: E-70)
- Split the First and Last names from the Full Name into separate columns and correct casing issues
If we were to perform these tasks with the traditional formula-based approach, our formulas would get quite complicated and we would need to be well versed in dozens of different functions and tools.
Transformation #1: Split Text by Delimiter
Our first transformation is to separate the Department and the Position into 2 columns.
Because the Department and Position are separated by “space – forward slash – space” characters, we can leverage these characters as a delimiter to assist in the separation process.
- Click in the table and select Data (tab) -> Get & Transform (group) -> From Table/Range.
- Rename the transformation to “ProperData”.
- Select the Department/Position column and click Home (tab) -> Transform (group) -> Split Column -> By Delimiter.
- In the Split Column by Delimiter dialog box, select Custom and use a forward slash (/) as the delimiter.
- Click OK.
- Once the split has been performed, another “Changed Type” step is added. This is to reassess the newly split data for possible new data types. We could delete this step as it is not creating anything we didn’t already possess.
- Technically, we could have used a “ / “ (space, forward slash, space) as the custom delimiter. This would have allowed us to eliminate the upcoming “Trim” step.
Removing the Leading & Trailing Spaces
If you select to the right of one of the Department names, you will be presented with a preview window at the bottom on the screen.
If you then click to the right of the data in the preview window you will notice that the cursor does not rest directly next to the last letter; there is a trailing space.
We can also discover that there is a leading space before the Position in the second column.
Trimming the Data
To remove the trailing and leading spaces, select the Department column, hold CTRL and select the Position column. Select Transform (tab) -> Text Column (group) -> Format -> Trim.
I prefer using the “space – forward slash – space” approach as it reduces the query by a step.
NOTE: We will deal with the column headings at the end. For now, we can leave them a bit messy.
Transformation #2: Extract Employee ID from Unique Code
Our next step involves the extraction of the Employee ID from within the Unique Code column. Notice that the format is “last name – employee ID – first name”.
Select the Unique Code column and click Transform (tab) -> Text Column (group) -> Split Column -> By Non-Digit to Digit.
This separates the Last Name from the Employee ID/First Name.
Next, select the column with Employee ID and Last Name and click Transform (tab) -> Text Column (group) -> Split Column -> By Digit to Non-Digit.
This separates the Employee ID from the First Name.
Getting rid of the unneeded bits
As we only need the Employee ID, we will remove the newly separated First Name and Last Name columns.
Click the column of Last Names, then press CTRL and click the column of First Names.
Press the Delete key to remove the selected columns.
Formatting the Employee ID
Remember, one of the requirements was to format the Employee IDs with an “E-“ prefix.
Select the column that contains Employee IDs and click Transform (tab) -> Text Column (group) -> Format -> Add Prefix.
In the Prefix dialog box, enter a Value of “E-“ and click OK.
We now have our properly formatted Employee IDs.
Transformation #3: Separate Names and Format Casing
Our final set of transformations is to separate the contents of the Full Name column into a First Name column and Last Name column whilst ignoring any middle name information.
There is a wealth of options sitting withing the Extract feature in Power Query.
We don’t want to transform what we have into a new set of data; we want to leave the original Full Name column while adding additional columns for First Name and Last Name.
For this operation, we will use the version of Extract located on the Add Column ribbon.
Extracting First Names
Select the Full Name column and click Add Column (tab) -> From Text (group) -> Extract -> Text Before Delimiter.
In the Text Before Delimiter dialog box, enter a space in the Delimiter field and click OK.
We are presented with a new column that contains all text up to the first space in the Full Name text.
Extracting Last Names
Select the Full Name column and click Add Column (tab) -> From Text (group) -> Extract -> Text After Delimiter.
In the Text After Delimiter dialog box, enter a space in the Delimiter field. Expand the Advanced Options and set the Scan for the Delimiter option to “From the end of the input” and click OK.
Starting from the end and “looking” backward is necessary because of the existence of middle names in some of our records.
If we began our search for a space from left-to-right, we would stop before a middle name and extract the middle and last names.
We are presented with a new column that contains all text after the last space in the Full Name text.
Format the Names with Proper Casing
The next step is to format the newly added First Name and Last Name columns so that the first letter is upper-case while the remaining letters are lower-case.
Select the First Name and Last Name columns then click Transform (tab) -> Text Column (group) -> Format -> Capitalize Each Word.
We now have our properly formatted names.
- We no longer require the original column of Full Names, so we can select the Full Name column and press Delete.
- Rename the column headings with more meaningful names.
Sending the Results to Excel
To send the transformation results back to Excel as a finished table, select Home (tab) -> Close (group) -> Close & Load (lower part of button) -> Close & Load to…
You can load the results to a table on a new sheet, or an existing sheet.
The results are as follows.
All three requirements have been satisfied and we didn’t have to write a single formula to get the job done.
Updating the Results with New Data
Because Power Query “remembers” the steps you performed on the original data, you can easily repeat those steps when you receive new data.
If we add a few new records to the original data set…
We can click Data (tab) -> Queries & Connections (group) -> Refresh All to absorb the newly added records into our query results table.
Now sit back and bask in the glory that is a fully-automated query.
Feel free to Download the Workbook HERE.
Get the ULTIMATE Excel Power Query Course
Use Power Query Like an EXPERT From the Start
Learn anytime that fits your schedule.
Download files. Practice. Apply.