The first step is to connect to the data source. For this example, we will connect to the U.S. Energy Information Administration.
The information we need is in a table that is part of the overall webpage.
In the “old days”, we would transfer the information from the website into Excel by highlighting the webpage table and copy/paste the data into Excel.
If you’ve ever done this, you know what a hit-or-miss proposition this can be. It’s the 50/50/90 Rule: if you have a 50/50 chance of winning, you’ll lose 90% of the time.
Even if you were to successfully transfer the information into Excel, the information is not linked to the webpage. When the webpage changes, you will need to recopy/paste (and potentially fix) the updated information.
- Begin by copying the URL from the webpage (assuming you are previewing the page in a browser). If not, you can type it into the next step’s URL prompt.
- Select Data (tab) -> Get & Transform (group) -> From Web.
- In the From Web dialog box, paste the URL into the URL field and click OK.
The Navigator window displays the components of the webpage in the left panel.
If you want to ensure you are on the correct webpage, click the tab labeled Web View to get a preview of the page in a traditional HTML format.
It is unlikely that the listed components on the left will be presented with obvious names as to which item goes to which webpage component. You may need to click from one item to the next, previewing each item in the right-side preview panel, in order to determine which belongs to the desired table.
If the data does not require any further transformations, you can click Load/Load To… to send the data directly to Excel. This will allow you to select the destination of the results data, such as a table on a new or existing worksheet, the Data Model, or create a “Connection Only” to the source data.
- In the Navigator dialog box, select the arrow next to Load and click Load To…
- In the Import Data dialog box, select “Existing worksheet” and point to a cell on your desired destination worksheet (like cell A1 on “Sheet1”).
The result is a table that is connected to a query. The Queries & Connections panel (right) lists all existing queries in this file.
If you hover over a query, an information window will appear giving you the following information:
- a preview of the data
- the number of imported columns
- the last refresh date/time
- how the data was loaded or connected to the Excel file
- the location of the source data
Although the data looks correct, we have some structural issues with the results that will cause problems with further analysis.
The empty cells in the Product column will cause problems when sorting, filtering, charting, or pivoting the data.
We need to make some adjustments to the data.
- Double-click (or right-click and choose Edit) the listed query to activate the Power Query Editor.
We want to fill down the listed products into the lower, empty cells of the Product column.
- In the Power Query Editor, select the Product column and click Transform (tab) -> Any Column (group) -> Fill -> Down.
The reason the product names failed to repeat down through the empty cells is that Power Query did not interpret the cells as empty. There may be some artifact from the webpage that exists in the cell that we can’t see.
We will replace all the “fake empty” cells with null values. This should allow the Fill Down operation to work as expected.
- Select the Changed Type step in the Query Settings panel (right).
- Select the Product column and click Transform (tab) -> Any Column (group) -> Replace Values.
- Tell Power Query that you wish to insert this new step into the existing query by clicking Insert.
- In the Replace Values dialog box, leave the “Value To Find” field empty and type “null” (no quotes) in the “Replace With” field. Click OK when finished.
If you select the previously created “Filled Down” step at the bottom of the Query Settings panel you will see the updated results of the query.
- Update the query name to “Spot Prices”.
- Click the top part of the Close & Load button at the far-left of the Home
If we were to graph the data, and the data were to change, we can refresh our graph by clicking Data (tab) -> Queries & Connections (group) -> Refresh All or right-click on the data and select Refresh.
There are some controllable options available by selecting Data (tab) -> Queries & Connections (group) -> Refresh All -> Connection Properties…
Some of the more popular options include:
- Refresh the data every N number of minutes
- Refresh the data when opening the file
- Opting for participation during a Refresh All operation