Merging multiple files with Power Query

A common problem in businesses is having several versions of the same spreadsheet being used by multiple different people making it difficult to combine them for any centralised reporting. It can be done using VB, but not particularly easily, but then Power Query came along and made this a quite simple step-by-step process.

 

There are several different variations but we will use an example where we want to pickup every CSV file in a specific folder and merge all the data into one sheet. We’ll use some of the more common Power Query features to cleanse the data before loading it.

 

We’ve used CSV files containing data on football matches from some of Europe’s top leagues, freely downloadable from Football-Data.co.uk – a terrific website renowned for their historical football stats.

 

1. Go to the Data tab, New Query – From File and there are various options for the most common file types. In this case we will go for the ‘From Folder’ option.

 

 

2. Browse to your folder location and click OK and you will be given some information about each of the files in the folder. Click the Edit button to load up the Query Editor screen.

 

 

3. Click on the ‘Combine Binaries’ option next to the Content header (circled below) to expand the query to return the data from the files rather than just the file information.

 

 

4. Depending on which version of Excel you have, you may need to promote your first row into headers. Earlier versions have a ‘Promote Headers’ option and Excel 2016 has ‘Use First Row As Headers’. In this instance Power Query has automatically detected the headers, so this step wasn’t required.

 

 

5. You can change data types by highlighting a column and dropping down the Data Type menu. In this case making the Date column a Date format.

 

 

6. You can double-click a column header and type in a new name if you want to rename a column. If you have the Formula Bar visible you can view the DAX formula being written for each step, as per the example below.

 

 

7. Highlight any columns you don’t want to load and select Remove Columns. Alternatively you can highlight the ones you want to keep and select Remove Other Columns.

 

 

8. You can go to the Add Column tab and create custom columns.

 

 

In this example, HomeGoals and AwayGoals have already had their data types switched to Whole Number and TotalGoals is being created as a simple sum of the two.

 

 

9. You can reorder any columns as you require, just click on the column title and drag the column left or right.

 

10. Depending again on your version of Power Query, you may find that the title row has been appended from all of the files in the folder. To get around this you need to filter them out. E.g. if you filtered the HomeTeam column, chose Text Filters – Does Not Include then you could filter out any other instances of the ‘HomeTeam’ column header within the data extract.

 

 

From the Home tab, you can now Close & Load the query to return the cleansed dataset containing all divisions.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *