How to copy paste tabular data into an Excel worksheet and unstack it using Power query.
Watch the video below:
In the last video we learnt how to get tabular data from a PDF file into an Excel worksheet using Power Query. Today we’ll do a similar action but this time we’ll copy and paste the tabular PDF data directly into an Excel worksheet and observe what happens.
I’ll select the sample PDF tabular data from my file and simply paste it into my Excel worksheet. The data gets pasted and with a few steps of editing I can clean my data and get into a tabular form. But now let’s take another tabular data and copy and paste it also into the same worksheet. Now things look different. The data is pasted in a stacked manner – one below the other. So this can also happen when you copy and paste PDF tabular data.
Can we unstack this data quickly and easily to get it into the normal tabular form? Yes! Power Query can help us achieve our goal quickly and easily.
We’ll first click inside the pasted data and then click the Data tab; next we click the drop down arrow of ‘Get Data’ from the ‘Get & transform data’ group, select ‘From Other sources’ and click on ‘From Table/Range’. A new window called ‘Create Table’ appears and in the text-box below ‘Where is the data for your table?’, the complete range of pasted data is shown. Also ‘My Table has headers’ check-box is ticked. Since our data has no headers we’ll uncheck the box and click on ‘OK’. Now Power Query fires up and our data is displayed in the Power Query editor with the header ‘Column 1’. How can we now manipulate this data and unstack it?
The first action that we do is add a column by clicking on ‘Add Column’. We select ‘Index Column’ and from the drop-down options we select ‘From 0’. Now we notice that we get a complete indexation of our data numerically starting with 0.
With the ‘Add Column’ selected, we click on the drop-down arrow of ‘Standard’ and select ‘Modulo’. In the Modulo window that pops up, we enter the value of 3 because our data consists of a set of 3. We then click OK and now a new column with the header ‘Modulo’ has been created and our index values have been converted to 0,1,2.
The next step is to select the Modulo column, click on transform and select Pivot Column. In the new window called Pivot Column under the header ‘Values Column’ in the comb-box we let Column 1 remain. In the ‘Advanced Options’ under ‘Aggregate Value Function’ we select ‘Don’t Aggregate’ and click on OK.
Now we notice that we have 3 more columns with the headers 0, 1 and 2. Each of these columns has the relevant value for the index 0, 1 and 2. For example, row 1 under column 0 has the value 1 , row 2 of column 1 has the value 2001-02 and the row 3 in column 2 has the value 100.
We select the 3 columns with the headers 0, 1 and 2. From transform we go to ‘Fill’ and from the drow-down options we select ‘Up’.
We can now see how our data has been created properly in a tabular form but with duplicates. To remove the duplicates we right-click column with the header 2 and select ‘Remove Duplicates’. Now our data looks good! We now remove or delete the Index column because we do not need it any more. On the right-hand side of the window of the query editor we can view every step we have done.
From the Home tab we will click on Close & Load drop-down arrow and select Click & Load to… In the new window Import data we will select Table and enter the range under the option Existing Worksheet where we wish to place the data in our worksheet. We finally click on OK. Our data is now in our Excel worksheet. We nan rename the column headers in our Excel worksheet or we can edit our query to do the same.
If you now add more data to your PDF file, you can easily and quickly update the data in the Excel worksheet by just clicking on ‘Refresh’ in the ‘External Table Data’ group. To learn all this in a much easy manner watch the video above.