How to analyze large amounts of data using Power Query and Power Pivot in Excel.
Watch the video below:
Today we will learn how to analyze large amounts of data in Excel. By large amounts of data we mean the amount of data that breaches the total number of rows in an Excel worksheet i. e. 1048576 rows. If we open a file containing more than 1048576 rows of data, we get a message: File not loaded completely. On clicking OK we can view the data up to row 1048576. But our file contains more than 1048576 rows of data.
Let’s open the two large CSV files in Power Query. We’ll select the Data tab and click on the drop-down arrow next to Get Data from the Get & Transform Data group. We will next select From File and finally click on From Folder. A new dialogue box with the title Folder opens. We browse for our folder, open the folder and although both the CSV files are not visible, we click on open. Now we can see the folder path and both the files have been retrieved. We click on Transform Data. The files attributes like file name, file extension, date created, folder path, etc are now available in the Power Query Editor.
Next to the header Content are two parallel arrows pointing in the downward direction and when we hover our mouse over these arrows we see the hint Combine Files. We click on these arrows. A window with the title Combine files opens.
We can see the comment: ‘Specify the settings for each file. Learn more’ Below this line we can read ‘Sample File:’ and in the drop-down area we see ‘First File’ displayed. Below that we have the headers ‘File Origin’, ‘Delimiter’ and ‘data Type Connection’. Below this region we have the data displayed. We click OK.
A window called ‘Evaluating Query’ is visible and after some time data is visible. The time taken to display the data depends on how large our files are. For the time being we can view only the top 1000 rows of the file.
We right-click on the column header Source_Name and select Remove from the offered options. We can now make changes to our data if required. You can also format the dates under the data column by right-clicking on the Date header and selecting ‘Using Locale…’.
But we’ll add a new column by clicking on the Add Column tab and then selecting Custom Column. In the Custom Column window that pops up under the New column name in the text box, we’ll enter ‘Year’, Below the ‘Custom column formula, we’ll write the formula ‘=Date.Year([Date])’ as shown in the video and finally click OK.
In the new custom column ‘Year’ we have now extracted the Year from our date in the Date column.
We click on the Home tab. From the ‘Close & Load’ drop-down arrow we select ‘Close & Load to…’ A new window called Import Data pops up.
In this Import Data Window we select the option ‘Only Create Connection’. We also check the check-box with the caption ‘Add data to the Data Model’. Finally we click on OK.
On the right-hand-side of the Excel worksheet under Queries & Connections under Other Queries you can view how the data is being loaded. In the end we can observe that 5 million rows of data has been loaded. We can also observe the whole action in the status bar.
Next we click on Power Pivot tab and select Manage in the Data Model group. In a few moments a message is displayed ‘Preparing the PowerPivot window, please wait…’ We can finally view our data in the Excel worksheet with the status bar displaying ‘5-million-rows-of-data’,
We can now analyze our data by clicking on the drop-down arrow of the Pivot Table tab and selecting Pivot Table from the options offered. A new window called ‘Create Pivot Table’ opens. We select the option ‘New Worksheet’ and the location is ‘Sheet1’!$A$1 and we click on OK.
The pivot table structure appears on the Excel worksheet and the pivot table fields appear on the right-hand side of the worksheet. We click on the drop-down arrow next to 5-million-rows-of-data to display the headers like Region, Product, Date, Sales and Year. We drag the Region field to the Columns area, the Year field to the Rows area and the Sales field to the Values area.
The Sales in different regions and Years along with the totals is displayed on the Excel worksheet. We can now also insert a chart into our Excel worksheet by first clicking inside the data area and then selecting the Insert tab. We then click on the drop-down arrow below Pivot Chart in the Charts group and select PivotChart. Now we can select an appropriate chart type to be placed next to our data in our worksheet.
We can further analyze our data using slicers and timeline, for example.
In this manner we can analyze large amounts of data using Power query and Power Pivot.