Last week, we announced Power BI for Office 365 at the Worldwide Partner Conference. Sign up now to be notified when the Power BI Preview becomes available.
We also announced that Microsoft Power Query for Excel has reached General availability and is available for download. Along with Power Pivot and Power View, you may now start using Power Query for your day to day BI scenarios.
To show the breadth of possibilities with Power Query, we thought it would be fun to take a cue from Hans Rosling and build a macroeconomic and demographic mashup built using Power Query and Power View. We will be using data from GapMinder extensively in order to build our visualization.
The end result we are looking to build is a visualization that shows a time series scatter plot/bubble chart across the following dimensions:
- Population by country, since 1980
- Health by country (using Body Mass Index as a health indicator), since 1980
- Wealth by country (using GDP Per Capita as a wealth indicator), since 1980
Ultimately, we are interested in building something like the visualization below that will allow us to view the progress of the various countries over time.
Acquiring and consuming health data (Body Mass Index)
The first step is to consume the health data. You can find the health data via a search on GapMinder. The dataset we are interested in is Body Mass Index (BMI), men, Kg/m2. This dataset can be found by searching for it on the GapMinder data page.
Use “From Web” on the Power Query ribbon to consume data from the web
The “From Web” option in the Power Query ribbon is used to consume data from the web. Even though the data we are consuming is an Excel spreadsheet, the fact that it is coming from the web will allow you to use the “From Web” option.
Once you paste in the URL, you can click OK and you will be taken into the Power Query query editor. After going through the tables that are presented, we find that the “Data” table shown in the left navigator is the one that is of interest.
Promote first row as headers
The next thing we need to do is to promote the first row in the data table to be proper headers. This can be done by using the table-level transforms menu that is available by clicking in the table glyph in the upper left corner of the table.
After applying the Use First Row As Headers transform, we are left with the following table.
Unpivot the pivoted data
The next challenge we need to deal with is to get the data into a better shape that is going to be useful for analysis. The current view of the data is really a pivot-table–and “unpivoting” this data would be a great way to get this data into a shape that is better for merging and mashing up with other data.
In order to unpivot, we first need to multi-select the year columns (hold down the Ctrl key while selecting to multi-select).
The next step is to right click on the column header to reveal the transforms menu. We can simply choose Unpivot to get to unpivoted data.
The result of unpivoting will be a simple table that looks like this:
Clean up column names and data types
At this point, we can clean up our table and column names. We can also change the data types of the last two columns to be Number types.
Fetching and consuming Wealth and Population data
The steps for consuming the Wealth (Income per person (GDP/capita, PPP# inflation-adjusted)) and Population (Population, total) data are largely the same as the steps we went through to consume the BMI data in the previous step.
The only additional step we need to do is to make sure that we filter out years prior to 1980 for these two datasets. Our BMI dataset only has values starting at year 1980 – so we will only look at 1980 or later for all three datasets.
The GDP data should look like this after reshaping using Power Query:
Similarly, the Population data should look like this after reshaping using Power Query:
Merging the 3 tables together
Now that we have the three tables that we need, we can use Power Query to merge these three tables together. The Merge experience can be reached from the Power Query ribbon. Clicking on the Merge button will bring up the following dialog here you can select the tables that you would like to merge together.
In the first step, we are going to merge BMI and GDP Per Capita – so select the tables as shown above. We are going to use the Country and Year as the common columns between the tables as the basis of our merge–you can multi-select these (make sure you pick Country first and then Year as the order matters).
A status message at the bottom will show you whether the merge might be successful. We can simply click on OK to see the final result.
We are left with a new query called Merge1 that includes a new column that can be expanded by clicking on the glyph in the column header next to the column name:
We can select GDP Per Capita since we have the other columns already in the table. That will give us a mashup that includes BMI and GDP Per Capita for a given country for a given year.
We can perform the same series of steps to merge this table (called Merge1) with Population. After cleaning up the final result, we are left with the final data table that can be used for visualization.
Visualizing using Power View
The only thing left for us to do is to visualize the data using Power View. You can do this by clicking on Power View from the Insert tab in Excel. Once in Power View, you can choose to visualize a Scatter Plot and configure your fields as shown.
We can focus on just the countries with population over 60 million or so to zero in on the larger countries, and to see how health, wealth and population might correlate and trend over time.
It is that simple to consume, mashup and visualize data coming from different sources using Excel, Power Query and Power View. Hope you have enjoyed this fun little mashup with real world statistics.
Hope you have enjoyed this walkthrough on mashups and visualizations using Power Query and Power View. Let us know what you think!