Search
  • Reviewed by Maslows

COVID-19: Build your own Power BI dashboard – Revised from DataChant

Follow @gilra

Note: Due to changes in the COVID-19 dataset that was published by Johns Hopkins (described here), I revised the original tutorial. Keep reading if you want to learn how to build your own Power BI Dashboard to analyze the spread of COVID-19 with the revised dataset.

Here are the main changes:

  1. Recovered cases are no longer provided.

  2. There are two new CSV files for confirmed and death cases.

  3. US States were removed from all data sources.

  4. If you follow my original tutorial, you will not find the deprecated files.

Update – March 29, 2020: Due to wrong confirmed cases for Guyana between 17-23 March, I added a correction step in the report. You can download the latest version here (Thank you, Mike Carlo, for finding this issue).

The Revised Tutorial

This tutorial will focus on the data prep steps using Power Query. You can download the report file here. The online report is available here.

The Data Source

Johns Hopkins University published here a great interactive dashboard that analyzes the spread of COVID-19 by country. The source data is based on the most reliable resources (e.g. CDC, Who) and is stored in a GitHub repository here as two separate CSV files (Confirmed cases and Deaths). In this tutorial, you will learn how to combine and transform this data to build your own refreshable report on Power BI.

Go to https://github.com/CSSEGISandData/COVID-19 and select the folder case_covid_19_data (or click here).

Select the folder case_covid_19_time_series (or click here)

Select time_series_covid19_confirmed_global.csv (or click here). Select the Raw button as shown in the screenshot below (Note: This screenshot is outdated, but highlights the location of the Raw button).

You will see comma-separated values displayed in your browser. Copy the URL address: https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

Confirmed Cases

Open Power BI Desktop and in the Get Data drop-down select Web.

In the From Web dialog box, paste the URL https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv into the URL text box, and click OK.

Click Transform Data on the next screen.

Rename the query Confirmed.

In Home tab, select Use First Row as Headers. This step will promote the first row. You should note that after this step you would need to harden the queries to be able to recognize new columns in the future (e.g. tomorrow when John Hopkins updates the CSV files). In the last section of this tutorial, you will learn how to do it.

You should note at this stage, that the 4 first columns identify the region/country, province/state and longitude and latitude, but the rest of the columns are all confirmed cases by date. This is not a good format for analysis in Power BI or Excel. We are going to change it.

Click and select the first 4 columns (using Ctrl key or Shift + Right cursor). Now Right-click on the header of one of the 4 columns and select Unpivot Other Columns in the shortcut menu.

Rename the last two columns Date and Confirmed.

Importing the second CSV

To bring the second CSV files (Death cases), go back to the time series folder on Github here and click on the CSV file time_series_covid19_deaths_global.csv

Select the Raw button.

After you clicked on Raw, copy the URL address from the browser (in the same way you did it for the previous CSV file).

Back in Power Query Editor, right-click on the Confirmed query and select Duplicate in the shortcut menu.

Rename the new query Deaths. In Applied Steps, select the Source step and click the settings icon (the cogwheel).

In the Comma-Separated Values dialog box, paste the new URL that you previously pasted into the URL box, and click OK.

Combining the Files

Now, that you have 2 queries Confirmed and Deaths. It’s time to bring them together. First, let’s ensure we will not load the separate tables into the report. Right-click on each query and uncheck Enable Load.

Next, select Deaths query and rename the last column Deaths.

Go back to Confirmed query and in Home tab, select Merge Queries and then Merge Queries as New.

In the Merge dialog box, select Deaths in the second drop-down menu. Using the CTRL key, select the three columns Province/State, Country/Region, and Date. Make sure you select them in the right order. You will notice the little numbers 1,2,3 appearing in their headers.

Now, in the Deaths table of the Merge dialog box, select the same columns on the same order. Make sure you see the 1,2,3 numbers. Before you click OK, you may see the Privacy Levels dialog box.

If Privacy levels dialog box appear, select Public in the drop-down box as shown below and click Save. Then click OK on the Merge dialog box.

Expand the Deaths column by clicking on the expand button on the right side of the column header. In the pane that opens, unselect all fields and keep only Deaths selected. Click OK. This step will expand the number of death cases by the corresponding location and date of the confirmed cases.

Rename the column Deaths.1 as Deaths.

Rename the Merge1 query as COVID-19 Cases and click Close & Apply to load the data. You can now start building your visuals (which will not be covered in this tutorial).

Fixing the Queries – Make It Refreshable

If you will go back to your report tomorrow, you will notice that the refresh will not work. The queries will not load the daily new cases as a new column on the right-end.

But this issue can be easily fixed. In each one of the queries Confirmed and Deaths, go to Applied Steps and select the Source step. Then, in the Formula Bar you can see the Columns section with an explicit static number of columns. (Note: If you don’t have the Formula Bar visible, you can activate it in the View tab).

Luckily, if we remove Columns=xx from the formula, we will resolve the issue. For example, here if the formula for the Confirmed query before and after the change.

Before:

= Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Columns=58, Encoding=65001, QuoteStyle=QuoteStyle.None])

After:

= Csv.Document(Web.Contents("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"),[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.None])

After you remove the Columns element from the formula on Confirmed and Deaths you can also delete the two Change Type steps in Applied Steps, as shown below (This is recommended as a best-practice. Read more about it here, or in Chapter 10 of my book).

Make sure you repeat these steps for Confirmed and Deaths.

Finally, go to Covid-19 Cases query and change the types of Confirmed and Deaths columns to Whole Number as shown in the screenshot.

That’s it. You are ready to go. Click Close & Apply and start building Check out the quick report I have built below.

This tutorial will focus on the data prep steps using Power Query. You can download the report file here. The online report is available here.

Stay safe!!!

The post COVID-19: Build your own Power BI dashboard – Revised appeared first on DataChant.

0 views