COVID-19 & Time Intelligence in Power BI from DataChant
In my last tutorial here, you learned how to import COVID-19 confirmed and death cases from Johns Hopkins dataset to build your own Power BI dashboard using Power Query. In this tutorial, we will go quickly through the DAX measures that will help you gain better analytics capabilities using Power BI time intelligence.
Challenge #1 – Cumulative Amounts
Johns Hopkins dataset is provided as cumulative numbers. While it is easy to create line charts that will display the cumulative number of confirmed cases and deaths, it is challenging to apply simple visuals like A Card that will show the latest total confirmed cases or a line chart with the daily number of confirmed cases by date.
Let’s demonstrate the challenge. Open your Power BI report and drag and drop the Confirmed column into the canvas. Then change the visual into a Card visual.
You will find that the number of confirmed cases is too high.
Why is the number incorrect? Because when you drag and drop the Confirmed column, Power BI applies a simple SUM operation on all the values in the Confirmed column. Since these amounts are cumulative, you end up with a much higher amount. To be more clear, let’s say that the cumulative confirmed cases are represented in this sequence 1,2,4,8,16 (e.g. on day 1 we have one case. On day 5 we have 16 cases). By dragging and dropping the Confirmed column into the Card you would have 31 cases (1+2+4+8+16=31) instead of 16.
For a quick & dirty fix, you can follow these steps:
Drag and drop the Date column into the Filters on this visual. Select Top N as Filter type. Enter 1 as Show items next to the Top drop-down menu. Drag and drop Date column again. This time into By value and set it as Latest Date.
Dragging and dropping amounts to visuals is a bad practice in Power BI. In our case, we would need a smarter way to represent the actual confirmed cases by date. Ideally, if Johns Hopkins would report the data by the actual daily cases, instead of cumulative numbers, your report would be simpler.
So, how would you address this challenge in a better and more scalable way without changing the data source? The answer is DAX.
Total Confirmed = VAR sum_confirmed = SUMX ( VALUES( 'Covid-19 Cases'[Country/Region] ), SUMX ( VALUES ('Covid-19 Cases'[Province/State] ), CALCULATE ( MAX ( 'Covid-19 Cases'[Confirmed] ) ) ) ) RETURN IF (sum_confirmed = 0, BLANK(), sum_confirmed )
In the Modeling tab, click New measure. Make sure Home table is set to Covid-19 Cases and enter the DAX expression above.
Now you can create a new Card visual and drag and drop the Total Confirmed measure. The latest confirmed cases were calculated correctly without the need to set the filter in the visual.
The measure above iterates over all the countries and for each country it iterates over all provinces/states (Note: US states were excluded from this dataset, so for United States, you would only have a single blank value as a state). For each province/state we find the maximal value of confirmed cases and sum up all the maximal values by country and province/state.
Here is an alternative way you can implement Total Confirmed. This time by applying an explicit filter on the latest date and summing up all the amounts for the last date in Covid-19 Cases table:
Total Confirmed = VAR latest_date = MAX ( 'Covid-19 Cases'[Date]) RETURN CALCULATE ( SUM ( 'Covid-19 Cases'[Confirmed] ), FILTER ( 'Covid-19 Cases', 'Covid-19 Cases'[Date] = latest_date ) )
But, I find the second option risky as it assumes that all countries and provinces/states will be reported on the same date. If for some reason, the cases of one country are missing on the latest date, the confirmed cases of the previous day will not be included in the calculations.
Note: The first measure can be improved by moving the countries and provinces/states into a separate dimension table.
Challenge 2: Daily Confirmed Cases
Now let’s move to the next challenge. How can we show the daily confirmed cases over time? When our data is cumulative we would need to build a measure that subtracts the confirmed cases amount on each day by the amount on the previous day. To do it right, it’s time to include a calendar table in our model.
In Power BI Desktop, Modeling tab, select New table and enter the following expression in the formula bar.
Calendar = CALENDAR ( MIN ( 'Covid-19 Cases'[Date] ), MAX ( 'Covid-19 Cases'[Date] ) )
The new table includes a unique date in each row, starting from the earliest date in Covid-19 Cases table and ending with the latest. Having a calendar table is the first step for efficient time intelligence in Power BI. Once you create this table, you would need to create a relationship between its Date column and the date column in your other fact tables (in our case – our fact table is the Covid-19 Cases).
In the Model view (highlighted in the left pane below), click Date in Calendar and drag and drop it to Date in Covid-19 Cases.
You can now create a line chart. Select the Line chart visual. Drag and drop Date column to Axis in the Visualizations pane. Drag and drop Total Confirmed to Values.
You can see in the Axis section that the Date column turned into a hierarchy of Year, Quarter, Month and Day. To drill down in all levels to see the confirmed cases by date click on the two-pronged devil’s fork.
Another option to see the line chart only by dates is to switch the Date from Date Hierarchy to Date as shown in this screenshot.
Now that we have a trend line, we would like to create a new measure for the daily confirmed cases by date as illustrated here:
In the Modeling tab, click New measure. Make sure Home table is set to Covid-19 Cases and enter the following DAX expression:
Daily New Cases = VAR current_day = [Total Confirmed] VAR prev_day = CALCULATE ( [Total Confirmed], DATEADD ( 'Calendar'[Date], -1, DAY ) ) RETURN IF ( ISBLANK ( prev_day ), BLANK(), current_day - prev_day )
The DAX expression above calculates the daily new cases by summing up the total cumulative confirmed cases subtracted by the total cumulative cases from the previous day. The previous day calculation is done using DATEADD that offsets our calendar table by one day to filter our cases to show only the previous date.
The ISBLANK function above is used to skip the edge case of countries that start reporting the cumulative amount with a number that is higher than their daily new confirmed cases. In DAX, subtracting a number by BLANK() will result in the original number. Since we want to ignore the first confirmed cases that can be too high, we apply that logic.
Challenge #3: Estimated Active Cases
Monitoring the spread of the Coronavirus by measuring only the confirmed and death cases can be depressing. Let’s think positively and take a look at the recovered cases. In my original tutorial, we imported the recovered cases CSV file. Since then, Johns Hopkins removed the file and later on put it back, stating that the recovery numbers are not consistently reliable for all countries. I thought it would be nice to skip this dataset altogether and create our own measure for estimated recovered cases.
Let’s make an assumption that it takes an average of X days to recover from the virus. So, on any given day, the number of recovered cases equal to the number of confirmed cases we had X days ago minus the number of new deaths in the passing period.
To calculate the estimated recovered cases, we first need to calculate the total deaths. As you recall from the challenge above, a simple SUM will not work on cumulative amounts. Here is the measure you would need (similarly to the Total Confirmed measure).
Total Deaths = VAR sum_deaths = SUMX ( VALUES( 'Covid-19 Cases'[Country/Region] ), SUMX ( VALUES ('Covid-19 Cases'[Province/State] ), CALCULATE ( MAX ( 'Covid-19 Cases'[Deaths] ) ) ) ) RETURN IF (sum_deaths = 0, BLANK(), sum_deaths )
Now, let’s create a measure that will store the X days to recover. For now, let’s assume it takes 17 days to recover. We can change this number later.
Create a new measure with the following expression:
Estimated Days to Recover = 17
Now create the following measure for the estimated recovered cases:
Estimated Recovered = VAR prev_period = DATEADD ( 'Calendar'[Date], - [Estimated Days to Recover], DAY ) RETURN CALCULATE ( [Total Confirmed] + [Total Deaths], prev_period ) - [Total Deaths]
Now, you can create a new measure for the estimated active cases by subtracting the total confirmed cases by the estimated reovered cases.
Estimated Active Cases = [Total Confirmed] - [Estimated Recovered]
From here, you can create line charts that will show you the estimated active cases over time. See how China shows the bell-shaped curve, while the other countries on April 3rd are still in exponential growth for their active cases.
To turn the static estimated 17 days recovery time into a dynamic parameter that you can change in a slicer, you can create a What-If Parameter that will be described in my next tutorial, including a measure that can align all countries to the same timeline of days since the virus started as shown here:
Hope you find this tutorial useful. Stay tuned to learn how I built this report. Let’s flatten the curve!