How to extract Live Coronavirus data using APIs in Python?

Aaron Ginder
6 min readApr 15, 2020

--

If you are anything like me, when a significant event occurs, I become curious about the data sitting behind key measures. Have you ever looked at a visualisation shown by the Government during a daily Coronavirus briefing and thought about the underlying data? How about whether you want to create your own visualisations and derive insights? If the answer is yes, look no further as I’ll show you how to extract Coronavirus data for 249 countries on a daily basis in less than 20 lines of Python code and less than 2 minutes.

Application Programming Interfaces

Before we delve into the solution to the article title, we must understand Application Programming Interfaces (APIs). Webopedia state that APIs are a set of routines and procedures in which developers can ‘consume’ to support the build of software applications. APIs are crucial to gaining access to data and are generally easy to embed into a program. We will be using ProgrammableWeb to locate an API to use which provides a data feed for Coronavirus information.

For those new to APIs, think of it as a restaurant menu — there is a clear, ordinal structure (starters on the front, mains courses in the middle, desserts and drinks on the back). In the same way one may use their finger to navigate through the menu, we use code to navigate through the data of an API response. We may ask the waiter for more information of a dish, like how we would calculate fields using data obtained from the API response.

Submitting API Requests & Getting a Response

After locating a live API such as the corona-api, we must send a request to the that web server to get a response with COVID-19 data. The diagram below illustrates an aggregated view of how API requests work. You send a request from your laptop to the website of interest and the server will respond to that request. The Python requests library allows us to do just that, sending a response to our request which will convert the data to JSON format.

Web request-response complex

Step-by-Step Guide:

1. Getting a list of countries where coronavirus data exists

Firstly, we must use the corona-api to understand which countries we can retrieve data for.

Code:

import requests
from datetime import date
todays_date = date.today()
base_url = "https://corona-api.com/countries"
get_countries = requests.get(url=base_url).json()['data']

Output:

[{'coordinates': {'latitude': 33, 'longitude': 65}, 'name': 'Afghanistan', 'code': 'AF', 'population': 29121286, 'updated_at': '2020-04-14T17:28:01.801Z', 'today': {'deaths': 2, 'confirmed': 49}, 'latest_data': {'deaths': 23, 'confirmed': 714, 'recovered': 40, 'critical': 0, 'calculated': {'death_rate': 3.221288515406162, 'recovery_rate': 5.602240896358544, 'recovered_vs_death_ratio': w'recovery_rate': 52.21052631578947, 'recovered_vs_death_ratio': None, 'cases_per_million_population': 165}}}, {'coordinates': {'latitude': 0, 'longitude': 0}, 'name': 'Åland Islands', 'code': 'AX', 'population': 26711, 'updated_at': '2020-04-14T17:28:01.801Z', 'today': {'deaths': 0, 'confirmed': 0}, 'latest_data': {'deaths': 0, 'confirmed': 0, 'recovered': 0, 'critical': 0, 'calculated': {'death_rate': None, 'recovery_rate': None, 'recovered_vs_death_ratio': None, 'cases_per_million_population': 0}}}, ...

Explanation:

The requests.get method submits the response to the website and the .json() method coverts the response to a JSON object. If you are unfamiliar with JSONs, read here

Do not be phased about the the messy look of this JSON response. The key ‘data’ that we sliced has values that compiles of a list of countries that contain data about the geographic coordinates, country code, population, dates and coronavirus statistics for the current day in a nested dictionary format. We already have retrieved key statistics about the current day Coronavirus data!

2. Use the country code to search for a specific country and generate urls targets where requests will be sent

If we want to get a daily timeline of historic data, we must specify the country code in the base url. For example, sending a request to https://corona-api.com/gb will only send a response for the UK because the country code “gb” refers to the UK.

To do this for each country, we create a list of all country codes by parsing the get_countries response from step 1. Then, we must loop through this list and add the code on to the end of the url to generate a list of the urls that we will use to access the data for each country.

Code:

for country in get_countries:
country_codes = [country['code'] for country in get_countries]
for code in country_codes:
request_urls = [f"{base_url}/{code}" for code in country_codes]

Output:

country_codes = ['AF', 'AL', 'AX', 'AS', 'DZ', 'AD', 'AO', 'AI' ...]request_urls = ['https://corona-api.com/countries/AF', 'https://corona-api.com/countries/AL', ...]

Explanation:

We collect the country codes in the first loop, where we use the key ‘code’ to access the value for each country and store this within the country_codes list.

Secondly, we iterate through the country_codes list and create request_urls. By having a list of all the possible urls, there is opportunity to use concurrency using the request futures library to make asynchronous calls to web servers. However, this is not part of the scope of this article.

3. Use the request_urls object to make the requests and receive a response

We must loop through the request_urls list and make the API request for each target url.

Code:

covid_19_data = []for url in request_urls:
country_data = requests.get(url=url).json()
covid_19_data.append(country_data)

Output:

covid_19_data = [{'data': {'coordinates': {'latitude': 33, 'longitude': 65}, 'name': 'Afghanistan', 'code': 'AF', 'population': 29121286, 'updated_at': '2020-04-15T10:21:11.506Z', 'today': {'deaths': 2, 'confirmed': 70}, 'latest_data': {'deaths': 25, 'confirmed': 784, 'recovered': 43, 'critical': 0, 'calculated': {'death_rate': 3.188775510204082, 'recovery_rate': 5.48469387755102, 'recovered_vs_death_ratio': None, 'cases_per_million_population': 20}}, 'timeline': [{'updated_at': '2020-04-15T10:18:59.805Z', 'date': '2020-04-15', 'deaths': 25, 'confirmed': 784, 'active': 716, 'recovered': 43, 'new_confirmed': 70, 'new_recovered': 3, 'new_deaths': 2, 'is_in_progress': True}, {'updated_at': '2020-04-14T21:33:12.000Z', 'date': '2020-04-14', 'deaths': 23, 'confirmed': 714, 'recovered': 40, 'new_confirmed': 49, 'new_recovered': 8, 'new_deaths': 2, 'active': 651}, ...}]}, '_cacheHit': True},Data collected for Afghanistan
Data collected for Albania
Data collected for American
...
Data collected for Yemen

Explanation:

Following the same process in step 1, we use requests.get(url) to make the request for each url, convert it to json format and append this json object to the covid_19_data list. As you can see from the output above, the timeline shows the updated_date for each day and the coronavirus figures that associate with that day.

4. Exporting the data to csv

The final step is to export this data to csv. We must create a DataFrame from our data object using pd.json_normalise(). We generate a date stamp for today and insert this date within the file_name. You should specify a file name of your choice after the date. Copy and paste the path of the directory where you want to save the Excel file and use the df.to_csv method to export the data to the destination path with the file name you have given.

Code:

from datetime import date
import pandas as pd
df = pd.json_normalize(covid_19_data)
todays_date = date.today()
file_name = f"{todays_date}_insert_file_name_here.csv"
file_save_destination = f"C:\\directory\\folder\\{file_name}"
df.to_csv(file_save_destination,
index=False,
header=True)

Output:

Make sure you use double backslashs in the file name to prevent the program from picking up regex characters.

Run the code to see your newly created Excel File!

Concluding Remarks

Now that we have a csv of 249 countries of data related to Coronavirus, this is the starting point for our analysis. We now have data from late January up until today for 249 countries. This code can be ran only a daily basis to collect the COVID-19 figures for that given day. It is worth noting that the point data become available vary between countries. For the UK, the daily coronavirus briefing is at 5pm so shortly after that will the figures be available.

Keep your eyes peeled for a future post that provides a guide to parsing the timeline JSON data into a flat, tabular format supported by data visualisation software such as Tableau or libraries like matplotlib.

Happy API requesting!

--

--

Aaron Ginder

An enthusiastic technologist looking to share my passion for cloud computing, programming and software engineering.