How to extract Live Coronavirus data using APIs in Python?

Application Programming Interfaces

Before we delve into the solution to the article title, we must understand Application Programming Interfaces (APIs). Webopedia state that APIs are a set of routines and procedures in which developers can ‘consume’ to support the build of software applications. APIs are crucial to gaining access to data and are generally easy to embed into a program. We will be using ProgrammableWeb to locate an API to use which provides a data feed for Coronavirus information.

Submitting API Requests & Getting a Response

After locating a live API such as the corona-api, we must send a request to the that web server to get a response with COVID-19 data. The diagram below illustrates an aggregated view of how API requests work. You send a request from your laptop to the website of interest and the server will respond to that request. The Python requests library allows us to do just that, sending a response to our request which will convert the data to JSON format.

Web request-response complex

Step-by-Step Guide:

1. Getting a list of countries where coronavirus data exists

Firstly, we must use the corona-api to understand which countries we can retrieve data for.

import requests
from datetime import date
todays_date = date.today()
base_url = "https://corona-api.com/countries"
get_countries = requests.get(url=base_url).json()['data']
[{'coordinates': {'latitude': 33, 'longitude': 65}, 'name': 'Afghanistan', 'code': 'AF', 'population': 29121286, 'updated_at': '2020-04-14T17:28:01.801Z', 'today': {'deaths': 2, 'confirmed': 49}, 'latest_data': {'deaths': 23, 'confirmed': 714, 'recovered': 40, 'critical': 0, 'calculated': {'death_rate': 3.221288515406162, 'recovery_rate': 5.602240896358544, 'recovered_vs_death_ratio': w'recovery_rate': 52.21052631578947, 'recovered_vs_death_ratio': None, 'cases_per_million_population': 165}}}, {'coordinates': {'latitude': 0, 'longitude': 0}, 'name': 'Åland Islands', 'code': 'AX', 'population': 26711, 'updated_at': '2020-04-14T17:28:01.801Z', 'today': {'deaths': 0, 'confirmed': 0}, 'latest_data': {'deaths': 0, 'confirmed': 0, 'recovered': 0, 'critical': 0, 'calculated': {'death_rate': None, 'recovery_rate': None, 'recovered_vs_death_ratio': None, 'cases_per_million_population': 0}}}, ...

2. Use the country code to search for a specific country and generate urls targets where requests will be sent

If we want to get a daily timeline of historic data, we must specify the country code in the base url. For example, sending a request to https://corona-api.com/gb will only send a response for the UK because the country code “gb” refers to the UK.

for country in get_countries:
country_codes = [country['code'] for country in get_countries]
for code in country_codes:
request_urls = [f"{base_url}/{code}" for code in country_codes]
country_codes = ['AF', 'AL', 'AX', 'AS', 'DZ', 'AD', 'AO', 'AI' ...]request_urls = ['https://corona-api.com/countries/AF', 'https://corona-api.com/countries/AL', ...]

3. Use the request_urls object to make the requests and receive a response

We must loop through the request_urls list and make the API request for each target url.

covid_19_data = []for url in request_urls:
country_data = requests.get(url=url).json()
covid_19_data.append(country_data)
covid_19_data = [{'data': {'coordinates': {'latitude': 33, 'longitude': 65}, 'name': 'Afghanistan', 'code': 'AF', 'population': 29121286, 'updated_at': '2020-04-15T10:21:11.506Z', 'today': {'deaths': 2, 'confirmed': 70}, 'latest_data': {'deaths': 25, 'confirmed': 784, 'recovered': 43, 'critical': 0, 'calculated': {'death_rate': 3.188775510204082, 'recovery_rate': 5.48469387755102, 'recovered_vs_death_ratio': None, 'cases_per_million_population': 20}}, 'timeline': [{'updated_at': '2020-04-15T10:18:59.805Z', 'date': '2020-04-15', 'deaths': 25, 'confirmed': 784, 'active': 716, 'recovered': 43, 'new_confirmed': 70, 'new_recovered': 3, 'new_deaths': 2, 'is_in_progress': True}, {'updated_at': '2020-04-14T21:33:12.000Z', 'date': '2020-04-14', 'deaths': 23, 'confirmed': 714, 'recovered': 40, 'new_confirmed': 49, 'new_recovered': 8, 'new_deaths': 2, 'active': 651}, ...}]}, '_cacheHit': True},Data collected for Afghanistan
Data collected for Albania
Data collected for American
...
Data collected for Yemen

4. Exporting the data to csv

The final step is to export this data to csv. We must create a DataFrame from our data object using pd.json_normalise(). We generate a date stamp for today and insert this date within the file_name. You should specify a file name of your choice after the date. Copy and paste the path of the directory where you want to save the Excel file and use the df.to_csv method to export the data to the destination path with the file name you have given.

from datetime import date
import pandas as pd
df = pd.json_normalize(covid_19_data)
todays_date = date.today()
file_name = f"{todays_date}_insert_file_name_here.csv"
file_save_destination = f"C:\\directory\\folder\\{file_name}"
df.to_csv(file_save_destination,
index=False,
header=True)

Concluding Remarks

Now that we have a csv of 249 countries of data related to Coronavirus, this is the starting point for our analysis. We now have data from late January up until today for 249 countries. This code can be ran only a daily basis to collect the COVID-19 figures for that given day. It is worth noting that the point data become available vary between countries. For the UK, the daily coronavirus briefing is at 5pm so shortly after that will the figures be available.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store