Lost Lives of Children: The Unfinished Agenda

Over 128 MILLION children under the age of 5 died since 2000. In the year 2020 alone, over 4.5 million children died, with an overwhelming majority being preventable. Over the last year the unfortunate losses due to COVID-19 have been brought to our attention on a daily basis. Yet the recurring, but silent, tragedy of childhood deaths has become even more silent. Through these visualizations, we bring attention to the child survival agenda, with the hope for actions being generated to help eliminate these avoidable deaths.

Key Visualization

Citation: Akhil Kumar, Aashish Gupta, S V Subramanian. Lost Lives of Children: The Unfinished Agenda. https://doi.org/10.7910/DVN/PADMQZ. Apr 2021. Geographic Insights Lab at the Harvard Center for Population and Development Studies; Center for Geographic Analysis at Harvard University, Cambridge, MA.


Data Sources and Metrics

We obtained mortality data from UNICEF that shows the total deaths occurring every year for 195 countries in the world. We restricted our time series prediction from the earliest data that was available for each country which in some cases was from 1955 to 2019 for only low and low-middle income countries as classified by the World Bank as of June 2020. We also wanted to calculate the cumulative Years of Life Lost (YLL) for these countries. We calculated 3 metrics and the full names and definitions of these 3 metrics can be found in the table below.

Metric Name


Predicted Daily Deaths

The average number of daily deaths in 2021 within each month for each country. Predicted with linear regression from full historical under 5 death data for each country.

Cumulative Deaths

The total number of yearly deaths for each country from 2000 to the end of 2020, with 2020 being the only predicted year.

Years of Life Lost

The total number of years of life lost from 2000 to the end of 2020, assuming all children live up to 5 years of age.

Data Processing and Analysis

The raw data that was used for processing came from UNICEF and its overall structure is described in the supplementary table 2 in the Excel file. We predict 2020 and 2021 values using semi-log regression. The equation to derive the rate of change, r, in infant deaths between years was ln(deaths) = a + r(year), where ln is the natural logarithm. We assumed that the year-to-year rate of change, r, applied to within-year changes as well. Then, the number of child deaths in a month, t, are calculated as:

deathst = deathst-1 * e(r/12)

To calculate, the average daily deaths from the predicted monthly death count, we assumed even distribution within each month. Also, we calculated the Years of Life Lost metric by taking the average life expectancy for each country for every year from 2000 to 2020, subtracting 5 from it and multiplying it by the total number of deaths for that year or month. To calculate cumulative deaths, we only included deaths from the beginning of 2000 to the end of 2020. To process the data, we used Excel’s Visual Basic for Applications which is a programming language built into Microsoft Excel and is an event-driven programming language developed by Microsoft Office. This is different than formulas or any front-end Excel functions and works between and across Excel workbooks and sheets. Our scripts, raw data, and processed data can be found on the Harvard Dataverse repository. To start processing the data download the entire repository as a .zip file and unzip it to your computer. We include the final processed data for you to use in addition to the raw data and the data processing scripts. To replicate our final processed data, follow the flowchart below.