This post was originally published by Asmi Kumar at Towards Data Science
Using satellite imagery and neural networks to predict asset wealth, with Rwanda as a case study
Economic livelihood is difficult to estimate. Even in today’s world, there is a lack of clear data to identify impoverished areas, which leads to insufficient resource distribution — money, food, medicine, and access to education. We produce an ample amount of resources to feed, clothe, and house up to 10 billion people, yet hundreds of millions still suffer in poverty.
One approach to help alleviate this problem is to create a model utilizing computer vision to map and predict poverty in the African country of Rwanda, one small enough to provide an abundant and diverse but not overwhelming dataset.
How do we complete this task? There are several key steps:
- Download Demographic and Health Surveys (DHS), nightlight satellite imagery, and daytime satellite imagery
- Test whether nightlights can predict wealth accurately
- Test whether basic features of daytime imagery can also predict wealth accurately, and extract image features
- Construct a convolutional neural network (CNN) leveraging a combined dataset of daytime and nightlight images, and apply transfer learning
- Construct maps showing the predicted distributions of wealth
In this article, we will learn how to develop a scalable method to predict poverty in rural areas using a CNN that identifies image features. We will utilize both daytime and nighttime satellite imagery to create an accurate and inexpensive method to estimate asset wealth at the cluster level (an area 10-by-10 kilometers). The following outlines capture the main goals:
There are three components: nightlights, daytime imagery, and DHS surveys.
Downloading Rwanda DHS data from the official DHS page and constructing clusters requires registering for access. These surveys provide representative household data for health, population, and nutrition, and they list asset scores — a measure of wealth — by assigning values on a scale of -2 to 6 to common assets like electricity and technological devices. This data will serve as our “ground truth” data and the labeling system for extracting corresponding daytime and nighttime imagery. We also use a geographic dataset, a shapefile, that reports the coordinates of every designated cluster. Specifically, we requested the RWHR61FL.ZIP (household surveys) and RWGE61FL.ZIP (shapefile) files on the DHS website linked above.
The 2010 nightlights file is a single large image containing nightlights intensities from around the world. To extract Rwanda only, we utilize the shapefile downloaded prior.
Lastly, we obtain daytime images from Google Maps Platform, using an API key. Like nightlights, these are extracted based on the locations provided in the DHS dataset, containing valuable features of landscape and activity at the cluster level. Using the Maps Static API and the Rwanda shapefile, we ping the service for tens of thousands of satellite images. We use 400-by-400-pixel images that represent 1 square kilometer per image.
Data organization scheme
After the necessary data is collected, the goal becomes to understand whether the nightlights data can be used to predict wealth. First, we merge the DHS and nightlights data and then fit a model of wealth on the nightlights. The average nighttime luminosity for each of the DHS clusters is computed by taking the average of the luminosity values for the nightlights’ locations surrounding the cluster center.
(left) overlay of asset scores and nightlights; (right) regression model to illustrate the relationship between average cluster wealth and corresponding cluster nightlight luminosities
The overlay visualization implies that nightlight luminosity is a great indicator of lower poverty, as brighter areas have clusters of cool-colored dots representing higher asset wealth. To dive deeper into why the asset scores of darker areas are the way they are, as well as better the regression model, we ask: what meaningful things can we extract from satellite images?
Daytime images, which are constantly recorded, automatically updated, and available in vast numbers, prove to be a valuable resource.
Extracting basic features
To test whether daytime imagery can predict cluster wealth, we first extract basic features. Images are encoded so that every pixel is composed of three numerical values between 0 and 255, corresponding to levels of red, green, and blue. From each of these three color layers, we extract five basic features: the max, min, mean, median, and standard deviation of the pixel values.
Daytime images are then merged with the DHS data, and a model of wealth as a function of these basic daytime features is fitted. The linear regression model to predict average cluster wealth yields an R² value (correlation coefficient) of 0.558.
Basic image features — asset wealth predictions when the model is trained on basic daytime imagery features, yielding an R² value of 0.558
Incorporating a pre-trained neural network
Next, we extract features with deep learning. A CNN (8 layers, VGG-F) pre-trained on ImageNet, a large annotated database designed for object recognition research is first obtained.
Original VGG-F architecture, in which there is an input that is analyzed through convolutional and pooling layers. The model extracts 4096 features from the image.
In learning to classify each image correctly from over 1000 categories, the model identifies low-level features like edges and corners, critical to computer vision tasks. With Keras, we use the CNN to input the daytime satellite images into this model and output a 4096-dimensional feature vector, and these features can then be used to predict asset wealth. The R² value increases to 0.689. Thus, we conclude that extracting features with deep learning does indeed increase the model’s accuracy.
CNN features — asset wealth predictions when the model is trained on daytime imagery features from CNN, yielding an R² value of 0.689
Up to this point, we have only incorporated daytime images. To obtain even better predictions, we should also include nighttime images in a clever way.
In an attempt to increase the model’s accuracy, we apply a transfer learning step — instead of directly using the image features extracted by the CNN, we retrain it to predict nightlights from the daytime imagery to ultimately use those features, which are presumably more appropriate for the final prediction task of estimating asset wealth. The model relies on a transfer learning approach to extract image features from daytime satellite imagery using a CNN. There are three steps:
Estimating nightlight intensities
The DHS survey datasets indicate nighttime luminosities as integers in the range 0 to 63. When predicting nightlights, we group these luminosities into three classes — low (0 to 2), medium (3 to 34), and high (35 to 63).
Repurposing the model
We now repurpose the model as a feature extractor for daytime satellite imagery by retraining the last layer; this was the nightlight intensity classification layer. Essentially, what we are asking is: given an input of daytime imagery, what would the same area look like at night?
The model learns a nonlinear mapping from each inputted satellite image to a certain vector representation and has filters that “slide” across the image, pinpointing features that gradually grow more complex. For example, the model originally learns basic features such as edges and vertices but eventually looks for much more intricate features such as roads, waterways, and buildings.
The fully-connected layers are converted into convolutional layers followed by an average pooling layer, which allows the network to make multiple evaluations of a single image via convolutions and average the results to produce one feature vector summarizing each image. Convolutional layers evaluate the input multiple times, outputting multiple feature vectors. Several evaluations of the image help process different parts of each input image.
Obtaining final feature vectors
We then average these feature vectors to obtain one vector for each cluster, which is used as input in ridge regression models for estimating consumption expenditure and asset wealth.
Convolutional filters overlaid on satellite imagery (source: Science)
We analyze whether these features, rather the ones collected using just daytime imagery, do a better job of predicting average cluster wealth. And they do! The R² value increases even further to 0.718, a significant rise.
Transfer learning — asset wealth predictions when the model is trained on daytime imagery features with transfer learning, yielding an R² value of 0.718. The model is now more reliable, indicated by the closeness of the red and dashed gray lines and high R² value.
The transfer learning model for estimating economic well-being significantly improves upon the existing method of nightlights. We have analyzed the well-being of populations near and below the poverty line, as nightlights show little variation and have low predictive power in these areas. Nightlights cannot distinguish well between poor, dense areas and wealthy, sparse areas, both of which have low nightlight levels.
A normalized confusion matrix is produced following the application of transfer learning. Matrix values are greatest along the diagonal, indicating a relatively high true positive rate. As is evident from the figure, the lowest nightlight class has the highest accuracy (a major distinction from the nightlights-only model), and the highest nightlight class has the second-highest accuracy.
Transfer learning confusion matrix, as nightlights fail to effectively characterize low-brightness areas
Additionally, we can build a heatmap of our wealth predictions, taking all of the plotted features and overlaying them onto a map of Rwanda. You can see that the results are quite like the colored map of the overlay of asset wealth and nightlight data shown previously, implying the model performs well.
Heatmap of predicted wealth — lighter areas correspond to less impoverishment, and darker areas correspond to more impoverishment.
The approach presented demonstrates that we can utilize CNNs with daytime and nighttime satellite imagery in coordination with survey data to accurately pinpoint areas of high and low economic well-being — and thus, poverty — in specific places. As the method is scalable and inexpensive, we are looking to expand the studies and results to more countries.
The full code is available at this GitHub repository.
This post was originally published by Asmi Kumar at Towards Data Science