Two tools every Data Scientist should use for their next ML project


This post was originally published by Braden Riggs at Towards Data Science

Uber’s Manifold

Photo by Dan Gold on Unsplash

For my project, I am creating an Ensemble. An Ensemble is a collection of machine learning algorithms that each individually train and predict on the same data. The advantage of an Ensemble is that it provides a range of different strategies for finding a solution and utilizes a majority vote that democratizes the classification by all the models. This is useful because whilst an individual model may predict some portions well, it may struggle on other portions of the data. Hence, an ensemble is just the machine learning version of the strength in numbers adage. In order for an ensemble to perform well, the individual models that make it up must have diversity of prediction. Diversity of prediction is a fancy way of saying that the models can’t all be predicting exactly same for the exact points, rather they should be performing well on different selections of points. This raises the question however, how do you know if the parts of your ensemble are diversifying their predictions? This is where the transportation tech giant Uber’s Manifold comes in.

Uber’s Manifold is an open-source long-term project that aims to provide a model-agnostic visual debugging tool for machine learning. In layman’s terms Manifold allows you to visualize which subset of the data your model or models are underperforming on and which features are causing ambiguity.

As you can imagine this is very useful when working on ensembles. The tool creates a widget output that can be interacted with within your notebook for quick analysis. It is important to note, however, that this tool currently only works in classic Jupyter notebooks. It doesn’t function on Jupyter Lab or Google’s Colab.

Manifold works by using k-means clustering, a neighbor grouping technique, to separate the prediction data into performance similarity segments. You can imagine this as splitting the data into subcategories of similarity. The models are then plotted along each segment, where the further to the left the model is the better it performed on that segment, you can see this on a randomly generated example below:

Manifold’s performance comparison widget within my Jupyter Notebook. Mousing over the lines provides values and insight into the results. Image by Author.

In the example above we have three models and the input data has been split into four segments. Using log-loss as our performance metric we can see that model_1 performs poorly on segment_0, whereas model_2 performs poorly on segment_2. The shape of the lines represents the performance distribution and the height of the lines represents the relative data point count at that log-loss. So again, for example, on model_1 in segment_1, we can see that there is a low but intense concentration of points with a log loss of 1.5.

Manifold also offers a feature attribution view:

Manifold’s feature attribution widget within my Jupyter Notebook. Mousing over the lines provides values and insight into the results. Image by Author.

The feature attribution view highlights the distribution of features for each segmentation. In the example above data group 0 includes clusters two and three, and we are comparing them to data group 1 which includes clusters zero and one. Along the x-axis is the feature values and the y-axis is the intensity of the cause. Feature_0 highlights these differences at small intervals whereas feature_1 highlights the histogram of feature values. Because this is an interactive widget the values aren’t shown unless moused over. If you are interested in a closer look check out the example here.

So how do we integrate Manifold in our project?

Manifold is still in the early stages of development and there are still some bugs and nuances to the tool, however, this should not discourage you from trying to use it in your own project. In my circumstances, I needed to install a few packages to get it to work in my Jupyter notebook. This required some trial and error but eventually resulted in the following commands:

!jupyter nbextension install --py --sys-prefix widgetsnbextension
!jupyter nbextension enable --py --sys-prefix widgetsnbextension
!pip install mlvis
!jupyter nbextension install --py --symlink --sys-prefix mlvis
!jupyter nbextension enable --py --sys-prefix mlvis

It wasn’t sufficient to just install the nbextention packages, I also had to enable the packages. From here we can import a few tools for our demo:

from mlvis import Manifold
import sys, json, math
from random import uniform

To use the Manifold framework your data needs to grouped into three specific formats. The first group is all of your x-values, which must be in a list of dictionaries:

#Example of x-values
x = [
{'feature_0': 21, 'feature_1': 'B'},
{'feature_0': 36, 'feature_1': 'A'}

The second group is your different model predictions, which must be a list of a list where each list is a different model’s predictions:

#Example of model predictions
yPred = [
[{'false': 0.1, 'true': 0.9}, {'false': 0.8, 'true': 0.2}],
[{'false': 0.3, 'true': 0.7}, {'false': 0.9, 'true': 0.1}],
[{'false': 0.6, 'true': 0.4}, {'false': 0.4, 'true': 0.6}]

The final group is the ground truth values or actual correct y-values, which are in a list of values:

#Example of ground truth
yTrue = [

Once your data is in this format we can pass the values into the Manifold object and execute to get the widget, which looks like the examples above:

Manifold(props={'data': {
'x': x,
'yPred': yPred,
'yTrue': yTrue

Using the Manifold tool you can then visually evaluate how your different models are performing on the same data. In my case, this was very helpful for building the ensemble because it allowed for me to understand which models performed where, and which data clusters were the hardest for the models to classify. Manifold also helped me evaluate the diversity of prediction for each model within the ensemble allowing me to construct a more robust apparatus that was able to classify over a range of different inputs.

Spread the word

This post was originally published by Braden Riggs at Towards Data Science

Related posts