Deploy Sci-kit learn models in .NET core applications


This post was originally published by George Novack at Towards Data Science

Use ONNX and the ONNX Runtime to share a single model across programming languages and technology stacks.

One frequent pain point for organizations trying to deliver ML models to Production is the discrepancy between the tools and technologies used by Data Scientists and those used by Application Developers. Data Scientists will most likely work in Python, using Machine Learning frameworks like Sci-kit Learn, Tensorflow, or PyTorch to build and train their models, while Application Developers will often use programming languages like Java, C#, or JavaScript to build enterprise applications, leveraging popular Web Application frameworks like Spring or ASP.NET Core.

There are many ways to bridge this gap. A couple of the most common approaches are:

  1. Have the Data Scientist send the model’s trained parameters to the Application Developer, who then must rewrite the model using the language of the web application where the model will be used.
  2. Develop a separate suite of web applications to host the deployed models using a Python web framework like Flask or Django.

Neither of these is ideal. With the first approach, the manual rewrite step leads to slower cycle times, duplication of logic, and an increase in the probability of human error. The second approach, while more appealing than the first, is still not great, as it hinders the ability to share development patterns, libraries, and core logic (such as security, logging, or common integration logic) across all applications.

In this article, we’ll look at a better way to bridge the technology gap between Data Scientists and App Developers using the ONNX Model format and the ONNX Runtime. Specifically, we’ll show how you can build and train a model using Sci-kit Learn, then use that same model to perform real-time inference in a .NET Core Web API.

Integrating the ML model and application development lifecycles. (Image by author)

What is ONNX?

Open Neural Network Exchange, or ONNX, is an open ML model format, similar to the Pickle format often used to save and load Sci-kit Learn models, or the SavedModel format for Tensorflow models. ONNX, however, is framework-agnostic, meaning you can produce ONNX format models from just about any popular Machine Learning framework.

In addition to the ONNX model format, we’ll also be using the ONNX Runtime, an open-source runtime that will allow us to run our ONNX model within our .NET Core Application. We’ll be using the C# APIs, but the ONNX Runtime also supports APIs for several other languages, including Python, Java, and Javascript.

You can read more about the ONNX project and the frameworks supported here:

And you can learn more about how to use the ONNX Runtime with different languages and platforms here:

Build an ONNX Model

First, we’ll build and train a Sci-kit Learn model using the California Housing Dataset. Nothing special here, just a GradientBoostingRegressor trained to predict the price of a house given a few data points about the neighborhood, such as median income, the average number of bedrooms, and so on.

We’ll need to install the sklearn-onnx library which will allow us to convert the sklearn model into the ONNX format:

pip install skl2onnx

Then we’ll use the convert_sklearn() method to do the conversion:

The initial_types parameter defines the dimensions and data types of the model input. This model takes 8 inputs of type float. The None in the input dimension [None,8] indicates an unknown batch size.

Note: There are some limitations for converting Scikit-learn models to ONNX format. You can find details about these limitations and the sklearn-onnx library here:

Perform Inference with an ONNX Model

Now for the ASP.NET Core application that will use our ONNX model and expose it as a REST API endpoint, enabling real-time inference as a service. We’ll create an empty ASP.NET Core Web API using the dotnet command-line tool:

dotnet new webapi

Next, we’ll install the Microsoft.ML.OnnxRuntime NuGet package which will allow us to load and score the ONNX model within the .NET Core application:

dotnet add package Microsoft.ML.OnnxRuntime

Before we can score the model, we need to start an InferenceSession and load the model object into memory. Add the following line to the ConfigureServices method in Startup.cs:

If you’re not familiar with ASP.NET Core or dependency injection, the above line is simply creating a singleton instance of typeInferenceSession and adding it to the application’s Service Container. This means that the Inference Session will be created only once when the application starts up, and that the same Session will be reused by subsequent calls to the inference API endpoint we’ll create shortly.

You can find more in-depth information about the Service Container, service lifetimes, and dependency injection in general here: Dependency Injection in ASP.NET Core

Notice that in the code above, we’re loading the .onnx model file in from the local filesystem. While this works fine for our example, in a production application you’d probably be better off downloading the model object from an external model repository/registry such as MLFlow in order to separate version control of the ML model from that of the application.

Now that the application knows how to load our ONNX model into memory, we can create an API Controller class, which, in ASP.NET Core applications, is simply a class that defines the API endpoints that will be exposed by our application.

Here’s the Controller class for the inference endpoint:

A few notes about the above class:

  • The [Route("/score")] attribute specifies that we can make requests to this controller’s endpoints via the route /score
  • Notice that the class’s constructor accepts an object of type InferenceSession as a parameter. When the application starts, it will create an instance of our controller class, passing in the singleton instance of InferenceSession that we defined earlier in Startup.cs.
  • The actual scoring of our model on the inputs happens when we call _session.Run()
  • The classes HousingData and Prediction are simple data classes used to represent the request and response bodies, respectively. We’ll define both below.

Here is the HousingData class that represents the JSON body of the incoming API request to our endpoint. In addition to the object’s properties, we’ve also defined an AsTensor() method that converts the HousingData object into an object of type Tensor<float> so that we can pass it into our ONNX model:

And here is the Prediction class that defines the structure of the response with which our API endpoint will reply:

Testing out the API

And that’s it. We’re now ready to run our ASP.NET Core Web API and test out our inference endpoint. Use the dotnet run command at the root of the project to start the application. You should see a line like this in the output indicating which port the application is listening on:

Now listening on: http://[::]:80

Now that the app is up and running, you can use your favorite tool for making API requests (mine is Postman) to send it a request like so:

Sending a request to the /score endpoint via Postman. (Image by author)

And there you have it! We’re now able to get predictions from our model in real-time via API requests. Try tweaking the input values and see how the predicted value changes.


Example Code

Spread the word

This post was originally published by George Novack at Towards Data Science

Related posts