Announcing: Hummingbird A library for accelerating inference with traditional machine learning models

Wed, 27 May 2020 16:00:40 +0000

Traditional machine learning (ML), such as linear regressions and decision trees, is extremely popular. As shown in the chart below of the Kaggle Survey from 2019, the most popular ML algorithms are still traditional (shown in green).

Recently, the ever-increasing interest around deep learning and neural networks has led to a vast increase in processing frameworks that are highly specialized and optimized for running these types of computations. Frameworks like TensorFlow, PyTorch, and ONNX Runtime are built around the idea of a computational graph that models the dataflow of individual units and have tensors as their basic computational unit. These frameworks can run efficiently on hardware accelerators (e.g. GPUs) and their prediction performance can be further optimized with compiler frameworks such as TVM.

Unfortunately, traditional ML libraries and toolkits (such as Scikit-Learn, ML.NET, and H2O) are usually developed to run on CPU environments. While they may potentially exploit multi-core parallelism to improve performance, they do not use a common abstraction (such as tensors) to represent their computation. The lack of this common extraction means that for these frameworks to make use of hardware acceleration, one would need to have many implementations ((for each operator) x (for each hardware backend)) which does not scale well. This means that traditional ML is often missing out on the potential accelerations that deep learning and neural networks enjoy.

Announcing: Hummingbird

We are announcing Hummingbird, a library for accelerating inference (scoring/prediction) in traditional machine learning models. Internally, Hummingbird compiles traditional ML pipelines into tensor computations to take advantage of the optimizations that are being implemented for neural network systems. This allows users to seamlessly leverage hardware acceleration without having to re-engineer their models.

This first open-source release of Hummingbird currently supports converting the following trees to PyTorch:

scikit-learn: DecisionTreeClassifier, RandomForestClassifier, RandomForestRegressor, GradientBoostingClassifier, and ExtraTreesClassifier
XGBoost: XGBClassifier and XGBRegressor
LightGBM: LGBMClassifier and LGBMRegressor

You can see a complete list of our support operators here. We are experimenting with many frameworks and backends, and we will continue to release additional operators and features in the upcoming weeks.

The code

Here’s an example of a RandomForestClassifier in scikit-learn

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer

# Create and train a RandomForestClassifier model
X, y = load_breast_cancer(return_X_y=True)
X = X.astype('|f4')
skl_model = RandomForestClassifier(n_estimators=500, max_depth=7)
skl_model.fit(X, y)
# Execute prediction using scikit-learn model
pred = skl_model.predict(X)

To enable Hummingbird and execute the scikit-learn model on PyTorch, users only need to add:

from hummingbird.ml import convert

And change the prediction code as follows:

# Use Hummingbird to convert the model to PyTorch
model = convert(skl_model, 'pytorch')

# Execute prediction on CPU using PyTorch
pred_cpu_hb = model.predict(X)

The translated model can then be seamlessly executed on GPU as well:

model.to('cuda')
pred_gpu_hb = model.predict(X)

From here, you can experiment with different parameters, see speedups between CPU and GPU, and compare against your initial model. Also, check out some of our sample notebooks that provide additional examples and benchmarking functionality. You can see the documentation here.

The details

Hummingbird works by reconfiguring algorithmic operators such that we can perform more regular computations which are amenable to vectorized and GPU execution. Each operator is slightly different, and we incorporate multiple strategies. This example explains one of Hummingbird’s strategies for translating a decision tree into tensors involving GEMM (GEneric Matrix Multiplication), where we implement the traversal of the tree using matrix multiplications. (GEMM is one of the three tree conversion strategies we currently support.)

Below, we have a simple decision tree:

In this example, the tree takes as input a feature vector with six elements (x∈R6), four decision nodes (orange), and five leaf nodes (blue). We translate the decision tree into neural networks with two additional layers.

And now, the transformed tree:

The first step takes all the features (x1 – x6) and evaluates all the conditions (nodes) of the tree together in one single one matrix multiplication.
For the second step, we put all the leaf nodes (ℓ1-ℓ5) together and evaluate all of them together using matrix multiplication.

Although this leads to redundant computation from checking all conditions (not just the ones we know to be true), this is the key that allows us to do the vectorized computation. To offset this additional computation, we batch tensor operations and minimize the number of kernel invocations in addition to built-in tensor runtime optimizations.

Performance

We ran the example above of RandomForestClassifier on a NVidia P100 GPU-enabled VM. You can see the notebook here for the full example, which includes imports and test data setup.

For RandomForestClassifier with these parameters, Hummingbird provides a ~5x speedup on CPU, and ~50x speedup on GPU.

The table below shows some additional performance data for RandomForestClassifier, LGBMClassifier, and XGBClassifier. We tested Hummingbird on several of the datasets in NVidia’s GDM-bench with an average speed-up of 65x from scikit-learn to PyTorch. The chart reports the average of 5 runs for a batch size of 10K predictions, run on a NVidia P100 VM with 6 CPU cores.

Our tech report provides additional details, where we have a full performance breakdown including per-operator results with varied batch sizes and on a variety of devices. Hummingbird is competitive and even outperforms (by up to 3x) hand-crafted kernels on micro-benchmarks, while enabling seamless end-to-end acceleration (with a speedup of up to 1200×) of ML pipelines.

Next steps

In the upcoming months, we look forward to adding many additional operators, input formats, and backend support, as we outline in our roadmap. We will soon release our linear and logistic regressors. We are investigating how to best integrate HB with existing platforms and are currently integrating Hummingbird with ONNX and its converters. We welcome contributions and collaborators.

The post Announcing: Hummingbird A library for accelerating inference with traditional machine learning models appeared first on Microsoft Research.

]]>

Applied Research; Insightful Impact

Tue, 07 Jan 2020 17:00:12 +0000

We take an ‘applied research’ approach. Our scientists collaborate with our engineering and product teams to solve hard technical problems.

Coupled with a direct line to data, resources, engineers, and the voice of the customer, GSL retains the freedom to investigate novel and strategic solutions. This gives us an intrinsic advantage over solely academic or solely product organizations.

Underpinning this is our mindset to take risks, disrupt, and pursue tricky problems, which enables a unique mix of empowerment, access, and innovation.

GSL has pioneered Microsoft’s commitment to open source; we consider OSS a great opportunity to create a culture of mind-share, and showcase the lab talent, engage with future lab members, and amplify individual contributions.

Insightful impact

Our labs provide the lens that gives Azure Data visibility to ‘see around the bend’. GSL’s unique collaboration with UW Madison and an inherent involvement in OSS projects and meetups, coupled with a focus on industry and scientific developments gives a mitigated approach to explore, validate, and disprove trends to pilot senior leadership.

The advantage of doing research at Microsoft is a direct access to a powerful repository of telemetry and data collected on our operational services. Affectionately referred to as a ‘candy-store for researchers’, we can collect data and estimate the potential impact of a technical solution during a pre-hypothesis stage.

We then drive the design and building of our systems based on real-world workloads, rigorous due diligence on viability, and the cost of ownership.

“In GSL,  I get to work closely with real products and live systems, exploring research problems with the full power of Azure (both massive amounts of data and computational resources).”

– Karla Saur, Senior Research SDE

Intellectual honesty is the driver in building an inclusive and diverse team. Voices are heard on the merits of their ideas versus their loudness. We believe in a multiplicity of views and evidence-based decision making is critical – especially when operating at the edge of the known world; where we think outside the box, because we live outside the box.

Lab fast facts

“The recipe to fulfill GSL’s ambitious mission? Simple: take a few bright minds, let them have fun on hard technical problems, boil into a product-solid consistency, add a zest of OSS, and publications to taste. Serve hot to our customers.”

– Carlo Curino, GSL Principal Scientist Lead

The post Applied Research; Insightful Impact appeared first on Microsoft Research.

]]>

Gray Systems Lab Articles