RL Open Source Fest header: hexagonal graphics with network node connectors on blue background

Reinforcement Learning Open-Source Fest

Introducing students to open-source reinforcement learning programs and software development.

Program dates: May – August 2023

2023 Alumni

RL Open Source Fest alumni - Micky Yun Chan

Micky Yun Chan

Micky (opens in new tab) recently completed his computer science master degree in Erasmus Mundus Software Engineer For Green Deal (SE4GD) programme. He believes that open source development can and will play a big part for many years to come and he likes to explore random new open source projects in his free time. One of his goals is to create a successful open source project.

Outside of programming, he likes to play board games and real-time strategy LAN games.

Demo: Testing Infrastructure for VowpalWabbit (opens in new tab)

Several improvements can be proposed for the current approach to end-to-end testing in Vowpal Wabbit. The existing methodology generates expected output by executing the exact same command that is under evaluation, potentially introducing challenges to the robustness of the tests. Additionally, the current approach operates without assumptions about the nature of the data it is trained on. Furthermore, it lacks the capability to facilitate the implementation of tests using hyperparameter grids, which can result in increased implementation costs for Vowpal Wabbit.

To address these concerns, a new domain-specific language has been developed to facilitate the creation of end-to-end test configurations. This language includes support for defining hyperparameter grids, pluggable data stimulators and assertion functions, thereby enhancing the testing process.


RL Open Source Fest alumni - Stelios Stavroulakis

Stelios Stavroulakis

Stelios Stavroulakis is a PhD student at the University of California, Irvine. For the past 5 years, he’s worked on reinforcement learning, focusing on applications in privacy, warehouse scheduling, and large language model hallucination control. His theoretical work is focused on exploring the intersection between reinforcement learning and game theory. Stelios has the ambition to incorporate his academic expertise to address a plethora of open problems in industry.

Demo: Optimizing In-Context Learning in LLMs (opens in new tab)

Our project focused on improving the way large language models (LLMs) respond by choosing the right examples for in-context learning.

We addressed this by viewing it as a contextual bandit problem. By using a method called reduction to regression [1], we upper bound the regret of the contextual bandit algorithm by the performance of the regressor. Additionally, we utilized CappedIGW [2], ensuring our algorithm can handle a large action-space effectively. A standout feature is the regressor-agnostic nature of this method which allows the incorporation of powerful transformer models that excel in extracting semantic information from text. While we explored various techniques like bert and sentence transformers, progressive example fine-tuning (PEFT) emerged as the most promising approach.

In light of our preliminary findings, there’s clear promise in improving in-context learning via interaction. By decoupling prompt learning from the LLM, we pave the way for tailored personalization as well as striking a balance between computational expense and performance.

[1] Dylan J. Foster, & Alexander Rakhlin. (2020). Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles.

[2] Rucker, M., Zhu, Y., & Mineiro, P. (2023). Infinite Action Contextual Bandits with Reusable Data Exhaust. arXiv preprint arXiv:2302.08551.


  • RL Open Source Fest alumni - Sharvani Somayaji

    Sharvani Somayaji

    Sharvani is a senior-year undergraduate student studying Electrical and Electronics Engineering at the National Institute of Technology Karnataka, India. Her interests lie in the fields related to AI, robotics, and web development, particularly in NLP and Reinforcement Learning. She likes to contribute to open source and collaborate with the community. Outside of work, she loves playing badminton, jogging, and drawing.

    Demo: Improve flatbuffer parser support in VW (opens in new tab)

    Vowpal Wabbit has several file inputs, examples, cache, and models. FlatBuffers is an efficient cross-platform serialization library for languages including C++, C#, C, Go, and Java. Improving flatbuffer parser support in VowpalWabbit will provide a new high-performance alternative to existing input data formats. This project focuses on improving the serialized size of the current flatbuffer format and measuring performance.


    RL Open Source Fest alumni - Ivoline Ngong

    Ivoline Ngong

    Ivoline is a 2nd-year Ph.D. student in Computer Science at the University of Vermont and a research scientist at OpenMined. Her research interests broadly revolve around all aspects of machine learning; theory, algorithms, and applications. Currently, she focuses on provable fairness and privacy-preserving machine learning including differential privacy, federated learning, and secure multiparty computation. She can usually be found in the kitchen whipping up a savory meal or getting lost in the plot of a good book when she’s not working.  

    Demo: Compiler Optimizations using Reinforcement Learning (opens in new tab)

    Expert-picked sequences are used in compilers to optimize performance for the conversion of human-written programs into executable binaries. These heuristics are developed by experts who spend hours tweaking compiler knobs resulting in smaller and faster binaries. By replacing these complex heuristics with reinforcement learning(RL) policies, we aim to enable compilers to automatically optimize code without using prefixed parameters or predefined ordering. Specifically, we try to tackle the phase-ordering problem, performing code size and runtime reduction in the LLVM compiler using RL environments, intermediate representations, and datasets provided by CompilerGym. Integrating the gym environment with Vowpal Wabbit RL agents like contextual bandits, we are able to obtain code size reductions. Furthermore, benchmarking the performance of different deep learning-based RL agents, as well as experiments in different observation spaces, shows promising results. Future work can focus on improving datasets and observation features to make them more representative as well as developing better reinforcement learning agents.


    RL Open Source Fest alumni - Shaokun Zhang

    Shaokun Zhang

    Shaokun Zhang is a first-year Ph.D. student in the College of Information Sciences and Technology at Pennsylvania State University. His primary research interests are automatic machine learning. Currently, his research has a special focus on AutoML in the data stream setting. He has contributed to several open-source projects such as AutoML library FLAML at Microsoft Research. Shaokun is passionate about doing research and coding. He wishes to do some impact works on AI research in his academic career. Outside of work, he loves traveling, reading, and running. 

    Demo: Automl Extensions (opens in new tab)

    In machine learning, AutoML is the process of applying machine learning (ML) models to real-world problems using automation. It automates the selection, composition, and parameterization of machine learning models. It aims to allow non-experts to make use of machine learning models and free people from repeated work of model development. As for VW, the target of AutoML reduction is to provide users with a hands-off method to get an optimal learning configuration without prior experience using VW or an in-depth understanding of their dataset. A “configuration” is the basic element of AutoML and a general term that can be extended to any aspect of VW. We defined it as a set of namespace interactions in a contextual bandit problem. However, VW only supports quadratics interaction in AutoML reduction, which greatly limited the application scope and feature richness. We rewrite the data structures of configuration in VW. In this way, it will support storing interactions of arbitrary size and provide a more flexible interface for future development. We also design a mechanism that extends AutoML to add and drop cubic interactions. It will greatly enlarge the search space and expands the application scope of the AutoML algorithm.


    RL Open Source Fest alumni - Songlin Jiang

    Songlin Jiang

    Songlin Jiang is from China and recently (2022 Summer) completed his bachelor’s degree in specialized class for fundamentals and theories of Computer Science and Technology (Hons) at Lanzhou University. Songlin is also an incoming student (2022 Fall) for Security and Cloud Computing (SECCLO) Erasmus Mundus joint master and receives full-ride scholarship support. He will study at two universities in the European Union and start his first year at Aalto University in Finland. Songlin is enthusiastic about open source. He is a member of the openSUSE for his continuous contribution during and after Google Summer of Code 2021. Distributed machine learning and related security issues are Songlin’s research interests. As Moore’s Law is starting to fail, he believes that distributed systems will be the future infrastructure to support the growing need for computing power in machine learning. 

    Demo: Native CSV parsing (opens in new tab)

    Our project introduces a native CSV parser (opens in new tab) making VW recognize the CSV format. The parser follows the RFC 4180 and MIME standards, with specifications of CSV header format adapting to VW training and prediction needs. Our project reaches 100% test and code coverage, and the parsing performance is comparable to the VW format parser. We also write a tutorial for using the CSV parser. Here are the design details: 

    Because alternative delimiter-separated files are often given a `.csv` extension despite using a non-comma field separator, the parser allows specifying the CSV field separator. In addition, we always need a CSV header to work correctly. If the header doesn’t exist or suit VW parsing needs, we can tell the parser the correct one with command line options. In this case, users do not need to edit the dataset after downloading it from the Internet. 

    For the format of the headers, we use `_label` and `_tag` to mark the label and tag column and `|` to separate the other column’s namespace and feature name. To ensure there will always be an equivalent CSV file for VW format files, we make the label data format the same as the VW labels’. The parser also supports scaling the namespace values by specifying the ratio in command line options. Since multi-line examples often mean different lines have different schemas, CSV is unsuitable for them. However, suppose all the cells in a line are empty, we will still mark it as a new line. In this case, users can still express multi-line examples in CSV files, although it is not listed as supported.


  • portrait of Muhamma Ammar Amid

    Muhammad Ammar Abid

    Muhammad Ammar Abid (opens in new tab) is a senior year undergraduate at NUCES (FAST) Peshawar campus in Pakistan. He has done more than 150 online courses, spending more than 800 hours online the past few years learning anything beneficial, new and exciting related to technology and his journey for learning still continues. Ammar’s goal is to empower himself and other people through learning technology while believing the famous saying that knowledge is something nobody can take away from you. Ammar’s Final Year Project is Real Time Pakistani Currency Detection for Visually Impaired aimed at solving the less accessible products’ problem of currency detection in Pakistan. Ammar believes that his collaboration in Vowpal Wabbit will empower people around the globe and help them to achieve more. Ammar’s future priority is to end up at a diverse and inclusive place where his collaboration can solve real world problems while empowering people in the process.

    Demo: Tensorboard and Tensorwatch Integration (opens in new tab)

    Visualization is necessary for brain to process large information. Tensorboard provides visualization and tooling necessary which can be effectively used for machine learning experimentation. Integrated in the Vowpal Wabbit ecosystem, this could help the users to focus more on the problem and extract meaningful results with help of visualization. Currently, we have extended the Vowpal Wabbit Python bindings to support outputting progress updates and model details to Tensorboard.


    portrait of Wilson Cheung

    Wilson Cheung

    Wilson Cheung recently completed his M.S. in Analytics from Georgia Institute of Technology. He previously worked as a data scientist at Booz Allen Hamilton where he developed many artificial intelligence and data engineering capabilities to address technological problems faced by government clients in healthcare and defense industries. He currently works as a data scientist at Amazon Web Services and is hoping to develop his professional career towards AI-based personalization using reinforcement learning and bandit-based methods learned in the participation of RLOS.

    Demo: Extend FairLearn to include RL bias analysis using VW (opens in new tab)

    Analyzing outcomes of a learned policy in an online setting through contextual bandits poses many unforeseen consequences in the study of responsible AI. Particularly, given that the objective for many RL systems is to maximize accumulated rewards, any policy can potentially carry latent harms throughout the online evaluation process. By using logged contextual bandits data generated via Vowpal Wabbit under an assumed logged policy, we compute FairLearn-supported fairness metrics to identify these harms and use counterfactual analysis to assess the quality of evaluation policies. Lastly, we identify ways to mitigate the impacts of these harms under the reduced weighted multi-class classification setting.


    portrait of Monika Farsang

    Mónika Farsang

    Mónika Farsang is from Budapest, Hungary. She recently graduated from the Budapest University of Technology and Economics, where she studied Mechatronics Engineering. Her specialization was in the field of Intelligent Embedded Systems. She is passionate about solving challenging problems and likes to contribute to open-source projects, because she loves to see that science and technology can become more accessible to everyone. Her research interests are machine learning, in particular reinforcement learning, bio-inspired solutions, and robotics. She hopes to join a PhD program, where she can continue her studies in this area.

    Demo: Safe Contextual Bandits (opens in new tab)

    Contextual bandit algorithms optimize the mean value of the reward distribution without paying attention to worst-case scenarios. However, there are many safety-critical domains where this kind of behavior is undesirable. Consequently, our goal is to implement safe contextual bandits which focus specifically on the worst cases. To achieve this, we use conditional value at risk (CVaR), which means the expected return in the worst q% of the cases. By using this, we optimize the average of the tail instead of the average of the whole distribution to maintain safety and avoid choosing bad actions. CVaR has a dual representation mathematically, which results in off-policy learning with passing modified rewards to the contextual bandit. By leveraging this formulation, we optimize the cut-off value of the distribution online. To optimize the cut-off point, we use a no-regret algorithm called FreeGrad, which is practical because it does not contain any hyperparameter tuning. The results of our project demonstrate that better worst-case behavior can be achieved by optimizing the CVaR of the distribution compared to typical contextual bandit policies.


    portrait of Nishant Kumar

    Nishant Kumar

    Nishant is currently pursuing his bachelor’s in Electronics Engineering from the Indian Institute of Technology (BHU), Varanasi, India. As an undergraduate researcher, most of his work has been broadly in AI, with a particular focus on Reinforcement Learning, Multiagent systems and communication, and game AI. He also specializes in writing code to develop and maintain intelligent systems. He has worked on a diverse set of problems in AI, from implementing efficient RL agents using C++ code to enabling and enhancing parallel processing capabilities in machine learning libraries. Apart from that, he likes contributing to open-source code and reading about cybersecurity, blockchain, astronomy, and human psychology.

    Demo: VW Parallel parsing improvements (opens in new tab)

    Vowpal Wabbit is known for its blazing-fast performance. However, VW’s parsers can be a bottleneck for most operations, so an effective way to multithread the parsers is required to unleash their true potential. Last year, parallel parsing support for text input format was provided. This project builds upon that by providing a better and more efficient way to read and write cache, support for multiple passes, multiline examples, and JSON/DsJSON input formats.


    portrait of Milena Mathew

    Milena Mathew

    Milena Mathew is an undergraduate majoring in Electrical Engineering and Computer Science at UC Berkeley. She’s broadly interested in the intersection of computer science and physics and has previously worked on applying machine learning techniques to problems in the natural sciences. After graduating, Milena hopes to work in industrial R&D. In her free time, you can typically find her tackling the latest crossword or baking a batch of cookies.

    Demo: Safe Contextual Bandits (opens in new tab)

    Contextual bandit algorithms are typically designed to maximize the expected reward over time. However, in systems where there’s a safety constraint, avoiding bad actions may be valued over purely maximizing reward. We aim to account for both of these goals by developing a chance-constrained policy optimizing learner. Chance constrained policy optimization takes advantage of additional observed feedback to determine the probability of a decision violating a constraint while still optimizing for reward. We implemented the learner in Coba- a contextual bandit algorithm benchmarking application- and added functionality to accommodate multiple observations. We aim to show the feasibility of this approach by comparing the amount of constraint-violating behavior in our learner versus traditional contextual bandit algorithms.


    portrait of Krystal Maughan

    Krystal Maughan

    Krystal Maughan is from Trinidad and Tobago. She is currently pursuing a PhD in Computer Science, minoring in Pure Mathematics at the University of Vermont, focusing on isogeny-based cryptography with Christelle Vincent and Joe Near. She has previously published in Fairness and Privacy for workshops at NeuRIPS and MD4SG, contributed to open source for Haskell.org for Google Summer of Code and Mozilla’s RustReach and interned at Apple, Microsoft, Mercury (a Haskell fintech), and Autodesk. She graduated with a Bachelor’s in Film, Photography and Visual Arts, and a double minor in Art History and Technical Theatre from Ithaca College. Before grad school, she worked in Hollywood as a lighting and camera technician for high speed lighting and camera R&D tech startups, did a workshop at the Jet Propulsion Laboratory and learned to sail. She wants to continue isogeny / mathematical cryptography research after her PhD.

    Demo: Integrate Estimator Library into Azure Machine Learning (AML) Pipeline (opens in new tab)

    In this work, we create a complete end-to-end pipeline for the user, from loading the data as a stream of decisions done by a reinforcement learning system and information of policies that we are trying to estimate counterfactually, running estimators from the estimators library as configured by the user, to running local and distributed computation on the compute clusters of Azure Machine Learning (AML) and visualizing the aggregated result for each estimator locally as an output of aggregated base on number of events counterfactuals for given policies/estimators.


    portrait of Jui Pradhan

    Jui Pradhan

    Jui Pradhan is a final year student pursuing B.E. Computer Science and MSc. Economics from BITS Pilani. Her current research interests involve Artificial intelligence, Federated Learning, Algorithmic Game theory and Optimization. She has worked on projects at the intersection of Reinforcement Learning, NLP and Information retrieval. Before interning at Microsoft as a Vowpal Wabbit contributor, she was a mentor at Google Summer of Code- Sugarlabs and has contributed to several open-source projects. Outside of work, she loves to paint, write and explore different forms of creativity and art. After graduating in 2023, she hopes to work on impactful AI-driven projects and work part-time on research projects at academic research labs.

    Demo: Integrate Estimator Library into Azure Machine Learning (AML) Pipeline (opens in new tab)

    The estimator library is a collection of estimators to perform off-policy evaluation. The current prototype of the estimator library lacked a clear structure, which made it hard to install and consume. Moreover, for researchers to contribute by adding new estimators, it is imperative for them to know which interfaces to use according to their problem type(cb, ccb, slates, ca, etc). Therefore, as a part of this project, we added interfaces for each problem type, worked on CI improvements, added new estimators, added tests, structured the estimator library and finally released it to PyPI as vw-estimators. Our work will capacitate the estimator library to be consumed by AML pipeline and end-users.


    portrait of Manav Singhal

    Manav Singhal

    Manav Singhal is a senior year undergraduate student at the National Institute of Technology Karnataka, India studying Electrical and Electronics Engineering. His current research interests lie in improving the deployment of reinforcement learning in real-world scenarios and increasing the interpretability of machine learning models. Besides work, Manav loves reading, traveling, and running! After graduating, he wishes to pursue his graduate studies in Computer Science.

    Demo: Empirical Analysis of Privacy Preserving Learning (opens in new tab)

    In many real-world learning scenarios, due to privacy constraints (for example, General Data Protection Regulation), one cannot use the user feature mapping directly for personalization. In order to uphold the privacy of the user, we aim to study the effect of using aggregated data for learning. We explore the notion of “aggregation” by saving only those features after training that have crossed a certain threshold of users. This project focuses on comparing the performance of the model without aggregation (public model) and the model with aggregation (private model), thus understanding how much this filtering helps in the learning process.


    portrait of Varun Suryan

    Varun Suryan

    Varun Suryan grew up in northern India. He is a final-year Ph.D. student in Computer Science at the University of Maryland. His interests lie in reinforcement learning (RL), multi-armed bandits, and robotics. His Ph.D. focuses on improving the sample efficiency of RL agents with the help of simulators. He attended the Indian Institute of Technology Jodhpur and Virginia Tech for his B.Tech. and MS in Mechanical Engineering and Computer Engineering respectively. Varun is passionate about technology and loves collaborating with people from various domains. In the future, he wishes to pursue his work in RL and AI. In his spare time, he runs and plays tennis.

    Demo: AutoML for Online Contextual Bandits (opens in new tab)

    We propose the ChaChaCB algorithm for making online feature interaction choices for contextual bandits. This is crucial in online learning services which can significantly benefit from online autoML style algorithms to automatically choose hyperparameters/configs. Currently, most of the tuning is done manually. This problem has been studied before under the full information (supervised) setting where each configuration has access to the revealed feedback from the environment. However, bandits present unique challenges, and not every configuration gets to receive feedback from the environment. By using importance weight to update the loss bounds for a subset of configurations, ChaChaCB performs competitively with several baselines. Further, we plan to integrate ChaChaCB as a learner in Coba – a standardized framework to test contextual bandit learners.


    portrait of Vishal Vinod

    Vishal Vinod

    Vishal Vinod (opens in new tab) is a Computer Science master’s student at University of California, San Diego. His current research interests are in continual learning, 3D computer vision and domain adaptation to improve the performance of autonomous systems in real-world scenarios. Vishal is strongly motivated by AI4SocialGood and aims to work on socially impactful AI research applications. Apart from research, he is involved in the open-source community and reads up on developmental economics.

    Demo: VW feature transformation without redeploying the source (opens in new tab)

    Feature transformation are necessary to prototype, mutate or compare the performance of a trained model. Currently, creating a feature mutation in VW requires the implementation of a new C++ reduction for each mutation, making it harder to work on new ones. This project simplifies feature modifications by implementing a generic reduction such that example modification functions can be registered with the interface and used without having to implement a reduction for each transformation. This allows mutation functions such as deleting a feature, dropout, normalization, logarithmic mutation, feature binarization, etc. to be registered on the stack for the model in memory without redeploying the source. The generic reduction enables feature engineering pipelines using only callable functions to modify the example, and also allows comparing the performance of models with and without mutations in a single run for benchmarking and avoids the step of transforming the data by other means.


  • head shot of Milind Agarwal (opens in new tab)

    Milind Agarwal

    Milind Agarwal (opens in new tab) is a combined undergraduate and master’s student in Computer Science at Johns Hopkins University. His current research interests are natural language processing and machine translation for low-resource and endangered language settings. Before interning at Microsoft, he previously worked in many different academic research labs at Johns Hopkins gaining experience in a wide variety of fields including NLP, machine translation, computational biology, data visualization, and software development. After graduation in 2021, he hopes to join a Ph.D. program where he can continue to work on challenging NLP problems.

    Demo: Challenge: Contextual Bandit Data Visualization with Jupyter Notebooks (opens in new tab)

    Exploratory data analysis and data visualization have become an essential part of any data scientist’s toolkit. Visualizations not only allow you to kickstart your analysis by easily understanding the patterns in your data but also help you visually inspect your policies to understand their behaviour. We present cb_visualize, a python-based visualization library specialized for contextual bandits features and policy visualizations. This library offers robust visualizations for data exploration, training, feature importance, and action distributions and supports common contextual-bandit dataset formats used by Vowpal Wabbit like text, JSON, and DSJSON. We hope that this toolkit will be an asset for researchers and customers alike to better present and understand their data and analyses.


    head shot of Sharad Chitlangia (opens in new tab)

    Sharad Chitlangia

    Sharad Chitlangia (opens in new tab) is a senior year undergraduate student at BITS Pilani Goa, where I studied Electronics. I am specializing in the field of Artificial Intelligence. I’ve previously worked heavily at the intersection of Machine Learning and Systems and Explainable AI. Aside from work, I spend a lot of time in the Open Source Community and working on improving accessibility, especially in AI research.

    Demo: Challenge: Pushing the Limits of VW with Flatbuffers (opens in new tab)

    VowpalWabbit is known for its abilitiy to solve complex machine learning problems extremely fast. Through this project, we aim to take this ability, even further, by the introduction of Flatbuffers. Flatbuffers is an efficient cross-platform serialization library known for its memory access efficiency and speed. We develop Flatbuffer schemas, for input examples, to be able to store them as binary buffers and show a performance increase of 30%, or more compared to traditional formats.


    head shot of Harish Kamath (opens in new tab)

    Harish Kamath

    Harish Kamath (opens in new tab) is a Computer Science/Math undergraduate at Georgia Tech. His passions focus on reinforcement learning, generalization in learning, and making newer technologies cheaper, faster, and more accessible. Outside of work, I love playing/watching basketball, dance, and running! Once I graduate, I hope I end up somewhere where I can make the biggest lasting impact for the most people.

    Demo: Challenge: Conversion of VowpalWabbit models into ONNX format (opens in new tab)

    Currently, ONNX is the leading standard to represent machine learning models across platforms and frameworks. It describes a model as a computational graph consisting of a set of standard operators from an operator set that is constantly evolving to accommodate new types of models and operations. Being able to describe a model in ONNX format is important, as it allows for (1) models to be optimized and run across different architectures using a single runtime, and (2) it allows models created in different frameworks to interact with each other. Although other leading frameworks such as Tensorflow and Pytorch have mature tools to convert into ONNX format, VowpalWabbit today does not yet have the capability baked into the framework. This project focuses on introducing this functionality to VowpalWabbit, so that we can combine the fast model training and inference speed of VW with the representational capacity of other frameworks. We introduce new sparse operators that are used to instantiate VW regression models efficiently in ONNX format, show that you can directly translate regression and contextual bandits models with these operators, and give an example of such models being run in RLClientLib to show that they can now be ported into any inference framework.


    head shot of Cassandra Marcussen (opens in new tab)

    Cassandra Marcussen

    Cassandra Marcussen (opens in new tab) is a junior at Columbia University studying Mathematics and Computer Science. Her interests lie in artificial intelligence, theoretical computer science, and contributing to technology within these fields through efficient computing and low-level optimizations. Cassandra is enthusiastic about open source code, and has loved working on an impactful open source system such as Vowpal Wabbit. In the future, she wishes to pursue graduate studies in Computer Science.

    Demo: Challenge: Parallelized Parsing (opens in new tab)

    Modern machines often utilize many threads to achieve good performance. Currently, VW uses a single thread to read in and parse input, and a single thread to learn. The parse thread presents a bottleneck, slowing down VW as a whole. By extracting the input reading into a separate thread and extending the parser to support many threads, VW can better utilize resources, achieve better performance, and have an improved design by separating logical components into independent modules. This project focuses on improving performance and design for the text input format, and also ensures compatibility with the cache input format.


    head shot of Newton Mwai (opens in new tab)

    Newton Mwai Kinyanjui

    Newton Mwai Kinyanjui (opens in new tab) is from Nairobi Kenya. I’m currently pursuing my Ph.D. at Chalmers University of Technology in Sweden, working in causal inference and reinforcement learning towards machine learning for improved decision making in healthcare with Fredrik Johansson. I graduated from Carnegie Mellon University Africa with a Master of Science in Electrical and Computer Engineering.

    Demo: Challenge: Library of contextual bandit estimators (opens in new tab)

    Estimators are used in off-policy evaluation. One common estimator is IPS, and others are DR and PseudoInverse. These estimators work better or worse in different settings. This project explores reference implementations of each and allows for comparison between them to aid in understanding. We extend the estimators library and implement an interface to help researchers and data scientists test different estimators quickly and easily.


    head shot of Mark Rucker (opens in new tab)

    Mark Rucker

    Mark Rucker (opens in new tab) is currently a 2nd year PhD student at the University of Virginia with a previous 8-year career as an enterprise software engineer. Mark’s PhD research explores how reinforcement learning models can be used to encourage health behavior change in individuals managing chronic health conditions. This research combines state of the art machine learning with web and mobile app development to support in-situ randomized control trials of behavior change interventions. After graduation Mark hopes to once again return to industry in order to develop high-quality products that deeply impact people’s lives.

    Demo: Challenge: COBA: A Modern Benchmarking Package for Reproducible Contextual Bandit Research (opens in new tab)

    Performance benchmarking on well-defined problems is a pillar of modern machine learning research. With clear problems and metrics, benchmarking has allowed the research community to maintain a high-level of independent effort while still making real and meaningful progress over time. The elegance of benchmarking — define, measure, repeat — however, belies real engineering challenges such as software maintenance, data distribution, statistical aggregation, and reproducibility to name a few. These challenges are especially salient in contextual bandit research where one not only needs a data set but also a harness to emulate interaction with the data. In an effort to reduce these burdens, while not losing any of benchmarking’s benefits, we present COBA, an ultra light-weight Python package for benchmarking contextual bandit algorithms. COBA uses a small set of clean and consistent interfaces to satisfy four core use cases: (1) creating reproducible benchmarks, (2) sharing reproducible benchmarks, (3) evaluating custom algorithms, and (4) exploring evaluation results.