2022年11月26日

2022 Data-driven Optimization Workshop

地点： Beijing, China

Ruibin Bai
The University of Nottingham Ningbo China
Talk title: Machine learning assisted hyper-heuristics for online combinatorial optimization problems

In the past decade, considerable advances have been made in the field of computational intelligence and operations research. However, the majority of these optimization approaches have been developed for deterministically formulated problems, the parameters of which are often assumed perfectly predictable prior to problem-solving. In practice, this strong assumption unfortunately contradicts the reality of many real-world problems which are subject to different levels of uncertainties. The solutions derived from these deterministic approaches can rapidly deteriorate during execution due to the over-optimization without explicit consideration of the uncertainties. To address this research gap, two data-driven hyper-heuristic frameworks are investigated. This talk will present the main ideas of the methods and their performance for two combinatorial optimization problems: a real-world container terminal truck routing problem with uncertain service times and the well-known online 2D strip packing problem. The talk shall briefly describe a port digital twin system developed by our team for the purpose of integrated optimization of multiple port operations.

Ruibin Bai is Professor at the University of Nottingham Ningbo China. He is a Senior Member of IEEE, currently serving as a Board Member for EJOR and an Associate Editor for Networks. He was awarded with Zhejiang Provincial “Outstanding Young Scientist” Fund in 2016. His research areas include computational intelligence, reinforcement learning, combinatorial optimisation, transportation optimisation and digital twins.

Hu Fu
Shanghai University of Finance and Economics
Talk title: Oblivious Online Contention Resolution Schemes

Contention resolution schemes (CRSs) are powerful tools for obtaining “ex post feasible” solutions from candidates that are drawn from “ex ante feasible” distributions. Online contention resolution schemes (OCRSs), the online version, have found myriad applications in Bayesian and stochastic problems, such as prophet inequalities and stochastic probing. When the ex ante distribution is unknown, it was unknown whether good CRSs/OCRSs exist with no sample (in which case the scheme is oblivious) or few samples from the distribution.

In this work, we give a simple 1/e-selectable oblivious single item OCRS by mixing two simple schemes evenly, and show, via a Ramsey theory argument, that it is optimal. On the negative side, we show that no CRS or OCRS with O(1) samples can be Ω(1)-balanced/selectable (i.e., preserve every active candidate with a constant probability) for graphic or transversal matroids.

Hu Fu is an associate professor at Shanghai University of Finance and Economics, Institute for Theoretical Computer Science. Before SHUFE, he was an assistant professor at University of British Columbia. He obtained his PhD in computer science from Cornell University, and was a postdoc at Microsoft Research New England Lab and Caltech. His research interest is in the intersection of economics and computation, and online algorithms.

Furong Huang
The University of Maryland
Talk title: Efficient Machine Learning at the Edge in Parallel

Since the beginning of the digital age, the size and quantity of data sets have grown exponentially because of the proliferation of data captured by mobile devices, vehicles, cameras, microphones, and other internet of things (IoT) devices. Given this boom in personal data, major advances in areas such as healthcare, natural language processing, computer vision, and more have been made with the use of deep learning. Federated Learning (FL) is an increasingly popular setting to train powerful deep neural networks with data derived from an assortment of devices. It is a framework for use-cases involving training machine learning models on edge devices without transmitting the collected data to a central server.

In this talk, I will address some major challenges of efficient machine learning at the edge in parallel. Model efficiency, data efficiency and learning paradigm efficiency will be discussed respectively. As some highlights, I will introduce our recent progress on model compression via tensor representation, data efficiency through the lens of generalization analysis and a decentralized federated learning framework via wait-free model communication.

Furong Huang is an Assistant Professor of the Department of Computer Science at University of Maryland. She works on statistical and trustworthy machine learning, reinforcement learning, graph neural networks, deep learning theory and federated learning with specialization in domain adaptation, algorithmic robustness and fairness. Furong is a recipient of the NSF CRII Award, the MLconf Industry Impact Research Award, the Adobe Faculty Research Award and three JP Morgan Faculty Research Awards. She is a Finalist of AI in Research – AI researcher of the year for Women in AI Awards North America 2022. She received her Ph.D. in electrical engineering and computer science from UC Irvine in 2016, after which she completed postdoctoral positions at Microsoft Research NYC.

Longbo Huang
Tsinghua University
Talk title: Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

We generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have α-th (1<α≤2) moments bounded by σ^α, while the variances may not exist. Specifically, we design an algorithm HTINF, when the heavy-tail parameters α and σ are known to the agent, HTINF simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When α and σ are unknown, HTINF achieves a log𝑇-style instance-dependent regret in stochastic cases and o(T) no-regret guarantee in adversarial cases. We further develop an algorithm AdaTINF, achieving O(σK^(1-1/α) T^(1/α)) minimax optimal regret even in adversarial settings, without prior knowledge on α and σ. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and α and σ are both known. To our knowledge, the proposed HTINF algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and AdaTINF is the first algorithm that can adapt to both α and σ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

Dr. Longbo Huang is an associate professor (with tenure) at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University, Beijing, China. Dr. Huang’s research focuses on AI for Decisions. He serves/served on 50+ TPCs and 10+ organizing committees for ACM/AI/IEEE conferences, including the General Chair for ACM Sigmetrics 2021, the TPC co-chair for ITC 2022, WiOpt 2020 and NetEcon 2020. Dr. Huang serves on the editorial board for ACM ToMPECS and IEEE/ACM ToN. He is a senior member of ACM and IEEE, an IEEE ComSoc Distinguished Lecturer, and an ACM Distinguished Speaker. Dr. Huang received the Outstanding Teaching Award from Tsinghua University in 2014, the Google Research Award and the Microsoft Research Asia Collaborative Research Award in 2014, and was selected into the MSRA StarTrack Program in 2015. Dr. Huang won the ACM SIGMETRICS Rising Star Research Award in 2018.

Shaofeng Jiang
Peking University
Talk title: Online Facility Location with Predictions

In this talk, we present a nearly optimal algorithm for online facility location (OFL) with (untrusted) predictions. In OFL, n demand points arrive in order and the algorithm must irrevocably assign each demand point to an open facility upon its arrival. The objective is to minimize the total connection costs from demand points to assigned facilities, plus the facility opening cost.

We additionally assume an untrusted predictor can suggest the facility that a demand point should be assigned to. With the access to this predictor but without knowing the error of the prediction, our algorithm achieves O(1) ratio when the error is small, which bypasses a \Omesdfsdflog n / log log n) worst-case lower bound. Furthermore, our algorithm still maintains O(log n) ratio even when the error is unbounded, nearly matching the mentioned lower bound.

Based on a joint work with Erzhi Liu, You Lyu, Zhihao Tang and Yubo Zhang.

Shaofeng Jiang is an assistant professor at the Center on Frontiers of Computing Studies, Peking University. He obtained his PhD from the University of Hong Kong in 2017. He was a postdoctoral researcher at Weizmann Institute of Science, and an assistant professor at Aalto University. His research interest is theoretical computer science, with a focus on algorithms for massive datasets and foundations of data science.

Yan Jin
Huazhong University of Science and Technology
Talk title: End-to-end Reinforcement Learning for the Large-scale Traveling Salesman Problem

Traveling Salesman Problem (TSP) is one of the most studied routing problems that arise in the practical applications of logistics. Traditional approaches not only rely on hand-crafted rules of experts, but also are time-consuming on iterative search. This limits their applications in time sensitive scenarios, e.g., on-call routing and ride hailing service. We propose an end-to-end approach based on hierarchical reinforcement learning for addressing the large-scale TSP. Using a divide-and-conquer strategy, the upper-level policy chooses a small subset of cities from all remaining cities that are to be traversed, while the lower-level policy takes a Transformer model on the chosen cities to solve a shortest path with prescribed starting and ending cities. These two policies are jointly trained by reinforcement learning algorithms, and the TSP solutions can be directly generated without any search procedure. The proposed approach takes advantage of inference efficiency of Transformer model and provides highly competitive results.

Dr. Yan Jin is an associate professor in the School of Computer Science and Technology, Huazhong University of Science and Technology (HUST), Wuhan, China. She received the Ph.D. degree in Computer Science from University of Angers, France, in 2016. Her research focuses on the design of effective algorithms for solving combinatorial optimization problems and practical applications, including reinforcement learning, machine learning based search approaches, hybrid evolutionary algorithms and heuristics/metaheuristics. She has published more than 20 papers in the fields of Metaheuristics, Evolutionary Computation, Combinatorial Optimization and Artificial Intelligence. She visited Queen Mary University of London in 2017 and Polytechnique Montreal in 2018 as a short-term visiting scholar. She was a StarTrack Visiting Faculty at Microsoft Research Asia in 2021.

Yuko Kuroki
The University of Tokyo
Talk title: Combinatorial Pure Exploration with Limited Observation and Beyond

Although most combinatorial optimization models require exact parameters as inputs, it is often impossible to obtain them due to privacy issues or system constraints, and we need to deal with such uncertainty. In this talk, I will introduce recent work on combinatorial pure exploration with limited feedback for solving such combinatorial optimization under uncertainty. We first study the combinatorial pure exploration over graphs with full-bandit feedback, which aims to identify a dense component in a network only together with a noisy evaluation for a sampled subgraph (the offline problem is called the densest subgraph problem). Then we also study the general combinatorial pure exploration with full-bandit or partial-linear feedback, which can work for general combinatorial structures including size-k subsets, matchings, and paths. Finally I will discuss further extensions and several open problems for future work in this line of research.

Yuko Kuroki is a research associate at The University of Tokyo. Her research includes optimization, machine learning, and data mining especially related to graph problems. She received her Ph.D. (2021) from The University of Tokyo under the supervision of Prof. Masashi Sugiyama with the Dean’s Award for Outstanding Achievement. She is a visiting scientist at RIKEN Center for Advanced Intelligence Project (AIP).

Weiran Shen
Renmin University of China
Talk title: Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

Optimizing strategic decisions (a.k.a. computing equilibrium) is key to the success of many non-cooperative multi-agent applications. However, in many real-world situations, we may face the exact opposite of this game-theoretic problem — instead of prescribing equilibrium of a given game, we may directly observe the agents’ equilibrium behaviors but want to infer the underlying parameters of an unknown game. This research question, also known as inverse game theory, has been studied in multiple recent works in the context of Stackelberg games. Unfortunately, existing works exhibit quite negative results, showing statistical hardness and computational hardness, assuming follower’s perfectly rational behaviors. Our work relaxes the perfect rationality agent assumption to the classic quantal response model, a more realistic behavior model of bounded rationality. Interestingly, we show that the smooth property brought by such bounded rationality model actually leads to provably more efficient learning of the follower utility parameters in general Stackelberg games. Systematic empirical experiments on synthesized games confirm our theoretical results and further suggest its robustness beyond the strict quantal response model.

Weiran Shen is an Assistant Professor at Gaoling School of Artificial Intelligence, Renmin University of China. Weiran Shen obtained his Bachelor’s degree from the Department of Electronic Engineering and his Ph.D. from IIIS, both at Tsinghua University. He was a post-doc researcher at Carnegie Mellon University from 2019 to 2020. His research interest includes multi-agent system, game theory, mechanism design, and machine learning. He published more than 20 papers in top-tier conferences and journals in related research areas. He was PC and SPC members of multiple international conferences including AAAI, IJCAI, ICML, NeurIPS, AAMAS. His research in mechanism design has already been adopted by online advertising platforms such as Baidu and ByteDance, and increased Baidu’s revenue by about 5%.

Lei Song
Microsoft Research Asia
Talk title: Deep Reinforcement Learning in Supply Chain Optimizations

There are plenty of optimization problems in industry, e.g., inventory management, vehicle routing etc. However, scale of these problems often makes traditional OR approaches unfeasible in these scenarios. In this talk, I will present potentials of deep reinforcement learning in solving large scale optimization problems in industry. I will start from two case studies with our industrial partners together with some preliminary results: one is about inventory management optimization for a retailer and the other is on-call orders scheduling in real-time for a logistics company. Finally, I will end the talk by introducing you main challenges we are facing.
Lei Song is currently a principal researcher in Machine Learning group in MSRA. His research interests include AI technologies and their applications in industry. Before joining MSR Asia, Lei worked for a leading e-commerce company in China, where he tried to optimize supply chain management leveraging AI technologies. He also had plenty of research experience in academy and has authored tens of research papers in several international conferences. Lei got his PhD degree majored in computer science from IT University of Copenhagen in 2012, and then joined Max Planck Institute for Informatics and University of Technology, Sydney, as a research assistant.

Siwei Wang
Microsoft Research Asia
Talk title: Thompson Sampling in Combinatorial Multi-armed Bandits with Independent Arms

Existing methods of combinatorial multi-armed bandits mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds of base arms to represent the upper confidence bound of a super arm. However, when the outcomes of different base arms are independent, this upper confidence bound could be much larger than necessary, which leads to a much higher regret upper bound (in regret minimization problems) or complexity upper bound (in pure exploration problems). To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design TS-based algorithms for both the regret minimization problems and the pure exploration problems. In TS-based algorithms, the sum of independent random samples within a super arm will not exceed its tight upper confidence bound with high probability. Hence it solves the above challenge, and achieves lower regret/complexity upper bounds than existing efficient UCB-based algorithms.
Dr. Siwei Wang obtained his bachelor and Ph.D degrees from the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. Now he is a senior researcher at MSRA, Theory Center. His research focuses on developing solutions to address challenges in controlling/decision-making systems in artificial intelligence and machine learning. Typically, he models and analyzes the systems through the lens of mechanism design, mathematics optimization, and algorithms. His recent studies are mostly on the design and analysis of online learning models, e.g., bandits problems and reinforcement learning problems.

Junchi Yan
Shanghai Jiao Tong University
Talk title: Machine Learning for Combinatorial Optimization: Some Empirical Studies

In this talk, I will present our lab’s recent progress and empirical results including some results on EDA, on machine learning for combinatorial optimiation, which has been an emerging topic in both communities of machine learning and operational research. I will also discuss the potential future directions for this exciting area.
Junchi Yan is an associate professor with Department of Computer Science and Engineering, Shanghai Jiao Tong University. His research interests are machine learning, combinatorial optimization and computer vision. He is leading a national key research and development project on machine learning for combinatorial optimization and an NSFC outstanding young talent program on graph computing.

Zhijie Zhang
Fuzhou University
Talk title: Optimization from Structured Samples for Coverage and Influence Functions

We revisit the optimization from samples (OPS) model, which studies the problem of optimizing objective functions directly from the sample data. Previous results showed that we cannot obtain a constant approximation ratio for the maximum coverage problem using polynomial independent samples of the form {S_i,f(S_i )}_(i=1)^t (BRS, STOC17), even if coverage functions are (1-ϵ)-PMAC learnable using these samples (BDF+, SODA12). In this work, to circumvent the impossibility result of OPS, we propose a stronger model called optimization from structured samples (OPSS), where the data samples encode the structural information of the functions. We show that under OPSS model, the maximum coverage problem enjoys constant approximation under mild assumptions on the sample distribution. We further generalize the result and show that influence maximization also enjoys constant approximation under this model.
Zhijie Zhang is currently an associated professor at School of Mathematics and Statistics, Fuzhou University. He is broadly interested in theoretical computer science, combinatorial optimization and algorithm design. His recent research topics include submodular maximization and online algorithms. His works have been published in SODA, ICML, AAAI, and so on. Recently, he won the best paper award of COCOON 2022.

Yuan Zhou
Tsinghua University
Talk title: Joint Pricing and Inventory Management with Demand Learning

In the problem of joint pricing and inventory management the retailer makes simultaneously a price decision and an inventory order-up-to decision at the beginning of each review period. The demands are being modeled as either a parametric or nonparametric function depending on the prices.

In this talk, I will introduce two of my recent works advancing this problem: the first one deals with fixed ordering costs under the backlogging setting, with a parametric (generalized linear) demand model. The second one studies nonparametric demand models with censored demands and lost sales. The techniques involved include a novel UCB analysis over trajectories of (s,S,p) policies, and a noisy comparison oracle constructed for censored demand models.

This talk is based on the following two papers:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3632475
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3750413
Dr. Yuan Zhou is an Associate Professor at the Yau Mathematical Sciences Center, Tsinghua University. Before joining Tsinghua, he was an Assistant Professor at the University of Illinois Urbana-Champaign, an Assistant Professor at Indiana University Bloomington, and an Applied Mathematical Instructor at the Massachusetts Institute of Technology. He obtained his Ph.D. from Carnegie Mellon University.

Dr. Zhou’s research interests include but are not limited to online learning and decision making (e.g., bandits and reinforcement learning), combinatorial optimization, and their applications to revenue management problems such as dynamic pricing, inventory control, assortment optimization, etc. He currently serves as an Associate Editor for Operations Research Letters.