Slides<\/a><\/p>\n\n\n\n\n\n2\/28\/2024:<\/strong> Feature learning of neural network by mean field Langevin dynamics: Optimization and generalization, Taiji Suzuki<\/p>\n\n\n\n\n\nAbstract: <\/strong>In this talk, I will discuss the feature learning ability of neural networks from statistical and optimization perspectives. In particular, I will present recent developments of theory of the mean-field Langevin dynamics (MFLD) and its application to neural network training. MFLD is a nonlinear generalization of the gradient Langevin dynamics (GLD) that minimizes an entropy regularized convex function defined on the space of probability distributions, and it naturally arises from the optimization of two-layer neural networks via (noisy) gradient descent. In the first half, I will present the convergence result of MFLD and explain how the convergence of MFLD is connected to the duality gap through the log-Sobolev inequality of the so-called proximal Gibbs measure. In addition to that, the time-space discretization of MFLD will be addressed. It can be shown that the discretization error can be bounded uniformly in time unlike existing work. In the latter half, I will discuss the generalization error analysis of neural networks trained by MFLD. Addressing a binary classification problem, we have a general form of a test classification error bound that provides a fast learning rate based on a local Rademacher complexity analysis. By applying this general framework to the k-sparse parity problem, we demonstrate how the feature learning helps its sample complexity compared with the kernel methods. Finally, we also discuss how anisotropic structure of input will affect the sample complexity and computational complexity. If the data is well aligned to the target function, both sample and computational complexities are significantly mitigated.<\/p>\n\n\n\n <\/figure>\n\n\n\nBio:<\/strong> Taiji Suzuki is currently an Associate Professor in the Department of Mathematical Informatics at the University of Tokyo. He also serves as the team leader of \u201cDeep learning theory\u201d team in AIP-RIKEN. He received his Ph.D. degree in information science and technology from the University of Tokyo in 2009. He worked as an assistant professor in the department of mathematical informatics, the University of Tokyo between 2009 and 2013, and then he was an associate professor in the department of mathematical and computing science, Tokyo Institute of Technology between 2013 and 2017. He served as area chairs of premier conferences such as NeurIPS, ICML, ICLR and AISTATS, a program chair of ACML2019, and an action editor of the Annals of Statistics. He received the Outstanding Paper Award at ICLR in 2021, the MEXT Young Scientists\u2019 Prize, and Outstanding Achievement Award in 2017 from the Japan Statistical Society. He is interested in deep learning theory, nonparametric statistics, high dimensional statistics, and stochastic optimization. In particular, he is mainly working on deep learning theory from several aspects such as representation ability, generalization ability and optimization ability. He also has devoted stochastic optimization to accelerate large scale machine learning problems including variance reduction methods, Nesterov\u2019s acceleration, federated learning and non-convex noisy optimization.<\/p>\n\n\n\n\n\n1\/25\/2024:<\/strong> Recent Advances in Coresets for Clustering, Shaofeng Jiang<\/p>\n\n\n\n\n\nAbstract: <\/strong>Coreset is a popular data reduction technique. Roughly, a coreset is a tiny proxy of the dataset, such that the objective function evaluated on the coreset for every feasible solution approximates that on the original dataset. Coresets are particularly useful for dealing with big data since they can usually be constructed in sublinear models efficiently, including streaming and parallel computing.<\/p>\n\n\n\nThe study of coresets for clustering is very fruitful, and nearly tight bounds have recently been obtained for well-known problems such as k-median and k-means and their variants. In this talk, I will introduce the recent advances in coresets for clustering, with a focus on presenting several fundamental sampling techniques, including importance sampling and hierarchical uniform sampling, for the construction of coresets. I will conclude the talk by discussing future directions for the study of coreset (and beyond).<\/p>\n\n\n\n <\/figure>\n\n\n\nBio:<\/strong> Shaofeng Jiang is an assistant professor at Peking University. He obtained his PhD at the University of Hong Kong, and before he joined PKU, he worked as a postdoctoral researcher at the Weizmann Institute of Science, and an assistant professor at Aalto University. His research interest generally lies in theoretical computer science, with a focus on sublinear algorithms.<\/p>\n\n\n\nSlides<\/a><\/p>\n\n\n\n\n\n11\/28\/2023:<\/strong> Textbooks Are All You Need, Yin Tat Lee<\/p>\n\n\n\n\n\n