Virtual Seminar Series: Steve Hanneke (TTIC)
October 30, 2020
2:00 PM - 3:30 PM
Join us for a "virtual" seminar this semester with Giacomo Nannicini, Research Staff Member, IBM T.J. Watson Research Center, NY.
Please use the link below to join via Zoom.
Meeting ID: 975 3576 7242
Password: IDStalk20
Paper: Multi-task Learning: Optimal Rates and a No-Free-Lunch Theorem
Abstract: Multitask learning and related areas such as multi-source domain adaptation address modern settings where datasets from N related learning tasks are to be combined towards improving performance on any single such learning task. A perplexing fact remains in the evolving theory on the subject: while we would hope for performance guarantees that account for the contribution from multiple tasks, the vast majority of analyses result in guarantees that improve at best in the number n of samples per task, but most often include terms that do not improve as the number of tasks N grows. As such, it might seem at first that the distributional settings or aggregation procedures considered in such analyses might be somehow unfavorable; however, as we show, the picture happens to be more nuanced, with interestingly hard regimes that might appear otherwise favorable.
In particular, we consider a seemingly favorable classification scenario where all tasks share a common optimal classifier h* in a given function class of finite VC dimension, and which can be shown to admit a broad range of regimes with improved oracle rates in terms of N and n. Some of our main results are as follows:
- We show that, even though such regimes admit minimax rates accounting for both n and N, no adaptive algorithm exists; that is, without access to distributional information, no algorithm can guarantee rates that improve with large N for n fixed.
- With a bit of additional information, namely, a ranking of tasks according to their relevance to the target task, a simple rank-based procedure can achieve near-optimal excess risk guarantees, which improve with both n and N. Interestingly, the optimal aggregation may exclude data from some tasks, even though they all have the same optimal classifier h*.
Based on joint works with Samory Kpotufe.
Bio: Steve Hanneke is a Research Assistant Professor at the Toyota Technological Institute at Chicago. His research explores the theory of machine learning, with a focus on reducing the number of training examples sufficient for learning. His work develops new approaches to supervised, semi-supervised, active, and transfer learning, and also revisits the basic probabilistic assumptions at the foundation of learning theory. Steve earned a Bachelor of Science degree in Computer Science from UIUC in 2005 and a PhD in Machine Learning from Carnegie Mellon University in 2009 with a dissertation on the theoretical foundations of active learning.
Date posted
Oct 23, 2020
Date updated
Oct 23, 2020