Optimizing the Optimizers
When? | December 10, 2016 |
Where? | Barcelona, Spain |
Organizers | Maren Mahsereci, Alex Davies, Philipp Hennig |
Official Site | https://nips.cc/Conferences/2016 |
Optimization problems in machine learning have aspects that make them more challenging than the traditional settings, like stochasticity, and parameters with side-effects (e.g., the batch size and structure). The field has invented many different approaches to deal with these demands. Unfortunately - and intriguingly - this extra functionality seems to invariably necessitate the introduction of tuning parameters: step sizes, decay rates, cycle lengths, batch sampling distributions, and so on. Such parameters are not present, or at least not as prominent, in classic optimization methods. But getting them right is frequently crucial, and necessitates inconvenient human “babysitting”.
Recent work has increasingly tried to eliminate such fiddle factors, typically by statistical estimation. This also includes automatic selection of external parameters like the batch-size or -structure, which have not traditionally been treated as part of the optimization task. Several different strategies have now been proposed, but they are not always compatible with each other, and lack a common framework that would foster both conceptual and algorithmic interoperability. This workshop aims to provide a forum for the nascent community studying automating parameter-tuning in optimization routines.
Among the questions to be addressed by the workshop are:
- Is the prominence of tuning parameters a fundamental feature of stochastic optimization problems? Why do classic optimization methods manage to do well with virtually no free parameters?
- In which precise sense can the “optimization of optimization algorithms” be phrased as an inference / learning problem?
- Should, and can, parameters be inferred at design-time (by a human), at compile-time (by an external compiler with access to a meta-description of the problem) or run-time (by the algorithm itself)?
- What are generic ways to learn parameters of algorithms, and inherent difficulties for doing so? Is the goal to specialize to a particular problem, or to generalize over many problems?
Schedule
The workshop will be held on Saturday, 10 December, in Area 2
Time | Event | Material | |
---|---|---|---|
09:00-09:10 | — | Opening Remarks | |
09:10-09:30 | — | Matt Hoffman (DeepMind) | |
09:30-10:00 | — | David Duvenaud (U Toronto) | slides |
10:00-10:30 | — | Stephen J Wright (U of Wisconsin) | slides |
10.30-11.00 | — | Coffee Break | |
11:00-11:30 | — | Samantha Hansen (Spotify) | slides |
11:30-12:00 | — | Spotlights | (see below) |
12:00-12:45 | — | Poster Session | |
12:45-14:15 | — | Lunch Break | |
14:15-14:40 | — | Matteo Pirotta (Politecnico di Milano) | |
14:40-15:00 | — | Ameet Talwalkar (UCLA) | slides |
15:00-15:30 | — | Coffee Break | |
15:30-15:50 | — | Ali Rahimi (Google) | |
15:50-16.20 | — | Mark Schmidt (UBC) | |
16:20-17:00 | — | Panel Discussion |
Accepted Papers
(in alphabetical order, by first author’s surname)
- Ömer Deniz Akyildiz, Víctor Elvira, Jesus Fernandez-Bes, Joaquín Miguez. On the Relationship between Online Optimizers and Recursive Filters
- Matt Bonakdarpour and Panagiotis (Panos) Toulis. Statistical Perspectives of Stochastic Optimization
- Anirban Chaudhuri, David Wolpert, Brendan Tracey. Stochastic Optimization and Machine Learning: Cross-Validation for Cross-Entropy Method
- Kamil Ciosek and Shimon Whiteson. Off-Environment RL with Rare Events
- Guilherme França and José Bento. Tuning Over-Relaxed ADMM
- Tobias Glasmachers. Small Stochastic Average Gradient Steps
- Ke Li and Jitendra Malik. Learning to Optimize
- Ben London. Generalization Bounds for Randomized Learning with Application to Stochastic Gradient Descent
- Matteo Pirotta and Marcello Restelli. Cost-Sensitive Approach for Batch Size Optimization