MIA'21

(home)

- Program (Paris time, UTC+1)-

Monday 11 January

09:50am-10:00am
-
Welcome message
Morning chair: Gabriele Steidl
10:00am-10:40am
-
Rémi Gribonval (Inria, ENS de Lyon, France) Privacy and utility in compressive statistical learning (abstract) (slides)
10:40am-11:20am
-
Ninon Burgos (CNRS, Brain and Spine Institute, ARAMIS Lab, France) Improving the interpretability of computer-assisted analysis tools in neuroimaging (abstract) (slides)
11:20am-12:00am
-
Michal Irani (Weizmann Institute of Science, Israel) “Deep Internal learning” - Deep Learning with Zero Examples (abstract) (slides)
Lunch break
Afternoon chair: Michael Möller
02:00pm-02:40pm
-
Thomas Pock (Technische Universität Graz, Austria) Learning optimal discretizations of the total variation (abstract) (slides)
02:40pm-03:20pm
-
Coloma Ballester (Universitat Pompeu Fabra, Spain) Adversarial approaches for inverse problems (abstract) (slides)
Coffee break with virtual croissants
03:40pm-04:20pm
-
Stefania Petra (Universität Heidelberg, Germany) A geometric first-order multilevel method for discrete tomography (abstract) (slides)
04:20pm-05:00pm
-
Dejan Slepčev (Carnegie Mellon University, USA) Consistency of higher-order derivatives on random point clouds and semi-supervized learning (abstract) (slides)

Tuesday 12 January

Morning chair: Julie Delon
10:00am-10:40am
-
Elsa Cazelles (CNRS, IRIT Toulouse, France) Streaming computation of optimal weak transport barycenters (abstract) (slides)
10:40am-11:20am
-
Lénaïc Chizat (CNRS, Université Paris-Sud, France) Faster Wasserstein distance estimation with the Sinkhorn divergence (abstract) (slides)
11:20am-12:00am
-
Jan Lellmann (Universität zu Lübeck, Germany) Dynamical Optimal Transport and Functional Lifting (abstract) (slides)
Lunch break
Afternoon chair: Joachim Weickert
02:00pm-02:40pm
-
Blanche Buet (Université Paris Sud, France) Weak and approximate curvatures of a measure: a varifold perspective (abstract) (slides)
02:40pm-03:20pm
-
Robert Beinert (Technische Universität Berlin, Germany) Bilinear and Quadratic Inverse Problems (abstract) (slides)
Coffee break with virtual pains au chocolat
03:40pm-04:20pm
-
Joan Bruna Estrach (Courant Institute and NYU Center for Data Science, USA) On Depth-Separation for Neural Networks (abstract) (slides)
04:20pm-05:00pm
-
Soledad Villar (Johns Hopkins Univeristy, USA) Dimensionality reduction, regularization, and generalization in overparameterized regressions (abstract) (slides)

Wednesday 13 January

Morning chair: Christoph Schnörr
10:00am-10:40am
-
Silvia Villa (Università di Genova, Italy) Zero orderstochastic (abstract) (slides)
10:40am-11:20am
-
Matthias Hein (Universität Tübingen, Germany) Robust Detection of Out-of-distribution data (abstract) (slides)
PhD prize session. Chair: Raymond Chan
11:20am-12:00am
-
Leon Bungert (Friedrich-Alexander University Erlangen-Nuremberg) L-infinity eigenvalue problems on graphs and their continuum limit (abstract) (slides)
Lunch break
Afternoon chair: Jalal Fadili
02:00pm-02:40pm
-
Pierre Chainais (École Centrale de Lille, France) DPPs for efficient random subsampling (abstract) (slides)
02:40pm-03:20pm
-
Claire Boyer (Sorbonne Université, France) Sampling rates for l1-synthesis (abstract) (slides)
Coffee break with virtual chouquettes
03:40pm-04:20pm
-
Francis Bach (Inria, ENS Paris, France) On the convergence of gradient descent for wide two-layer neural networks (abstract) (slides)
04:20pm-05:00pm
-
Adam Oberman (McGill University, Canada) Regularization for deep learning and a Lyapunov approach to accelerated SGD (abstract) (slides)

- Abstracts -

Francis Bach - On the convergence of gradient descent for wide two-layer neural networks

Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this talk, I will consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models. (Joint work with Lénaïc Chizat)

Coloma Ballester - Adversarial approaches for inverse problems: from image restoration and colorization to anomaly detection

Generative adversarial approaches are generative models that aim to model the probability distribution of certain data so that we can sample new samples out of the distribution. In this talk, we will discuss them together with its use to tackle several imaging problems. First, a method for image inpainting will be described. It also uses a variational method to incorporate perceptual information in the completion. Then, a method for image colorization that is based on an adversarial strategy that captures geometric, perceptual and semantic information. Finally, an adversarial strategy for anomaly detection will be discussed.

Robert Beinert - Bilinear and Quadratic Inverse Problems: Non-Convex Regularization, Tensorial Liftings, Tensor-Free Algorithms

Bilinear and quadratic inverse problems arise in various formulations in imaging and physics like blind deconvolution, deautoconvolution, parallel imaging in MRI, or phase retrieval. Although the corresponding forward operators are only slightly non-linear, in many instances, the requirements to apply the well-established non-linear regularization theory are not fulfilled. Our central idea to overcome this issue is the universal property, which enables us to lift bilinear/quadratic operators to linear mappings. This immediately giving us access to the linear regularization theory. On the down-side, a simple lifting causes an additional non-convex rank-one constraint, which is similarly challenging to handle than the original non-linear problem. Therefore, we use the tensorial lifting indirectly and generalize the required concepts like subdifferentials and Bregman distances from convex analysis to the new framework. At the same time, we allow the regularization functional to be non-convex in a manner being comparable with the non-linearity. Using a source-wise representation, we derive convergence rates for the regularized solutions with respect to the actual noise level, which has, for instance, interesting consequences for the deautoconvolution. To solve a bilinear/quadratic inverse problem numerically, we adapting the methodology behind PhaseLift, i.e. we lift the problem to the tensor product and relax the non-convex rank-one constraint by the nuclear norm heuristic to obtain a convex minimization problem, which can be solved by any convex minimization method. Unfortunately, a direct implementation is most often impracticable due to the tremendous dimension of the tensor product. For first-order proximal algorithms like primal-dual or forward-backward splittings, which turn out to be closely related to the singular value thresholding, we can efficiently exploit low-rank representations. On the basis of a subspace iteration or an augmented Lanczos process, the required computations on the tensor product can here be performed completely tensor-free. We apply the proposed method to the two-dimensional masked Fourier phase retrieval problem.

Joan Bruna Estrach - On Depth-Separation for Neural Networks

Neural networks define functional spaces with appealing behavior in the high dimensional regime, experimentally validated across many applications. However, these spaces are poorly understood from the approximation and optimization point of view, especially as these networks become deeper. In this talk, we will review existing results about approximation and optimization properties for deep networks, highlighting the techniques employed to establish so-called ‘depth separation’ results, that show exponential approximation lower bounds for the class of shallow networks, together with polynomial upper bounds for deeper architectures. We will also present recent extensions that highlight the role of input geometry, such as homogeneity and symmetry, and conclude with a collection of open questions that relate depth separation with scale separation, as well as optimization on deeper models.

Claire Boyer - Sampling rates for l1-synthesis

This work investigates the problem of signal recovery from undersampled noisy sub-Gaussian measurements under the assumption of a synthesis-based sparsity model. Solving the l1-synthesis basis pursuit allows to simultaneously estimate a coefficient representation as well as the sought-for signal. However, due to linear dependencies within redundant dictionary atoms it might be impossible to identify a specific representation vector, although the actual signal is still successfully recovered. We will study the problem of signal recovery from a non-uniform, signal-dependent perspective. By utilizing results from linear inverse problems and convex geometry, we identify the sampling rate describing the phase transition of this problem, and propose a "tight" estimated upper-bound. This is extracted from a joint work with Maximilian März (TU Berlin), Jonas Kahn and Pierre Weiss (CNRS, Toulouse).

Blanche Buet - Weak and approximate curvatures of a measure: a varifold perspective

We propose a natural framework for the study of surfaces and their different discretizations based on varifolds. Varifolds have been introduced by Almgren to carry out the study of minimal surfaces. Though mainly used in the context of rectifiable sets, they turn out to be well suited to the study of discrete type objects as well. While the structure of varifold is flexible enough to adapt to both regular and discrete objects, it allows to define variational notions of mean curvature and second fundamental form based on the divergence theorem. Thanks to a regularization of these weak formulations, we propose a notion of discrete curvature (actually a family of discrete curvatures associated with a regularization scale) relying only on the varifold structure. We prove nice convergence properties involving a natural growth assumption: the scale of regularization must be large with respect to the accuracy of the discretization. We performed numerical computations of mean curvature and Gaussian curvature on point clouds in R^3 to illustrate this approach. Joint work with: Gian Paolo Leonardi (Trento) and Simon Masnou (Lyon).

Leon Bungert - L-infinity eigenvalue problems on graphs and their continuum limit

In the first part of the talk, I discuss nonlinear eigenvalue problems and their relation to gradient flows in some generality. Then I speak about two different L-infinity eigenvalue problems in more detail, show relations to distance functions, and speak about graph-based discrete-to-continuum limits for these models using monotone schemes and Gamma-convergence.

Ninon Burgos - Improving the interpretability of computer-assisted analysis tools in neuroimaging

Neuroimaging offers an unmatched description of the brain’s structure and physiology, which explains its crucial role in the understanding, diagnosis, and treatment of neurological disorders. However, information provided by neuroimages is not always easy to extract and interpret. I will present two approaches to make computer-assisted analysis tools more interpretable. The first ones explores how visualisation techniques can improve the interpretability of neural networks used for computer-aided diagnosis. The second approach is an anomaly detection framework that generates patient-specific maps summarising the pathology’s distribution in the brain.

Elsa Cazelles - Streaming computation of optimal weak transport barycenters

We present an alternative to the standard Wasserstein barycenter problem for probability distributions, based on optimal weak mass transport. The main advantage of our proposal, termed weak barycenter, is that it provides a framework for aggregating (or combining) a set of probability measures for which we can construct an iterative algorithm suitable for streaming distributions, including discrete ones. Moreover, the weak barycenter is defined in a natural way when compared to the Wasserstein barycenter, in particular they both verify a similar fixed-point equation. We provide a theoretical analysis of the weak barycenter problem and the algorithms to compute it in two settings: (i) for a finite number of measures and (ii) for a population of probability measures distributed according to a given law on the Wasserstein space. Our approach is illustrated on synthetic examples, validated on experiments using 2D real-world data and compared to the classical optimal mass transport setting.

Pierre Chainais - DPPs for efficient random subsampling

Sub-sampling is a very common task of machine learning (e.g. feature selection, regression on a budget) and signal processing (e.g., discretization, interpolation, Monte-Carlo methods). We study the potential of random subsampling for quadrature and interpolation. To this aim, we focus on functions living in RKHSs. We use nodes sampled either from a determinantal point process (DPP) or from continuous volume sampling (CVS). A projection DPP is parametrized by a kernel: we use a truncated and saturated version of the RKHS kernel. Thanks to this link between the two kernels, along with DPP machinery, we prove relatively tight bounds on the quadrature error, that depend on the spectrum of the RKHS kernel. We experimentally compare DPPs to existing kernel-based quadratures such as herding, Bayesian quadrature, or leverage score sampling. Numerical results confirm the interest of DPPs, and even suggest faster rates than our bounds in particular cases. Since the quadrature problem and the interpolation problem are closely connected in RKHSs, we also show bounds on the quality of the approximation obtained when using DPPs or CVS for interpolation of functions in an RKHS. These theoretical guarantees motivate an increasing more interest in random sub-sampling techniques such as DPPs and CVS.

Lénaïc Chizat - Faster Wasserstein distance estimation with the Sinkhorn divergence

The squared Wasserstein distance is a natural quantity to compare probability distributions in a non-parametric setting. This quantity is usually estimated with the plug-in estimator, defined via a discrete optimal transport problem. It can be solved to ε-accuracy by adding an entropic regularization of order ε and using for instance Sinkhorn's algorithm. In this work, we propose instead to estimate it with the Sinkhorn divergence, which is also built on entropic regularization but includes debiasing terms. We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels, of order ε1/2, which leads to improved computational complexity bounds and a speedup in practice. We also propose and analyze an estimator based on Richardson extrapolation of the Sinkhorn divergence which enjoys improved statistical and computational efficiency guarantees, under a condition on the regularity of the approximation error, which is in particular satisfied for Gaussian densities. This is joint work with Pierre Roussillon, Flavien Léger, François-Xavier Vialard and Gabriel Peyré.

Rémi Gribonval - Privacy and utility in compressive statistical learning

In compressive statistical learning, a dataset is summarized into a sketch vector capturing the information needed for a given learning task. Sketches are typically built as empirical averages of random features of the training data, leading to a finite dimensional kernel mean embedding of the empirical probability distribution of the dataset. The talk will discuss how to ensure the utility of such embeddings for a given learning task, and how to make the sketch differentially private. The overall goal is to keep only the information needed to learn, but not more.

Matthias Hein - Robust Detection of Out-of-distribution data

Current deep neural networks for image recognition do not know when they don't know. Non-task related images are assigned to one of the classes with high-confidence. Thus the machine learning system cannot trigger human intervention or go over into to a safe state when it is applied out of its specification. I will discuss our recent work to tackle this problem via certifiably adversarially robust out-of-distribution detection.

Michal Irani - “Deep Internal learning” - Deep Learning with Zero Examples

In this talk I will show how complex visual inference tasks can be performed with Deep-Learning, in a totally unsupervised way, by training on a single image -- the test image itself. The strong recurrence of information inside a single natural image provides powerful internal examples which suffice for self-supervision of Deep-Networks, without any prior examples or training data. This new paradigm gives rise to true “Zero-Shot Learning”. I will show the power of this approach to a variety of problems, including super-resolution, image-segmentation, transparent layer separation, image-dehazing, image-retargeting, and more. I will further show how self-supervision can be used for “Mind-Reading” from very little fMRI data.

Jan Lellmann - Dynamical Optimal Transport and Functional Lifting

Functional lifting methods provide a tool for approximating solutions of difficult non-convex problems by embedding them into a larger space. One successful strategy is to embed into a space of probability measures, which is then equipped with a suitable metric based on Wasserstein distances. In this talk, we will show how this strategy can be viewed as a generalization of the Benamou-Brenier approach to dynamical optimal transport. By modifying the underlying continuity equation, this allows in particular to incorporate functionals depending on higher-order derivatives and with non-scalar domain and range.

Adam Oberman - Regularization approaches to deep learning and a Liapunov approach to accelerated SGD

Stefania Petra - A geometric first-order multilevel method for discrete tomography

In this talk, I will present a geometric multilevel optimization approach choosing as case study discrete tomography. In particular, our method is motivated by variational models that arise as discretization of some underlying infinite dimensional problem. Such problems naturally lead to a hierarchy of models of varying discretization levels. We employ multilevel optimization to take advantage of this hierarchy: while working at the fine level we compute the search direction based on a coarse model. Importing a few concepts of information geometry to our setup we propose a smoothing operator that only uses first-order information and incorporates constraints smoothly. We show that the proposed algorithm is well suited to the ill-posed reconstruction problem in discrete tomography and demonstrate its efficiency on several large-scale examples.

Thomas Pock - Learning optimal discretizations of the total variation

In this work, we study a general framework of discrete approximations ofthe total variation for image reconstruction problems. The framework, forwhich we can show consistency in the sense of Γ–convergence, unifies andextends several existing discretization schemes. In addition, we propose al-gorithms for learning discretizations of the total variation in order to achievethe best possible reconstruction quality for particular image reconstructiontasks. Interestingly, the learned discretizations significantly differ betweenthe tasks, illustrating that there is no universal best discretization of thetotal variation.

Dejan Slepčev - Consistency of higher-order derivatives on random point clouds and applications to semi-supervised learning

Silvia Villa - Implicit regularization: from inverse problems to classification

Soledad Villar - Dimensionality reduction, regularization, and generalization in overparameterized regressions

Overparameterization in deep learning has shown to be powerful: very large models can fit the training data perfectly and yet generalize well. Investigation of overparameterization brought back the study of linear models, which, like more complex models, show a “double descent” behavior. This involves two features: (1) The risk (out-of-sample prediction error) can grow arbitrarily when the number of samples n approaches the number of parameters p (from either side), and (2) the risk decreases with p at p > n, sometimes achieving a lower value than the lowest risk at p < n. The divergence of the risk at p = n is related to the condition number of the empirical covariance in the feature set. For this reason, it can be avoided with regularization. In this work we show that performing a PCA-based dimensionality reduction also avoids the divergence at p = n; we provide a finite upper bound for the variance of the estimator that decreases with p. This result is in contrast to recent work that shows that a different form of dimensionality reduction— one based on the population covariance instead of the empirical covariance—does not avoid the divergence. We connect these results to an analysis of adversarial attacks, which become more effective as they raise the condition number of the empirical covariance of the features. We show that ordinary least squares is arbitrarily susceptible to data-poisoning attacks in the overparameterized regime—unlike the underparameterized regime—and how regularization and dimensionality reduction improve its robustness. We also translated the results on the highly overparameterized linear regression regime to Gaussian Processes.