Some applications of concentration inequalities to machine learning: from Banach space geometry to computational efficiency (Slides)

Statistical machine learning methods based on reproducing kernels have enjoyed a large popularity since the beginning of the century. In a nutshell, they consist in mapping the observed data into a Hilbert space (called "feature space"), and trying to estimate a linear prediction function in that space (which generally corresponds to a nonlinear and possibly highly complex function when seen in the original input space). Because the function searched for is linear in the Hilbert space, this opens the door to use in that framework a wide array of methods and ideas coming from classical statistics and numerical linear algebra, in general combined with a suitable regularization to avoid overfitting the data. To analyze the statistical convergence of such methods, a powerful tool is to use concentration inequalities for sums of Hilbert or Banach vector-valued sums, some of which were developed as early as in the 1980s. I will explain in a simplified setting how such tools, combined with with suitable operator perturbation inequalities, can allow to analyze a large panel of kernel-based algorithms, as well as some very recent extensions of them where computational efficiency becomes an important bottleneck (typically in a "big data" context). This field was pioneered by the works of De Vito, Caponnetto, Rosasco, Smale and Zhou (between others), and I will present some contributions in collaborations with N. Krämer, N. Mücke, P. Mathé and O. Zadorozhnyi.

Quantitative results and a second order equation for Schrödinger bridges (Slides)

If you observe the arrangement of a very large collection of independent Brownian particles at times 0 and 1, what does their configuration look like at t=1/2?" The goal of this talk is to present some new results centred around this question, which goes back to the early works of Schrödinger, and to outline how this question is tightly connected to many interesting others.

Renewal in Hawkes processes with self-excitation and inhibition (Slides)

In this talk we consider Hawkes processes on the positive real line exhibiting both self-excitation and inhibition. Each point of this point process impacts its future intensity by the addition of a signed reproduction function. The aim is to establish limit theorems for Hawkes process with signed reproduction functions by using renewal techniques, extending previous results (of Reynaud-Bouret and Roy) for nonnegative reproduction function. This is joint work with Carl Graham, Laurence Marsalle and Viet Chi Tran.

Various models around the Cucker-Smale model and their flocking results (Slides)

In a first part of this talk, we will focus on the modeling
of collective behavior and present the usual flocking properties of
the deterministic Cucker-Smale system. We will then propose different
ways to introduce noise in these systems and therefore extend the
notion of flocking to those simple stochastic models.
In a second part, we will look at a few particular cases of those
stochastic Cucker-Smale-inspired models, focusing on their asymptotic
behaviour, and looking for flocking or ergodicity.

Approximate Optimal Designs for Multivariate Polynomial Regression (Slides)

We introduce a new approach aiming at computing approximate optimal designs for multivariate polynomial regressions on compact (semi-algebraic) design spaces. We use the moment-sum-of-squares hierarchy of semidefinite programming problems to solve numerically the approximate optimal design problem. The geometry of the design is recovered via semidefinite programming duality theory. This work shows that the hierarchy converges to the approximate optimal design as the order of the hierarchy increases. Furthermore, we provide a dual certificate ensuring finite convergence of the hierarchy and showing that the approximate optimal design can be computed numerically with our method. As a byproduct, we revisit the equivalence theorem of the experimental design theory: it is linked to the Christoffel polynomial and it characterizes finite convergence of the moment-sum-of-square hierarchies.

Contextual bandits, ad auctions, and a new chaining algorithm

This talk is about contextual bandits with a continuous number of arms, and several variants with stronger forms of feedback. One important motivation lies in second-price ad auctions with reserve price. We will first describe this problem together with well-known bandit algorithms. In a second part, we will explain how to construct a new algorithm with the so-called chaining technique when the learner has access to a stronger feedback, leading to improved and sometimes optimal regret guarantees. The latter results are joint work with Nicolo Cesa-Bianchi, Pierre Gaillard, and Claudio Gentile (COLT 2017,

Graph Wavelets (Slides)

We present a multiscale approach to construct a data-adapted basis-like set of functions F which allows for a decomposition of every square-integrable function defined on the vertices of a finite undirected weighted graph. To this end, we follow the idea of Coulhon et al. (2012) for constructing localized frames on relatively general spaces. The construction of the set F is based on the orthonormal basis obtained by the normalized eigenvectors of the graph Laplace matrix and uses a spectral localization and a splitting operation. We analyze some properties of this set and show that it is a tight frame, especially a Parseval frame. By this we have a decomposition formula with respect to the frame elements for every function defined on the vertices of the graph. We have a look at the application of this approach to the setting of denoising. That is we observe noisy values of an unknown function \(y_i = f(x_i) + \varepsilon_i\)  at a finite number of points \(D = \{x_i \in R^d, i = 1...n\}\) and we want to recover f at the given points. We use a neighborhood graph representation of D which approximates the structure of the underlying space hidden in the data. Having the frame decomposition we use thresholding methods to the coefficients to derive an estimate of f. Furthermore we show that the considered random neighborhood graphs satisfy with high probability a doubling volume condition as well as a local Poincaré inequality under some assumptions on the underlying space and the sampling. These two properties are essentially for the spatial localization of the frame elements in the setting of Coulhon et al. (2012).

Testing the Equality in Distribution of Two Random Graphs (Slides)

We consider two inhomogenous Erdös- Renyi graphs on the same set of vertices and study the problem of testing if the random edges of these graphs have the same underlying stochastic structure. In this framework, such a structure is fully described by a symmetric n by n matrix with entries in [0,1] and zero diagonal: entry (i,j) is the probability of edge (i,j) to occur and the edges occur independently. Now, identifying the two graphs with such matrices P and Q, respectively, we want to describe the smallest Frobenius and spectral distances between P and Q such that there is a test whose total error probability does not exceed a given level. In particular, we focus on the dependence of this distances on the number n of vertices and the number M of collected samples per graph, i.e. observed adjacency matrices. The talk will be based on joint research with Alexandra Carpentier, Ulrike von Luxburg and Debarghya Ghoshdastidar.

Conditional McKean Lagrangian models (Slides)

In this talk, we discuss the theoretical problems related to the class of conditional McKean Lagrangian models. These models provide prototypical and simplified cases of the more general class of Lagrangian Stochastic Models which have been introduced to describe and simulate the dynamic of a generic particle of a turbulent fluid flow. These models are mathematically defined as a class of Langevin models endowing nonlinear (in the McKean-Vlasov sense) components of conditional type, this particular nonlinearity modeling the possible interaction between the particle and the macroscopic flow. After a short presentation on the applications and mathematical issues related to the general class of Lagrangian Stochastic Models for turbulent flows, we will present some mathematical results concerning the wellposedness problem and propagation of chaos property related to Conditional McKean Lagrangian models (joint work with Mireille BOSSY and Denis TALAY, TOSCA, INRIA Sophia-Antipolis Méditerranée), and concerning density estimates for these models (work in progress with Stéphane MENOZZI, LaMME, Evry, and HSE, Moscow).

Noise sensitivity of Lévy driven stochastic differential equations: estimates and applications (Slides)

The topic of this talk is induced by the following question: can the deviation between the solutions of two different Lévy driven SDE’s be controlled in terms of the characteristics of the underlying Lévy processes? In the case of SDE’s with additive noise we give the estimate for the deviation between the solutions in terms of the coupling distance for Lévy measures, which is based on the notion of the Wasserstein distance. Such estimate can be applied, for example, to the analysis of the low-dimensional conceptual climate models with paleoclimate data.

Adaptive strategies for nonparametric active learning (Slides)

This talk will focus on the problem of active learning in the setting of binary classification under smoothness assumptions on the regression function \(\eta(x) = P(Y = 1 | x)\)or the decision boundary (the 1/2-level set of \(\eta\)). It was shown in the last decade that in these two nonparametric settings, under appropriate assumptions characterizing the hardness of a classification problem, active learning can achieve strictly faster minimax rates (for the excess risk) than passive strategies (i.e. the usual batch classification setup). While the problem of adaptivity is well understood in the passive setting, active learning strategies so far required access to key distributional parameters of the problem (i.e. non-adaptive strategies, Castro & Nowak (2007)) or strong and unnatural technical assumptions (i.e. stronger than in the corresponding passive setting, Minsker (2012)). In this talk, we will present a generic strategy for adaptivity in the context of nested smoothness classes, which yields the first adaptive active learning strategy in both the smooth-\(\eta\) and smooth boundary settings. This is joint work with Alexandra Carpentier and Samory Kpotufe.

An optimistic point of view on classification (Slides)

Classification rules can be severely affected by the presence of disturbing observations in the training sample. Looking for an optimal classifier with such data may lead to unnecessarily complex rules. So, simpler effective classification rules could be achieved if we relax the goal of fitting a good rule for the whole training sample but only consider a fraction of the data. In this paper we introduce a new method based on trimming to produce classification rules with guaranteed performance on a significant fraction of the data. The take-away message will be : try to be otpmistic. If the data are too far from your model, then remove the data. In particular we adresse medical data. This work is a joint work with E. del Barrio and M. Agullo-Marin from university of Valladolid.

KL-UCB algorithms over P[0,1], an asymptotic and minimax analysis (Slides)

We will introduce some UCB (upper confidence bound) algorithms to solve bandit problems. In particular we will present how to turn an upper confidence bound on the mean of a random variable into an efficient algorithm, in a parametric and non-parametric setting. The non-parametric upper confidence bound are inspired from the empirical likelihood methods.

Post hoc inference via joint family-wise error rate control (Slides)

The goal of this work is to devise multiple testing procedures providing a statistical guarantee on any candidate set of hypotheses, including user-defined and/or data-driven candidate sets. We introduce a general methodology relying on the control of a multiple testing error rate that we call the joint Family-Wise Error Rate (JER). Our construction generalizes existing so-called "post hoc" procedures under positive dependence proposed by Goeman and Solari (Statistical Science, 2011). We propose a generic approach to build JER-controlling procedures in the situation where the joint null distribution of the test statistics is known or can be sampled from. Our theoretical statements are supported by numerical experiments. This is joint work with Gilles Blanchard and Etienne Roquain. A preprint is available at:

Quasi-stationarity with moving boundaries (Slides)

The goal will be to understand if we can extend quasi-stationarity (and all the other associated notions) considering moving absorbing boundaries. First of all, we will define the notions related with quasi-stationarity in the non-moving case which will be relevant for the lecture. Then some results of existence (or non existence) of these notions will be given in the case where the boundaries move periodically.

Completing partially observed random point patterns (Slides)

Starting from a partial observation of a random point configuration, we want to infer on the entire configuration. We link the related conditional distributions to integration-by-parts formulas. The latter describe how to add single points to a point configuration. We show that both mechanisms are essentially equivalent.

Collective motion of living organisms: the Vicsek model (Slides)

In this talk we will present the Vicsek model for self-driven particles, in particular its time-continuous version. We will study its mean-field limit, as well as the large-scale behaviour of the model. In particular, we find that the equilibrium distributions change according to whether the density is above or below a given threshold. We will also present a generalization of the classic model, with the interaction of two different populations.