# Stochastic processes and statistical machine learning, II

## Abstracts

Here you will be able to find the abstracts of the talks.

#### Mickaël Albertus (Toulouse)

##### Raking-ratio empirical process (slides)

In this presentation, I introduce the empirical process associated with the Ratio Ratio method, a statistical method widely used in many fields (economics, statistics, computer science, ...) and which exploits an auxiliary information given by the probability of belonging to a set of a given partition. I give non-asymptotic and asymptotic results concerning this process. I end my presentation with an extension of this study, that is when the information is learned by an estimate.

#### François Bachoc (Toulouse)

##### Uniformly valid confidence intervals post-model-selection (slides)

We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. (2013). In particular, the candidate models used can be misspecified, the target of inference is model-specific, and coverage is guaranteed for any data-driven model selection procedure. After developing a general theory we apply our methods to practically important situations where the candidate set of models, from which a working model is selected, consists of fixed design homoskedastic or heteroskedastic linear models, or of binary regression models with general link functions. In an extensive simulation study, we find that the proposed confidence intervals perform remarkably well, even when compared to existing methods that are tailored only for specific model selection procedures.

#### Manon Costa (Toulouse)

##### Emergence of homogamy in a two-loci stochastic population model (slides)

This talk deals with the emergence of a specific mating preference pattern called homogamy in a population. Individuals are characterized by their genotype at two haploid loci, and the population dynamics is modelled by a non-linear birth-and-death process. The first locus codes for a phenotype, while the second locus codes for  homogamy defined with respect to the first locus: two individuals are more (resp. less) likely to reproduce with each other if they carry the same (resp. a different) trait at the first locus. Initial resident individuals do not feature homogamy, and we are interested in the probability and time of invasion of a mutant presenting this characteristic under a large population assumption.

This is joint work with C. Coron, F. laroche, H. Leman, C. Smadi.

#### Jens Fischer (Potsdam)

##### Randomized network segregation (slides)

Segregation based on race, political and religious views or social norms is a reoccurring phenomenon throughout history. Thomas Schelling proposed in 1969 a model for this phenomenon accounting for the preferences of each individual regarding their neighbors. He showed segregation in his model both for agents having heterophob or heterophile preferences.

In this talk, I am going to present a model based on random discrete time dynamics of graphs and I am going to compare it to Schelling's model. Our model, inspired by Henry, Prałat and Zhang (2011), also exhibits segregation in almost surely finite time and we obtain upper bounds for the expected time to segregation.

#### Gersende Fort (Toulouse)

##### Stochastic Approximation-based algorithms, when the Monte Carlo bias does not vanish (slides)

Stochastic Approximation algorithms, whose stochastic gradient descent methods with decreasing stepsize are an example, are iterative methods to compute the root of a non explicit function. They rely on a Monte Carlo approximation of this objective function. Nevertheless, in many applications, this random approximation is biased with a bias which, mainly for computational cost, does not vanish along the iterations: the convergence of the algorithm towards the roots may fail.
In this talk, we will motivate the use of such algorithms by computational issues in statistical learning, with an emphasis on the penalized inference in latent variable models. We will address the convergence of stochastic approximation-based algorithms for solving the optimization of a convex composite function : sufficient conditions for the convergence of perturbed proximal-gradient methods, possibly accelerated, will be given. We will also outline the parallel with Stochastic Expectation Maximization algorithms (MCEM, SAEM for example).

##### Numerical cost of the posterior Bayesian mean with a Langevin diffusion (slides)

We consider a statistical translation model where the distribution of the observations is $$e^{-U(.-\theta_0)}$$, where $$U$$ is a convex potential from $$\mathbb{R}^d$$ to $$\mathbb{R}_+$$ and $$\theta_0$$ is a hidden parameter of $$\mathbb{R}^d$$. We design a statistical paradigm of prior/posterior distribution that makes it possible to recover the unknown parameter $$\theta_0$$ under mild assumptions when we observe n i.i.d. realizations of this model. In particular, we obtain a minimax quadratic risk $$r_n$$ of the posterior mean and some estimations of the tail of the posterior distribution. Then, we design a numerical probabilistic procedure with the help of a Langevin diffusion for estimating the posterior mean and assess an evaluation of the length of simulation needed to obtain an estimator, that approaches the target mean with an accuracy below $$r_n$$.

This is a joint work with Fabien Panloup and Clément Pellegrini.

#### Paula Gordaliza (Toulouse)

##### A central limit theorem for Lp transportation cost on the real line with application to fairness assessment in machine learning (slides)

We provide a Central Limit Theorem for the Monge-Kantorovich distance between two empirical distributions with sizes $$n$$ and $$m$$, $$\mathcal{W}_p(P_n,Q_m), \ p\geq1,$$ for observations on the real line. In the case $$p>1$$ our assumptions are sharp in terms of moments and smoothness. We prove results dealing with the choice of centering constants. We provide a consistent estimate of the asymptotic variance which enables to build two sample tests and confidence intervals to certify the similarity between two distributions. These are then used to assess a new criterion of data set fairness in classification.

#### Joseph Lam (Magdeburg)

##### A minimax near-optimal algorithm for adaptive rejection sampling (slides)

Rejection Sampling is a fundamental Monte-Carlo method. It is used to sample from distributions admitting a probability density function which can be evaluated exactly at any given point, albeit at a high computational cost. However, without proper tuning, this technique implies a high rejection rate. Several methods have been explored to cope with this problem, based on the principle of adaptively estimating the density by a simpler function, using the information of the previous samples. Most of them either rely on strong assumptions on the form of the density, or do not offer any theoretical performance guarantee.

We give the first theoretical lower bound for the problem of adaptive rejection sampling and introduce a new algorithm which guarantees a near-optimal rejection rate in a minimax sense.

#### Abhishake Rastogi (Potsdam)

##### Convergence analysis of Tikhonov regularization for non-linear statistical inverse learning problems (slides)

We study a non-linear statistical inverse learning problem, where we observe the noisy image of a quantity through a non-linear operator at some random design points. We consider the widely used Tikhonov regularization (or method of regularization, MOR) approach to reconstruct the estimator of the quantity for the non-linear ill-posed inverse problem. The estimator is defined as the minimizer of a Tikhonov functional, which is the sum of a data misfit term and a quadratic penalty term. We develop a theoretical analysis for the minimizer of the Tikhonov regularization scheme using the ansatz of reproducing kernel Hilbert spaces. We discuss optimal rates of convergence for the proposed scheme, uniformly over classes of admissible solutions, defined through appropriate source conditions.

#### Jérôme Stenger (Toulouse)

##### Optimal uncertainty quantification of a risk measurement on moment class (slides)

In Uncertainty Quantification, the model (an industrial computer code) is often considered as an input-output black-box that might be deterministic. A major topic of interest is to assess the uncertainties tainting the results of the computer simulation. In order to track down the uncertainties affecting the inputs (i.e. physical parameters), we model them as random variables. We propagate these uncertainties through the model, so that the output is also a random variable characterized by its distribution. In a context of safety margins choice, the measure of risk we are interested in is the quantile associated to the output distribution.

In this work we gain robustness on the quantification of the risk measurement by accounting for all sources of uncertainties tainting the inputs of a computer code. To that extent, we evaluate the maximum quantile over a class of unimodal distributions satisfying constraints on their moments. Two options are available when dealing with such complex optimization problems: one can either optimize under constraints; or preferably, one should reformulate the objective function. We identify a well suited parameterization to compute the optimal quantile based on the theory of canonical moments. It allows an effective, free of constraints, optimization.

##### Online nonparametric regression with kernels

In this talk we introduce the general concept of online learning and consider the setting of online nonparametric regression for arbitrary deterministic sequences with the square loss. We investigate the problem for the case when the forecaster is the online version of Ridge regression estimator which belongs to RKHS generated by some kernel. We present the theoretical analysis of the regret in this case as well as discuss computational efficiency.

The talk is based on the work with Sébastien Gerschinovitz, Pierre Gaillard, Alessandro Rudi.

#### Alexander Zass (Potsdam)

##### Entropy method for Gibbs point processes

In 1988, Hans-Otto Georgii introduced a new method of proving the existence of an infinite-volume Gibbs measures, that uses the specific entropy as a Lyapunov function.

In this talk, we will use this approach in the framework of marked Gibbs point processes with unbounded interaction. Some examples from stochastic geometry will also be discussed.