# Piecewise Deterministic Markov Processes and Sampling

The goal of the workshop is to bring together people working on PDMPs and on various Monte Carlo methods with a momentum and/or a memory: Hamiltonian Monte Carlo, lifted Markov chains, zig-zag process, etc.

### When ?

In 2017, January 25-27. The workshop will start Wednesday at 2pm and end Friday at noon (see the detailed program below).

### Where ?

In Cermics, École des Ponts ParisTech, Marne-la-Vallée, near Paris. Access directions are available in French or in English; the workshop will take place in Bât. Coriolis.

### Registration

Registration is free but mandatory. Please register on dedicated page.

### Program

#### Wednesday, Jan. 25

13h-14h: Registration, coffee

• 14h-14h50: J. Bierkens, Zig-Zag Sampling for Doubly Intractable Distributions
• 15h-15h50: P. Monmarché, Piecewise deterministic simulated annealing

Coffee break

• 16h30-17h20: S. Vollmer, The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo Method

#### Thursday, Jan 26

9h30-10h: Morning coffee

• 10h-10h50: O. Mangoubi, Rapid mixing bounds for Hamiltonian Monte Carlo and geodesic walks, via positive curvature and momentum
• 11h-11h50: K. Życzkowski, Sampling quantum states with hit and run algorithm

Lunch break

• 14h-14h50: F. Panloup, Weighted Multilevel Langevin simulation of invariant distributions
• 15h-15h50: A. Durmus, High dimensional sampling with the Unadjusted Langevin Algorithm

Coffee Break

• 16h30-17h20: B. Cloez, Properties of stochastic algorithms through homogeneous-time Markov process properties

#### Friday, Jan. 27

• 9h-9h50: P. Fearnhead, Continuous-time Importance Sampling, and Bayesian Analysis of Big Data

Coffee break

• 10h30-11h20 : S. Gadat, Second order processes reinforced by their memory
• 11h30-12h20: A. Duncan, The PDMP as a Monte Carlo method and sampling from restricted domains

### Abstracts

##### J. Bierkens: Zig-Zag Sampling for Doubly Intractable Distributions

In important models in Bayesian statistics the computation of the likelihood function is intractable. The corresponding posterior distributions are referred to as doubly intractable distributions. For example this situation occurs when inferring the temperature in an Ising model, or the parameters in an exponential random graph model (ERGM).

Existing methodology to deal with such models rely on the exchange algorithm of Møller (2006) and variations thereof. In order for this class of algorithms to be asymptotically exact it is necessary to draw perfect samples from the forward model using e.g. the Propp & Wilson (1996) methodology.

It turns out that, when applying the Zig-Zag Sampler (Bierkens et al., 2016) to problems with intractable likelihood, it suffices to obtain unbiased estimates from the forward model, which greatly enlarges the scope of possible models which can be dealt with, as well provides significant gains in computational efficiency.

This is currently work in progress in collaboration with Antonietta Mira (Lugano) and Gareth Roberts (Warwick).

##### B. Cloez: Properties of stochastic algorithms through homogeneous-time Markov process properties

In this talk, we consider some stochastic algorithms represented by an inhomogeneous (discrete time) Markov chain. We are interested in its long time behavior. We provide sufficient conditions to ensure that some of its asymptotic properties can be related to the ones of a homogeneous (continuous time) Markov process. Renowned examples such as a bandit algorithms, Monte-Carlo method, decreasing step Euler schemes are included in our framework. Our results are related to functional limit theorems, but the approach differs from the standard “Tightness/Identification” argument; our method is unified and based on the notion of pseudotrajectories on the space of probability measures. This based on the ODE-method on the probability space and can be seen as a “Markov-method”. Among the approximated Markov process, there is some classical diffusion processes but also some PDMP like switching ODE systems or some processes with jumps.

##### A. Duncan: The PDMP as a Monte Carlo method and sampling from restricted domains

While piecewise deterministic Markov processes (PDMPs) have been around for over 50 years, their potential use as a continuous time Monte Carlo (CTMC) method has only recently been explored. The success of this methodology has motivated a number of theoretical questions on PDMPs, particularly related to long-time behaviour. Focusing on the one-dimensional case, in the first part of this talk, we explore some of these questions, and in particular characterise the performance of PDMP samplers and compare them to other continuous time Monte Carlo schemes. In the second part of this talk, we return to the multidimensional case and explore the effectiveness of PDMP schemes to sample from distributions supported on bounded or semibounded domains.

##### A. Durmus: High dimensional sampling with the Unadjusted Langevin Algorithm

Recently, the problem of designing MCMC sampler adapted to high-dimensional distributions and with sensible theoretical guarantees has received a lot of interest. The applications are numerous, including large-scale inference in machine learning, Bayesian nonparametrics, Bayesian inverse problem, aggregation of experts among others. When the density is L-smooth (the log-density is continuously differentiable and its derivative is Lipshitz), we will advocate the use of a “rejection- free” algorithm, based on the discretization of the Euler diffusion with either constant or decreasing stepsizes. We will present several new results allowing convergence to stationarity under different conditions for the log-density (from the weakest, bounded oscillations on a compact set and super-exponential in the tails to the log concave). When the density is strongly log-concave, the convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality for Lipschitz functions will be reported. Finally, based on optimzation techniques we will propose new methods to sample from high dimensional distributions. In particular, we will be interested in densities which are not continuously differentiable. Some Monte Carlo experiments will be presented to support our findings.

##### P. Fearnhead: Continuous-time Importance Sampling, and Bayesian Analysis of Big Data

I will discuss how versions of importance sampling, that aim to unbiasedly estimate transition densities of continuous-time stochastic processes, can be constructed. These importance sampling algorithms are based upon simulating piecewise deterministic Markov processes. One application of these ideas is their use within a sequential Monte Carlo (SMC) algorithm that samples from a target distribution of interest. Excitingly, for Bayesian applications, these SMC algorithms have the potential to scale well to big data settings: as the computational cost per effective sample size may not increase with the amount of data.

##### S. Gadat: Second order processes reinforced by their memory

Second order methods are expected to be optimal for convex minimization problems. In this work, we develop a second order stochastic dynamical system inspired from the Heavy Ball differential equation. After some remainders on the deterministic system, we study a stochastic adaptation that leads to a second order Markov process, degenerated on the kinetic component. The regularity of such a system is shown as well as its ergodicity. We then study the large deviation properties associated to the invariant measures of the system at low temperature.

##### O. Mangoubi: Rapid mixing bounds for Hamiltonian Monte Carlo and geodesic walks, via positive curvature and momentum.

Most existing methods for analyzing geometric” MCMC algorithms such as Hamiltonian Monte Carlo (HMC) and geodesic walks use conductance bounds. Unfortunately, conductance bounds cannot capture improvements obtained from momentum or positive curvature. To take advantage of positive curvature and momentum, our analysis of HMC and geodesic walks instead uses probabilistic coupling bounds, obtained via comparison geometry.

Specifically, we obtain rapid mixing bounds for HMC in an important class of strongly log-concave target distributions, showing that HMC is faster than many competitor algorithms such as the Langevin MCMC algorithm in this regime.

We also introduce the geodesic walk, an implementable Markov chain that achieves a volume-measure stationary distribution on a Riemannian manifold $\mathcal{M}$ without computationally intensive Metropolis-Hastings corrections. For positively-curved $\mathcal{M}$, we show that the geodesic walk has mixing time $\mathcal{O}^*(\frac{\mathfrak{M}_2}{\mathfrak{m}_2})$, where $\mathfrak{M}_2$ and $\mathfrak{m}_2$ are, respectively, upper and lower bounds on the sectional curvature of $\mathcal{M}$.

(Joint work with Aaron Smith)

##### P. Monmarché: Piecewise deterministic simulated annealing

In order to sample a given probability law with density exp(-V/T), many ergodic Markov processes are available, and the efficiency of the corresponding algorithms is related to the speed of convergence to equilibrium of these processes. Kinetic processes (that is, processes (X,Y) where X is the position and Y=dX/dt is the velocity) are good candidates, since the velocity acts as an “instantaneous memory”: the process have some inertia and can't go back immediatly to the place it has just been, thus exploring the space more efficiently. We will introduce a velocity jump process, which is a kinetic PDMP designed in such a way it samples a given target law, and we will study its speed of convergence, in particular in the regime T→0, and give conditions for a simulated annealing procedure to succeed.”

##### F. Panloup: Weighted Multilevel Langevin simulation of invariant distributions

In this joint work with G. Pagès, we investigate a weighted Multilevel Richardson-Romberg extrapolation for the ergodic approximation of invariant distributions of diffusions. We first obtain some rate of convergence results under weak confluence assumptions on the diffusion. In a second part, under stronger confluence assumptions and with the help of some second order expansions of the asymptotic error, we go deeper in the study by optimizing the choice of the parameters involved by the method. In particular, for a given $\varepsilon>0$, we exhibit some semi-explicit parameters for which the number of iterations of the Euler scheme required to attain a Mean-Squared Error lower than $\varepsilon^2$ is about $\varepsilon^{-2}\log(\varepsilon^{-1})$. Finally, we discuss the so-called weak-confluence assumption and provide some numerical tests on a high dimensional diffusion motivated by a statistical problem.

##### S. Vollmer: The Bouncy Particle Sampler: A Non-Reversible Rejection-Free Markov Chain Monte Carlo Method

Markov chain Monte Carlo methods have become standard tools in statistics to sample from complex probability measures. Many available techniques rely on discrete-time reversible Markov chains whose transition kernels build up over the Metropolis-Hastings algorithm. We explore and propose several original extensions of an alternative approach introduced recently in Peters and de With (2012) where the target distribution of interest is explored using a continuous-time Markov process. In the Metropolis-Hastings algorithm, a trial move to a region of lower target density, equivalently “higher energy”, than the current state can be rejected with positive probability. In this alternative approach, a particle moves along straight lines continuously around the space and, when facing a high energy barrier, it is not rejected but its path is modified by bouncing against this barrier. The resulting non-reversible Markov process provides a rejection-free MCMC sampling scheme. We propose several original techniques to simulate this continuous-time process exactly in a wide range of scenarios of interest to statisticians. When the target distribution factorizes as a product of factors involving only subsets of variables, such as the posterior distribution associated to a probabilistic graphical model, it is possible to modify the original algorithm to exploit this structure and update in parallel variables within each clique. We present several extensions by proposing methods to sample mixed discrete-continuous distributions and distributions restricted to a connected smooth domain. We also show that it is possible to move the particle using a general flow instead of straight lines. We demonstrate the efficiency of this methodology through simulations on a variety of applications and show that it can outperform Hybrid Monte Carlo schemes in interesting scenarios.

##### K. Życzkowski : Sampling quantum states with hit and run algorithm

We introduce a notion of a set $X$ accessible in $k$ steps with respect to a given collection of vectors. This property leads to an algorithm which may be considered as a derandomization of the hit and run algorithm applied to generate a sequence of random points covering $X$ uniformly.

The set $\Omega_N$ of quantum states of size $N$ consist of positive hermitian density of order $N$ with a fixed trace, $Tr(\rho)=1$. For $N=2$ this set is equivalent to a $3$-disk, called Bloch ball, while in higher dimensions the structure of this convex set is more complicated. To study composed systems we take $N=M^2$. In the theory of quantum entanglement a special role is played by a subset of $\Omega_{M^2}$ containing states with positive partial transpose, $(T\otimes 1)\rho >0$. To analyze spectral properties of such density matrices we apply a variant of the 'hit and run' algorithm and formulate a conjecture of the average spectral density of states in this subset.