**Speaker(s)**: Dr Tugkan Batu

**Organiser**: Dr Enrico H Gerding

**Time**: 02/02/2009 14:00-15:00

**Location**: B32/3077

## Abstract

In this talk, I will describe algorithms for several fundamental statistical inference tasks. The main focus of this research is the sample complexity of each task as a function of the domain size for the underlying discrete probability distributions. The algorithms are given access only to i.i.d. samples from the distributions and make inferences based on these samples. The inference tasks studied here are: (i) similarity to a fixed distribution (i.e., goodness-of-fit); (ii) similarity between two distributions (i.e., homogeneity); (iii) independence of joint distributions; and (iv) entropy estimation. For each of these tasks, an algorithm with sublinear sample complexity is presented (e.g., a goodness-of-fit test on a discrete domain of size $n$ is shown to require $O(\sqrt{n}polylog(n))$ samples). Accompanying lower bound arguments show that all but one of these algorithms achieve a near-optimal performance. Given some extra information on the distributions (such as the distribution is monotone or unimodal with respect to a fixed total order on the domain), the sample complexity of these tasks become polylogarithmic in the domain size.

## Speaker Biography

## Dr Tugkan Batu

Tugkan is a lecturer in the Department of Mathematics at the London School of Economics. Before that he has held several postdoctoral fellow positions:

- School of Computing Science at Simon Fraser University, with Funda Ergun, Cenk Sahinalp, and Petra Berenbrink; - Department of Computer Sciences at the University of Texas at Austin, with David Zuckerman; - Department of Computer and Information Science at University of Pennsylvania, with Sampath Kannan.

He completed his Ph.D. in the Computer Science Department at Cornell University in August 2001. His advisor was Ronitt Rubinfeld. He got his B.S. degree from the Computer Engineering and Information Science Department at Bilkent University in 1996.