# Information, Information Processing, Causal Inference and Modeling Abstracts

Thomas Augustin

## On Imprecise Probability and Imprecise Information

The talk surveys some recent work on the relation between imprecise probability and imprecise information. We focus on some dynamical aspects and the handling of non-randomly coarsened / missing outcomes, in order to prepare a discussion of the relevance of our results for causal inference.

In the first part we investigate the power of imprecise priors in handling prior-data conflict in generalized Bayesian inference, in particular looking at the dynamic behavior in sequential updating. We present a class of models where the imprecision in the posterior distributions behaves in the desired way: while the imprecision shrinks steadily under prior-data agreement, it first increases under prior-data conflict and little sample information before eventually the sample information fully dominates the original prior information. We also discuss this result in the context of decision making and finally try to extend it to interval-valued / missing data.The second part is devoted to regression models under non-randomly coarsened / missing outcomes. We sketch a random set likelihood approach to handle categorical outcomes, which also allows to incorporate weak auxiliary information. Then we discuss a sufficiency-type result for generalized linear models under interval-valued response variables, which promises to make cautious data completion computationally tractable.

If time allows we also will present some ideas on imprecise imputation techniques in the matching/fusion of different data sets with some common matching variables and block-wise missing specific variables.

Some Related References:

Augustin, T., Walter, G., Coolen, F.P.A. Statistical inference. In T. Augustin, F.P.A. Coolen, G. de Cooman, and M.C.M. Troffaes, editors, Introduction to Imprecise Probabilities, pages 135-189. Wiley, Chichester, 2014.

Plass, J. Cattaneo, M. Augustin, T. Schollmeyer, G. Heumann, C. Towards a reliable categorical regression analysis for non-randomly coarsened observations: An analysis with German labour market data. Technical Report 206, Department of Statistics, LMU Munich, 2017. (https://epub.ub.uni-muenchen.

Plass, J., Cattaneo, M., Schollmeyer, G., Augustin, T. On the testability of coarsening assumptions: A hypothesis test for subgroup independence. International Journal of Approximate Reasoning, 90:292--306, 2017.

Endres, E., Fink, P., Augustin, T. Imprecise imputation: a nonparametric micro approach considering the natural uncertainty of statistical matching with categorical data. Technical Report, Department of Statistics, LMU Munich, to appear March 2018.

Gert de Cooman

## Stochastic processes with imprecise probability models

I present an approach to dealing with aspects of imprecision and indecision in modelling uncertainty in dynamical processes. In order to make the discussion as simple as possible, I focus on discrete-time, finite-state processes.

The model is surprisingly elegant and simple, and mathematically powerful, but steps away from the main body of work on stochastic processes in a number of respects, as it combines a number of ideas that have emerged in the foundations literature on (imprecise) probabilities in recent decades. First of all, it refrains from using a sample space, measure-theoretic approach to modelling uncertainty in dynamic processes (along the lines of Fermat's solution to the problème des points). Rather, it builds on event trees populated with local (immediate prediction) models in each of the nodes of the tree to represent this uncertainty (along the lines of Pascal's and Huygens' solutions). The martingale-theoretic connection between the two approaches has been a topic of study in recent years, and goes back to the work of Jean Ville in the late 1930s: it details how to convert the local probabilities in the nodes of the event into a global probability model on the sample space of the leaves of the tree.

Secondly, the local models in the nodes are allowed to be imprecisely specified, in a variety of forms: credal sets, lower expectations, sets of desirable gambles (and, why not, choice functions), leading to what is c called an imprecise probability tree. It is a topic of current study how these local models can be combined coherently to yields global sample space models in various ways that generalise the Ville approach, under various more or less restrictive assumptions. A more or less simple-minded view of how this variety comes about, starts from the idea that an imprecise probability tree (an event tree with imprecise local models) can be used to generate various collections of precise probability trees (event trees with precise local models) by appropriately choosing, in each of the nodes of the tree, a precise probability model in the local credal set. There are many ways and principles to guide the choice of these precise local models, each leading to specific interpretations and causality considerations for the resulting (more or less conservative) global model, and each leading to different formulas and methods for inferences about them.

All this will be illustrated using one of the simplest examples available: discrete-time, discrete-space Markov chains, where we give examples of inferences, and of interesting results that can be obtained.

Erik Hoel

## Causal models, information theory, and emergence

Recent tools and methods have revealed that information theory can be used to analyze the structure of causal models. This provides a mathematical basis for the previously amorphous concept of emergence, by assessing how causal models of the same system can contain greater information at higher scales than lower ones.

Dominik Janzing

## Causality as a tool for merging joint distributions

If X,Y,Z are three sets of variables, the distributions P(X,Y) and P(Y,Z) do not uniquely determine the joint distribution P(X,Y,Z). One can argue that adding causal information can render the problem unique under certain conditions and assumptions. More generally speaking, causal information thus entails statistical relations between variables that have never been observed together, as already shown by [1]. This way, causal statements can be empirically tested without resorting to interventions. I will elaborate on this idea [2] and speculate whether this property even defines the essential empirical content of causality in cases where interventions are hard to define (e.g. if a disease is considered 'a consequence of one's Age', what is an intervention on the variable 'Age'?).

[1] Tsamardinos & Triantafillou: Towards integrative causal analysis of heterogeneous data sets and studies, JMLR 2012.

[2] Janzing: Causality as a tool for merging joint distributions. In preparation

David Krakauer

## Evolution: organic and cultural dimensions of individual information accumulation

The theory of evolution posits that the dominant causes of change and stasis are ancestry/history and selection/environment (to include the null case when selection disappears under drift ). Change and stasis are measured in terms of time-dependent distributions over types (e.g. genotypes, phenotypes). Information theory provides a natural means of measuring evolutionary processes, gives us insight into appropriate least action principles, and suggests a measure of complexity. Furthermore, information theory can provide some insights into the natural scales of an evolving system (e.g. organism, population) and help to define the conditions under which evolution can occur in relation to secret or private adaptive information. When considering cultural evolution a number of these constraints are lifted but a few essential features of information accumulation remain. I shall discusses these ideas in relation to the organic evolution of genomes, organisms, and cultural artifacts such as constitutions.

Sarah Marzen

## Designing lossy predictive sensors of memoryful environments

Much machine learning research today centers on recurrent neural networks that can predict their input. Here, we argue that we can draw inspiration from biology when designing such recurrent neural networks. We review theoretical and empirical reasons to believe that biological sensors form lossy predictive representations of sensory input. We then move onto some lessons for designing such biological sensors. First, we highlight the importance of choosing the right objective function. Then, we discuss whether or not large randomly wired sensors might be "good enough" at forming predictive representations.

Kun Zhang

## Causal discovery and data heterogeneity

Causal discovery aims to reveal the underlying causal model from observational data. Recently various types of independence, including conditional independence between observed variables and independence between causes and noise, have been exploited to solve this problem. In this talk I will show how causal discovery and latent variable learning can greatly benefit from heterogeneity or nonstationarity of the data--data heterogeneity improves the identifiability of the causal model, and allows us to identify the true causal model even in the presence of a large number of hidden variables that are causally related. I will then present a set of corresponding approaches to causal discovery, together with their pros and cons. Finally, I will discuss how the problem of causal discovery from nonstationary or heterogenous data is related to human causal learning and human representation learning.