Causality detection based on information-theoretic approaches in time series analysis Hlavackova-Shindler Palus 2009
From enfascination
<bibtex>@article{hlaváčková2007causality,
title=Template:Causality detection based on information-theoretic approaches in time series analysis, author={Hlav{\'a}{\v{c}}kov{\'a}-Schindler, K. and Palu{\v{s}}, M. and Vejmelka, M. and Bhattacharya, J.}, journal={Physics Reports}, volume={441}, number={1}, pages={1--46}, year={2007}, publisher={Elsevier}
} </bibtex>
Great map to the literature. I found in it what I have been doing until now and why that is wrong. It was here that I finally sat down and thought through the implications, in the calucation of entropy, the implications of weighting each p value by its log, and taking the sum.
I got introduced to the Kullback-Leibler divergence (KLD) (1951) as an alternative to mutual information. "Mutual information is the KLD of the product P(X)P(Y) from the joint distribution P(X,Y)" as demonstrated in <bibtex>@article{gelman2003bayesian,
title=Template:Bayesian data analysis. Texts in statistical science, author={Gelman, A. and Carlin, J.B. and Stern, H.S. and Rubin, D.B.}, journal={Boca Raton (Florida): Chapman \& Hall/CRC Press}, volume={200}, pages={696}, year={2003}
} </bibtex>
By the definition of the "norm of the mutual information" I asked "Is this the average MI added in per time step in the window"
At the bottom of p 190 is the discussion of why I should be using transfer entropy instead of mutual information.
196 discusses partioning, and other approaches besides the binning which I have been using naively.
It then talks about using learning methods, and then talks about kernel methods, which I am clueless about.
Paper wraps up, as promised, with a dicussion of Granger causality, briefly "for a pair of stationary, weakly dependant, bivariate time series X, Y, Y is a Granger cause of X if the distrution of X, given past obserrvations of X and Y, differs from the distribution of X given past observations of X only."
"Theoretically, for a good entropy estimator, the condition of consistency seems to be important." I have no idea what this means, but it looks important, no?
bibliography
<bibtex>@article{schreiber2000measuring,
title=Template:Measuring information transfer, author={Schreiber, T.}, journal={Physical review letters}, volume={85}, number={2}, pages={461--464}, year={2000}, publisher={APS}
} </bibtex> Schrieber walks through conditional mutual information as a Markov process. [1]
<bibtex>@article{hlaváčková2007causality,
title=Template:Causality detection based on information-theoretic approaches in time series analysis, author={Hlav{\'a}{\v{c}}kov{\'a}-Schindler, K. and Palu{\v{s}}, M. and Vejmelka, M. and Bhattacharya, J.}, journal={Physics Reports}, volume={441}, number={1}, pages={1--46}, year={2007}, publisher={Elsevier}
} </bibtex>
- shows that transfer entropy is equivalent to conditional mutual information
<bibtex> @article{paluš1996coarse,
title=Template:Coarse-grained entropy rates for characterization of complex time series, author={Palu{\v{s}}, M.}, journal={Physica D: Nonlinear Phenomena}, volume={93}, number={1-2}, pages={64--77}, year={1996}, publisher={Elsevier}
} </bibtex>
- "In order to obtain an asymptotic entropy estimate onf an m-dimensiona dynamical system, large amounts of data are necessary. To avoid this, Palus proposed to compute "course-grained entropy rates" as relative measures of "information creation" and of regularity and predictability of studied processes"