Texas

In Bayesian probability, the Jeffreys prior, named after Harold Jeffreys, is a non-informative (objective) prior distribution on parameter space that is proportional to the square root of the Fisher information:

p({\vec {\theta }})\propto {\sqrt {I({\vec {\theta }})}}.\,

It has the key feature that it is invariant under reparameterization of the parameter vector ${\vec {\theta }}$ . In particular, for an alternate parameterization ${\vec {\varphi }}$ we can derive

p({\vec {\varphi }})\propto {\sqrt {I({\vec {\varphi }})}}\,

from

p({\vec {\theta }})\propto {\sqrt {I({\vec {\theta }})}}\,

using the change of variables theorem, the definition of Fisher information, and that the product of determinants is the determinant of the matrix product:

{\begin{aligned}p({\vec {\varphi }})&=p({\vec {\theta }})\left|\det \left({\frac {\partial \theta _{i}}{\partial \varphi _{j}}}\right)\right|\\&\propto {\sqrt {I({\vec {\theta }}){\det }^{2}\left({\frac {\partial \theta _{i}}{\partial \varphi _{j}}}\right)}}\\&={\sqrt {\det \left({\frac {\partial \theta _{k}}{\partial \varphi _{i}}}\right)\det \left(E\left[{\frac {\partial \ln L}{\partial \theta _{k}}}{\frac {\partial \ln L}{\partial \theta _{l}}}\right]\right)\det \left({\frac {\partial \theta _{l}}{\partial \varphi _{j}}}\right)}}\\&={\sqrt {\det \left(E\left[\sum _{k,l}{\frac {\partial \theta _{k}}{\partial \varphi _{i}}}{\frac {\partial \ln L}{\partial \theta _{k}}}{\frac {\partial \ln L}{\partial \theta _{l}}}{\frac {\partial \theta _{l}}{\partial \varphi _{j}}}\right]\right)}}\\&={\sqrt {\det \left(E\left[{\frac {\partial \ln L}{\partial \varphi _{i}}}{\frac {\partial \ln L}{\partial \varphi _{j}}}\right]\right)}}={\sqrt {I({\vec {\varphi }})}}.\end{aligned}}

In the simpler case of a single parameter space variable we can derive

{\begin{aligned}p(\varphi )&=p(\theta )\left|{\frac {d\theta }{d\varphi }}\right|\propto {\sqrt {E\left[\left({\frac {d\ln L}{d\theta }}\right)^{2}\right]\left({\frac {d\theta }{d\varphi }}\right)^{2}}}\\&={\sqrt {E\left[\left({\frac {d\ln L}{d\theta }}{\frac {d\theta }{d\varphi }}\right)^{2}\right]}}={\sqrt {E\left[\left({\frac {d\ln L}{d\varphi }}\right)^{2}\right]}}={\sqrt {I(\varphi )}}.\end{aligned}}

From a practical and mathematical standpoint, a valid reason to use this non-informative prior instead of others, like the ones obtained through a limit in conjugate families of distributions, is that it is not dependent upon the set of parameter variables that is chosen to describe parameter space.

However, generally, the Jeffreys prior violates the likelihood principle in that inferences about ${\vec {\theta }}$ depend upon more than just the probability of the observed data as a function of ${\vec {\theta }}$ . That is, also relevant for inferences is the universe of all possible experimental outcomes, as determined by the experimental design, because the Fisher information is computed from an expectation over the chosen universe.

References

Jeffreys, H. (1946). "An Invariant Form for the Prior Probability in Estimation Problems". Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences. 186 (1007): 453–461. doi:10.1098/rspa.1946.0056.

Jeffreys, H. (1939). Theory of Probability. Oxford University Press.

@@ Line 39: / Line 39: @@
 However, generally, the Jeffreys prior violates the [[likelihood principle]] in that inferences about <math>\vec\theta</math> depend upon more than just the probability of the observed data as a function of <math>\vec\theta</math>.  That is, also relevant for inferences is the universe of all possible experimental outcomes, as determined by the experimental design,  because the Fisher information is computed from an expectation over the chosen universe.
-==Logarithmic prior==
-For the parameter space of positive numbers (<math>0 < \theta < \infty</math>), the Jeffreys prior is also known as the '''logarithmic prior,''' as it can described as the [[uniform distribution]] on the [[log scale]]: it is the unique (up to a multiple) prior (on the positive reals) that is ''scale''-invariant. It can also be described as the distribution such that <math>p(\theta) \propto 1/\theta</math>.
-As with the infinite uniform "distribution", which is the unique ''translation''-invariant distribution on the reals, it is an [[improper prior]].
 ==References==

The best road to progress is freedom's road. - JFK

Texas

Revision as of 16:06, 2 March 2009

References