TY - GEN
T1 - Information bounds for the risk of Bayesian predictions and the redundancy of universal codes
AU - Barron, Andrew
AU - Clarke, Bertrand
AU - Raussler, David
PY - 1993
Y1 - 1993
N2 - Several diverse problems have solutions in terms of an information-theoretic quantity for which we examine the asymptotics. Let Y1, Y2,..., YN be a sample of random variables with distribution depending on a (possibly infinite-dimensional) parameter θ. The maximum of the mutual information IN = I(θ; Y1, Y2,..., YN) over choices of the prior distribution of θ provides a bound on the cumulative Bayes risk of prediction of the sequence of random variables for several choices of loss function. This same quantity is the minimax redundancy of universal data compression and the capacity of certain channels. General bounds for this mutual information are given. A special case concerns the estimation of binary-valued functions with Vapnik-Chervonenkis dimension dvc, for which the information is bounded by dvc log N. For smooth families of probability densities with a Euclidean parameter of dimension d, the information bound is (d/2) log N plus a constant. The prior density proportional to the square root of the Fisher information determinant is the unique continuous density that achieves a mutual information within o(1) of the capacity for large N. The Bayesian procedure with this prior is asymptotically minimax for the cumulative relative entropy risk.
AB - Several diverse problems have solutions in terms of an information-theoretic quantity for which we examine the asymptotics. Let Y1, Y2,..., YN be a sample of random variables with distribution depending on a (possibly infinite-dimensional) parameter θ. The maximum of the mutual information IN = I(θ; Y1, Y2,..., YN) over choices of the prior distribution of θ provides a bound on the cumulative Bayes risk of prediction of the sequence of random variables for several choices of loss function. This same quantity is the minimax redundancy of universal data compression and the capacity of certain channels. General bounds for this mutual information are given. A special case concerns the estimation of binary-valued functions with Vapnik-Chervonenkis dimension dvc, for which the information is bounded by dvc log N. For smooth families of probability densities with a Euclidean parameter of dimension d, the information bound is (d/2) log N plus a constant. The prior density proportional to the square root of the Fisher information determinant is the unique continuous density that achieves a mutual information within o(1) of the capacity for large N. The Bayesian procedure with this prior is asymptotically minimax for the cumulative relative entropy risk.
UR - http://www.scopus.com/inward/record.url?scp=0027307823&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0027307823&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0027307823
SN - 0780308786
T3 - Proceedings of the 1993 IEEE International Symposium on Information Theory
SP - 54
BT - Proceedings of the 1993 IEEE International Symposium on Information Theory
PB - Publ by IEEE
T2 - Proceedings of the 1993 IEEE International Symposium on Information Theory
Y2 - 17 January 1993 through 22 January 1993
ER -