TY - GEN

T1 - Information bounds for the risk of Bayesian predictions and the redundancy of universal codes

AU - Barron, Andrew

AU - Clarke, Bertrand

AU - Raussler, David

PY - 1993

Y1 - 1993

N2 - Several diverse problems have solutions in terms of an information-theoretic quantity for which we examine the asymptotics. Let Y1, Y2,..., YN be a sample of random variables with distribution depending on a (possibly infinite-dimensional) parameter θ. The maximum of the mutual information IN = I(θ; Y1, Y2,..., YN) over choices of the prior distribution of θ provides a bound on the cumulative Bayes risk of prediction of the sequence of random variables for several choices of loss function. This same quantity is the minimax redundancy of universal data compression and the capacity of certain channels. General bounds for this mutual information are given. A special case concerns the estimation of binary-valued functions with Vapnik-Chervonenkis dimension dvc, for which the information is bounded by dvc log N. For smooth families of probability densities with a Euclidean parameter of dimension d, the information bound is (d/2) log N plus a constant. The prior density proportional to the square root of the Fisher information determinant is the unique continuous density that achieves a mutual information within o(1) of the capacity for large N. The Bayesian procedure with this prior is asymptotically minimax for the cumulative relative entropy risk.

AB - Several diverse problems have solutions in terms of an information-theoretic quantity for which we examine the asymptotics. Let Y1, Y2,..., YN be a sample of random variables with distribution depending on a (possibly infinite-dimensional) parameter θ. The maximum of the mutual information IN = I(θ; Y1, Y2,..., YN) over choices of the prior distribution of θ provides a bound on the cumulative Bayes risk of prediction of the sequence of random variables for several choices of loss function. This same quantity is the minimax redundancy of universal data compression and the capacity of certain channels. General bounds for this mutual information are given. A special case concerns the estimation of binary-valued functions with Vapnik-Chervonenkis dimension dvc, for which the information is bounded by dvc log N. For smooth families of probability densities with a Euclidean parameter of dimension d, the information bound is (d/2) log N plus a constant. The prior density proportional to the square root of the Fisher information determinant is the unique continuous density that achieves a mutual information within o(1) of the capacity for large N. The Bayesian procedure with this prior is asymptotically minimax for the cumulative relative entropy risk.

UR - http://www.scopus.com/inward/record.url?scp=0027307823&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027307823&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0027307823

SN - 0780308786

T3 - Proceedings of the 1993 IEEE International Symposium on Information Theory

SP - 54

BT - Proceedings of the 1993 IEEE International Symposium on Information Theory

PB - Publ by IEEE

T2 - Proceedings of the 1993 IEEE International Symposium on Information Theory

Y2 - 17 January 1993 through 22 January 1993

ER -