Features a broad introduction to recent research on Turing’s formula and presents modern applications in statistics, probability, information theory, and other areas of modern data science
Turing's formula is, perhaps, the only known method for estimating the underlying distributional characteristics beyond the range of observed data without making any parametric or semiparametric assumptions. This book presents a clear introduction to Turing’s formula and its connections to statistics. Topics with relevance to a variety of different fields of study are included such as information theory; statistics; probability; computer science inclusive of artificial intelligence and machine learning; big data; biology; ecology; and genetics. The author provides examinations of many core statistical issues within modern data science from Turing's perspective. A systematic approach to long-standing problems such as entropy and mutual information estimation, diversity index estimation, domains of attraction on general alphabets, and tail probability estimation is presented in light of the most up-to-date understanding of Turing's formula. Featuring numerous exercises and examples throughout, the author provides a summary of the known properties of Turing's formula and explains how and when it works well; discusses the approach derived from Turing's formula in order to estimate a variety of quantities, all of which mainly come from information theory, but are also important for machine learning and for ecological applications; and uses Turing's formula to estimate certain heavy-tailed distributions.
In summary, this book:
• Features a unified and broad presentation of Turing’s formula, including its connections to statistics, probability, information theory, and other areas of modern data science
• Provides a presentation on the statistical estimation of information theoretic quantities
• Demonstrates the estimation problems of several statistical functions from Turing's perspective such as Simpson's indices, Shannon's entropy, general diversity indices, mutual information, and Kullback–Leibler divergence
• Includes numerous exercises and examples throughout with a fundamental perspective on the key results of Turing’s formula
Statistical Implications of Turing's Formula is an ideal reference for researchers and practitioners who need a review of the many critical statistical issues of modern data science. This book is also an appropriate learning resource for biologists, ecologists, and geneticists who are involved with the concept of diversity and its estimation and can be used as a textbook for graduate courses in mathematics, probability, statistics, computer science, artificial intelligence, machine learning, big data, and information theory.
Zhiyi Zhang, PhD, is Professor of Mathematics and Statistics at The University of North Carolina at Charlotte. He is an active consultant in both industry and government on a wide range of statistical issues, and his current research interests include Turing's formula and its statistical implications; probability and statistics on countable alphabets; nonparametric estimation of entropy and mutual information; tail probability and biodiversity indices; and applications involving extracting statistical information from low-frequency data space. He earned his PhD in Statistics from Rutgers University.
Keywords: Turings formula; statistics; probability; information theory; data science; computer science; artificial intelligence; machine learning; big data; biology; ecology; and genetics; Simpson's indices; Shannon's entropy; general diversity indices; mutual information; Kullback-Leibler divergence, Data Analysis, Methods & Statistics in Ecology, Data Analysis, Methods & Statistics in Ecology