Sir David, David Roxbee Cox, was born on 15 July 1924 in Birmingham, United Kingdom, were he attended Handsworth Grammar School. (The aeronautical engineer Harold R. Cox was a distant cousin.) David received his Master of Arts in mathematics at St John’s College, Cambridge. Referring to his time as a student in Cambridge, he very often mentioned Harold Jeffreys, and to a lesser extent J. O. Irwin. Sir Harold Jeffreys, FRS (1891–1989) was mathematician, statistician, geophysicist and astronomer. His book Theory of Probability (1939) discussed the objective Bayesian view on probability, which Sir David referred to even in his last seminars, see also . Jeffreys was also involved in mathematical physics, the favorite subject of David Cox in his very early steps. He eventually concentrated on statistics in the early 1950s. As for Joseph Oscar Irwin (1898–1982), he was a key person in the middle third of the 20th century, linking theoretical statistics to applications in medicine, an area that Sir David respected during all his research. Moreover, Irwin was one of the very few statisticians who worked with both Pearson and Fisher and was able to maintain cordial relations with these strong personalities in the statistics world of the 20th century . I certainly think that both Jeffreys and Irwin influenced the statistical line of thought of Sir David in his future important work. David Cox obtained his PhD from the University of Leeds in 1949, supervised by Prof. Henry Daniels, FRS, and Prof. Bernard Welch, a founder of the Industrial and Agricultural Research Section of the RSS. His dissertation was entitled “Theory of Fibre Motion”. Below is a list of milestone dates in his career:
Royal Aircraft Establishment;
Wool Industries Research Association of Science and Technology;
Assistant Lecturer in Mathematics, University of Cambridge;
Visiting University of North Carolina, Princeton, and Berkeley;
Reader in Statistics, Birkbeck College, London;
Professor of Statistics, Birkbeck College, London;
Member of Technical Staff, Bell Laboratories;
Professor of Statistics, Imperial College of Science and Technology, London;
Head of Department of Mathematics, Imperial College;
SERC Senior Research Fellow;
Warden, Nuffield College, Oxford;
Honorary Fellow of Nuffield College.
David Cox was married with Joyce Drummond since 1947, with four children. He was knighted in 1985 and received the Copley Medal, the Royal Society’s highest award, in 2010.
David Cox served as President of the Royal Statistical Society (1980–82) and the International Statistical Institute (1995–97). In this capacity I had the honor to meet him at the 51st Session of ISI in Istanbul and discuss in detail the satellite conference on Industrial Statistics we held in Athens ; this among other occasions of meeting him in various countries. I still remember that discussion and the comments-questions he asked when I listed the papers presented at the Athens satellite conference. Later I became aware of the contributions to industry he made during his first work steps, at the Royal Aircraft Establishment and the Wool Industries Research Association. At that time, in 1949, he published his first two papers  (part of his doctoral dissertation, related to industrial problems), and the discussion of quality control ideas . In 1998 he visited Greece, the University of Business and Economics, Department of Statistics, where he was awarded the honorary doctor degree. A complete list of about 384 publications of Sir David Cox can be found on the internet.
David Cox was a doctoral advisor for several distinguished statisticians, among them David Hinkley (with whom he published in 1974 the book Theoretical Statistics), Peter McCullagh (who received the 1983 Karl Pearson Prize of the ISI), Henry Wynn (in design theory; Wynn was the first RSS president elected by a contested vote in 1977). Sir David authored a great number of pioneering works, offering an elegant statistical background and appropriate solutions to real life problems. Most of us worked with a range of his concepts and methods, including the Cox process, Cox models and the Cox’s direction. Cox’s 1972 survival analysis paper accounted for over 26 % of the citations to papers in Series B of the Journal of the Royal Statistical Society, something like more than 50 000 citations! He was awarded the International Prize in Statistics, recognizing him specifically for his 1972 paper , in which he developed the proportional hazards model that today bears his name, and which changed the way we understand and analyze risk factors.
We shall try to provide here a compact review of his work, specifically, at least of the part that has received a great number of citations and covers different fields in statistics.
1 Experiment design – regression
Following the line of thought of  and his pioneering work, Sir David, worked in his early research on the book Planning Experiments , one of his favorite texts. The book is devoted to all sorts of experimental design models, and although there are discussions on error reduction, it does not contain an optimal design approach, as it has been treated in  by S. D. Silvey, a close fellow to Sir David, or later by his student H. P. Wynn in . The experiment design point of view was also discussed, among several very helpful statistical ideas for the cancer problem, in the papers [48, 49], which account for over 2000 citations. Cox came back to the design of experiments and regression in the paper , written for the 150th anniversary of the RSS, with a list of 22 essential points stated in an appendix, points that offered vital lines of thought for the interested researcher. I think that trying to address the point 22, concerned with “the prediction (via intervals) of future values”, I came across the idea that for applications, and for “future observations” it is better to adopt tolerance intervals rather than confidence intervals, see [45, 46]. Adopting regression and working on the general definition of residuals, Cox and E. J. Snell  came across an application of their method to a nonlinear model for leukemia data, where crude and modified residuals are evaluated. On the subject of regression, the paper  offers very nice, in my opinion, “miscellaneous and isolated comments”. The paper  can be considered as a continuation of the existing common work, treating the problem of variable selection. Therein a series of criteria are mentioned, especially Mallow’s statistic, recommending the general points a researcher should follow. The medical line of thought, for the practical applications, is also present in this paper: the relation between time to death and the level of some prescribed “dose” is considered, and the possible analyses and classification of variables such as age and sex are discussed. The problem of selection of variables in linear regression was essential at that time (see, e.g., Hocking ), and appropriate “routines” were discussed in . Working with mixtures of experiments, Cox presented new such models in . In principle, the centroid of a constrained region is the reference mixture, while the effect of the -th component is measured along a line connecting this centroid to the corresponding vertex ; it is the direction of this line that is known as Cox’s direction. I believe that all this demonstrates an essential characteristic of D. R. Cox’s scientific life: he was present, during all his active years, with his own contributions to various problems, at the right time, offering new ideas and clarifying existing ones. Sir David returned to the experiment design theory in the book , this time with a modern notation, discussing recent methods (in Chapter 8), nonlinear design, and optimal designs (Chapter 7). Although the spirit of  was preserved, the presentation of the work is different, with the addition of the new ideas that emerged since then.
2 Survival analysis – binary data
The sigmoid curve is known as “logistic curve”, due to Adolphe Quetelet’s student Pierre Francois Verhulst (1804–1849). It was J. Berkson who devoted his statistical work to “logit models” , according to Bliss pioneering work  on “probit models”, and then later D. Finney coined the term bioassay .
The binary response problem was extensively discussed by Cox . Later, in , he cemented the theory of the binary response problem, so useful in biostatistics and crucial in data analysis. In this way a systematic and strong framework was constructed for binary data, analogous to the least squares method and extending the probit analysis. The “covariate paper”  is concerned with the existence of the MLE working with binary response problems on the Analysis of Binary Data . Some years later an improved version of the 1969 book was published . As computing technology was changing rapidly, binary analysis became increasingly popular. Going from one variable to two, the problem can be simply described as follows.
Let and be two dependent Bernoulli variables. Let be a covariate associated to the distribution of and . In  Sir David works within the framework of the analysis of multivariate binary data, adopts logistic models (see [23, Table 2]), and views as a special case the joint distribution of . The possible outcomes are , , , . Eventually the bivariate distribution of can be expressed as products of (ordinary) logistic functions and thus the likelihood function and the information matrix can be evaluated (see also ).
The year 1972 is crucial for David Cox’s research. He presented important results in [22, 23], which changed the way of thinking on what a risk factor is in survival analysis. This paved the way for powerful scientific research and discoveries to take place, which had a lasting impact on human health worldwide. He introduced statistics in medical applications that J. O. Irwin (see Section 1 above) could not have even imagined! His mark on research is so great that his 1972 paper is one of the three most-cited papers in statistics and ranked 16th in Nature’s list of the top 100 most-cited papers of all time for all fields. So indeed 1972 was a golden year for Sir David. Since then we are referring to Cox’s proportional hazards model. We shall refer to his paper for the essential newly proposed relation (9) therein: the typical hazard function is, in principle, specified by the assumed probability model to identify etiological agents for the risk problem under investigation. Let us suppose, at first, that two explanatory variables, and , say, are of interest, and that these do not vary with time. We can assume that is a linear function of and , , say. Recall that we assume that , and this might not be the case for the postulated linear function. If it is assumed that the function is linear, with an extra linear term to consider time, there are still problems, even if the parameters can be estimated. Not only is difficult to define how the hazard function depends on time, but if it is also assumed to be non-monotonic (does not increase or decrease with time), then it is difficult to find an appropriate explicit such function to include in the model. The Cox proportional hazards model provides the solution. It defines , with , , the vector of covariates associated to each individual , as
Here is a vector of unknown parameters, and an unknown function, which provides the hazard function at , known as baseline function. The above relation (1) is revolutionary. In Sir David’s words: “My model is used to compute the probability of anything from earthquakes to bankruptcies”. Definition (1) and the related theory and computations are widely used in the analysis of survival data. They enable researchers to easily identify the risks of specific factors for mortality. Certainly the model can be applied to other “survival outcomes”, as in electronics among groups of materials with disparate characteristics, or in economics when risk factors are under investigation. The whole analysis is based on the Maximum Likelihood, as all his work is ”Fisherian”. It is remarkable the way he treats the likelihood now, ignoring some of its terms.
3 Stochastic processes
Although David Cox agreed (see ) that there is too much in his paper , “Doubly stochastic Poisson process, all sorts of tests to do with empirical series, of points events …”, this paper is certainly his first mentioned contribution to the field of stochastic processes. Most of these ideas were present in his doctoral dissertation, while his interest in queues (see ) originated from his work in the textile industry. Today some people are regarding queuing theory as a branch of operational research, but nobody denies that it is inextricably linked to the stochastic processes, including the adoption of Kendall’s notation, in his excellent work .
The realistic line of thought rather than the technicalities is clear in the two papers [12, 10], published in the same proceedings volume, where it is shown how a non-Markov process can be built into a Markov process. David Cox remained faithful to “the spirit of Bartlett’s great masterpiece , which is a difficult read, but not because of an overelaborate mathematical formalism” . The covariance counting problem in physics was successfully tackled as a stochastic process . We recall the pioneering work of Maurice Bartlett (1910–2002), devoted mainly to the analysis of data with spatial and temporal patterns, also known from Bartlett’s method in analysis of time series. In his book on stochastic processes , Bartlett summarizes all his work on the subject, and Sir David is referring to it as a “masterpiece”. Bartlett sometimes criticized Fisher, but he was a pioneer in the field (see also ). Sir David expressed in  the opinion that somebody might study stochastic processes without a heavy mathematical background or by adopting an overelaborate mathematical formalism, even for renewal theory [17, 30].
I feel that there are times when mathematical technicalities are not helpful at all to the experimentalists, and so Cox’s line of thought is well accepted. Still there are cases, like the theories of stochastic birth-death processes, where technicalities can be useful for modeling processes of carcinogenesis. Moreover, since the pioneering seminar of Karl Pearson in 1896 on “Regression, Heredity and Banmixia”, linear algebra became an important tool in statistics. Then a new chance was offered for more mathematical methods to enter statistical theory, and some indeed proved useful. In principle, I believe, it takes time for a mathematically oriented idea to be absorbed in practical problems, if adopted.
4 The separate families problem
A very interesting problem, known as the “separate families of hypotheses”, was introduced in [15, 16]. Cox then returned to this problem later, in . A compact formulation of the problem reads as follows: Let , , be independent identically distributed (i.i.d) random variables from a population with density function . Let be a parameter with values in a parameter space , and be parameter with values in a different parameter space . Consider the distribution functions and associated with the parameter spaces and , respectively, as well as the resulting families of distributions and . It is assumed that all the distribution functions are associated with the same baseline measure. The problem is to test, under smoothness conditions on and ,
The method is applicable for the “one-hit” or the “two-hit” models in the binary response theory , so essential for statistics problems concerned with cancer. The paper  considered the difference of the log-likelihoods for and , denoted , with being the estimate of the parameter , and the expected value of with respect to , say. Thus, the paper worked with the test statistic . It was really a very interesting line of thought, based on fundamental statistical principles.
5 Other fields
The concepts of marginal and conditional likelihood were clarified by David Cox in the paper , where he also treated the likelihood of the hazards proportional model and proved that it fits the partial likelihood definition he proposed. He also worked on the concept of likelihood in , where he proved that the maximum likelihood estimation of a simple model retains high efficiency in the presence of modest amounts of overdispersion.
Dealing with sequential likelihood ratio tests, David Cox proposed in , under mild assumptions, an easy-to-handle method based on an approximation providing numerical evaluations. Moreover, in , he devised a unified method under which sequential tests can be obtained for composite hypotheses. Therein he considered the problem of discriminating between the hypotheses and concerning two different Bernoulli trials. There are several papers based on ; I think  tried to offer a mathematical justification for this excellent paper on sequential analysis, where the calculations, eventually, establish the validity of the theoretical considerations for the main argument of sequential analysis: there is a gain in the sampling units, providing a discussion for sufficiency and invariance.
An interesting contribution to sampling is the two-stage sampling , which provided food for thought for a two-stage optimal experimental design , while the sequential procedures offered a solution to the estimation problem for the nonlinear optimal experimental designs. His contribution to time series is not reduced to the paper , very rich in ideas; we should mention also , where the trend is investigated, and a smooth function of the time of the form or , where is known and are unknown parameters, is introduced. Importantly, in  a point process which is a generalization of a Poisson process, also known as a Cox process, was introduced. It is interesting that David Cox uses the same notation, lambda, for this function, as in the proportional hazards model (see ).
There is no textbook on regression analysis that does not refer to the Box and Cox transformations, and to the masterpiece source  for teaching all sorts of statistics subjects to graduate students. Both J. Tukey and M. S. Bartlett, in a discussion of , stated: “the authors have made a major step forward”. It is indeed a marvelous contribution, widely adopted, especially in applications.
We tried to survey briefly a small, but – we believe – representative part of David Cox’s extended scientific research, in almost all fields of statistics. One should emphasize that all his papers (despite including often in the title the word “notes”), are rich in new, pioneering ideas, and always provide helpful examples.
Sir David was particularly known for adopting a pragmatic rather than a dogmatic perspective on the Bayesian/frequentist controversy and described this position at his very interesting RSS seminars and accompanying videos. He was also referring to “foundation” with the well-known comment of Fisher, about “building a basement”. His line of thought was clearly referring to “theoretical statistics” and not to “mathematical statistics”; needless to say, he was faithful to this line of thought until the end. Model adequacy was crucial to him, though probably he did not persuade everybody working in the field of medical statistics. He received many honors: the Guy Medal in Gold of the RSS, the inaugural International Prize for Statistics, and the Copley Medal of the RSS (as Carl Friedrich Gauss once did!). Sir David will be remembered as an incredibly generous and supportive friend. I had the honor to receive his friendly comments and advice in many discussions, and especially at the ISI Session in Istanbul while discussing industrial statistics and a cancer problem. Only a small sample of his over 350 papers are mentioned here. He was the editor of the journal Biometrika for an extraordinary span of 25 years, from 1966 to 1991, and was a co-editor with Professor D. M. Titterington, head of the Department of Statistics of Glasgow University in the 1980s (and my supervisor in Glasgow!), of a volume dedicated to the centennial anniversary of Biometrika. He served terms as President of the Royal Statistical Society (1980–82) and the International Statistical Institute (1995–97). In his words, “people say theoretical work in statistics should be motivated by applications because it’s a practical subject” . That is in accordance with his good relationship with John Tukey, during his visit to USA, but mainly provides evidence for the general line of thought David Cox followed, often stressing how hard it was for him to get to grips with ideas and to solve the impressive, for us, problems that he formulated in his pioneering work in statistics. He did not hesitate to work on the improvement of his own books: he returned to and with D. Reid  revised the experiment design book , and with E. Snell revised the Analysis of Binary Data .
As Professor F. Downton stated in his discussion for the  paper: “Professor Cox has been too modest”, and he lived in modesty all his productive life, one could add.
Acknowledgements I would like to thank Alex Rigas for helpful discussions during the preparation of this paper. I am also grateful to Steven Roy and Fernando Pestana da Costa for their help.
- M. S. Bartlett, Some evolutionary stochastic processes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 11, 211–229 (1949)
- M. S. Bartlett, An introduction to stochastic processes. Cambridge University Press, London (1955)
- J. Berkson, Why I prefer logits to probits. Biometrics 7, 327–339 (1951)
- C. L. Bliss, The method of probits. Science 79, 38–39, 409–410 (1934)
- G. E. P. Box and D. R. Cox, An analysis of transformations (with discussion). J. R. Stat. Soc. Ser. B. Stat. Methodol. 26, 211–252 (1964)
- A. Brearley and D. R. Cox, An Outline of statistical methods for use in the textile industry. Wool Industries Research Association, Leeds (1949)
- D. R. Cox, The theory of drafting wool silvers. Proc. Roy. Soc. London Ser. A 197, 28–51 (1949)
- D. R. Cox, Estimation by double sampling. Biometrika 39, 217–227 (1952)
- D. R. Cox, Sequential tests for composite hypotheses. Proc. Cambridge Philos. Soc. 48, 290–299 (1952)
- D. R. Cox, The analysis of non-Markovian stochastic processes by the inclusion of supplementary variables. Proc. Cambridge Philos. Soc. 51, 433–441 (1955)
- D. R. Cox, Some statistical methods connected with series of events (with discussion). J. R. Stat. Soc. Ser. B. Stat. Methodol. 17, 129–164 (1955)
- D. R. Cox, A use of complex probabilities in the theory of stochastic processes. Proc. Cambridge Philos. Soc. 51, 313–319 (1955)
- D. R. Cox, Planning of experiments. A Wiley Publication in Applied Statistics, John Wiley & Sons Inc., New York (1958)
- D. R. Cox, The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B. Stat. Methodol. 20, 215–242 (1958)
- D. R. Cox, Tests of separate families of hypotheses. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I, Univ. California Press, Berkeley, 105–123 (1961)
- D. R. Cox, Further results on tests of separate families of hypotheses. J. R. Stat. Soc. Ser. B. Stat. Methodol. 24, 406–424 (1962)
- D. R. Cox, Renewal theory. Methuen & Co. Ltd., London (1962)
- D. R. Cox, Large sample sequential tests for composite hypotheses. Sankhyā Ser. A 25, 5–12 (1963)
- D. R. Cox, Notes on aspects of regression analysis (with discussion). J. Roy. Statist. Soc. Ser. A 131, 265–279 (1968)
- D. R. Cox, Analysis of binary data. Chapman & Hall, London (1969)
- D. R. Cox, A Note on polynomial response function for mixtures. Biometrika 58, 155–159 (1971)
- D. R. Cox, Regression models and life-tables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 34, 187–220 (1972)
- D. R. Cox, The analysis of multivariate binary data. Appl. Stat. 21, 113–120 (1972)
- D. R. Cox, Partial likelihood. Biometrika 62, 269–276 (1975)
- D. R. Cox, Some remarks on overdispersion. Biometrika 70, 269–274 (1983)
- D. R. Cox, Present position and developments: Some personal views. Design of experiments and regression. J. Roy. Statist. Soc. Ser. A 147, 306–315 (1984)
- D. R. Cox, A return to an old paper: “Tests of separate families of hypotheses”. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75, 207–215 (2013)
- D. R. Cox and V. Isham, A bivariate point process connected with electronic counters. Proc. Roy. Soc. London Ser. A 356, 149–160 (1977)
- D. R. Cox and P. A. W. Lewis, The statistical analysis of series of events. Methuen & Co. Ltd., London (1966)
- D. R. Cox and H. D. Miller, The theory of stochastic processes. Methuen & Co. Ltd., London (1965)
- D. R. Cox and N. Reid, The theory of the design of experiments. Monographs on Statistics and Applied Probability 86, Chapman & Hall, Boca Raton (2000)
- D. R. Cox and W. L. Smith, Queues. Chapman & Hall, London (1991)
- D. R. Cox and E. J. Snell, A general definition of residuals. J. R. Stat. Soc. Ser. B. Stat. Methodol. 30, 248–275 (1968)
- D. R. Cox and E. J. Snell, The choice of variables in observational studies. Appl. Stat. 23, 51–59 (1974)
- D. R. Cox and E. J. Snell, Analysis of binary data. Second ed., Monographs on Statistics and Applied Probability 32, Chapman & Hall, London (1989)
- D. J. Finney, Probit analysis. Third ed., Cambridge University Press, London (1971)
- R. A. Fisher, Design of experiments. Oliver and Boyd, Edinburgh (1935)
- W. J. Hall, R. A. Wijsman and J. K. Ghosh, The relationship between sufficiency and invariance with applications in sequential analysis. Ann. Math. Statist. 36, 575–614 (1965)
- R. R. Hocking, The analysis and selection of variables in linear regression. Biometrics 32, 1–49 (1976)
- J. O. Irwin, The place of mathematics in medical and biological statistics. J. Roy. Statist. Soc. Ser. A 126, 1–44 (1963)
- H. Jeffreys, On the relation between direct and inverse methods in statistics. Proc. Roy. Soc. London Ser. A 160, 325–348 (1937)
- D. G. Kendall, Stochastic processes occurring in the theory of queues and their analysis by the method of the imbedded Markov chain. Ann. Math. Statistics 24, 338–354 (1953)
- C. P. Kitsos, Design and inference in nonlinear problems. Unpublished PhD thesis, Glasgow University (1986)
- C. P. Kitsos and L. Edler, Industrial statistics. Physica-Verlag (1997)
- C. P. Kitsos and T. L. Toulias, Confidence and tolerance regions for the signal process. Recent patterns on signal processing 2, 149–155 (2012)
- C. P. Kitsos and V. Zarikas, On the best predictive general linear model for data analysis: A tolerance region algorithm for prediction. J. Appl. Science 13, 513–524 (2013)
- L. R. La Motte, The SELECT routines: A program for identifying best subject regression. Appl. Stat. 21, 92–93 (1972)
- R. Peto, C. M. Pike, P. Armitage, E. N. Breslow, D. R. Cox, V. S. Howard, N. Mantel, K. McPherson, J. Peto and C. P. Smith, Design and analysis of randomized clinical trials requiring prolonged observation of each patient. Part I. Br. J. Cancer 34, 585–612 (1976)
- R. Peto, C. M. Pike, P. Armitage, E. N. Breslow, D. R. Cox, V. S. Howard, N. Mantel, K. McPherson, J. Peto and C. P. Smith, Design and analysis of randomized clinical trials requiring prolonged observation of each patient. Part II. Br. J. Cancer 35, 1–39 (1977)
- N. Reid, A conversation with Sir David Cox. Statist. Sci. 9, 439–455 (1994)
- M. J. Silvapulle, On the existence of maximum likelihood estimators for the binomial response models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 43, 310–313 (1981)
- S. D. Silvey, Optimal design. Monographs on Applied Probability and Statistics, Chapman & Hall, London (1980)
- H. P. Wynn, Results in the theory and construction of D-optimum experimental designs. J. R. Stat. Soc. Ser. B. Stat. Methodol. 34, 133–147, 170–186 (1972)
Cite this article
Christos P. Kitsos, Sir David Cox: A wise and noble statistician (1924–2022). Eur. Math. Soc. Mag. 124 (2022), pp. 27–32DOI 10.4171/MAG/86