Tutorials in Biostatistics: Statistical Modelling of Complex Medical Data
Abstract
The development and use of statistical methods has grown exponentially over the last two decades. Nowhere is this more evident than in their application to biostatistics and, in particular, to clinical medical research. To keep abreast with the rapid pace of development, the journal Statistics in Medicine alone is published 24 times a year. Here and in other journals, books and professional meetings, new theory and methods are constantly presented. However, the transitions of the new methods to actual use are not always as rapid. There are problems and obstacles. In such an applied interdisciplinary ÿeld as biostatistics, in which even the simplest study often involves teams of researchers with varying backgrounds and which can generate massive complicated data sets, new methods, no matter how powerful and robust, are of limited use unless they are clearly understood by practitioners, both clinical and biostatistical, and are available with well-documented computer software.
In response to these needs Statistics in Medicine initiated in 1996 the inclusion of tutorials in biostatistics. The main objective of these tutorials is to generate, in a timely manner, brief well-written articles on biostatistical methods; these should be complete enough so that the methods presented are accessible to a broad audience, with su cient information given to enable readers to understand when the methods are appropriate, to evaluate applications and, most importantly, to use the methods in their own research.
At ÿrst tutorials were solicited from major methodologists. Later, both solicited and unsolicited articles were, and are still, developed and published. In all cases major researchers, methodologists and practitioners wrote and continue to write the tutorials. Authors are guided by four goals. The ÿrst is to develop an introduction suitable for a well-deÿned audience (the broader the audience the better). The second is to supply su cient references to the literature so that the readers can go beyond the tutorial to ÿnd out more about the methods. The referenced literature is, however, not expected to constitute a major literature review. The third goal is to supply su cient computer examples, including code and output, so that the reader can see what is needed to implement the methods. The ÿnal goal is to make sure the reader can judge applications of the methods and apply the methods. The tutorials have become extremely popular and heavily referenced, attesting to their usefulness. To further enhance their availability and usefulness, we have gathered a number of these tutorials and present them in this two-volume set.
Each volume has a brief preface introducing the reader to the aims and contents of the tutorials. Here we present an even briefer summary. We have arranged the tutorials by subject matter, starting in Volume 1 with 18 tutorials on statistical methods applicable to clinical studies, both observational studies and controlled clinical trials. Two tutorials discussing the computation of epidemiological rates such as prevalence, incidence and lifetime rates for cohort studies and capture-recapture settings begin the volume. Propensity score adjustment methods and agreement statistics such as the kappa statistic are dealt with in the next two tutorials. A series of tutorials on survival analysis methods applicable to observational study data are next. We then present ÿve tutorials on the development of prognostics or clinical prediction models. Finally, there are six tutorials on clinical trials. These range from designing vii viii PREFACE and analysing dose response studies and Bayesian data monitoring to analysis of longitudinal data and generating simple summary statistics from longitudinal data. All these are in the context of clinical trials. In all tutorials, the readers is given guidance on the proper use of methods.
The subject-matter headings of Volume 1 are, we believe, appropriate to the methods. The tutorials are, however, often broader. For example, the tutorials on the kappa statistics and survival analysis are useful not only for observational studies, but also for controlled clinical studies. The reader will, we believe, quickly see the breadth of the methods.
Volume 2 contains 16 tutorials devoted to the analysis of complex medical data. First, we present tutorials relevant to single data sets. Seven tutorials give extensive introductions to and discussions of generalized estimating equations, hierarchical modelling and mixed modelling. A tutorial on likelihood methods closes the discussion of single data sets. Next, two extensive tutorials cover the concepts of meta-analysis, ranging from the simplest conception of a ÿxed e ects model to random e ects models, Bayesian modelling and highly involved models involving multivariate regression and meta-regression. Genetic data methods are covered in the next three tutorials. Statisticians must become familiar with the issues and methods relevant to genetics. These tutorials o er a good starting point. The next two tutorials deal with the major task of data reduction for functional magnetic resonance imaging data and disease mapping data, covering the complex data methods required by multivariate data. Complex and thorough statistical analyses are of no use if researchers cannot present results in a meaningful and usable form to audiences beyond those who understand statistical methods and complexities. Reader should ÿnd the methods for presenting such results discussed in the ÿnal tutorial simple to understand.
Before closing this preface to the two volumes we must state a disclaimer. Not all the tutorials that are in these two volumes appeared as tutorials. Three were regular articles. These are in the spirit of tutorials and ÿt well within the theme of the volumes.
We hope that readers enjoy the tutorials and ÿnd them beneÿcial and useful.