This task view collects information on R packages for experimental design
and analysis of data from experiments. Please feel free to
suggest enhancements, and please send information on new packages or major
package updates if you think they belong here. Contact details are given on my
Web page
.
Experimental design is applied in many areas, and methods have been tailored
to the needs of various fields. This task view starts out with a section on
the most general packages, continues with specific sections on agricultural and
industrial experimentation, computer experiments, and experimentation in the
clinical trials contexts, and closes with a section on various special
experimental design packages that have been developed for other specific purposes.
Of course, the division into fields is not always clear-cut, and some packages from
the more specialized sections can also be applied in general contexts.
You may also notice that my own experience is mainly from industrial experimentation
(in a broad sense), which may explain a somewhat biased view on things.
Experimental designs for general purposes
There are a few packages for creating and analyzing experimental designs
for general purposes: First of all, the standard (generalized) linear model
functions in the base package stats are of course very important for analyzing
data from designed experiments (especially functions
lm(),
aov()
and the methods and functions for the resulting linear model objects). These are
concisely explained in Kuhnert and Venables (2005, p. 109 ff.); Vikneswaran (2005)
points out specific usages for experimental design (using function
contrasts(),
multiple comparison functions and some convenience functions like
model.tables(),
replications()
and
plot.design()). Lalanne (2009)
provides an R companion to the well-known book by Montgomery (2005); he so far covers the first
few chapters only and (understandably!) does not keep pace with the fast development
of R regarding experimental design facilities.
GAD
handles
general balanced analysis of variance models with fixed and/or random effects
and also nested effects (the latter can only be random); they quote Underwood 1997 for this work.
The package is quite valuable, as many
users have difficulties with using the R packages for handling random or mixed effects.
granova
offers some interesting non-standard graphical representations for results of simply-structured
experiments (one-way and two-way layouts, paired data).
-
Package
AlgDesign
creates full
factorial designs with or without additional quantitative variables, creates mixture
designs (i.e., designs where the levels of factors sum to 1=100%; lattice designs are created only) and creates
D-, A-, or I-optimal designs exactly or approximately.
NOTE: Bob Wheeler, the author
and maintainer of
AlgDesign, would like to retire from this job and is looking
for an "heir" whom he can entrust with continuing the package. Please contact Bob, if you
are interested.
-
Package
conf.design
allows
to create a design with certain interaction effects confounded with blocks (function
conf.design()) and allows to combine existing designs in several ways
(e.g., useful for Taguchi's inner and outer array designs in industrial experimentation).
-
Package
planor
allows
to generate regular fractional factorial designs with fixed and mixed levels
and quite flexible randomization structures. The packages flexibility
comes at the price of a certain complexity and - for larger designs - high computing time.
-
Package
crossdes
creates and analyses cross-over designs of various types (including
latin squares, mutually orthogonal latin squares and Youden squares) that can for example
be used in sensometrics.
-
Package
DoE.base
provides full factorial designs with or without blocking
(function
fac.design) and orthogonal arrays (function
oa.design)
for main effects experiments
(those listed by Kuhfeld 2009 up to 144 runs, plus a few additional ones).
There is also some experimental functionality
for assessing the quality of orthogonal arrays.
Package
DoE.base
also forms the basis of a suite of related packages (cf. Groemping 2009).
Together with
FrF2
(cf. above) and
DoE.wrapper, it provides the work horse
of the GUI package
RcmdrPlugin.DoE
(beta version; tutorial available in Groemping 2011), which integrates
design of experiments functionality into the R-Commander (package "Rcmdr", Fox 2005)
for the benefit of those R users who cannot or do not want to do command line programming.
The role of package
DoE.wrapper
in that suite is to wrap
functionality from other packages into the input and output structure of the package suite
(so far for response surface designs with package
rsm
(cf. also below),
design of computer experiments with packages
lhs
and
DiceDesign
(cf. also below),
and , and D-optimal designs with package
AlgDesign
(cf. also above).
-
Package
dae
provides various utility functions around experimental design
and R factors, e.g. a randomization routine that can handle various nested structures
(according to Bailey 1981) and functions for combining several factors into one
or dividing one factor into several factors.
Furthermore, the package provides features for post-processing
objects returned by the
aov()
function, e.g. extraction of Yates effects
for 2-level experiments.
-
blockTools
assigns units to blocks in order to end up with homogeneous sets
of blocks in case of too small block sizes.
Experimental designs for agricultural and plant breeding experiments
agricolae
offers extensive functionality on experimental design
especially for agricultural and plant breeding experiments, which can also be useful
for other purposes. It supports
planning
of lattice designs, factorial designs,
randomized complete block designs, completely randomized designs,
(Graeco-)Latin square designs, balanced incomplete block designs and alpha designs.
There are also various
analysis
facilities for experimental data, e.g. treatment
comparison procedures and several non-parametric tests, but also some quite specialized
possibilities for specific types of experiments.
Experimental designs for industrial experiments
Some further packages especially handle designs for industrial experiments
that are often highly fractionated, intentionally confounded and have few extra degrees
of freedom for error.
Fractional factorial 2-level designs are particularly important in industrial
experimentation.
-
Package
FrF2
is the most comprehensive R package for
their creation. It generates regular Fractional Factorial
designs for factors with 2 levels as well as Plackett-Burman type screening designs.
Regular fractional factorials default to maximum resolution minimum aberration designs
and can be customized in various ways, supported by an
incorporated catalogue of designs (including the designs catalogued by Chen, Sun and Wu 1993,
and further larger designs catalogued in Block and Mee 2005 and Xu 2009;
the additional package
FrF2.catlg128
provides a very large complete catalogue
for resolution IV 128 run designs with up to 23 factors for special purposes).
Analysis-wise,
FrF2
provides simple graphical analysis tools (normal and half-normal effects plots
(modified from
BsMD, cf. below), main effects
plots and interaction plot matrices similar to those in Minitab software, and a cube
plot for the combinations of three factors). It can also show the alias structure
for regular fractional factorials of 2-level factors, regardless whether they have been
created with the package or not.
Fractional factorial 2-level plans can also be created by other R packages,
namely
BHH2
and
qualityTools
(but do not use function pbDesign from
version 1.54 of that package!), or with a little bit more complication
by packages
conf.design,
planor
or
AlgDesign.
-
Package
BHH2
accompanies the 2nd edition of the book by Box, Hunter and Hunter
and provides various of its data sets. It can generate full and fractional factorial
two-level-designs from a number of factors and a list of defining relations
(function
ffDesMatrix(), less comfortable than package FrF2).
It also provides several functions for analyzing data from 2-level factorial
experiments: The function anovaPlot assesses effect sizes relative to residuals, and
the function
lambdaPlot()
assesses the effect of Box-Cox transformations on
statistical significance of effects.
-
BsMD
provides Bayesian charts as
proposed by Box and Meyer (1986) as well as effects plots (normal, half-normal and
Lenth) for assessing which effects are active in a fractional factorial experiment
with 2-level factors.
Apart from tools for planning and analysing factorial designs, R also offers support for
response surface optimization for quantitative factors (cf. e.g. Myers and Montgomery 1995):
-
Package
rsm
supports sequential
optimization with first order and second order response surface models (central composite
or Box-Behnken designs), offering
optimization approaches like steepest ascent and visualization of the response
function for linear model objects. Also, coding for response surface investigations is
facilitated.
-
Package
DoE.wrapper
enhances design creation from package
rsm
with the possibilities of automatically choosing the cube portion of central
composite designs and of augmenting
an existing (fractional) factorial 2-level design with a star portion.
-
Package
Vdgraph
implements a variance dispersion graph (Vining 1993) for response
surface designs created by package
rsm.
-
Package
qualityTools
can also create central composite designs
and can visualize response surfaces.
In some industries, mixtures of ingredients are important; these require special designs,
because the quantitative factors have a fixed total.
Mixture designs are handled by packages
AlgDesign
(function
gen.mixture,
lattice designs),
qualityTools
(function
mixDesign,
lattice designs and simplex centroid designs), and
mixexp
(several small functions for simplex centroid,
simplex lattice and extreme vertices designs as well as for plotting).
Occasionally, supersaturated designs can be useful.
The two small packages
mkssd
and
mxkssd
provide fixed level and mixed level
k-circulant supersaturated designs.
Experimental designs for computer experiments
Computer experiments with quantitative factors require special types of
experimental designs: it is often possible to include many different
levels of the factors, and replication will usually not be beneficial. Also, the
experimental region is often too large to assume that a linear or quadratic model adequately
represents the phenomenon under investigation. Consequently, it is desirable to fill
the experimental space with points as well as possible (space-filling designs) in such
a way that each run provides additional information even if some factors turn out to be
irrelevant.
The
lhs
package provides latin hypercube designs for this purpose.
Furthermore, the package provides ways to analyse such computer experiments with
emphasis on what follow-up experiments to conduct. Another package with similar orientation
is the
DiceDesign
package, which adds further ways to construct space-filling
designs and some measures to assess the quality of designs for computer experiments. The
package
DiceKriging
provides the kriging methodology which is often used for
creating meta models from computer experiments, the package
DiceEval
creates
and evaluates meta models (among others Kriging ones), and the package
DiceView
provides facilities for viewing sections of multidimensional meta models.
Package
tgp
is another package dedicated to planning and analysing
computer experiments. Here, emphasis is on Bayesian methods.
The package can for example be used with various kinds of (surrogate) models for
sequential optimization, e.g. with an expected improvement criterion for optimizing a noisy
blackbox target function. Packages
plgp
and
dynaTree
enhance the
functionality offered by
tgp
with particle learning facilities and learning for
dynamic regression trees.
Package
BatchExperiments
is also designed for computer
experiments, in this case specifically for experiments with algorithms to be run
under different scenarios. The package is described in a technical report by
Bischl et al. (2012).
Experimental designs for clinical trials
This task view only covers specific design of experiments packages; there may be some
grey areas. Please, also consult the
ClinicalTrials
task view.
-
experiment
contains tools for clinical experiments,
e.g., a randomization tool, and it provides a few special analysis options for clinical
trials.
-
Package
gsDesign
implements group sequential designs,
-
Package
gsbDesign
evaluates operating characteristics for group sequential Bayesian designs,
-
package
asd
implements adaptive sequential designs.
-
Package
TEQR
provides toxicity equivalence range designs (Blanchard and Longmate 2010) for phase I clinical trials.
-
The
DoseFinding
package provides functions for the design and analysis
of dose-finding experiments (for example pharmaceutical Phase II clinical trials);
it combines the facilities of the "MCPMod" package (maintenance discontinued;
described in Bornkamp, Pinheiro and Bretz 2009) with a special type of optimal designs for
dose finding situations (MED-optimal designs, or D-optimal designs, or a mixture of both;
cf., Dette et al. 2008).
Experimental designs for special purposes
Various further packages handle special situations in experimental design:
-
Package
desirability
provides ways to combine several target criteria into a desirability function in order to simplify
multi-criteria analysis; desirabilities are also offered as part of package
qualityTools.
-
ldDesign
suggests appropriate designs for linkage equilibrium studies,
-
odprism
is an acronym for optimal design and performance of random intercept and slope models,
-
osDesign
designs studies nested in observational studies,
-
qtlDesign
is for quantitative trait locus designs,
-
package
SensoMineR
contains special designs for
sensometric studies, e.g., for the triangle test.
-
Package
support.CEs
provides tools for creating stated choice designs
for market research investigations.
Key references for packages in this task view
-
Atkinson, A.C. and Donev, A.N. (1992).
Optimum Experimental Designs
.
Oxford: Clarendon Press.
-
Bailey, R.A. (1981). A unified approach to design of experiments.
Journal of the Royal Statistical Society, Series A
144
, 214-223.
-
Ball, R.D. (2005). Experimental Designs for Reliable Detection of Linkage
Disequilibrium in Unstructured Random Population Association Studies.
Genetics
170
, 859-873.
-
Bischl, B., Lang, M., Mersmann, O., Rahnenfuehrer, J. and Weihs, C. (2012).
Computing on high performance clusters with R: Packages BatchJobs and
BatchExperiments
.
Technical Report 1/2012
, TU Dortmund, Germany.
-
Blanchard, M.S. and Longmate, J.A. (2010).
Toxicity equivalence range design (TEQR): A practical Phase I design.
Contemporary Clinical Trials
, doi:10.1016/j.cct.2010.09.011.
-
Block, R. and Mee, R. (2005). Resolution IV Designs with 128 Runs.
Journal of Quality Technology
37
, 282-293.
-
Bornkamp B., Pinheiro J. C., and Bretz, F. (2009).
MCPMod: An R Package for the Design and Analysis of Dose-Finding Studies
.
Journal of Statistical Software
29
(7), 1-23.
-
Box G. E. P, Hunter, W. C. and Hunter, J. S. (2005).
Statistics for Experimenters
(2nd edition). New York: Wiley.
-
Box, G. E. P and R. D. Meyer (1986). An Analysis for Unreplicated Fractional
Factorials.
Technometrics
28
, 11-18.
-
Box, G. E. P and R. D. Meyer (1993). Finding the Active Factors in Fractionated Screening
Experiments.
Journal of Quality Technology
25
, 94-105.
-
Chasalow, S., Brand, R. (1995). Generation of Simplex Lattice Points.
Journal of the Royal Statistical Society, Series C
44
, 534-545.
-
Chen, J., Sun, D.X. and Wu, C.F.J. (1993). A catalogue of 2-level and 3-level orthogonal arrays.
International Statistical Review
61
, 131-145.
-
Collings, B. J. (1989). Quick Confounding.
Technometrics
31
, 107-110.
-
Cornell, J. (2002).
Experiments with Mixtures
. Third Edition. Wiley.
-
Daniel, C. (1959). Use of Half Normal Plots in Interpreting Two Level Experiments.
Technometrics
1
, 311-340.
-
Derringer, G. and Suich, R. (1980). Simultaneous Optimization of Several Response Variables.
Journal of Quality Technology
12
, 214-219.
-
Dette, H., Bretz, F., Pepelyshev, A. and Pinheiro, J. C. (2008).
Optimal Designs for Dose Finding Studies.
Journal of the American Statisical Association
103
,
1225-1237.
-
Federov, V.V. (1972).
Theory of Optimal Experiments
. Academic Press, New York.
-
Fox, J. (2005).
The R Commander:
A Basic-Statistics Graphical User Interface to R
.
Journal of Statistical Software
14
(9), 1-42.
-
Gramacy, R.B. (2007).
tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models
.
Journal of Statistical Software
19
(9), 1-46.
-
Groemping, U. (2009).
Design of Experiments in R
. Presentation at UseR! 2009 in Rennes, France.
-
Groemping, U. (2011).
Tutorial for designing experiments using the R package RcmdrPlugin.DoE
.
Reports in Mathematics, Physics and Chemistry
,
Department II, Beuth University of Applied Sciences Berlin.
-
Hoaglin D., Mosteller F. and Tukey J. (eds., 1991).
Fundamentals of Exploratory Analysis of Variance
.
Wiley, New York.
-
Jones, B. and Kenward, M.G. (1989).
Design and Analysis of Cross-Over Trials
. Chapman and
Hall, London.
-
Johnson, M.E., Moore L.M. and Ylvisaker D. (1990). Minimax and maximin distance designs.
Journal of Statistical Planning and Inference
,
26
, 131-148.
-
Kuhfeld, W. (2009). Orthogonal arrays. Website courtesy of SAS Institute Inc., accessed August 4th 2010.
URL
http://support.sas.com/techsup/technote/ts723.html
.
-
Kuhnert, P. and Venables, B. (2005)
An Introduction to R: Software for Statistical
Modelling & Computing
. URL
http://CRAN.R-project.org/doc/contrib/Kuhnert+Venables-R_Course_Notes.zip
.
(PDF document (about 360 pages) of lecture notes in combination with the data sets and R scripts)
-
Kunert, J. (1998). Sensory Experiments as Crossover Studies.
Food Quality and Preference
9
, 243-253.
-
Lalanne, C. (2009). R Companion to Montgomerys Design and Analysis of Experiments.
Manuscript, downloadable at URL
http://www.aliquote.org/articles/tech/dae/dae.pdf
.
(The file accompanies the book by Montgomery 2005 (cf. below).)
-
Lenth, R.V. (1989). Quick and Easy Analysis of Unreplicated Factorials.
Technometrics
31
, 469-473.
-
Lenth, R.V. (2009).
Response-Surface Methods in R, Using rsm
.
Journal of Statistical Software
32
(7), 1-17.
-
Mee, R. (2009).
A Comprehensive Guide to Factorial Two-Level Experimentation
.
New York: Springer.
-
Montgomery, D. C. (2005, 6th ed.).
Design and Analysis of Experiments
. New York: Wiley.
-
Myers, R. H. and Montgomery, D. C. (1995).
Response Surface Methodology: Process and Product
Optimization Using Designed Experiments
. New York: Wiley.
-
Plackett, R.L. and Burman, J.P. (1946). The design of optimum multifactorial experiments.
Biometrika
33
, 305-325.
-
Rosenbaum, P. (1989). Exploratory Plots for Paired Data.
The American Statistician
43
, 108-109.
-
Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989). Design and analysis of computer experiments.
Statistical Science
4
, 409-435.
-
Santner T.J., Williams B.J. and Notz W.I. (2003).
The Design and Analysis of Computer Experiments
.
Springer, New York.
-
Sen S, Satagopan JM and Churchill GA (2005). Quantitative Trait Locus Study Design from an Information
Perspective.
Genetics
170
, 447-464.
-
Stein, M. (1987). Large Sample Properties of Simulations Using Latin Hypercube Sampling.
Technometrics
29
, 143-151.
-
Stocki, R. (2005). A Method to Improve Design Reliability Using Optimal Latin Hypercube Sampling.
Computer Assisted Mechanics and Engineering Sciences
12
, 87-105.
-
Underwood, A.J. (1997).
Experiments in Ecology: Their Logical Design and Interpretation Using Analysis of Variance.
Cambridge University Press, Cambridge.
-
Vikneswaran (2005).
An R companion to "Experimental Design".
URL
http://CRAN.R-project.org/doc/contrib/Vikneswaran-ED_companion.pdf
.
(The file accompanies the book "Experimental Design with Applications in Management, Engineering
and the Sciences" by Berger and Maurer, 2002.)
-
Vining, G. (1993). A Computer Program for Generating Variance Dispersion Graphs.
Journal of Quality Technology
25
, 45-58. Corrigendum in the same volume, pp. 333-335.
-
Xu, H. (2009). Algorithmic Construction of Efficient Fractional Factorial Designs With Large Run Sizes.
Technometrics
51
, 262-277.