Design of Experiments#

Related modules

mqr.doe

Detailed examples

nklsxn/mqr-guide

Introduction#

The goal of experimental design is to efficiently collect enough data to make conclusions about a particular set of questions. In the case of ANOVA, the question is whether subsets of a sample have different means. In the case of regression, the question is whether or not the terms in an equation have non-zero coefficients.

While it is possible to come up with bespoke null-hypotheses and then derive the distributions of related statistics, many problems can be answered with a set of standard designs and their corresponding null-hypotheses. MQR focuses on the standard designs that can be analysed with linear regression. So, while ANOVA and regression are data analysis tools, design of experiments is about data collection. Standard factorial, fractional factorial and central composite designs are demonstrated below. In terms of design and process parameters, possibly multiple parameters, these experimental designs identify

  • main effects, which are linear terms like \(x\) in \(f(x, y) = a x + b y + c\),

  • interactions which are non-linear terms involving variables that change one another’s effect on response like \(x y\) in \(f(x, y) = a x + b y + c x y + d\), and

  • quadratic effects which are second-order non-linearities within a single variable (or curvature along an axis) like \(y^2\) in \(f(x, y) = a x^2 + b y^2 + c x y + d\).

Additionally, MQR provides tools to randomise runs, partly randomise runs (as in split-plot designs), and label point types to allow screening for curvature.

The main features of this module are:

  • easy concatenation of test runs, allowing screening to be expanded to more detailed tests,

  • labelling with point types and block labels, for easy management in analysis,

  • transforming from coded variables to physical values or category labels, to assist experimental technique,

  • easy randomisation, and

  • creation of standard designs (factorial, etc.) using pyDOE3 functions.

For further information about pyDOE3, see here.

Experimental workflow#

The module is designed to fit into the Improve part of a DMAIC approach:

  • (Define the process with process maps and FMEA, mqr.process, resulting in experimental variables.)

  • (Measure the response/dimension of interest and check the characteristics of the measurement system mqr.msa.)

  • (Analyse the experimental data to see if the process is capable: mqr.process and mqr.inference.)

  • Improve the process by showing which variables give the desired response.

    1. Design an experiment using the tools in mqr.doe (or pyDOE3 directly), exercising the variables identified in the earlier steps.

    2. Randomise the runs, possibly in blocks.

    3. Save the design to a design file, and append the experimental observations as new columns to make an experiment file.

    4. (optional) Instead of creating a new file, get the DataFrame version of a design and enter data directly into the notebook as extra columns on the DataFrame: design['Response'] = np.array([...]).

    5. Load the experiment file (if you created one) ready for analysis with ANOVA and regression tools (mqr.anova).

    6. Make changes to the process, guided by the model.

    7. Verify that the response changed using an appropriate statistical test: mqr.inference.

  • (Control the processes capability using tools like mqr.spc.)

Experimental designs#

The main type is mqr.doe.Design. The Design type is a class that looks a lot like a pandas.DataFrame. The difference is that most operations in mqr.doe treat the various columns (“Runs”, “Blocks”, etc.) in particular ways, and to manage that treatment, Design was not implemented as a single DataFrame. To get the final DataFrame, call the mqr.doe.Design.to_df method. When displayed in a notebook,Designs are displayed like the DataFrame that would result from calling Design.to_df(). The examples below show how to create basic experimental designs.

Note that the convenience functions from_* reflect the interface of pyDOE3: mqr.doe.Design.from_ccdesign calls ccdesign. Alternatively, call those functions directly and pass the levels to mqr.doe.Design.from_levels.

Full factorial designs#

Full factorical designs test every level of every variable and all interactions.

names = ['x1', 'x2', 'x3']
levels = [2, 3, 2]

mqr.doe.Design.from_fullfact(names, levels)
x1 x2 x3
1 -1.0 -1.0 -1.0
2 1.0 -1.0 -1.0
3 -1.0 0.0 -1.0
4 1.0 0.0 -1.0
5 -1.0 1.0 -1.0
6 1.0 1.0 -1.0
7 -1.0 -1.0 1.0
8 1.0 -1.0 1.0
9 -1.0 0.0 1.0
10 1.0 0.0 1.0
11 -1.0 1.0 1.0
12 1.0 1.0 1.0

Fractional factorial designs#

Fractional factorial designs use Yates labels to express desired aliasing. For example, this design has every combination of two levels for x1 and x2, but counfounds x3 with the x1 * x2 interaction.

names = ['x1', 'x2', 'x3']
generator = 'a b ab'

mqr.doe.Design.from_fracfact(names, generator)
PtType x1 x2 x3
1 1 -1.0 -1.0 1.0
2 1 1.0 -1.0 -1.0
3 1 -1.0 1.0 -1.0
4 1 1.0 1.0 1.0

A screening design that checks for curvature#

This design adds three centre points to a \(2^2\) full-factorial design to check for curvature. The plus operator concatenates runs. The mean of the centre points in this design can be compared to the mean of the corner points using an appropriate hypothesis test (like mqr.inference.mean.test_2sample), or the centre points can be given a separate block label that is used as a categorical variable in ANOVA.

Note that corner points have type 1 and centre points have type 0.

names = ['x1', 'x2']
levels = [2, 2]

fullfact = mqr.doe.Design.from_fullfact(names, levels)
centres = mqr.doe.Design.from_centrepoints(names, 3)

fullfact + centres # '+' concatenates the runs from each design
PtType x1 x2
1 1 -1.0 -1.0
2 1 1.0 -1.0
3 1 -1.0 1.0
4 1 1.0 1.0
5 0 0.0 0.0
6 0 0.0 0.0
7 0 0.0 0.0

Central composite design#

Assuming curvature was detected or is known to exist from physical reasoning, this central composite design characterises all main effects, interactions and quadratic curvature along both axes. The axial points (points lying on an axis) are designed to quantify quadratic effects. Axial points are always labelled with point type 2.

The \(\sqrt{2}\) magnitude of the axial points is deliberate and creates desirable properties in the variance of the predicted response. See “sqhericity” in [1] and [2] for more details.

The centres argument adds centre points, which are useful for estimating error, and therefore increase the power of the ANOVA statistical tests. This adds three centre point runs following the factorial corner points, and another three centre point runs following the axial runs.

names = ['x1', 'x2']
centres = (3, 3)

mqr.doe.Design.from_ccdesign(names, centres)
PtType x1 x2
1 1 -1.000000 -1.000000
2 1 1.000000 -1.000000
3 1 -1.000000 1.000000
4 1 1.000000 1.000000
5 0 0.000000 0.000000
6 0 0.000000 0.000000
7 0 0.000000 0.000000
8 2 -1.414214 0.000000
9 2 1.414214 0.000000
10 2 0.000000 -1.414214
11 2 0.000000 1.414214
12 0 0.000000 0.000000
13 0 0.000000 0.000000
14 0 0.000000 0.000000

Axial points#

Central composite designs can be constructed manually using factorial designs, centre points and axial points. This example constructs the central composite design from the previous example. Here, though, the as_block method allows corner and axial points to be checked in separate runs.

names = ['x1', 'x2']
levels = [2, 2]

fullfact = mqr.doe.Design.from_fullfact(names, levels)
axial = mqr.doe.Design.from_axial(names, magnitude=np.sqrt(2))
centres = mqr.doe.Design.from_centrepoints(names, 3)

fullfact.as_block(1) + centres.as_block(1) + axial.as_block(2) + centres.as_block(2)
PtType Block x1 x2
1 1 1 -1.000000 -1.000000
2 1 1 1.000000 -1.000000
3 1 1 -1.000000 1.000000
4 1 1 1.000000 1.000000
5 0 1 0.000000 0.000000
6 0 1 0.000000 0.000000
7 0 1 0.000000 0.000000
8 2 2 -1.414214 0.000000
9 2 2 1.414214 0.000000
10 2 2 0.000000 -1.414214
11 2 2 0.000000 1.414214
12 0 2 0.000000 0.000000
13 0 2 0.000000 0.000000
14 0 2 0.000000 0.000000

Custom designs#

MQR does not expose the latin hypercube or Box-Behnken designs from pyDOE3, but they can be constructed easily. This example calls pyDOE3 directly, resulting in an np.array of levels. The design is a Box-Behnken design for three factors, and contains no centre points.

import pyDOE3

names = ['x1', 'x2', 'x3']
levels = pyDOE3.bbdesign(len(names), 0)

mqr.doe.Design.from_levels(names, levels)
x1 x2 x3
1 -1.0 -1.0 0.0
2 1.0 -1.0 0.0
3 -1.0 1.0 0.0
4 1.0 1.0 0.0
5 -1.0 0.0 -1.0
6 1.0 0.0 -1.0
7 -1.0 0.0 1.0
8 1.0 0.0 1.0
9 0.0 -1.0 -1.0
10 0.0 1.0 -1.0
11 0.0 -1.0 1.0
12 0.0 1.0 1.0

Practicalities#

These features help with the practicalities of running experiments.

Replication#

Runs can be replicated.

names = ['x1', 'x2']
levels = [2, 2]

design = mqr.doe.Design.from_fullfact(names, levels)
design.replicate(3)
PtType x1 x2
1 1 -1.0 -1.0
2 1 -1.0 -1.0
3 1 -1.0 -1.0
4 1 1.0 -1.0
5 1 1.0 -1.0
6 1 1.0 -1.0
7 1 -1.0 1.0
8 1 -1.0 1.0
9 1 -1.0 1.0
10 1 1.0 1.0
11 1 1.0 1.0
12 1 1.0 1.0

The optional argument label adds a column that labels replicates. This might be helpful for managing eg. split-plot structure if experimental constraints restrict randomisation of replicates.

names = ['x1', 'x2']
levels = [2, 2]

design = mqr.doe.Design.from_fullfact(names, levels)
design.replicate(3, label='Rep')
PtType Rep x1 x2
1 1 1 -1.0 -1.0
2 1 2 -1.0 -1.0
3 1 3 -1.0 -1.0
4 1 1 1.0 -1.0
5 1 2 1.0 -1.0
6 1 3 1.0 -1.0
7 1 1 -1.0 1.0
8 1 2 -1.0 1.0
9 1 3 -1.0 1.0
10 1 1 1.0 1.0
11 1 2 1.0 1.0
12 1 3 1.0 1.0

Randomisation#

Randomise the rows of a pd.DataFrame by calling mqr.doe.Design.randomise_runs. Randomisation uses the numpy random number generator, so seeding that generator will seed the MQR randomisation. Blocks or factor levels can be kept in order by passing the name of the blocks or factors as arguments.

This example is the blocked central composite design from above. The runs within each block are randomised, or put another way, the blocks are ordered.

names = ['x1', 'x2']
levels = [2, 2]

fullfact = mqr.doe.Design.from_fullfact(names, levels)
axial = mqr.doe.Design.from_axial(names, magnitude=np.sqrt(2))
centres = mqr.doe.Design.from_centrepoints(names, 3)
design = fullfact.as_block(1) + centres.as_block(1) + axial.as_block(2) + centres.as_block(2)

# Seeding with 0 for repeatable docs;
# in practise, use something like the date-time or a randomly generated seed.
np.random.seed(0)
design.randomise_runs('Block')
PtType Block x1 x2
7 0 1 0.000000 0.000000
5 0 1 0.000000 0.000000
3 1 1 -1.000000 1.000000
2 1 1 1.000000 -1.000000
4 1 1 1.000000 1.000000
1 1 1 -1.000000 -1.000000
6 0 1 0.000000 0.000000
9 2 2 1.414214 0.000000
12 0 2 0.000000 0.000000
14 0 2 0.000000 0.000000
10 2 2 0.000000 -1.414214
8 2 2 -1.414214 0.000000
11 2 2 0.000000 1.414214
13 0 2 0.000000 0.000000

Transforms#

Writing down exactly which values correspond to each level is convenient for careful experimental technique. These values can be read off a screen or printed while conducting an actual experiment.

First, define a transform that maps the levels that correspond to each label (when mqr.doe constructs a transform from labels like below, it assumes the transform is affine). The transforms can be callable objects like a lambda, function or mqr.doe.Transorm. Or they can be dicts that give a mapped value for every coded level.

This example transforms x1 with an affine transform, x2 with an affine transform that is inversely proportional to the coded variable, and x4 with a categorical value. The factor x3 is left in coded units.

from mqr.doe import Transform

names = ['x1', 'x2', 'x3', 'x4']
levels = [2, 2, 2, 2]
design = mqr.doe.Design.from_fullfact(names, levels)

transforms = {
    'x1': Transform.from_map({-1:100, 1:110}),
    'x2': lambda x: -x + 5,
    'x4': {-1: 'low', 1: 'high'},
}
design.transform(**transforms)
PtType x1 x2 x3 x4
1 1 100.0 6.0 -1.0 low
2 1 110.0 6.0 -1.0 low
3 1 100.0 4.0 -1.0 low
4 1 110.0 4.0 -1.0 low
5 1 100.0 6.0 1.0 low
6 1 110.0 6.0 1.0 low
7 1 100.0 4.0 1.0 low
8 1 110.0 4.0 1.0 low
9 1 100.0 6.0 -1.0 high
10 1 110.0 6.0 -1.0 high
11 1 100.0 4.0 -1.0 high
12 1 110.0 4.0 -1.0 high
13 1 100.0 6.0 1.0 high
14 1 110.0 6.0 1.0 high
15 1 100.0 4.0 1.0 high
16 1 110.0 4.0 1.0 high

Saving to a file#

There are a few options for saving designs to files. If the design is going to a file/database, use python’s pickle library, or some similar serialisation.

Below is an example that saves the design to a CSV or Excel file. The index_label argument in DataFrame.to_csv(...) tells Pandas to include the index column with the given name.

names = ['x1', 'x2']
levels = [2, 2]
design = mqr.doe.Design.from_fullfact(names, levels)

# np.random.randint(0, 2**32-1)
np.random.seed(1294194915) # Randomly generated seed
frozen_design = design.randomise_runs().to_df()

# Run these to create the actual files
# frozen_design.to_csv(
#     'doe-section6-1294194915.csv',
#     index_label='run')
# frozen_design.to_excel(
#     'doe-section6-1294194915.xlsx',
#     index_label='run')

References#