Reading Research: Does Size Matter?

Following up from Reading Research part one where I review key aspects of the book Experimental Psychology by Myers and Hansen, I thought it would be interesting to tackle the question, “Does size matter?”  Of course, by size, I mean the sample size in a research study – often referred to as “n.”

Many introductory books make the point that small sample sizes are a red flag.  Small samples might provide flawed information.  A small group might be comprised of unique or unusual individuals – subjects who do not reflect the majority of the population.

Yes, Skinner, undoubtedly one of the most well known names in psychology did experiments with only a few subjects.  His work is held in high regard for being tightly controlled.  Much of his work has held up over many decades.

skinner

That presents quite a discrepancy to resolve.  Large samples are purportedly good, yet Skinner’s exceptionally controlled research used small samples.  Assuming that both premises are true, an explanation must exist.

Generally, large samples are beneficial.  One reason is that large samples are more likely to reflect the whole of the population.  There is another reason illustrated by the following example.

Let us pretend that we want to know if a dog training technique is better than another is.  We randomly divide the dogs into two groups.  Group A learns a task with our technique.  Group B learns with a different technique.  We train each group of dogs and compare the results.  This research design is called a two independent group design.

Statistics analyze the data. A standard t-test is a probable choice for this study.  Each statistical test makes assumptions in its calculations.  Standard t-tests assume that the data we are collecting creates a normal curve (bell curve.)  Without enough participants, there isn’t enough data to make a solid, fully formed bell curve.  As the diagram below shows, without a fully formed curve, it is impossible to compare if the curves are similar or different from one another.  Standard t-tests generally require at least twenty subjects in each group – but thirty is better.

normal curve

There are other forms of research.  Some researchers prefer to focus on details.  These details can be lost when data is pooled or averaged.  Instead of blending the results of many subjects, “small n” researchers focus intently at the individual responses of a few.

Such an approach can offer key insights.  For example, if we measured how dogs learn new skills, blended results might create a gentle sloping curve.  Individual results could paint a jagged process – breakthroughs and setbacks.

average graph

There are a number of low n experiments including ABA; Multiple Baseline Design; Changing Criterion Designs and Discrete Trial Designs.

Here are a couple examples to show how some of these processes work.

In an ABA design, the subject acts as both the experimental and control group.  Assume that we want to test a new anxiety treatment.  A baseline is measured during phase one (The first A in ABA).  Treatment is then given during the second phase (B).  Finally, treatment is discontinued (A).  ABA design allows us to see if the treatment has an effect.  We can also see if results disappeared when treatment stopped.  There are many variations of the ABA design such as ABABA, ABACADA and so on.  The reversals allow researchers to see if the order of treatment is having an impact rather than the actual treatment.

Discrete trials are common in conditioning experiments.  For example, we might want to know if dogs discriminate sound better with one ear versus the other.  In other words, we want to know if dogs are left or right “eared.”  Dogs learn to discriminate a tone.  Probe tones are presented to the left or right ear.  The dog’s responses – how quickly they discriminate on either ear is measured.  Comparisons are made.  A response is measured over many treatment conditions.  In this case, hearing is measured across a number of manipulations.  Humans who participated in a similar experiment each performed over 2000 trials.  The sample might be small, but the volume of data is massive.  It requires meticulous record keeping and data analysis.

The question should not be “does size matter?”  That is an overly simplistic question.

Of course, size matters.  Bigger is not always better when it comes to sample sizes.  What matters is whether the size of the sample works with the type of study and the statistical analysis used.

The various types of research are like tools.  A hammer is no better or worse than a screwdriver.  Using a hammer to drive a screw is fraught with problems.  It is similar with studies.  Different types of research serve a different purpose – they need to be used correctly.  Keep looking at sample sizes.  Also, look to see if that sample matches the type of research.  It can be helpful to grab a few studies, look up the sample size and look up the type of study.  Start becoming familiar with the jargon.

I would highly recommend Experimental Psychology to anyone wanting a deeper understanding.  My blogs are just highlighting a few small sections.  Well worth the investment.

Part one on reading research:  Internal Validity can be found here.
Part three – coming soon.