Beyond pedometers: minutes of physical activity • Name:
Week of
Minutes
Mon 30 Jun 08
139
Mon 07 Jul 08
189
Mon 14 Jul 08
239
Mon 21 Jul 08
137
Mon 28 Jul 08
302
Mon 04 Aug 08
143
Mon 11 Aug 08
348
Mon 18 Aug 08
103
Mon 25 Aug 08
224
Mon 01 Sep 08
165
Mon 08 Sep 08
226
Mon 15 Sep 08
186
Mon 22 Sep 08
234
Mon 29 Sep 08
293
Mon 06 Oct 08
113
Mon 13 Oct 08
358
Mon 20 Oct 08
133
Mon 27 Oct 08
322
Mon 03 Nov 08
152
Mon 10 Nov 08
346
Part I: Basic Statistics
In October 2008 the
United States Department of Health and Human Services
released the first set of
recommendations on physical activity.
The
basic recommendation
is for adults to get 150 minutes of moderately intense activity per week.
The second column of the table indicates the number of minutes of
running per week done by Lee Ling.
Running is one form of intense activity. There are many
others including walking, engaging in sports, strength training, any
activity that raises your heart rate and gets you sweating for at least
twenty minutes a day.
The intensity level of activities varies with the activity and the fitness
of the person. Running is noted to be a vigorous activity in an
appendix to the main report.
Use the number of minutes of running per week (Minutes) for the following
basic statistics.
_________ Calculate the sample standard deviation sx.
_________ Calculate the sample Coefficient of Variation.
_________ Determine the class width. Use five classes (bins or intervals).
Fill in the following table with the class upper limits in the first column,
the frequencies in the second column, and the relative frequencies in the third column
Bins (x)
Frequency f
RF p(x)
Sums:
Sketch a histogram of the relative frequency data.
__________________ What is the shape of the distribution?
__________________
On 19 July 2008 I ran in the INS Half-Marathon.
That run occurred during the week of 14 July, a week with 239 minutes of running.
Use the sample mean x and sample standard deviation sx above
to calculate the z-score for
239 minutes of running during the week of 14 July.
_________ Is the z-score for
239 minutes an ordinary or extraordinary value?
__________________
On the week of 13 October I racked up 358 minutes of running during the week.
I was celebrating 30 years of running – I first ran in 1978.
Use the sample mean x and sample standard deviation sx above
to calculate the z-score for
358 minutes of running.
_________ Is the z-score for
358 minutes an ordinary or extraordinary value?
_________
Toughie: How many minutes of running in one week would I have to do in order
to attain an exraordinary number of minutes of running in one week?
Use z = 2 to find the number of minutes x that I would have to run to reach "extraordinary."
_________ Calculate the standard error of the sample mean x
for the number of minutes of running per week.
_________ Find tcritical for a confidence level c of 95%
for the number of minutes of running per week.
_________ Determine the margin of error E for the sample mean x.
Write out the 95% confidence interval for the population mean μ
for the number of minutes of running per week.
p(_____________ < μ < ___________) = 0.95
_________
The United States Department of Health recommends 150 minutes of moderately intense activity
per week. Use 150 minutes as the population mean μ.
Is the number of minutes of running per week done by Lee Ling statistically significantly
different than the μ = 150 minutes recommendation?
______________________
Using Lee Ling's data above and a population mean μ = 150,
determine the t-statistic.
______________________
Using Lee Ling's data above and a population mean μ = 150,
determine the p-value.
Keep three decimal places in your answer.
______________________
Using Lee Ling's data above and a population mean μ = 150,
determine the maximum confidence interval c for which the difference is statistically significant.
Keep three decimal places in your answer.
_________
Is Lee Ling exceeding the United States Health department minimum physical activity
minutes per week guidelines by a statistically significant amount?
Part II: Hypothesis Testing using the t-test
Last spring term I was lazy and my running slacked off. My weekly minutes of running was low,
some weeks I did not run at all.
With the end of the school term in mid-May I planned to put myself on a stricter regimen of running.
With only six weeks of renewed effort by the week of 23 June, could I prove that my
running duration in minutes per week had improved?
Use a t-test for a difference of two independent sample means in this portion of the test.
The samples means are the weekly minutes of running for spring versus summmer.
Note that for two of my weeks in spring the total weekly running time was actually zero minutes.
I did not run during those two particular weeks.
Steps
Date
Spring minutes (x)
Date
Summer minutes (y)
Mon 25 Feb 08
194
Mon 19 May 08
331
Mon 03 Mar 08
0
Mon 26 May 08
182
Mon 10 Mar 08
141
Mon 02 Jun 08
261
Mon 17 Mar 08
238
Mon 09 Jun 08
187
Mon 24 Mar 08
86
Mon 16 Jun 08
207
Mon 31 Mar 08
88
Mon 23 Jun 08
176
Mon 07 Apr 08
80
Mon 14 Apr 08
61
Mon 21 Apr 08
49
Mon 28 Apr 08
58
Mon 05 May 08
104
Mon 12 May 08
0
_________ Calculate the sample mean x
number of minutes of running per week during the spring term.
_________ Calculate the sample mean y
number of minutes of running per week during the summer.
_________ Are the sample means for the two samples mathematically different?
__________________ What is the p-value? Use the difference of means for independent samples TTEST function to determine the p-value for this two sample data. Keep three decimal places in your answer.
__________________ Is the difference in the means statistically significant
at a risk of a type I error alpha α = 0.05?
__________________ Would we "fail to reject" or "reject" a null hypothesis of no difference
in the means?
__________________ What is the maximum level of confidence we can have that the
difference is statistically significant? Keep three decimal places in your answer.
Part III: Linear Regression (best fit or least squares line)
Data table
Minutes of running
Total daily steps
34
10101
49
14200
41
8660
24
8675
34
10864
65
15489
60
14690
74
14725
114
18660
50
16763
In October 2008 my last pedometer that could withstand running and rain finally failed.
Without a pedometer, can minutes of running be used to estimate daily total steps?
Running is very regular and produces, for Lee Ling, 154 steps per minute.
The complication is before Lee Ling runs each day he walks around campus.
Can a linear regression be used to predict total daily steps just from his daily run?
This section explores this question.
_________ Calculate the slope of the linear regression (best fit line).
_________ Calculate the y-intercept of the linear regression (best fit line).
_________ Is the relation between minutes of running and total daily steps positive, negative, or neutral?
_________ Calculate the linear correlation coefficient r for the data.
______________ Is the correlation none, weak/low, moderate, strong/high, or perfect?
______________ Determine the coefficient of determination.
______________ What percent in the variation in
the minutes of running
"explains" the variation in
the total daily steps?
_________ Use the slope and intercept to predict
the number of total daily steps for 100 minutes of running.
_________ Use the slope and intercept to determine
the number of minutes of running required to produce 12000 steps.
_________ On a day on which I do not run, how many steps am I predicted to get?
One intention of any course is that a student should be able to
learn and employ new concepts in the field even after the course is over.
In a linear regression analysis a correlation coefficient near
zero means no relation exists between the variables.
You can run a statistical test to determine whether the
correlation coefficient r is
statistically signigicantly different from zero.
If the difference of r from zero is statistically significant,
then you will have proved that a relationship exists.
If you fail to reject a null hypothesis of r equals zero,
then there is no evidence in the data that minutes of running and total steps are related.
To run the hypothesis test, you will calculate a
t-critical (tc), a t-statistic (t), and then a p-value
using the t-statistic and the TDIST function.
For this test:
sample size n is the number of data pairs
t-critical: =TINV(α;n−2) where α = 0.05
p-value: =TDIST(ABS(t-statistic);n−2;2)
Note that n−2 is used in these formulas. This is the degrees
of freedom for a correlation hypothesis test.
_________ Determine the sample size n
by counting the number of data pairs.
_________ Determine t-critical using an alpha of α = 0.05
and n − 2 degrees of freedom.
_________
Determine the t-statistic using the formula noted further above,
remembering to use n − 2 for the degrees of freedom.
_________ Determine the p-value using the TDIST function,
remembering to use n − 2 for the degrees of freedom.
________ Is the correlation between
my minutes of running and my daily total steps statistically significant?
For a retrospective look at pedometer data, see also the
pedometer
mini-studies.
Tables of Formulas and OpenOffice Calc functions
Basic Statistics
Statistic or Parameter
Symbol
Equations
OpenOffice
Square root
=SQRT(number)
sample size n
n
=COUNT(data)
sample mean
x
Σx/n
=AVERAGE(data)
Sample standard deviation
sx or s
=STDEV(data)
Sample Coefficient of Variation
CV
sx / x
=STDEV(data)/AVERAGE(data)
Formula to calculate a z value from an x value
z
=STANDARDIZE(x;x;sx)
Confidence interval statistics for a single sample
Statistic or Parameter
Symbol
Equations
OpenOffice
Sample size
n
n
=COUNT(data)
Degrees of freedom
df
n − 1
=COUNT(data)-1
Find a tcritical value from a confidence level c
tc
=TINV(1-c;df)
Standard error of a sample mean x
SE
=STDEV(data)/SQRT(n)
Standard error of a sample proportion p
SE
=SQRT(p*q/n)
Calculate a margin of error for the mean E using tcritical and the standard error SE.
E
=tc*SE
Calculate a confidence interval for a population mean μ from a sample mean
x and a margin of error E
x - E < μ < x + E
Calculate a confidence interval for a population proportion P from a sample proportion p and a margin of error E
p - E < P < p + E
Hypothesis testing for a sample mean versus a known population mean
Statistic or Parameter
Symbol
Equations
OpenOffice
Relationship between confidence level c and alpha α for two-tailed tests
1 − c = α
Calculate t-critical for a two-tailed test
tc
=TINV(α;df)
Calculate a t-statistic
t
=(x - μ)/(sx/SQRT(n))
Calculate a two-tailed p-value from a t-statistic
p-value
= TDIST(ABS(t);df;2)
Hypothesis testing for paired data samples
Statistic or Parameter
Symbol
Equations
OpenOffice
Calculate a p-value for the difference of the means from two samples of paired data
=TTEST(data_range_x;data_range_y;2;1)
Hypothesis testing and confidence intervals for two independent samples
Statistic or Parameter
Symbol
Equations
OpenOffice
Degrees of freedom (approx.)
df
[smaller sample n] − 1
=COUNT(smaller sample)-1
Calculate t-critical for a two-tailed test
tc
=TINV(α;df)
Calculate the standard error SE for two independent samples
SE
=sqrt((sx^2/nx)+(sy^2/ny))
Calculate a margin of error E for two independent samples using tcritical and the standard error SE.
E
=tc*SE
Calculate the difference between two sample means
xd
x − y
=average(data set x)-average(data set y)
Calculate a confidence interval for a population mean difference μd
from a sample mean difference xd and a margin of error E
xd − E < μd <
xd + E
Calculate a p-value for the difference of the means for two independent samples (data unpaired, independent) where the population standard deviations are unknown