Beyond pedometers, minutes of physical activity

Week of	Minutes
Mon 30 Jun 08	139
Mon 07 Jul 08	189
Mon 14 Jul 08	239
Mon 21 Jul 08	137
Mon 28 Jul 08	302
Mon 04 Aug 08	143
Mon 11 Aug 08	348
Mon 18 Aug 08	103
Mon 25 Aug 08	224
Mon 01 Sep 08	165
Mon 08 Sep 08	226
Mon 15 Sep 08	186
Mon 22 Sep 08	234
Mon 29 Sep 08	293
Mon 06 Oct 08	113
Mon 13 Oct 08	358
Mon 20 Oct 08	133
Mon 27 Oct 08	322
Mon 03 Nov 08	152
Mon 10 Nov 08	346

In October 2008 the United States Department of Health and Human Services released the first set of recommendations on physical activity. The basic recommendation is for adults to get 150 minutes of moderately intense activity per week. The second column of the table indicates the number of minutes of running per week done by Lee Ling. Running is one form of intense activity. There are many others including walking, engaging in sports, strength training, any activity that raises your heart rate and gets you sweating for at least twenty minutes a day. The intensity level of activities varies with the activity and the fitness of the person. Running is noted to be a vigorous activity in an appendix to the main report.

Use the number of minutes of running per week (Minutes) for the following basic statistics.

Part II: Hypothesis Testing using the t-test

Last spring term I was lazy and my running slacked off. My weekly minutes of running was low, some weeks I did not run at all. With the end of the school term in mid-May I planned to put myself on a stricter regimen of running. With only six weeks of renewed effort by the week of 23 June, could I prove that my running duration in minutes per week had improved? Use a t-test for a difference of two independent sample means in this portion of the test. The samples means are the weekly minutes of running for spring versus summmer. Note that for two of my weeks in spring the total weekly running time was actually zero minutes. I did not run during those two particular weeks.

Part III: Linear Regression (best fit or least squares line)

In October 2008 my last pedometer that could withstand running and rain finally failed. Without a pedometer, can minutes of running be used to estimate daily total steps? Running is very regular and produces, for Lee Ling, 154 steps per minute. The complication is before Lee Ling runs each day he walks around campus. Can a linear regression be used to predict total daily steps just from his daily run? This section explores this question.

Steps
Date	Spring minutes (x)	Date	Summer minutes (y)
Mon 25 Feb 08	194	Mon 19 May 08	331
Mon 03 Mar 08	0	Mon 26 May 08	182
Mon 10 Mar 08	141	Mon 02 Jun 08	261
Mon 17 Mar 08	238	Mon 09 Jun 08	187
Mon 24 Mar 08	86	Mon 16 Jun 08	207
Mon 31 Mar 08	88	Mon 23 Jun 08	176
Mon 07 Apr 08	80
Mon 14 Apr 08	61
Mon 21 Apr 08	49
Mon 28 Apr 08	58
Mon 05 May 08	104
Mon 12 May 08	0

Minutes of running	Total daily steps
34	10101
49	14200
41	8660
24	8675
34	10864
65	15489
60	14690
74	14725
114	18660
50	16763

One intention of any course is that a student should be able to learn and employ new concepts in the field even after the course is over. In a linear regression analysis a correlation coefficient near zero means no relation exists between the variables. You can run a statistical test to determine whether the correlation coefficient r is statistically signigicantly different from zero. If the difference of r from zero is statistically significant, then you will have proved that a relationship exists. If you fail to reject a null hypothesis of r equals zero, then there is no evidence in the data that minutes of running and total steps are related.

To run the hypothesis test, you will calculate a t-critical (t_c), a t-statistic (t), and then a p-value using the t-statistic and the TDIST function.

For this test:
sample size n is the number of data pairs
t-critical: =TINV(α;n−2) where α = 0.05
$t-statistic = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^{2}}}$
p-value: =TDIST(ABS(t-statistic);n−2;2)

Note that n−2 is used in these formulas. This is the degrees of freedom for a correlation hypothesis test.

_________ Determine the sample size n by counting the number of data pairs.
_________ Determine t-critical using an alpha of α = 0.05 and n − 2 degrees of freedom.
_________ Determine the t-statistic using the formula noted further above, remembering to use n − 2 for the degrees of freedom.
_________ Determine the p-value using the TDIST function, remembering to use n − 2 for the degrees of freedom.
________ Is the correlation between my minutes of running and my daily total steps statistically significant?

For a retrospective look at pedometer data, see also the pedometer mini-studies.

Tables of Formulas and OpenOffice Calc functions

Basic Statistics
Statistic or Parameter	Symbol	Equations	OpenOffice
Square root			=SQRT(number)
sample size n	n		=COUNT(data)
sample mean	x	Σx/n	=AVERAGE(data)
Sample standard deviation	sx or s		=STDEV(data)
Sample Coefficient of Variation	CV	sx / x	=STDEV(data)/AVERAGE(data)
Formula to calculate a z value from an x value	z		=STANDARDIZE(x;x;sx)

Confidence interval statistics for a single sample
Statistic or Parameter	Symbol	Equations	OpenOffice
Sample size	n	n	=COUNT(data)
Degrees of freedom	df	n − 1	=COUNT(data)-1
Find a t_critical value from a confidence level c	t_c		=TINV(1-c;df)
Standard error of a sample mean x	SE		=STDEV(data)/SQRT(n)
Standard error of a sample proportion p	SE		=SQRT(p*q/n)
Calculate a margin of error for the mean E using t_critical and the standard error SE.	E		=t_c*SE
Calculate a confidence interval for a population mean μ from a sample mean x and a margin of error E		x - E < μ < x + E
Calculate a confidence interval for a population proportion P from a sample proportion p and a margin of error E		p - E < P < p + E

Hypothesis testing for a sample mean versus a known population mean
Statistic or Parameter	Symbol	Equations	OpenOffice
Relationship between confidence level c and alpha α for two-tailed tests		1 − c = α
Calculate t-critical for a two-tailed test	t_c		=TINV(α;df)
Calculate a t-statistic	t		=(x - μ)/(sx/SQRT(n))
Calculate a two-tailed p-value from a t-statistic	p-value		= TDIST(ABS(t);df;2)

Hypothesis testing for paired data samples
Statistic or Parameter	Symbol	Equations	OpenOffice
Calculate a p-value for the difference of the means from two samples of paired data			=TTEST(data_range_x;data_range_y;2;1)

Hypothesis testing and confidence intervals for two independent samples
Statistic or Parameter	Symbol	Equations	OpenOffice
Degrees of freedom (approx.)	df	[smaller sample n] − 1	=COUNT(smaller sample)-1
Calculate t-critical for a two-tailed test	t_c		=TINV(α;df)
Calculate the standard error SE for two independent samples	SE		=sqrt((sx^2/n_x)+(sy^2/n_y))
Calculate a margin of error E for two independent samples using t_critical and the standard error SE.	E		=t_c*SE
Calculate the difference between two sample means	x_d	x − y	=average(data set x)-average(data set y)
Calculate a confidence interval for a population mean difference μ_d from a sample mean difference x_d and a margin of error E			x_d − E < μ_d < x_d + E
Calculate a p-value for the difference of the means for two independent samples (data unpaired, independent) where the population standard deviations are unknown			=TTEST(x data;y data;2;3)

Linear regression statistics
Statistic or Parameter	Symbol	Equations	OpenOffice
Slope	b		=SLOPE(y data; x data)
Intercept	a		=INTERCEPT(y data; x data)
Correlation	r		=CORREL(y data; x data)
Coefficient of Determination	r²		=(CORREL(y data; x data))^2

Z-scores diagram

Bins (x)	Frequency f	RF p(x)





Sums:

Beyond pedometers: minutes of physical activity • Name:

Part I: Basic Statistics

Part II: Hypothesis Testing using the t-test

Part III: Linear Regression (best fit or least squares line)

Tables of Formulas and OpenOffice Calc functions