Beyond pedometers: minutes of physical activity • Name:

Week ofMinutes
Mon 30 Jun 08139
Mon 07 Jul 08189
Mon 14 Jul 08239
Mon 21 Jul 08137
Mon 28 Jul 08302
Mon 04 Aug 08143
Mon 11 Aug 08348
Mon 18 Aug 08103
Mon 25 Aug 08224
Mon 01 Sep 08165
Mon 08 Sep 08226
Mon 15 Sep 08186
Mon 22 Sep 08234
Mon 29 Sep 08293
Mon 06 Oct 08113
Mon 13 Oct 08358
Mon 20 Oct 08133
Mon 27 Oct 08322
Mon 03 Nov 08152
Mon 10 Nov 08346

Part I: Basic Statistics

In October 2008 the United States Department of Health and Human Services released the first set of recommendations on physical activity. The basic recommendation is for adults to get 150 minutes of moderately intense activity per week. The second column of the table indicates the number of minutes of running per week done by Lee Ling. Running is one form of intense activity. There are many others including walking, engaging in sports, strength training, any activity that raises your heart rate and gets you sweating for at least twenty minutes a day. The intensity level of activities varies with the activity and the fitness of the person. Running is noted to be a vigorous activity in an appendix to the main report.

Use the number of minutes of running per week (Minutes) for the following basic statistics.

Data sheet (OpenOffice.org Calc)html

  1. _________ What level of measurement is the data?
  2. _________ Determine the sample size n.
  3. _________ Determine the minimum.
  4. _________ Determine the maximum.
  5. _________ Calculate the range.
  6. _________ Calculate the midrange.
  7. _________ Determine the mode.
  8. _________ Determine the median.
  9. _________ Calculate the sample mean x.
  10. _________ Calculate the sample standard deviation sx.
  11. _________ Calculate the sample Coefficient of Variation.
  12. _________ Determine the class width. Use five classes (bins or intervals).
  13. Fill in the following table with the class upper limits in the first column, the frequencies in the second column, and the relative frequencies in the third column
Bins (x)Frequency fRF p(x)
Sums:
  1. Sketch a histogram of the relative frequency data.
  2. __________________ What is the shape of the distribution?
  3. __________________ On 19 July 2008 I ran in the INS Half-Marathon. That run occurred during the week of 14 July, a week with 239 minutes of running. Use the sample mean x and sample standard deviation sx above to calculate the z-score for 239 minutes of running during the week of 14 July.
  4. _________ Is the z-score for 239 minutes an ordinary or extraordinary value?
  5. __________________ On the week of 13 October I racked up 358 minutes of running during the week. I was celebrating 30 years of running – I first ran in 1978. Use the sample mean x and sample standard deviation sx above to calculate the z-score for 358 minutes of running.
  6. _________ Is the z-score for 358 minutes an ordinary or extraordinary value?
  7. _________ Toughie: How many minutes of running in one week would I have to do in order to attain an exraordinary number of minutes of running in one week? Use z = 2 to find the number of minutes x that I would have to run to reach "extraordinary."
  8. _________ Calculate the standard error of the sample mean x for the number of minutes of running per week.
  9. _________ Find tcritical for a confidence level c of 95% for the number of minutes of running per week.
  10. _________ Determine the margin of error E for the sample mean x.
  11. Write out the 95% confidence interval for the population mean μ for the number of minutes of running per week.
    p(_____________ < μ < ___________) = 0.95
  12. _________ The United States Department of Health recommends 150 minutes of moderately intense activity per week. Use 150 minutes as the population mean μ. Is the number of minutes of running per week done by Lee Ling statistically significantly different than the μ = 150 minutes recommendation?
  13. ______________________ Using Lee Ling's data above and a population mean μ = 150, determine the t-statistic.
  14. ______________________ Using Lee Ling's data above and a population mean μ = 150, determine the p-value. Keep three decimal places in your answer.
  15. ______________________ Using Lee Ling's data above and a population mean μ = 150, determine the maximum confidence interval c for which the difference is statistically significant. Keep three decimal places in your answer.
  16. _________ Is Lee Ling exceeding the United States Health department minimum physical activity minutes per week guidelines by a statistically significant amount?

Part II: Hypothesis Testing using the t-test

Last spring term I was lazy and my running slacked off. My weekly minutes of running was low, some weeks I did not run at all. With the end of the school term in mid-May I planned to put myself on a stricter regimen of running. With only six weeks of renewed effort by the week of 23 June, could I prove that my running duration in minutes per week had improved? Use a t-test for a difference of two independent sample means in this portion of the test. The samples means are the weekly minutes of running for spring versus summmer. Note that for two of my weeks in spring the total weekly running time was actually zero minutes. I did not run during those two particular weeks.

Steps
DateSpring minutes (x) DateSummer minutes (y)
Mon 25 Feb 08194Mon 19 May 08331
Mon 03 Mar 080Mon 26 May 08182
Mon 10 Mar 08141Mon 02 Jun 08261
Mon 17 Mar 08238Mon 09 Jun 08187
Mon 24 Mar 0886Mon 16 Jun 08207
Mon 31 Mar 0888Mon 23 Jun 08176
Mon 07 Apr 0880
Mon 14 Apr 0861
Mon 21 Apr 0849
Mon 28 Apr 0858
Mon 05 May 08104
Mon 12 May 080
  1. _________ Calculate the sample mean x number of minutes of running per week during the spring term.
  2. _________ Calculate the sample mean y number of minutes of running per week during the summer.
  3. _________ Are the sample means for the two samples mathematically different?
  4. __________________ What is the p-value? Use the difference of means for independent samples TTEST function to determine the p-value for this two sample data. Keep three decimal places in your answer.
  5. __________________ Is the difference in the means statistically significant at a risk of a type I error alpha α = 0.05?
  6. __________________ Would we "fail to reject" or "reject" a null hypothesis of no difference in the means?
  7. __________________ What is the maximum level of confidence we can have that the difference is statistically significant? Keep three decimal places in your answer.

Part III: Linear Regression (best fit or least squares line)

Minutes of running versus the total steps in a specific day background rectangle major grid lines axes x-axis and y-axis data points as circles text layers Minutes of running versus the total steps in a specific day [Ten days worth of data displayed] Minutes of running Total daily steps y-axis labels 8660 9660 10660 11660 12660 13660 14660 15660 16660 17660 18660 x-axis labels 24 33 42 51 60 69 78 87 96 105 114

Data table

Minutes of runningTotal daily steps
3410101
4914200
418660
248675
3410864
6515489
6014690
7414725
11418660
5016763

In October 2008 my last pedometer that could withstand running and rain finally failed. Without a pedometer, can minutes of running be used to estimate daily total steps? Running is very regular and produces, for Lee Ling, 154 steps per minute. The complication is before Lee Ling runs each day he walks around campus. Can a linear regression be used to predict total daily steps just from his daily run? This section explores this question.

  1. _________ Calculate the slope of the linear regression (best fit line).
  2. _________ Calculate the y-intercept of the linear regression (best fit line).
  3. _________ Is the relation between minutes of running and total daily steps positive, negative, or neutral?
  4. _________ Calculate the linear correlation coefficient r for the data.
  5. ______________ Is the correlation none, weak/low, moderate, strong/high, or perfect?
  6. ______________ Determine the coefficient of determination.
  7. ______________ What percent in the variation in the minutes of running "explains" the variation in the total daily steps?
  8. _________ Use the slope and intercept to predict the number of total daily steps for 100 minutes of running.
  9. _________ Use the slope and intercept to determine the number of minutes of running required to produce 12000 steps.
  10. _________ On a day on which I do not run, how many steps am I predicted to get?

One intention of any course is that a student should be able to learn and employ new concepts in the field even after the course is over. In a linear regression analysis a correlation coefficient near zero means no relation exists between the variables. You can run a statistical test to determine whether the correlation coefficient r is statistically signigicantly different from zero. If the difference of r from zero is statistically significant, then you will have proved that a relationship exists. If you fail to reject a null hypothesis of r equals zero, then there is no evidence in the data that minutes of running and total steps are related.

To run the hypothesis test, you will calculate a t-critical (tc), a t-statistic (t), and then a p-value using the t-statistic and the TDIST function.

For this test:
sample size n is the number of data pairs
t-critical: =TINV(α;n−2) where α = 0.05
t-statistic = r n 2 1 r 2
p-value: =TDIST(ABS(t-statistic);n−2;2)

Note that n−2 is used in these formulas. This is the degrees of freedom for a correlation hypothesis test.

  1. _________ Determine the sample size n by counting the number of data pairs.
  2. _________ Determine t-critical using an alpha of α = 0.05 and n − 2 degrees of freedom.
  3. _________ Determine the t-statistic using the formula noted further above, remembering to use n − 2 for the degrees of freedom.
  4. _________ Determine the p-value using the TDIST function, remembering to use n − 2 for the degrees of freedom.
  5. ________ Is the correlation between my minutes of running and my daily total steps statistically significant?

For a retrospective look at pedometer data, see also the pedometer mini-studies.

Tables of Formulas and OpenOffice Calc functions

Basic Statistics
Statistic or ParameterSymbolEquationsOpenOffice
Square root=SQRT(number)
sample size nn=COUNT(data)
sample mean x Σx/n =AVERAGE(data)
Sample standard deviationsx or s=STDEV(data)
Sample Coefficient of VariationCV sx / x =STDEV(data)/AVERAGE(data)
Formula to calculate a z value from an x value z z-score from a sample mean =STANDARDIZE(x;x;sx)
Confidence interval statistics for a single sample
Statistic or ParameterSymbolEquationsOpenOffice
Sample sizenn=COUNT(data)
Degrees of freedomdfn − 1=COUNT(data)-1
Find a tcritical value from a confidence level c tc =TINV(1-c;df)
Standard error of a sample mean x SE standard error) =STDEV(data)/SQRT(n)
Standard error of a sample proportion p SE se_proportion =SQRT(p*q/n)
Calculate a margin of error for the mean E using tcritical and the standard error SE. E margin_error =tc*SE
Calculate a confidence interval for a population mean μ from a sample mean x and a margin of error E x - E < μ < x + E
Calculate a confidence interval for a population proportion P from a sample proportion p and a margin of error E p - E < P < p + E
Hypothesis testing for a sample mean versus a known population mean
Statistic or ParameterSymbolEquationsOpenOffice
Relationship between confidence level c and alpha α for two-tailed tests 1 − c = α
Calculate t-critical for a two-tailed test tc=TINV(α;df)
Calculate a t-statistic t t-statistic =(x - μ)/(sx/SQRT(n))
Calculate a two-tailed p-value from a t-statisticp-value = TDIST(ABS(t);df;2)
Hypothesis testing for paired data samples
Statistic or ParameterSymbolEquationsOpenOffice
Calculate a p-value for the difference of the means from two samples of paired data =TTEST(data_range_x;data_range_y;2;1)
Hypothesis testing and confidence intervals for two independent samples
Statistic or ParameterSymbolEquationsOpenOffice
Degrees of freedom (approx.)df [smaller sample n] − 1=COUNT(smaller sample)-1
Calculate t-critical for a two-tailed test tc=TINV(α;df)
Calculate the standard error SE for two independent samples SE standard error for two sample means =sqrt((sx^2/nx)+(sy^2/ny))
Calculate a margin of error E for two independent samples using tcritical and the standard error SE. E margin_error =tc*SE
Calculate the difference between two sample means xd xy =average(data set x)-average(data set y)
Calculate a confidence interval for a population mean difference μd from a sample mean difference xd and a margin of error E xd − E < μd < xd + E
Calculate a p-value for the difference of the means for two independent samples (data unpaired, independent) where the population standard deviations are unknown =TTEST(x data;y data;2;3)
Linear regression statistics
Statistic or ParameterSymbolEquationsOpenOffice
Slopeb=SLOPE(y data; x data)
Intercepta=INTERCEPT(y data; x data)
Correlationr=CORREL(y data; x data)
Coefficient of Determinationr2 =(CORREL(y data; x data))^2

Z-scores diagram