Is environmentally friendlier faster? • Name:

RouteSpeed
Airport OAB10.53
Genesis-Sant8.66
Koahn-airport9.24
Koahn-Sant9.57
Lida OAB9.50
LTLR11.11
NBFS-KasEl-Sant9.34
Palipowe KasEl OAB8.44
Palipowe-KasEl-Lida9.81
Santarch9.63
Santarch10.96
U-airport8.45

Part I: Basic Statistics

END Stumptown

Former Nike shoe designer Andrew Estey felt that a more environmentally friendly running shoe could be produced. The current design of running shoes use a lot of petrochemical resources and energy. Andrew was sure there was a better way, starting with a simpler design, recycled materials, and organic inputs such as bamboo fiber uppers. Andrew formed his own company, Environmentally Neutral Design (END) and eventually released the END Stumptown running shoe. Being simpler, the Stumptown is lighter than the typical sort of shoes I have been running in for over thirty years. Shoes like my Mizuno Wave Renegade IVs are 915 grams for the pair, while the Stumptowns are only 665 grams for the pair. I began running in the END Stumptown shoes in March and I feel faster when I run in them. This examination uses speed data I have been gathering in the END Stumptown shoes.

In the first section of the examination you are asked to make basic calculations on the spped of the new END Stumptown shoes in kilometers per hour along diifferent routes that I typically run. My route names include abbreviations and nicknames for locations. KasEl refers to passing through the intersection of Kaselehlie and Elenieng at AMCRES. OAB is an abbreviation for "out-and-back," NBFS is "Nett Bridge Far Side." Sant refers to the Spanish name for Kolonia: Santiago de la Ascension. In my running logs Sant means passing Spanish wall.

Data sheet

  1. _________ What level of measurement is the data?
  2. _________ Determine the sample size n.
  3. _________ Determine the minimum.
  4. _________ Determine the maximum.
  5. _________ Calculate the range.
  6. _________ Calculate the midrange.
  7. _________ Determine the mode.
  8. _________ Determine the median.
  9. _________ Calculate the sample mean x.
  10. _________ Calculate the sample standard deviation sx.
  11. _________ Calculate the sample Coefficient of Variation.
  12. _________ Determine the class width. Use five classes (bins or intervals).
  13. Fill in the following table with the class upper limits in the first column, the frequencies in the second column, and the relative frequencies in the third column
Bins (x)Frequency fRF p(x)
Sums:
  1. Sketch a histogram of the relative frequency data.
  2. __________________ What is the shape of the distribution?
  3. __________________ On 01 January 2009, New Year's day, I ran an odd route. Every time I heard pots banging, I ran to the noise running and juggling. I wound up running all over Kolonia. The resulting run was very slow, only 5.84 kph. Use the sample mean x and sample standard deviation sx above to calculate the z-score for 5.84 kph.
  4. _________ Is the z-score for 5.84 kph an ordinary or extraordinary value?
  5. __________________ On 07 April 2009 I ran in an older pair of Mizuno Wave Renegade IV running shoes. I ran one way to the Nett river bridge at 11.46 kph. Use the sample mean x and sample standard deviation sx above to calculate the z-score for 11.46 kph.
  6. _________ Is the z-score for 11.46 kph an ordinary or extraordinary value?
  7. _________ Calculate the standard error of the sample mean x
  8. _________ Find tcritical for a confidence level c of 95%
  9. _________ Determine the margin of error E for the sample mean x.
  10. Write out the 95% confidence interval for the population mean μ
    p(_____________ < μ < ___________) = 0.95
  11. _________ My average speed in the Mizuno Wave Renegade IV shoes is 9.05 kph. Based on the confidence interval above, is the mean speed in the newer, lighter END Stumptown shoes different than the μ = 9.05 kph?
  12. ___________ Using END Stumptown speed data above and a population mean μ = 9.05 kph determine the t-statistic.
  13. ___________ Using END Stumptown speed data above and a population mean μ = 9.05 kph determine the p-value.
  14. ___________ Using END Stumptown speed data above and a population mean μ = 9.05 kph determine the maximum confidence c interval for which the difference is statistically significant.
  15. ___________ Based on the hypothesis test above, at an alpha of 0.05, are the END Stumptown shoes statistically significantly faster than the Mizuno Wave Renegade IV shoes?

Part II: Hypothesis Testing using the t-test

In part two you will run a paired data two sample hypothesis test on whether the END Stumptown shoes are associated with a statistically significantly different speed from the Mizuno Wave Renegade IV shoes. At the end of part one you ran a hypothesis test against a population mean speed of 9.05 kph. The problem with that hypothesis test is that my speed varies with the length of the route. By comparing my speed in the END shoes versus the speed in the Mizuno shoes on the same route, a more accurate hypothesis test can be run. Use a paired t-test for two samples to determine whether the lighter END Stumptown shoes are faster for me than the heavier Mizuno Wave Renegade 4 shoes.

RouteENDMZ WR 4
Airport OAB10.538.47
Koahn-Sant9.579.37
Lida OAB9.508.81
LTLR11.119.01
NBFS-KasEl Sant9.348.83
Palipowe-KasEl-Lida9.818.67
Santarch10.309.61
  1. _________ Calculate the sample mean speed for the END Stumptown shoes using the above data.
  2. _________ Calculate the sample mean speed for the Mizuno Wave Renegade IV shoes using the above data.
  3. _________ Are the sample means for the two samples mathematically different?
  4. __________________ What is the p-value? Use the difference of means for paired data TTEST function =TTEST(END-data;MZWR4-data;2;1) to determine the p-value for this two sample data.
  5. __________________ Is the difference in the means statistically significant at a risk of a type I error alpha α = 0.05?
  6. __________________ Would we fail to reject or reject a null hypothesis of no difference in the sample means?
  7. __________________ What is the maximum level of confidence we can have that the difference is statistically significant?

Part III: Linear Regression (best fit or least squares line)

Distance versus speed background rectangle major grid lines axes x-axis and y-axis data points as circles text layers Distance versus speed for END Stumptown shoes Distance (km) Speed (kph) y-axis labels 8 9 9 9 10 10 10 10 11 11 11 x-axis labels 4 5 7 8 9 10 12 13 14 15 17

Data table

RouteDistance (km)Speed (kph)
Santarch4.069.63
Santarch4.0610.96
LTLR4.7611.11
Lida OAB5.719.50
Genesis-Sant6.758.66
Airport OAB7.9010.53
NBFS-KasEl Sant9.159.34
Koahn-Sant9.999.57
Palipowe KasEl OAB10.448.44
Palipowe-KasEl-Lida12.909.81
Koahn-airport13.489.24
U airport16.538.45

Runners tend to run at the fastest speed for which they can still finish the distance. Thirty years of running have trained me to know how fast I can go for a given distance. To go farther I have to run a little slower. This last section of the final explores this relationship between the length of a run in kilometers and the speed at which I run.

  1. _________ Calculate the slope of the linear regression (best fit line).
  2. _________ Calculate the y-intercept of the linear regression (best fit line).
  3. _________ Is the relation between distance and speed positive, negative, or neutral?
  4. _________ Calculate the linear correlation coefficient r for the data.
  5. ______________ Is the correlation none, weak/low, moderate, strong/high, or perfect?
  6. ______________ Determine the coefficient of determination.
  7. ______________ What percent in the variation in the distance "explains" the variation in the speed?
  8. _________ Use the slope and intercept to predict the speed for an 8 kilometer run.
  9. _________ Use the slope and intercept to determine the distance I would be predicted to run at 9 kph.
  10. _________ If I run zero kilometers, what speed does the regression predict?
-->

One intention of any course is that a student should be able to learn and employ new concepts in the field even after the course is over. In a linear regression analysis a correlation coefficient near zero means no relation exists between the variables. You can run a statistical test to determine whether the correlation coefficient r is statistically signigicantly different from zero. If the difference of r from zero is statistically significant, then you will have proved that a relationship exists. If you fail to reject a null hypothesis of r equals zero, then there is no evidence in the data that minutes of running and total steps are related.

To run the hypothesis test, you will calculate a t-critical (tc), a t-statistic (t), and then a p-value using the t-statistic and the TDIST function.

For this test:
The sample size n is the number of data pairs
t-critical: =TINV(α;n−2) where α = 0.05
t-statistic = r n 2 1 r 2
p-value: =TDIST(ABS(t-statistic);n−2;2)

Note that n−2 is used in these formulas. This is the degrees of freedom for a correlation hypothesis test.

  1. _________ Determine the sample size n by counting the number of data pairs.
  2. _________ Determine t-critical using an alpha of α = 0.05 and n − 2 degrees of freedom.
  3. _________ Determine the t-statistic using the formula noted above.
  4. _________ Determine the p-value using the TDIST function, remembering to use n − 2 for the degrees of freedom.
  5. ________ Is the correlation between my distance and speed statistically significant?

Tables of Formulas and OpenOffice Calc functions

Basic Statistics
Statistic or ParameterSymbolEquationsOpenOffice
Square root=SQRT(number)
sample size nn=COUNT(data)
sample mean x Σx/n =AVERAGE(data)
Sample standard deviationsx or s=STDEV(data)
Sample Coefficient of VariationCV sx / x =STDEV(data)/AVERAGE(data)
Formula to calculate a z value from an x value z z-score from a sample mean =STANDARDIZE(x;x;sx)
Confidence interval statistics for a single sample
Statistic or ParameterSymbolEquationsOpenOffice
Sample sizenn=COUNT(data)
Degrees of freedomdfn − 1=COUNT(data)-1
Find a tcritical value from a confidence level c tc =TINV(1-c;df)
Standard error of a sample mean x SE standard error) =STDEV(data)/SQRT(n)
Standard error of a sample proportion p SE se_proportion =SQRT(p*q/n)
Calculate a margin of error for the mean E using tcritical and the standard error SE. E margin_error =tc*SE
Calculate a confidence interval for a population mean μ from a sample mean x and a margin of error E x - E < μ < x + E
Calculate a confidence interval for a population proportion P from a sample proportion p and a margin of error E p - E < P < p + E
Hypothesis testing for a sample mean versus a known population mean
Statistic or ParameterSymbolEquationsOpenOffice
Relationship between confidence level c and alpha α for two-tailed tests 1 − c = α
Calculate t-critical for a two-tailed test tc=TINV(α;df)
Calculate a t-statistic t t-statistic =(x - μ)/(sx/SQRT(n))
Calculate a two-tailed p-value from a t-statisticp-value = TDIST(ABS(t);df;2)
Hypothesis testing for paired data samples
Statistic or ParameterSymbolEquationsOpenOffice
Calculate a p-value for the difference of the means from two samples of paired data =TTEST(data_range_x;data_range_y;2;1)
Hypothesis testing and confidence intervals for two independent samples
Statistic or ParameterSymbolEquationsOpenOffice
Degrees of freedom (approx.)df [smaller sample n] − 1=COUNT(smaller sample)-1
Calculate t-critical for a two-tailed test tc=TINV(α;df)
Calculate the standard error SE for two independent samples SE standard error for two sample means =sqrt((sx^2/nx)+(sy^2/ny))
Calculate a margin of error E for two independent samples using tcritical and the standard error SE. E margin_error =tc*SE
Calculate the difference between two sample means xd xy =average(data set x)-average(data set y)
Calculate a confidence interval for a population mean difference μd from a sample mean difference xd and a margin of error E xd − E < μd < xd + E
Calculate a p-value for the difference of the means for two independent samples (data unpaired, independent) where the population standard deviations are unknown =TTEST(x data;y data;2;3)
Linear regression statistics
Statistic or ParameterSymbolEquationsOpenOffice
Slopeb=SLOPE(y data; x data)
Intercepta=INTERCEPT(y data; x data)
Correlationr=CORREL(y data; x data)
Coefficient of Determinationr2 =(CORREL(y data; x data))^2

Z-scores diagram