Steps

During this term many of you have had a chance to wear a pedometer and to count your steps. Focusing the final examination on statistics from pedometers seems like a natural way to end this term. The data in the table comes from pedometers worn by second grade students at Rural Elementary School (RES). The data is the daily average steps for each of the students.

Part II: Hypothesis Testing using the t-test

A study was conducted to determine differences in the number of steps between the rural Rural Elementary School (RES) students and the Urban Elementary School (UES) students. The data in the table is the average daily steps for the students.

Part III: Linear Regression (best fit or least squares line)

The data in this section explores the relationship between cadence (steps per minute) and speed (centimeters per second). In distance running theory, the optimal cadence is up around 180 steps per minute. This should produce the highest sustainable speed for distance running. As a joggler, my cadence is synchronized to my juggling. This section of the examination explores whether my cadence as measured by a pedometer is related to my speed as measured by a global positioning satellite receiver.

Data
Student	Steps
Gabriel	7095
Layoleen	6593
MJ Marcus	4151
Sabias	6793
Alex	5036
Elsihmer	5404
Misty	3922
Mairalinda	4361
Ernest Junior	5091
Lyllone	3042
AllyJean	4236
Felix Junior	1424
Mark	8966
Mersein	7402
Mayoleen	7618

Steps
RES students	Steps (x)	UES students	Steps (y)
Gabriel	7095	Frankie	16269
Layoleen	6593	Shawna	3712
MJ Marcus	4151	Alan	9713
Sabias	6793	Danielle	6293
Alex	5036	Kevin	9529
Elsihmer	5404	Rejazy	3747
Misty	3922	Alexandra	1019
Mairalinda	4361	Marlin	9669
Ernest Junior	5091	Anjelu	2914
Lyllone	3042	AJ	3798
AllyJean	4236	Arianne	1823
Felix Junior	1424	Burt	4482
Mark	8966	Aiyumi	7095
Mersein	7402	Trumaine	4071
Mayoleen	7618	Vincent	7616
		Faith	4123
		Hart	10068
		Kelsey	3309
		Hagadauroma	8828
		Beatrice	6840

Cadence versus speed
Cadence (steps/min)	Speed (cm/s)
148.0	234
150.0	250
151.3	239
151.4	239
152.4	250
153.2	224
153.6	256
154.7	250
154.8	235
155.4	290
155.4	240
155.5	276
159.3	245
159.7	248

One intention of any course is that a student should be able to learn and employ new concepts in the field even after the course is over. In a linear regression analysis a correlation coefficient near zero means no relation exists between the variables. You can run a statistical test to determine whether the correlation coefficient r is statistically signigicantly different from zero. If the difference of r from zero is statistically significant, then you will have proved that a relationship exists. If you fail to reject a null hypothesis of r equals zero, then there is no evidence in the data that cadence and speed are linked.

To run the hypothesis test, you will calculate a t-critical (t_c), a t-statistic (t), and then a p-value using the t-statistic and the TDIST function.

For this test:
sample size n is the number of data pairs
t-critical: =TINV(α;n−2) where α = 0.05
$t-statistic = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^{2}}}$
p-value: =TDIST(ABS(t-statistic);n−2;2)

Note that n−2 is used in these formulas. This is the degrees of freedom for a correlation hypothesis test.

_________ Determine the sample size n by counting the number of data pairs.
_________ Determine t-critical using an alpha of α = 0.05 and n − 2 degrees of freedom.
_________ Determine the t-statistic using the formula noted further above, remembering to use n − 2 for the degrees of freedom.
_________ Determine the p-value using the TDIST function, remembering to use n − 2 for the degrees of freedom.
________ Is the correlation between my cadence and my speed statistically significant?

Tables of Formulas and OpenOffice Calc functions

Basic Statistics
Statistic or Parameter	Symbol	Equations	OpenOffice
Square root			=SQRT(number)
sample size n	n		=COUNT(data)
sample mean	x	Σx/n	=AVERAGE(data)
Sample standard deviation	sx or s		=STDEV(data)
Sample Coefficient of Variation	CV	sx / x	=STDEV(data)/AVERAGE(data)
Formula to calculate a z value from an x value	z		=STANDARDIZE(x;x;sx)

Confidence interval statistics for a single sample
Statistic or Parameter	Symbol	Equations	OpenOffice
Sample size	n	n	=COUNT(data)
Degrees of freedom	df	n − 1	=COUNT(data)-1
Find a t_critical value from a confidence level c	t_c		=TINV(1-c;df)
Standard error of a sample mean x	SE		=STDEV(data)/SQRT(n)
Standard error of a sample proportion p	SE		=SQRT(p*q/n)
Calculate a margin of error for the mean E using t_critical and the standard error SE.	E		=t_c*SE
Calculate a confidence interval for a population mean μ from a sample mean x and a margin of error E		x - E < μ < x + E
Calculate a confidence interval for a population proportion P from a sample proportion p and a margin of error E		p - E < P < p + E

Hypothesis testing for a sample mean versus a known population mean
Statistic or Parameter	Symbol	Equations	OpenOffice
Relationship between confidence level c and alpha α for two-tailed tests		1 − c = α
Calculate t-critical for a two-tailed test	t_c		=TINV(α;df)
Calculate a t-statistic	t		=(x - μ)/(sx/SQRT(n))
Calculate a two-tailed p-value from a t-statistic	p-value		= TDIST(ABS(t);df;2)

Hypothesis testing for paired data samples
Statistic or Parameter	Symbol	Equations	OpenOffice
Calculate a p-value for the difference of the means from two samples of paired data			=TTEST(data_range_x;data_range_y;2;1)

Hypothesis testing and confidence intervals for two independent samples
Statistic or Parameter	Symbol	Equations	OpenOffice
Degrees of freedom (approx.)	df	[smaller sample n] − 1	=COUNT(smaller sample)-1
Calculate t-critical for a two-tailed test	t_c		=TINV(α;df)
Calculate the standard error SE for two independent samples	SE		=sqrt((sx^2/n_x)+(sy^2/n_y))
Calculate a margin of error E for two independent samples using t_critical and the standard error SE.	E		=t_c*SE
Calculate the difference between two sample means	x_d	x − y	=average(data set x)-average(data set y)
Calculate a confidence interval for a population mean difference μ_d from a sample mean difference x_d and a margin of error E			x_d − E < μ_d < x_d + E
Calculate a p-value for the difference of the means for two independent samples (data unpaired, independent) where the population standard deviations are unknown			=TTEST(x data;y data;2;3)

Linear regression statistics
Statistic or Parameter	Symbol	Equations	OpenOffice
Slope	b		=SLOPE(y data; x data)
Intercept	a		=INTERCEPT(y data; x data)
Correlation	r		=CORREL(y data; x data)
Coefficient of Determination	r²		=(CORREL(y data; x data))^2

Z-scores diagram

Bins (x)	Frequency f	RF p(x)





Sums:

Steps • Name:

Part I: Basic Statistics

Part II: Hypothesis Testing using the t-test

Part III: Linear Regression (best fit or least squares line)

Tables of Formulas and OpenOffice Calc functions