MS 150 Statistics project

Matrix of green numbers not falling on a black screen: no animation yet 4 5 6 9 5 8 3 1 8 7 1 5 2 4 9 8 9 9 5 2 7 1 7 3 7 3 8 5 9 1 5 8 2 3 3 4 1 5 9 7 4 1 3 3 8 1 5 2 2 9 2 3 4 2 5 5 3 6 7 . . 9 7 5 1 8 5 3 9 2 1 5 7 6 8 7 13 1 9 9 3 4 7 1 9 8 6 21 4 5 1 1 9 4 6 6 1 1 34 7 8 8 7 1 3 4 . 1 4 55 4 5 6 9 5 8 3 1 8 7 5 2 4 9 8 9 9 5 2 7 7 3 7 3 8 5 9 1 5 8 3 3 4 1 5 9 7 4 1 3 8 1 5 . . . . 9 4 2 5 3 6 . 0 0 . 7 5 5 3 9 . . . . 2 8 1 1 9 3 . . 1 9 8 4 4 1 1 . . 6 6 1 7 1 8 . . . . . 1

Matrix of gray numbers on a white screen: print media variation 4 5 6 9 5 8 3 1 8 7 1 5 2 4 9 8 9 9 5 2 7 1 7 3 7 3 8 5 9 1 5 8 2 3 3 4 1 5 9 7 4 1 3 3 8 1 5 2 2 9 2 3 4 2 5 5 3 6 7 . . 9 7 5 1 8 5 3 9 2 1 5 7 6 8 7 13 1 9 9 3 4 7 1 9 8 6 21 4 5 1 1 9 4 6 6 1 1 34 7 8 8 7 1 3 4 . 1 4 55 4 5 6 9 5 8 3 1 8 7 5 2 4 9 8 9 9 5 2 7 7 3 7 3 8 5 9 1 5 8 3 3 4 1 5 9 7 4 1 3 8 1 5 . . . . 9 4 2 5 3 6 . 0 0 . 7 5 5 3 9 . . . . 2 8 1 1 9 3 . . 1 9 8 4 4 1 1 . . 6 6 1 7 1 8 . . . . . 1

Ideas under consideration due at start of fourth week of class
First draft due at midterm
Second draft due at time of test three
Final draft due roughly a week prior to the end of the term.

We all walk in an almost invisible sea of data. I walk into a school fair and notice a jump rope contest. The number of jumps for each jumper until they foul out is being recorded on the wall. Numbers. With a mode, median, mean, and standard deviation. Then I notice that faster jumpers attain higher jump counts than slower jumpers. I can begin to predict jump counts based on the starting rhythm of the jumper. I use my stopwatch to record the time and total jump count. I later find that a linear correlation does exist, and I am able to show by a t-test that the faster jumpers have statistically significantly higher jump counts. I later incorporate this data in the fall 2007 final.

I walked into a store back in 2003 and noticed that Yamasa soy sauce appeared to cost more than Kikkoman soy sauce. I recorded prices and volumes, working out the cost per milliliter. I eventually showed that the mean price per milliliter for Yamasa is higher than Kikkoman. I also ran a survey of students and determined that students prefer Kikkoman to Yamasa.

My son likes articulated mining dump trucks. I find pictures of Terex dump trucks on the Internet. I write to Terex in Scotland and ask them about how the prices vary for the dump trucks, explaining that I teach statistics. "Funny you should ask," a Terex sales representative replied in writing. "The dump trucks are basically priced by a linear relationship between horsepower and price." The representative included a complete list of horsepower and price.

One term I learned that a new Cascading Style Sheets level 3 color specification for hue, luminosity, and luminance was available for HyperText Markup Language web pages. The hue was based on a color wheel with cyan at the 180° middle of the wheel. I knew that Newton had put green in the middle of the red-orange-yellow-green-blue-indigo-violet rainbow, but green is at 120° on a hue color wheel. And there is no cyan in Newton's rainbow. Could the middle of the rainbow actually be at 180° cyan, or was Newton correct to say the middle of the rainbow is at 120° green? I used a hue analysis tool to analyze the image of an actual rainbow taken by a digital camera here on Pohnpei. This allowed an analysis of the true center of the rainbow.

While researching sakau consumption in markets here on Pohnpei I found differences in means between markets, and I found a variation with distance from Kolonia. I asked some of the markets to share their cup tally sheets with me, and a number of them obliged. The data proved interesting.

The point is that data is all around us all the time. You might not go into statistics professionally, yet you will always live in a world filled with numbers and data. For one sixteen week term period in your life I want you to walk with an awareness of the data around you. At midterm you will turn in a proposed ratio level data set with basic statistics. You pick the data - you decide on the sample. At term's end you will add a 95% confidence interval for your data set and turn in a final, completed project.

Numbers flow all around you. A sea of a data pours past your senses daily. The world is numbers. Watch for numbers to happen around you. See the matrix. When you observe numbers happening, record them.

Project report

Cite your data sources. Describe the sampling procedure using complete sentences. Use statistical terminology and use that terminology correctly. Was the sample a random sample or a convenience sample? What were the circumstances that led to obtaining the data? Write up the procedure in complete sentences. Prepare the write-up using in a word processing program. Copy and paste your data tables and charts from a spreadsheet into the word processing document.

Ratio level data is preferred. If you opt to work with nominal or ordinal level data, please meet with your instructor for guidance and advice on how to best proceed.

Find something original, something unique to your life. Avoid doing a project on an example used in class such as favorite color, car counts, step counts, or other in-class examples.

Statistics to report in the first, second, and final draft include:

The items below will appear in the final draft. The corresponding material is not covered until after midterm. If confidence intervals are done by test two, then the second draft should include confidence intervals.

Statistics project marking rubric
[S]ources and sampling
2Sources cited and sampling procedure described
1Source cited, no sampling procedure
[C]ompleteness of the statistical analysis
+1 Per appropriate and correctly calculated statistic. Frequency table, histogram chart, and others as specified above are worth more than a single point. If source is unidentified, or the sampling procedure unclear, or the data is not clearly labeled in terms of both what the data is measuring and units of measurement, then judging whether a statistic is appropriate or correct may be impossible and can result in no points for completeness of the statistical analysis.
[U]niqueness
2Unique data showing inspiration and originality
1Commonly chosen data
[R]ange distribution
2Data shows a variety of values well distributed across the range
1Data has only a few values or is not well distributed across the range
[V]alidity
2Statistically valid and useable data
1Statistically invalid or unuseable data
[E]ffort
3 High fruit: data required planning, forethought, sustained effort over time. Not easily obtained.
2 Low hanging fruit: Data easily available in a single contact with minimal planning and effort
1 Fallen fruit: Found a stick on the ground on the day of the assignment and called it a statistick
[D]ata discussion (second and final drafts only)
2 Thorough discussion of: the data, data outliers (if any), potential implications of the data, ideas for future extensions or expansion of the data
1Weak or imcomplete discussion
[F]ormat (second and final drafts only)
2 Document is well laid out, table columns are have head with label and units, table head aligned with data cell contents, tables and cells have borders, appearance of having been done in a word processing program
1Minor format issues
For any of the above…
0Completely missing the mark for that item

Sample report

This report would be done in a word processing program such as OpenOffice.org Writer or Microsoft Word, with tables and charts cut and pasted in from a spreadsheet program such as Calc or Excel. The report below includes material such as confidence intervals which would not appear until the second draft or final report.

Jump Rope Contest Statistical Report

Data gathered and analyzed by: Dana Lee Ling

Data
Jumps
102
79
68
66
61
69
42
45
79
22
43
13
24
10
11
107
17
34
8
20
58
26
45
40
111
105
213

On Thursday 08 November 2007 a jump rope contest was held at a local elementary school festival. Contestants jumped with their feet together, a double-foot jump. The data seen in the table is the number of jumps for twenty-seven female jumpers. Participants jumped without stopping until they missed a jump, fouled on their rope, or stopped of their own accord. This was a solo jump rope contest, jumpers spun their own rope. The jumpers ranged in age from approximately six years old to twelve years old.

If a jumper made two or more attempts, only her highest number of jumps was retained in the data table. This was also the procedure used on the sheets of paper on the wall at the school on the day of the contest. Data was gathered by the author from the sheets on the wall, not from his own counts. The names of the jumpers were not recorded as the jumpers were minors. The table includes all of the young women who jumped that day.

Note that while the number of jumps data is discrete, the range and diversity of values permit treatment of the data as if it were continuous data. This data represents a convenience sample and is not a random sample of rope jumpers in general.

StatisticValue (Jumps)
1. sample size n27
2. minimum8
3. maximum213
4. range205
5. midrange110.5
6. mode45
7. median45
8. sample mean x56.22
9. standard deviation sx44.65
10. coefficient of variation0.79
11. class width for five bins 41

The large coefficient of variation indicates that the jump data is spread away from the mean. Jumpers are inconsistent in the number of jumps attained, the data shows a lot of variability.

The maximum of 213 jumps is an outlier (z-score = 3.51), an unusually high number of jumps for that day. A highly proficient jumper who had made a previous attempt and achieved 102 consecutive jumps returned and accomplished 213 consecutive jumps.

12. Histogram table

Bins (x)Frequency fRF p(x)
49150.56
9070.26
13140.15
17200.00
21310.04
Sums:271.00

13. Histogram chart

SVG histogram of jump rope relative frequency grid axes histogram columns text layers Jump distribution relative frequency histogram data class upper limits (jumps) relative frequency y-axis labels 0.0 0.1 0.2 0.3 0.4 0.5 0.6 x-axis labels 49 90 131 172 213

14. Histogram shape: skewed right, bimodal

The skewed histogram illustrates that jump counts above 131 jumps are very rare events.

StatisticValue (Jumps)
15. Standard error SE8.59
16. tcritical for a confidence level of 95%: 2.06
17. Margin of error E:17.66
18.The 95% confidence interval for this data is: 38.56 ≤ μ ≤ 73.88

The population mean can be estimated to be in the range between 39 and 74 jumps. Observations suggested that faster jumpers achieved higher jump counts than slower jumpers. Future research could examine whether the jump rate is correlated to the total number of jumps.