Ideas under consideration due at start of fourth week of class (5 points)
First draft due at midterm (25 points)
Second draft due at time of test three, must be an integrated word processing document (25 points)
Final draft as integrated word processing document due roughly a week prior to the end of the term. (40 points)
We all walk in an almost invisible sea of data. I walked into a school fair and noticed a jump rope contest. The number of jumps for each jumper until they fouled out was being recorded on the wall. Numbers. With a mode, median, mean, and standard deviation. Then I noticed that faster jumpers attained higher jump counts than slower jumpers. I saw that I could begin to predict jump counts based on the starting rhythm of the jumper. I used my stopwatch to record the time and total jump count. I later find that a linear correlation does exist, and I am able to show by a t-test that the faster jumpers have statistically significantly higher jump counts. I later incorporated this data in the fall 2007 final.
I walked into a store back in 2003 and noticed that Yamasa soy sauce appeared to cost more than Kikkoman soy sauce. I recorded prices and volumes, working out the cost per milliliter. I eventually showed that the mean price per milliliter for Yamasa is higher than Kikkoman. I also ran a survey of students and determined that students prefer Kikkoman to Yamasa.
My son likes articulated mining dump trucks. I find pictures of Terex dump trucks on the Internet. I write to Terex in Scotland and ask them about how the prices vary for the dump trucks, explaining that I teach statistics. "Funny you should ask," a Terex sales representative replied in writing. "The dump trucks are basically priced by a linear relationship between horsepower and price." The representative included a complete list of horsepower and price.
One term I learned that a new Cascading Style Sheets level 3 color specification for hue, luminosity, and luminance was available for HyperText Markup Language web pages. The hue was based on a color wheel with cyan at the 180° middle of the wheel. I knew that Newton had put green in the middle of the red-orange-yellow-green-blue-indigo-violet rainbow, but green is at 120° on a hue color wheel. And there is no cyan in Newton's rainbow. Could the middle of the rainbow actually be at 180° cyan, or was Newton correct to say the middle of the rainbow is at 120° green? I used a hue analysis tool to analyze the image of an actual rainbow taken by a digital camera here on Pohnpei. This allowed an analysis of the true center of the rainbow.
While researching sakau consumption in markets here on Pohnpei I found differences in means between markets, and I found a variation with distance from Kolonia. I asked some of the markets to share their cup tally sheets with me, and a number of them obliged. The data proved interesting.
The point is that data is all around us all the time. You might not go into statistics professionally, yet you will always live in a world filled with numbers and data. For one sixteen week term period in your life I want you to walk with an awareness of the data around you. At midterm you will turn in a proposed ratio level data set with basic statistics. You pick the data - you decide on the sample. At term's end you will add a 95% confidence interval for your data set and turn in a final, completed project.
Numbers flow all around you. A sea of a data pours past your senses daily. The world is numbers. Watch for numbers to happen around you. See the matrix. When you observe numbers happening, record them.
Ratio level data is preferable. If you opt to work with nominal or ordinal level data, please meet with your instructor for guidance and advice on how to best proceed.
Find something original, something unique to your life. Avoid doing a project on an example used in class such as favorite color, car counts, step counts, or other in-class examples.
Statistics to report in the first, second, and final draft vary according to the level of measurement of the projext. Ratio level data is preferred.
Note that at the nominal level, the data is usually reported in a frequency table. That is, the data table and the frequency table are one and the same table.
The items below will appear in the final draft. The corresponding material is not covered until after midterm. If confidence intervals are done by test two, then the second draft should include confidence intervals.
If you found two variable data, then perform a linear regression on the data. Report the slope, intercept, and correlation. Whether or not basic statistics should be reported for one of the variables depends on whether statistics such as the mode, median, and mean have a "meaning" to the study. In most cases the meaning or impact of the study is in the relationship (slope, intercept, r) and not in the basic statistics.
Start of week four [5] | ||
---|---|---|
5 | Submission of description of project, potential sample and sampling method. Should include the appropriate answers to the who, what, where, when, why, and how questions concerning the project and the data to be gathered. In effect, a project topic statement. |
First draft at test two/midterm [25] | 5 | Description of project concept, sample, and sampling method. Should include the appropriate answers to the who, what, where, when, why, and how questions concerning the data. What was measured? How was it measured? When? Where? If people were involved, who were they and why were they selected? Is the sample a good random sample or a convenience sample? |
---|---|---|
5 | Sample data recorded and reported | |
Single variable x data | Two variable x,y data | |
5 | Basic statistics reported (see list above) | Slope, intercept, r, r² |
5 | Frequency table | XY Scattergraph |
5 | Histogram chart (done correctly) |
Second draft at test three: first draft requirements plus: [30] | ||
---|---|---|
5 | Turned in as a report done using word processing software, document is well laid out, text is double spaced, tables are single spaced, table columns are have head with label and units, table head aligned with data cell contents, tables are appropriately separated, tables and cells have borders |
Final draft includes first and second draft requirements plus optional bonus: [35] | ||
---|---|---|
+5 | For a study using unique and original data that required planning, forethought, and a sustained effort over time to acquire; data that has a variety of values, a study that has thoughtful implications; a thorough study that includes a discussion data outliers (if any) and potential future extensions or impacts of the study. |
This report would be done in a word processing program such as OpenOffice.org Writer or Microsoft Word, with tables and charts cut and pasted in from a spreadsheet program such as Calc or Excel. The report below includes material such as confidence intervals which would not appear until the second draft or final report.
Jump Rope Contest Statistical Report
Data gathered and analyzed by: Dana Lee Ling
Jumps |
---|
102 |
79 |
68 |
66 |
61 |
69 |
42 |
45 |
79 |
22 |
43 |
13 |
24 |
10 |
11 |
107 |
17 |
34 |
8 |
20 |
58 |
26 |
45 |
40 |
111 |
105 |
213 |
On Thursday 08 November 2007 a jump rope contest was held at a local elementary school festival. Contestants jumped with their feet together, a double-foot jump. The data seen in the table is the number of jumps for twenty-seven female jumpers. Participants jumped without stopping until they missed a jump, fouled on their rope, or stopped of their own accord. This was a solo jump rope contest, jumpers spun their own rope. The jumpers ranged in age from approximately six years old to twelve years old.
If a jumper made two or more attempts, only her highest number of jumps was retained in the data table. This was also the procedure used on the sheets of paper on the wall at the school on the day of the contest. Data was gathered by the author from the sheets on the wall, not from his own counts. The names of the jumpers were not recorded as the jumpers were minors. The table includes all of the young women who jumped that day.
Note that while the number of jumps data is discrete, the range and diversity of values permit treatment of the data as if it were continuous data. This data represents a convenience sample and is not a random sample of rope jumpers in general.
Statistic | Value (Jumps) |
---|---|
1. sample size n | 27 |
2. minimum | 8 |
3. maximum | 213 |
4. range | 205 |
5. midrange | 110.5 |
6. mode | 45 |
7. median | 45 |
8. sample mean x | 56.22 |
9. standard deviation sx | 44.65 |
10. coefficient of variation | 0.79 |
11. class width for five bins | 41 |
The large coefficient of variation indicates that the jump data is spread away from the mean. Jumpers are inconsistent in the number of jumps attained, the data shows a lot of variability.
The maximum of 213 jumps is an outlier (z-score = 3.51), an unusually high number of jumps for that day. A highly proficient jumper who had made a previous attempt and achieved 102 consecutive jumps returned and accomplished 213 consecutive jumps.
12. Histogram table
Bins (x) | Frequency f | RF p(x) |
---|---|---|
49 | 15 | 0.56 |
90 | 7 | 0.26 |
131 | 4 | 0.15 |
172 | 0 | 0.00 |
213 | 1 | 0.04 |
Sums: | 27 | 1.00 |
13. Histogram chart
14. Histogram shape: skewed right, bimodal
The skewed histogram illustrates that jump counts above 131 jumps are very rare events.
Statistic | Value (Jumps) |
---|---|
15. Standard error SE | 8.59 |
16. tcritical for a confidence level of 95%: | 2.06 |
17. Margin of error E: | 17.66 |
18.The 95% confidence interval for this data is: | 38.56 ≤ μ ≤ 73.88 |
The population mean can be estimated to be in the range between 39 and 74 jumps. Observations suggested that faster jumpers achieved higher jump counts than slower jumpers. Future research could examine whether the jump rate is correlated to the total number of jumps.