Coastal damage • Name:

Data
Wave
height (m)
0.5
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.6
1.7
1.7
1.8
1.8
1.9
1.9
2.0
2.1
2.1
2.2
2.2
2.2
2.2
2.3
2.3
2.4
2.4
2.5
2.6
3.0
3.5

Part I: Basic Statistics

A coastal engineering consultant for the state of Kosrae, Doug Ramsay, released a report in May 2008 on the November 2007 to February 2008 coastal damage in Kosrae. The text in this final comes directly from that report. The data is based on data presented in the report. The data, however, has been interpreted from graphs and charts and is not the original data. The data is effectively "based on real events" but is not the actual original data. The data presented in this final may be altered in significant ways from the actual data.

"Over the period between November 2007 and February 2008 Kosrae experienced high sea levels which caused severe flooding of coastal land and further coastal erosion, particularly along the southern coastline of Malem."

The data in this first section of the final examination derives from Figure 6 of the report. The figure depicts the frequency of wave heights in meters. Wave heights on the coast of Kosrae are generally between 2 and 2.5 meters. "...this is fairly typical of wave conditions experienced off the north east coast of Kosrae during the trade wind season. During this period significant wave heights are [almost] all less than 3 meters, suggesting no severe storm event, although significant wave heights in excess of 2 meters have coincided with high spring tides particularly in November and February and from wave directions that would have affected the Malem coast."

  1. ratio What level of measurement is the data?
  2. 30 Determine the sample size n.
  3. 0.5 Determine the minimum.
  4. 3.5 Determine the maximum.
  5. 3 Calculate the range.
  6. 3 Calculate the midrange.
  7. 2.2 Determine the mode.
  8. 1.95 Determine the median.
  9. 1.93 Calculate the sample mean x.
  10. 0.61 Calculate the sample standard deviation sx.
  11. 0.32 Calculate the sample Coefficient of Variation.
  12. 0.5 Determine the bin width. Use six classes (bins or intervals).
  13. Fill in the following table with the class upper limits in the first column, the frequencies in the second column, and the relative frequencies in the third column
Bins (x)Frequency fRF p(x)
1.020.07
1.550.17
2.090.30
2.5110.37
3.020.07
3.510.03
Sums:301.00
  1. Sketch a histogram of the relative frequency data.
  2. skew What is the shape of the distribution?
  3. 2.57 Use the sample mean x and sample standard deviation sx above to calculate the z-score for the 3.5 meter wave event.
  4. extraordinary Is the z-score for a 3.5 meter wave an ordinary or extraordinary value?
  5. -2.35 Use the sample mean x and sample standard deviation sx above to calculate the z-score for a 0.5 meter wave.
  6. extraordinary Is the z-score for a 0.5 meter wave an ordinary or extraordinary value?
  7. 0.11145 Calculate the standard error of the sample mean x.
  8. 2.0452 Find tcritical for a confidence level c of 95% for the data.
  9. 0.2279 Determine the margin of error E for the sample mean x.
  10. Write out the 95% confidence interval for the population mean μ wave height:
    p(1.71 < μ < 2.16) = 0.95
  11. Yes The population mean wave height μ for Kosrae from 1986 to 1999 was 1.9 meters. Is this a possible population mean for the wave heights of November 2007 to February 2008.
  12. No Are the wave heights of November 2007 to February 2008 statistically significantly higher than the wave heights from 1986 to 1999?

Part II: Hypothesis Testing using the t-test

19802000
114
114
114
114
114
224
224
225
225
225
225
225
325
336
336
336
337
437
438
43
43
53
53
64

In the January 2000 Doug Ramsay produced the report Reducing the impact on sand mining of Kosrae and the provision of a long-term supply of sand aggregate for construction: A proposed way forward. for the Development Review Commission. The report noted the increasing demand for sand aggregate used in construction. The increasing population of Kosrae is only one cause of the increased demand. Over the period from 1980 to 2000 the number of rooms per home has increased. Kosraens have shifted from a median of 2.5 rooms per home in 1980 to a median of 3.8 rooms per home in 2000 (page 7).

The data table is based on Figure 2 of page eight of the sand mining report. Figure 2 represents a sample of 1/25 of the homes on Kosrae in 1980 and 2000. Use these two samples to determine if the sample means are statistically significantly different. Note that the first column is the 1980 data. The second and third columns are both 2000 data, the column had to be doubled to fit on the page.

  1. 2.75 Calculate the sample mean x number of rooms per home in 1980.
  2. 3.56 Calculate the sample mean y number of rooms per home in 2000.
  3. Yes Are the sample means for the two samples mathematically different?
  4. 23 Calculate the degrees of freedom using the cound of the smaller of the two sample sizes.
  5. 0.0458 What is the p-value? Use the difference of means for independent samples TTEST function =TTEST(data_range_x;data_range_y;2;3) to determine the p-value for this two sample data.
  6. Yes Is the difference in the mean number of rooms per home statistically significant at a risk of a type I error alpha α = 0.05?
  7. Reject Would we "fail to reject" or "reject" a null hypothesis of no difference in the steps per day between the two samples?
  8. 0.9542 What is the maximum level of confidence we can have that the difference is statistically significant?

Part III: Linear Regression (best fit or least squares line)

"The most significant component of sea level is that of the astronomical tide, the actions of the moon and sun on water levels which are most commonly observed causing the daily rise and fall of water levels, and the Spring-Neap tidal cycle every two weeks." [Spring tides are the higher tides during the full and new moon, caused by the alignment of the sun, moon, and earth system. Neap tides occur at first and last quarter, when the sun, moon, earth system forms a right angle. The distance to sun also contributes to the height of the tide. The sun is closest to the earth during northern hemisphere winter (perihelion). Perihelion causes higher maximum tide heights.]

"There two further astronomical characteristics that are important in this discusion. Firstly there is the lunar cycle, known as the lunar declinational cycle which reaches a maximum twice every tropical month of 27.32 days when the moon is at a maximum angle (or declination) north of the equator and again at a maximum angle south of the equator. When the moon is at maximum declination this causes higher tides on Kosrae. Secondly, due to the elliptical orbit of the moon around the earth every 27.5 days, there are times when the moon is closer to the earth than others. When the moon's orbit is at its closest point to the earth (once every 27.5 days), this is called the lunar perigee, and again results in highter tide levels."

"When the Spring tides coincide with, or are close to, the time when the moon is in perigee and/or at its maximum lunar declination, high spring tides can be quite a bit higher than normal (often known as King tides)"

The data table presents the number of days to or from the closest lunar perigee for a spring tide and the maximum height of that spring tide. According to the paragraphs above, the The fewer the number of days between a spring tide (full or new moon) and a lunar perigee, the higher the maximum tide height. The higher the tide, the greater the erosion and thus coastal damage to the shoreline in Kosrae. This section explores whether this relationship is supported by the data derived from figure 2 of the report.

SVG xy scatter graph major grid lines x-axis and y-axis data points as circles coordinate labels for the rectangles above (11,895) (4,1038) (10,958) (3,1063) (9,1000) (3,1083) (7,958) (7,895) linear regression line text layers Days to/from lunar perigee versus maximum tide height in Kosrae Days Maximum height in mm y-axis labels 895 914 933 951 970 989 1008 1027 1045 1064 1083 x-axis labels 3.0 3.8 4.6 5.4 6.2 7.0 7.8 8.6 9.4 10.2 11.0

Data table

DaysMaximum height in mm
11895
41038
10958
31063
91000
31083
7958
7895
  1. −18.29 Calculate the slope of the linear regression (best fit line).
  2. 1109.74 Calculate the y-intercept of the linear regression (best fit line).
  3. negative Is the relation between cadence and speed positive, negative, or neutral?
  4. −0.80 Calculate the linear correlation coefficient r for the data.
  5. Strong Is the correlation none, weak/low, moderate, strong/high, or perfect?
  6. 1109.74 Use the slope and intercept to predict the maximum tide height when the spring tide coincides with lunar perigee (zero days between spring tide and lunar perigee).
  7. 5.9984 Use the slope and intercept to determine the number of days between a spring tide and lunar perigee that would produce a predicted tide height of 1000 mm.

One intention of any course is that a student should be able to learn and employ new concepts in the field even after the course is over. In a linear regression analysis a correlation coefficient near zero means no relation exists between the variables. You can run a statistical test to determine whether the correlation coefficient r is statistically signigicantly different from zero. If the difference of r from zero is statistically significant, then you will have proved that a relationship exists. If you fail to reject a null hypothesis of r equals zero, then there is no evidence in the data that cadence and speed are linked.

To run the hypothesis test, you will calculate a t-critical (tc), a t-statistic (t), and then a p-value using the t-statistic and the TDIST function.

For this test:
sample size n is the number of data pairs
t-critical: =TINV(α;n−2) where α = 0.05
t-statistic = r n 2 1 r 2
p-value: =TDIST(ABS(t-statistic);n−2;2)

Note that n−2 is used in these formulas. This is the degrees of freedom for a correlation hypothesis test.

  1. Determine the sample size n by counting the number of data pairs.
  2. Determine t-critical using an alpha of α = 0.05 and n − 2 degrees of freedom.
  3. Determine the t-statistic using the formula noted further above, remembering to use n − 2 for the degrees of freedom.
  4. Determine the p-value using the TDIST function, remembering to use n − 2 for the degrees of freedom.
  5. ________ Is the correlation between days between a spring tide and lunar perigee and the maximum tide height statistically significant?

Tables of Formulas and OpenOffice Calc functions

Basic Statistics
Statistic or ParameterSymbolEquationsOpenOffice
Square root=SQRT(number)
sample size nn=COUNT(data)
sample mean x Σx/n =AVERAGE(data)
Sample standard deviationsx or s=STDEV(data)
Sample Coefficient of VariationCV sx / x =STDEV(data)/AVERAGE(data)
Formula to calculate a z value from an x value z z-score from a sample mean =STANDARDIZE(x;x;sx)
Confidence interval statistics for a single sample
Statistic or ParameterSymbolEquationsOpenOffice
Sample sizenn=COUNT(data)
Degrees of freedomdfn − 1=COUNT(data)-1
Find a tcritical value from a confidence level c tc =TINV(1-c;df)
Standard error of a sample mean x SE standard error) =STDEV(data)/SQRT(n)
Standard error of a sample proportion p SE se_proportion =SQRT(p*q/n)
Calculate a margin of error for the mean E using tcritical and the standard error SE. E margin_error =tc*SE
Calculate a confidence interval for a population mean μ from a sample mean x and a margin of error E x - E < μ < x + E
Calculate a confidence interval for a population proportion P from a sample proportion p and a margin of error E p - E < P < p + E
Hypothesis testing for a sample mean versus a known population mean
Statistic or ParameterSymbolEquationsOpenOffice
Relationship between confidence level c and alpha α for two-tailed tests 1 − c = α
Calculate t-critical for a two-tailed test tc=TINV(α;df)
Calculate a t-statistic t t-statistic =(x - μ)/(sx/SQRT(n))
Calculate a two-tailed p-value from a t-statisticp-value = TDIST(ABS(t);df;2)
Hypothesis testing for paired data samples
Statistic or ParameterSymbolEquationsOpenOffice
Calculate a p-value for the difference of the means from two samples of paired data =TTEST(data_range_x;data_range_y;2;1)
Hypothesis testing and confidence intervals for two independent samples
Statistic or ParameterSymbolEquationsOpenOffice
Degrees of freedom (approx.)df [smaller sample n] − 1=COUNT(smaller sample)-1
Calculate t-critical for a two-tailed test tc=TINV(α;df)
Calculate the standard error SE for two independent samples SE standard error for two sample means =sqrt((sx^2/nx)+(sy^2/ny))
Calculate a margin of error E for two independent samples using tcritical and the standard error SE. E margin_error =tc*SE
Calculate the difference between two sample means xd xy =average(data set x)-average(data set y)
Calculate a confidence interval for a population mean difference μd from a sample mean difference xd and a margin of error E xd − E < μd < xd + E
Calculate a p-value for the difference of the means for two independent samples (data unpaired, independent) where the population standard deviations are unknown =TTEST(x data;y data;2;3)
Linear regression statistics
Statistic or ParameterSymbolEquationsOpenOffice
Slopeb=SLOPE(y data; x data)
Intercepta=INTERCEPT(y data; x data)
Correlationr=CORREL(y data; x data)
Coefficient of Determinationr2 =(CORREL(y data; x data))^2

Z-scores diagram