Stats - Large Data Sets
What data does Pearson’s large data set contain?
Information from weather stations.
When did the weather stations in Pearson’s large data set record data?
- May-Oct 1987
- May-Oct 2015
What does tr/trace mean in relation to total rainfall?
Less than 0.05mm rain.
To what accuracy are the units for total sunshine?
The nearest tenth of an hour.
What are the _ _ two _ _ different units for daily mean windspeed?
$\text{knots}/\text{kn}$ or the beaufort scale.
What are the units for humidity?
A percentage of water-air saturation.
What does 100% represent in terms of humidity?
The maximum amount of water saturation in the air.
What is an Okta?
Describes the amount of cloud coverage in the sky.
One Okta represents what fraction of the sky obscured by cloud?
How is mean visibility measured?
The amount that can be seen into the horizon during daylight.
What are the two ways wind direction is measured?
- A cardinal direction
- A bearing
What two data points about wind direction are recorded (not units)?
- Max gust direction
- Average wind direction
2022-05-04
What shouldn’t you say as a reason why a simple random sample might not give the desired number of samples?
Don’t say it’s because there might be repeats.
If a small sample of large data set seems to agree with a hypothesis, should you jump to the conclusion that it’s good evidence for that hypothesis being true?
No, the sample size might be too small.
2022-05-08
What is mnemonic to remember the weather stations in the large data set?
Look! LLarge HHadron Collider, Bussin’ Job Piranha!
Is cloud cover (in Oktas) a discrete or a continuous variable?
Discrete.
Using the mnemonic, can you list all weather stations in the large data set?
- Leuchars
- Leeming
- Heathrow
- Hurn
- Camborne
- Beijing
- Jacksonville
- Perth
2022-05-16
What is a bad reason for stating you can’t model a variable in the large data set?
Saying that it is discrete.
Why can’t you model the wind speed (using the Beaufort scale) with a normal distribution for the large data set?
Because it is non-numeric (not because it is discrete!!)
Why can’t you model the daily rainfall with a normal distribution for the large data set?
Because the distribution is not bell shaped.
What’s another reason apart from values being non-numeric that a variable can’t be modelled using a normal distribution?
The actual distribution that describes it might not be symmetric.
2022-05-17
What fairly obvious thing do you get less of the sunnier it is?
Rainfall!