Category Archives: Deliverable

Week 7 Lab 8

Week 7, Lab 8

For this deliverable, you should draw a curve by hand.
Look at this lab’s videos to understand how to carry out the exercises.

Due before next class, on Wednesday 10/14

This deliverable comes from the Open Intro Statitstics Book, p. 142

Respond to the questions by answering to the post.

1. What percent of a standard normal distribution N (μ = 0, σ = 1) is
found in each region? Hint! Be sure to draw a graph and use the distribution tables on the Resources page.
(a) Z score −1.35

(b) Z score above 1.48

(c) Between Z score of −0.4 and Z score of 1.5

(d) Z score below – 2 and Z score above 2

2. Sophia who took the Graduate Record Examination (GRE) scored 160 on the Verbal Reasoning section and 157 on the Quantitative Reasoning section. The mean score for the Verbal Reasoning
section for all test takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning was 153 with a standard deviation of 7.67. Suppose that both distributions are nearly normal.
(a) Write down the short-hand for these two normal distributions.
(b) What is Sophia’s Z-score on the Verbal Reasoning section? On the Quantitative Reasoning section?
Draw a standard normal distribution curve and mark these two Z-scores.
(c) What do these Z-scores tell you?
(d) Relative to others, which section did she do better on?
(e) Find her percentile scores for the two exams.
(f) What percent of the test takers did better than her on the Verbal Reasoning section? On the Quantitative
Reasoning section?
(g) Explain why simply comparing raw scores from the two sections could lead to an incorrect conclusion
as to which section a student did better on.

Week 12 Lab 13

Week 12, Lab 13

Deliverable for Hypothesis Tests for 2-samples

For this lab, you will be using the formulas for 2-sample Means and Proportions, and the Z score tables for one and two tails. The formulas and the table are listed at the end of this page.

Respond to all following questions

Q1. A representative survey of New York City reports on attitudes toward premarital sex by gender. Female and male respondents reported different rates of disapproval, and we want to know if those different disapproval rates are statistically significant.

Carry out a 2-sample hypothesis testing for proportions to know if female disapproval rate of 40% (N=450) is statistically different from male disapproval rate of 35% (N=417). Use an alpha level of 0.05.

Obs! Your answer should include the null hypothesis, and what the result means.

Q2. What would your answer be if the Alpha level was of 0.10?

Q3. A scale measuring satisfaction with family life has been administered to a random sample of married respondents. The sample is divided into respondents with no children (N=120) and with children (N=98). Those with no children reported an average satisfaction score of 14.5, with a standard deviation of 0.6. Those with children reported an average satisfaction score of 12.6, with a standard deviation of 0.5.

While it looks like there is a difference, is this difference statistically significant? Use an alpha level of 0.10.

Obs! Your answer should include the null hypothesis, and what the result means.

Q4. What would your answer be if the Alpha level was of 0.01?


Formula for 2-sample hypothesis test for Means

Formula for 2-sample hypothesis test for Proportions

Alpha level Z scores for One or Two tailed tests



Week 11 Lab 12

Week 11, Lab 12

One sample Hypothesis testing for Means and Proportions

For this lab we need to have the following inputs to work with.

Formua one-sample hypothesis testing for means

Formula one-sample hypothesis testing for proportions

Alpha level Z scores for One or Two tailed tests

Respond to all following questions

Q 1. For each of the following situations, state whether the parameter of interest is a mean or a proportion. It may be helpful to examine whether individual responses are numerical or categorical.


(a) In a survey, one hundred college students are asked how many hours per week they spend on the Internet.
(b) In a survey, one hundred college students are asked: “What percentage of the time you spend on the Internet is part of your course work?”
(c) In a survey, one hundred college students are asked whether or not they cited information from Wikipedia in their papers.
(d) In a survey, one hundred college students are asked what percentage of their total weekly spending is on alcoholic beverages.
(e) In a sample of one hundred recent college graduates, it is found that 85 percent expect to get a job within one year of their graduation date.

Q 2. For each of the following situations, state whether the parameter of
interest is a mean or a proportion.


(a) A poll shows that 64% of Americans personally worry a great deal about federal spending and the budget deficit.
(b) A survey reports that local TV news has shown a 17% increase in revenue within a two year period while newspaper revenues decreased by 6.4% during this time period.
(c) In a survey, high school and college students are asked whether or not they use geolocation services on their smart phones.
(d) In a survey, smart phone users are asked whether or not they use a web-based taxi service.
(e) In a survey, smart phone users are asked how many times they used a web-based taxi service over the last year.

Q. 3 A Pew Research sample shows that 37% of American adults support increased usage of coal. We want to know if this proportion is different from census information that showed that a proportion of 50% of the population support increased use of coal. Is the difference real using an Alpha level of 0.05?

Q. 4 State the null and the research hypothesis

The state of Wisconsin would like to understand the fraction of its adult residents that consumed alcohol in the last year, specifically if the rate is different from the national rate of 70%. To help them answer this question, they conduct a random sample of 852 residents and ask them about their alcohol consumption. How would you state a null and a research hypothesis to analyze if the average rate of Wisconsin is different from the national rate of the population?

Q. 5 A simple random sample of 1028 US adults in March 2013 show that 56% support nuclear arms reduction. Previous information from population data showed that only half of the population supported arms reduction. Is there evidence that the difference between the proportions are significant at a 5% significance level?

Also, does the data indicate that a majority of Americans supported nuclear arms reduction at the moment of the survey?

Week 9 Lab 10

Week 9, Lab 10

Confidence intervals

Due by Mon, Oct 26

For this deliverable, you will be using the formula for calculating Confidence Intervals for sample proportions.

Remember that this formula includes three elements:

  1. The sample proportion.
  2. The Z value that corresponds to the alpha value or confidence level
  3. The Standard Error or Sample Deviation of the standard distribution

For the Z values, it is handy to use the following table that includes the values associated with the most common confidence levels used.

Q.1 In New York City on October 23rd, 2014, a doctor who had recently been treating Ebola patients in Guinea went to the hospital with a slight fever and was subsequently diagnosed with Ebola. Soon thereafter, an NBC 4 New York/The Wall Street Journal/Marist Poll found that 82% of New Yorkers favored a “mandatory 21-day quarantine for anyone who has come in contact with an Ebola patient”. This poll included responses of 1,042 New York adults between Oct 26th and 28th, 2014.

A) Knowing the proportion of the sample, what is the proportion in the total population of New York adults that supported a quarantine for anyone who has come into contact with an Ebola patient. Use a 95% confidence level. Write your answer.

Q.2 In 2013, the Pew Research Foundation reported that “45% of U.S. adults report that they live with one or more chronic conditions”. However, this value was based on a sample, so it may not be a perfect estimate for the population parameter. The study reported a standard error of about 1.2%, and a normal model may reasonably be used in this setting. Create a 95% confidence interval to estimate the proportion of U.S. adults who live with one or more chronic conditions. Interpret what the confidence interval tells us.

Week 8 Lab 8 cont

Week 8, Lab 8 cont

This deliverable continues with Z-score excercises introduced in last week’s lab

This deliverable comes from the Open Intro Statitstics Book, p. 142

Respond to the questions by answering to the post.

1. In triathlons, it is common for racers to be placed into age and gender groups.
Friends Leo and Mary both completed the Hermosa Beach Triathlon, where Leo competed in the Men, Ages
30 – 34 group while Mary competed in the Women, Ages 25 – 29 group. Leo completed the race in 1:22:28
(4948 seconds), while Mary completed the race in 1:31:53 (5513 seconds). Obviously Leo finished faster,
but they are curious about how they did within their respective groups. Can you help them? Here is some
information on the performance of their groups:
• The finishing times of the Men, Ages 30 – 34 group has a mean of 4313 seconds with a standard
deviation of 583 seconds.
• The finishing times of the Women, Ages 25 – 29 group has a mean of 5261 seconds with a standard
deviation of 807 seconds.
• The distributions of finishing times for both groups are approximately Normal.
Remember: a better performance corresponds to a faster finish.

a) Write down the short-hand for these two normal distributions.
b) What are the Z-scores for Leo’s and Mary’s finishing times? What do these Z-scores tell you?
c) Did Leo or Mary rank better in their respective groups? Explain your reasoning.
d) What percent of the triathletes did Leo finish faster than in his group?
e) What percent of the triathletes did Mary finish faster than in her group?
f) If the distributions of finishing times are not nearly normal, would your answers to parts (b) – (e) change? Explain your reasoning.

2. The average daily high temperature in June in LA is 77 ◦ F with a standard deviation of 5 ◦ F. Suppose that the temperatures in June closely follow a normal distribution.


(a) What is the probability of observing an 83 ◦ F temperature or higher in LA during a randomly chosen day in June?
(b) How cool are the coldest 10% of the days (days with lowest average high temperature) during June in LA?

Week 6 Lab 7

Week 6, Lab 7

For this deliverable, you will work on an
Excel but you should post your answers here

Due before class on Monday 10/5 (the class following next class)

OBS! Like with the previous deliverable, you are expected to work in Excel to answer all the questions, but you do not need to upload your Excel document. You should post all answers as a response to this post.

Instructions

I. Download the database named “Lab_6_the-counted-2016_simplified.csv”. You have already used this simplified version of the database “The Counted”. Next, you should open and compute the data in Excel, but answer the questions in this post.

II. Download the instructions and follow them in order to complete the computation. They are called “Lab7 Measures of dispersion.docx”.

III. After computing all requested tasks in Excel, please answer the questions below.

Questions (answer all of them and consult Lab 7 video to complete the answers)

1. Are there many modes for the age variable? Which ones are they/is it? What does it mean?

2. Is the age variable skewed? If so, how and why?

3. What are the minimum, maximum and range values of age?

4. What is the interquartile range? What does this mean? Hint: watch the video to inform your explanation.

5.) What is the standard deviation of the age variable? How do you interpret the standard deviation?


Week 5 Lab 6 Deliverable

Week 5 Lab 6 Deliverable

For this deliverable, you will use the skills you have learned in Excel to process data.

Due before next class.

OBS! For this deliverable, you are expected to work in Excel to answer all the questions, but you do not need to upload your Excel document.

In this lab, you will use Excel to calculate measures of central tendency using The Guardian’s data set that compiles police killings in the U.S. in 2016. You will compute the mode, median, mean, and percentile commands in Excel.

Instructions

Download the database named “Lab_6_the-counted-2016_simplified.csv”. Open the data in Excel. It’s a more simplified version of the data you used in the last lab. Compute the data in Excel but answer the questions in this post.

Questions (answer all of them)

1. What is the most common age of those killed by law enforcement in this data set? In any blank cell, type =mode(C:C) to get the answer.

2. What is the most common day of the month reported in this data set?

3. What is the typical age of those killed by law enforcement? In any blank cell, type =median(C:C) to get the answer. What does this mean?

4. What is the typical day of the month reported in the data set? What does this mean?

5. What is the average age of those killed by law enforcement? In any blank cell, type =average(C:C) to get the answer.

6. Is the age variable skewed in these data? How do you know? Hint: Today’s lecture video shows how to tell if your data is skewed.

7. What age is the 25th percentile in these data? In any blank cell, type =percentile(C:C, .25) to get the answer. What does this mean?

Week 5 Lab 5 Cont

Week 5 Lab 5 cont.

This deliverable is a hands-on task that you should complete in an Excel file and upload

Submit an Excel file as the deliverable

To complete today’s deliverable (due before next class), please download the database below and replicate today’s lecture on how to create graphs in Excel.

The database below is the deliverable from Lab 4.

After creating all charts, rename, save and upload your file to the upload page.

At the end of the deliverable, you should have an Excel file with the following charts:

  • One Bar chart of Country of Birth
  • One Pie chart of Gender
  • One Column chart of Major
  • One Pie chart of Liking stats or not
  • Three column charts of Liking stats by gender (one clustered, one stacked, one 100% stacked)
  • One Scatterplot of the Age variable

In total, the deliverable Excel file will have 8 charts.


Week 4 Lab 5 Deliverable

Week 4 Lab 5 Deliverable

This deliverable is a hands-on task that you should complete in an Excel file and upload

Due before next class

Submit an Excel file as the deliverable

To complete today’s deliverable (due before next class), please download the document below and follow the instructions.

Instructions for the deliverable are here:

You should complete the whole excercise in an Excel file.

After you complete the required tasks, save your work and name it with your name and the name of the lab (Lab#-Yourname.xlsx).

Next, please upload the resulting Excel file through the “Uploads” page.

Where to find the database?

The deliverable uses a database created by the newspaper The Guardian. To download and work on the database, go to the “Resources” page.

You can also download it here:

Important hints!!!

All data is located in the Excel sheet called “in”.

The range of the data: data starts in row 2 and ends in row 1094.

More information about the database

The database was created by a research project called “The Counted” by The Guardian. You can visit the project page in this link:

https://www.theguardian.com/us-news/ng-interactive/2015/jun/01/the-counted-police-killings-us-database

For more details about the database, visit this link:

https://www.theguardian.com/us-news/ng-interactive/2015/jun/01/about-the-counted


Week 3 Lab 3 Deliverable

Week 3 Lab 3 Deliverable

Each group should answer all questions together and submit one survey and one group response.

Include a header with the names of all group members.

Due before next class

Creating a Survey on Intimate Partner Violence

In this lab, you will design a survey and identify the levels of measurement of your variables.

Your research question is:

What is the prevalence of Psychological/Emotional Intimate Partner Violence in Brooklyn?

Assume you will give this survey to a representative sample of household members in Brooklyn.

1. Create a survey using Google Forms that will give you data to answer your research question. Make sure you follow the best practices we have discussed in class.

Your survey should have between 7 and 10 questions, including socio-demographic information. Consult the “Intimate Partner Violence and Sexual Violence Victimization Assessment Instruments for Use in Healthcare Settings”, by the CDC for the questions on emotional or psychological violence (download it from the Resources page)

Send the survey link to [email protected]

2. Identify the unit of analysis of the research and list all your questions here. For each question identify whether it is numerical or categorical, and what response options there are.

Sampling Strategy (who you will give the survey to). Answer all the following questions:

3. What is your study population?

4. Which representative sampling strategy will you use? Why?

5. Discuss in detail how you will draw your sample (your procedure.)


Week 2 Lab 2 Deliverable

Week 2 Lab 2 Deliverable

Answer all questions in your own words. Be as short and precise as you can.


Due before next class

Questions

  1. The Stanford Open Policing project gathers, analyzes, and releases records
    from traffic stops by law enforcement agencies across the United States. Their goal is to help researchers, journalists, and policymakers investigate and improve interactions between police and the public. The following is an excerpt from a summary table created based off of the data collected as part of this project.

a) What variables were collected on each individual traffic stop in order to create to the summary table above?

b) State whether each variable is numerical or categorical. If numerical, state whether it is continuous or discrete. If categorical, state whether it is ordinal or not

c) Suppose we wanted to evaluate whether vehicle search rates are different for drivers of different races. In this analysis, which variable would be the response variable and which variable would be the explanatory variable?


2. A study is designed to test the effect of light level on exam performance
of students. The researcher believes that light levels might have different effects on males and females, so wants to make sure both are equally represented in each treatment. The treatments are fluorescent overhead lighting, yellow overhead lighting, no overhead lighting (only desk lamps).

a) What is the response variable?

b) What is the explanatory variable? What are its levels?


3. What type of variable is telephone area code? Choose only one answe

a) Numerical, continuous

b) Numerical, discrete

c) Categorical

d) Categorical, ordinal