This assignment covers material in chapters 16–18 of the Unit Notes.  All statistical analysis and graphs must be done using SPSS.  Data for Q1 and Q2 are given in each question and must be typed into SPSS. The data file for Q3 is available on the Data Files page on LMS.  Relevant SPSS output must be included with your assignment. Total marks = 54.
1. [12 marks] A single gene controls two human physical characteristics: the ability to roll one’s tongue (or not) and whether one’s ear lobes are free of (or attached to) the neck.  Genetic theory says that people will have neither, one, or both of these traits in the ratios 1:3:3:9.  A class of Biology students collected data on themselves and reported the following frequencies: Tongue, Earlobe Non-curling, Attached Curling, Attached Non-curling, Free Curling, Free Count 10 22 31 59 Does the distribution among these students appear to be consistent with genetic theory?  Answer by testing at appropriate hypothesis at a 5% significance level.
2. [15 marks] Hepatitis C is a blood-borne infection with potentially serious consequences. Identification of social and environmental risk factors is important because Hepatitis C can go undetected for years after infection. A study conducted in Texas in 1991-2 examined whether the incidence of hepatitis C was related to whether people had tattoos and where they obtained their tattoos.  Data were obtained from existing medical records of patients who were being treated for conditions that were not blood-related disorders.  The patients were classified according to hepatitis C status (whether they had it or not) and tattoo status (tattoo from tattoo parlour, tattoo obtained elsewhere, or no tattoo). The data are summarised in the following table. Tattoo?  Has Hep C No Hep C Tattoo (parlour) 17 35 Tattoo (elsewhere) 8 53 No tattoo 22 491 (a) In any association between hepatitis C status and tattoo status, which variable would be the explanatory variable? Justify your answer. [2] (b) If a simple random sample is not available, a sample may be treated as if it was randomly selected provided that the sampling process was unbiased with respect to the research question.  On the information provided above, is it reasonable to treat the data as a random sample for the purposes of investigating a possible relation between tattoos and hepatitis C?  Briefly discuss. [2] (c) Assuming that any concerns about data collection can be resolved, evaluate the evidence that hepatitis C status and tattoo status are related in the relevant population.  If you conclude that there is a relationship, describe it.  Use a 1% significance level. [11]
3. [27 marks] The Framingham Study was planned as a 20-year cohort study of adult health in Framingham, Massachusetts, USA, commencing in 1948.  A total of 1406 adults aged 45 to 62 were selected from regional census data, using systematic sampling.  Fourteen variables were recorded over the period of the study, and the data are available in the file Framingham.xlsx.  For the purposes of this question, you are investigating the relationship, if any, between the following variables: AGE The person’s age last birthday (i.e., whole years only) CHOL Cholesterol level in milligrams per decilitre (mg/dL) (a) Obtain the following SPSS output, treating AGE as the explanatory variable: (i) scatter plot [3] (ii) standardised residual plot [2] (iii) distribution of residuals [3] (iv) regression analysis. [2, total = 10] (b) What statistical model is assumed in the regression analysis? [2] (c) As far as possible, assess the extent to which the model is appropriate to the data. [4] (d) Using a suitable hypothesis test at a 5% significance level, evaluate the evidence that average Cholesterol levels are related to Age in the underlying population.  If you find that there is a relationship, describe it. [5] (e) Provide a 90% prediction interval for the cholesterol level that can be expected in a 52-year -old person from this region. [4] (f) Briefly discuss whether a linear relationship with Age provides a useful basis for predicting Cholesterol levels in this region, for adults within the age range of the data. This assignment is suppose to do in SPSS software in computer. And refrence is not required as its all calculations. Questions 1 and 2 informations are given in questions but for question 3 there is another file uploaded for the information

