Python data analysis project | Computer Science homework help

Introduction

Use five-fold mode and separate the data into training and testing sets.

 

 

Data Dictionary description

 

Variable      Definition                                           Key
survival       Survival                                               0 = No, 1 = Yes
pclass          Ticket class                                         1 = 1st, 2 = 2nd, 3 = 3rd
sex              Sex              
Age             Age in years                   
sibsp           # of siblings/spouses
parch          # of parents/children
fare             Passenger fare              
embarked  Port of Embarkation                         C = Cherbourg, Q = Queenstown, S = Southampton
 
 

Variable Notes

pclass: A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower
age: Age is fractional if less than 1. 
sibsp: The dataset defines family relations in this way...
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way...
Parent = mother, father
Child = daughter, son, stepdaughter, stepson
Some children travelled only with a nanny, therefore parch=0 for them.
 

Analysis requirement:

1. How is the survival chance related to the gender?
2. How is the survival chance related to age?
3. How is the survival chance related to socio-economic status?
5. What are the first three most important factors related to the survival chance?
6. What is average prediction accuracy?