Titanic_Survival_Analysis

Titanic Case Study:

Analyzing Titanic Survival Variables

A comprehensive analysis of the Titanic disaster, focusing on identifying key factors that influenced passenger survival rates using machine learning and data visualization techniques. Date: 17/Jun/2024

Summary

Introduction:

Titanic Photo

Title: Analyzing Titanic Survival Variables

Industry_Focus: Maritime Safety and Transportation

Problem Statement: To identify the key factors that influenced passenger survival during the Titanic disaster.

Business Use-Case: Improving safety measures and emergency protocols for maritime travel by understanding the variables that affected survival rates in a historical context. This analysis aims to provide insights that can be applied to modern safety regulations and passenger management.

Goals Metrics:

Deliverables:

Dataset List:

Websites to scrape data_needed:

Data_Preparation

1. Cleaning and Processing Data

The main dataset is the train.csv file to be used to build your machine learning models.

Input

import pandas as pd

# Load the CSV file using a raw string
file_path = r"train.csv"
data = pd.read_csv(file_path)

# Display the headers of the table
print(data.columns)

Output

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp','Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

Some column headers are not clear, so data dictionary was searched and these are the following information on the collumn headers.

2. Handling Missing Values and Outliers

3. Creating derived variables (e.g., family size, fare per person)