Exploratory Data Analysis Projects
Top 100 IMDb Movie Analysis
IMDb Movie project explores IMDb movie data through detailed exploratory data analysis (EDA). It involves cleaning, processing, and visualizing key movie attributes such as ratings, genres, and box office performance. The analysis uncovers trends in movie success, factors influencing audience ratings, and insights into the film industry. Using Python (Pandas, Matplotlib, and Seaborn), the project provides data-driven conclusions about the dynamics of popular movies.
Credit Approval/Disapproval Analysis
This credit approval analysis project employs Exploratory Data Analysis (EDA) to identify key factors influencing loan defaults for a consumer finance company. Through comprehensive data cleaning, outlier detection, and feature engineering, I uncovered critical risk indicators including income-credit mismatches, employment status anomalies, and external credit score thresholds. My analysis revealed that applicants with credit amounts exceeding 4x their annual income, unemployed individuals, and those with low external credit scores presented the highest default risks. The project delivers actionable insights that enable the company to optimize their loan approval process, minimize financial losses, and maintain a healthy portfolio while approving creditworthy applicants.
Titanic Dataset Analysis
The Titanic Dataset Analysis explores key factors influencing passenger survival using Exploratory Data Analysis (EDA) and machine learning preparation. Through data cleaning, feature engineering, and statistical tests, the project identifies Sex, Pclass, and Embarked as the most significant predictors of survival. Key insights reveal that females, first-class passengers, and those from Cherbourg had higher survival rates, while males in third class faced the highest fatalities. The dataset was successfully processed for predictive modeling, making it a solid foundation for classification tasks.
World Population Analysis
This project explores global population trends using the World Bank's dataset covering 266 countries/regions from 1960 to 2023. The analysis identifies key growth patterns, demographic shifts, and regional variations while assessing the impact of historical events like pandemics and conflicts. Key insights include India surpassing China as the most populated country, Africa's rapid growth, and Europe's population decline due to aging demographics. Urbanization trends, density variations, and migration effects are also analyzed. The dataset was cleaned, processed, and scaled for better readability, making it a valuable resource for population forecasting and policy planning.