Car Crash Data Project

October 20, 2024

Why

I’ve noticed local news frequently report statistics about car crashes in my city, often claiming that "90% of accidents involve taxis." While this may be possible—since taxis are heavily used here—I’m skeptical of how accurate this number is. These assumptions impact public perception and I want to see if the data really supports this claim. That’s why I’m building this project: to gather, analyze, and verify real data to understand if taxis are truly responsible for such a high percentage of accidents. 

How

I could simply use an Excel file to store all the data, but that wouldn’t teach me anything new. Instead, I want to challenge myself and learn more about databases and data analysis. Here are the steps I plan to follow:

Phase 1: Database Setup and Basic Analytics 
Create a database in SQL Server
Write a Python script to load data
Create a draft in Power BI

Phase 2: Advanced Data Population

   ☐  Incorporate AI for data auto-fill 
  ☐  Migrate the database to Azure
☐  Update Power BI Report

Phase 3: Real-World Applications and Collaboration 

  ☐  Try to collaborate with local authorities
☐  Publish an anonymized version on kaggle

Key Consideration

Data Source

For this project, my primary data source will be Facebook.