Readme

The Journey of a Nation: Telling a Story with U.S. Census Data
As a resident of the United States I often wonder what our nation really looks like in regard to our population and geography. To answer some of my questions I decided to do a project to help breakdown recent census data to better understand the demographics of our country.
Overall Summary
For this project I decided to use the 2017 U.S. Census dataset available at [Kaggle.com](https://www.kaggle.com/muonneutrino/us-census-demographic-data%5D). The dataset itself is broken up into demographic information by State and subsequent counties within the state. The dataset provided information about population numbers, ethnicity, income, employment, and modes of transportation to work. A detailed explanation of the features for the dataset can be found at the link above.
Some of the skills I practiced during this project were:
- Data Acquisition: since this was my first major project as a data scientist a large amount of time was spent learning about what sources of data are valid and reliable. Additionally, I had to decide what subset of the data would be best for the purpose I wanted it for (the original Kaggle repository has datasets broken down by county and by census tract).
- Data Wrangling: Although, overall, the data for this project was pretty clean and straightforward I still had to use different methods to make the data more digestible and easier to understand. Several methods I used, like breaking the data into regions and divisions, quickly taught me advanced feature engineering (and how to properly use Stack Overflow).
- Data Visualization and Storytelling: I am a person who loves to see beautiful pictures tell a story. This project gave me many chances (and still is) to tell the story of data through visualizations. While completing this project I gained a huge love of Seaborn and its ability to make information visually easy to comprehend.
- Statistical Analysis/Descriptive Statistics: Sure, visualizations are great but sometimes we need hard numbers to clarify points. This project allowed me to play with the theorems and ideas I learned in my college statistics classes in relation to the real world.
Prerequisites/ Process
Exploratory Data Analysis (EDA)
As I stated previously, this was my first major project in the world of data science and I learned so much about EDA in a very short time. Before looking deeper into what the data told us I had to do some basic cleaning. Null values were normalized, columns were combined, and new columns were generated.
Feature Engineering
The original dataset was massive and difficult to use for analytical purposes. To help make the data more “bite-sized” I broke the information down in to geographical regions and divisions, as created by the U.S. Census Bureau. I also created several new features to help understand the data better, like top 75% of income, percentage of population by gender, and ethnicity.
Testing/Main Purpose of Project
My overall question for this project was: Can we use the U.S. Census data to tell a story of our country?
Through visualizations and analytic questioning I was able to begin to tell the story of the U.S. through its demographical data. I have only scratched the surface and look forward to expanding this project using previous years data and advanced visualization programs, such as Tableau, to tell a more in-depth story of the United States over time.
Final Bits
This is an on-going project and will be updated as more information is added.