In a recent job interview for an economic analyst position I was asked to run a statistical analysis and create a presentation on a dataset of my choosing from the opportunity insights collection. Unfortunately I wasn’t the right fit for the job, but I wanted to share my findings and the process of working with data in a different way from what I’m used to.
Going into the project I was pretty nervous since this didn’t really fall into my geography background. This was only made worse by the fact that I was moving across the country in part of the time given to prepare for the interview. Luckily I was able to plan out my time accordingly and maybe more importantly get the Wi-Fi set up quickly so I could start working and create a presentation I could be proud of.
Before getting started I talked with some of my colleagues from Miami University to get ideas on what data to investigate and I ended up settling on a dataset titled: Commuting Zone Life Expectancy Estimates by Gender and Income Quartile. I chose this dataset interested in exploring if there exists any links between commuting zone diversity and average life expectancy. I was particularly interested in this dataset since it has blocking by gender and income allowing for a deeper dive into any correlations in the data.
The first step in examining this potential relationship was to create some robust definition of diversity using the demographics dataset which had information on four demographic groups: Black, White, Asian, and Hispanic. To measure diversity across each commuter zone in a more standardized way I made use of the Shannon diversity index letting me convert the four categories into one standardized number. There are some flaws to this methodology, obviously having only four categories is not representative of the true diversity of the country, but it worked as a starting and further more representative research could come later if needed. In R I then joined the demographics dataset with its new Shannon diversity index field with the commuter zone life expectancy by gender and income. the analysis did this by joining the commuting zone life expectancy dataset with a commuting zone demographics dataset. This combined data could then be plotted out as the graphs below. For the sake of conciseness in the interview I decided to focus on female life expectancy in particular, though I did compare it to the male life expectancy briefly.
I decided to look at male life expectancy just to compare it against the trends seen in the female life expectancy and found one of the more interesting results here. While the life expectancy decline was seen in the top quartile of the female life expectancy graph, it’s not seen in the male graph. This could be indicating the presence of medical sexism. I decided to also look at the flip side of the diversity question by plotting out life expectancy compared to the whiteness of an area. From this we see that there is a positive association across all groups. This means the decreasing life expectancy seen associated with diverse areas could be a result of segregation of health services. This is the same data as the scatter plot graph, but smoothed out to show the trends within the data. Surprisingly, we see that there is a decline in life expectancy associated with increased diversity. Even more surprising it seems that this holds true even in the upper income quartile potentially indicating medical racism. This graph isn’t all that informative, but I liked how it represented the methodology well. From this we can see that each commuter zone’s diversity is being shown 4 times, once for each income quartile and with the life expectancy associated with its income.
One thing to keep in mind when interpreting these results is that commuter zone dataset looked at all commuter zones with populations of 25,000 or more. This could create some misleading trends in the data when comparing smallish and likely more homogenous cities to very large and likely more diverse cities. Overall though there were some interesting findings buried in the data and it was fun to try and piece them together. I must say I was intimidated going into this project, but I came out of it with a new confidence realizing that my data science skills don’t just apply to geography, but can apply on a broader scale tackling different issues and questions than I’m used to.
Cool website!
LikeLike