This week’s #MakeoverMonday, Week 46, is Diversity in Tech and covers several key technology companies and their breakdown of employees by gender and ethnicity. Starting this week and moving forward, this #MakeoverMonday Diary will take on a slightly different approach. In doing a couple of time-boxed posts now, it has quickly become clear that the approach of trying to complete the project in a set amount of time, while also taking notes and documenting my steps along the way, hinders my ultimate goal of becoming a better analyst. What’s important to me is that each week I’m learning and growing my analytical skills and also taking the time required to share my learnings with others, who may be looking to either begin building analytical skills of their own or improve upon their current skill set. Let’s get started!
Step 1. Know and Understand the Data
After first looking over the original visualization (above),which I liked quite a bit, I flipped over to data.world to download the data set and become familiar with it. The fields included in the data were Date, Type (of company) and Company (name), as well as nine columns for the percentage of employees who were Female, Male, White, Latino, etc. The Date field contained five values, but I had already determined my focus would be on the latest data only, so I added a data source filter getting rid of the previous four time periods. Under Type, I was only interested in Tech and Social Media, so used another data source filter, to filter out Entity and Government. I needed to also keep Country for some later calculations. One last filter on Company kept only those that were Tech and Social Media companies…as well as U.S. Population, again needed for those calcs that we’ll get to.
Step 2. Keep It Simple
Now that I had a good feeling for the data, it was time to think about design. Earlier, I mentioned that I liked the original viz quite a bit. So, in a effort to keep it simple, my approach was to stick with a similar layout, but really emphasize where companies were either overrepresented or underrepresented for a specific gender or ethnicity. In the original viz, I found it a bit inconvenient to have to always go back and reference the very top row (USA Population), to see if a company had more or fewer employees than the US Population, for a given gender or ethnicity. This is where those previously mentioned calculations would come in, but first we’ll touch on color.
Step 3. Effective Use of Color
Going back to the original viz, once you looked past the Gender section (to the right), it didn’t make a ton of sense to me why each ethnicity needed its own color. It was more confusing than anything…did the color actually mean anything or was it there just because? So, in my version of the viz, I stuck with the maroon and gold of the Gender section, letting anything in my viz that signaled overrepresentation be colored gold and anything that signaled underrepresentation be colored maroon. This way it would be extremely easy for the user to understand, at a glance, the breakdown across companies. And to make it even easier yet, I added a highlight when hovering on a company name. This action highlights the row you hover over while also adding the value next to each bar. In an attempt to keep the view clean, I went this route as opposed to adding permanent labels on all bars like in the original. Lastly, to avoid the clutter of any sort of color legend, I tied the colors into the title.
Title with color tied throughout the viz
Step 4. Choosing the Right Chart Type
So what would be an effective chart type that could achieve the goal of emphasizing where companies were either overrepresented or underrepresented, for a specific gender or ethnicity? Given the two color approach, I felt an effective way to do this would be to use a diverging bar chart and focus on the difference within each company from the US Population. So for each field (Female, Male, etc.) I needed to calculate the difference in the number employed for a company by the number represented in the US Population. For example, women make up 51% of the US Population and 17% of employees at Nvidia. But to simplify a bit, I took the percentages out of the equation and instead went with absolute values per 100 people. So, we could say;
- For every 100 people in the US, 51 are female
- For every 100 employees at Nvidia, 17 are female
- 17 minus 51 is negative 34, so;
- At Nvidia, for every 100 employees, there is an underrepresentation of 34 females. And conversely, males would be overrepresented by 34 for every 100 employees.
For reference, I included these figures in my tooltips (see below).
There’s likely a more efficient way of going about the calculations, but since each gender and ethnicity was its own field, I created six calculations, one for each field that would be included in my visualization. And once it came time to move onto the tooltip, several more calculations came into play in order to get the color coding to work. This approach worked here, but if there’s a quicker, easier way of tackling this part of the project and you happen to be reading this, I’m all ears!! So anyway, after going the diverging bar route, here’s what the view started to look like.
With the addition of a ‘sort by’ parameter and the highlight action mentioned earlier, I was starting to like how the visualization was coming together. It encouraged exploration, while providing a quick snapshot of the entire picture. It was easy to see, for instance, that Latinos were underrepresented at all companies (in the above image), while Asians were overrepresented at all companies. The user could sort the data various ways and also had the option of seeing more detail about a particular company if that was of interest; either through the highlight action or through the tooltips.
My final visualization is below and the interactive version can be found here. My hope is that this post and future posts are helpful to those who are early on in their analytical and #dataviz journeys and are looking to either build their skills from the ground up or improve upon their existing skills. If you have any questions at all, whether its something you liked or something you did not like, please don’t hesitate to reach out to me through Twitter at @JtothaVizzo. Thanks for reading and have a great day!