#MakeoverMonday Week 46 Diary

This week’s #MakeoverMonday, Week 46, is Diversity in Tech and covers several key technology companies and their breakdown of employees by gender and ethnicity. Starting this week and moving forward, this #MakeoverMonday Diary will take on a slightly different approach. In doing a couple of time-boxed posts now, it has quickly become clear that the approach of trying to complete the project in a set amount of time, while also taking notes and documenting my steps along the way, hinders my ultimate goal of becoming a better analyst. What’s important to me is that each week I’m learning and growing my analytical skills and also taking the time required to share my learnings with others, who may be looking to either begin building analytical skills of their own or improve upon their current skill set. Let’s get started!

original

Step 1. Know and Understand the Data

After first looking over the original visualization (above),which I liked quite a bit, I flipped over to data.world to download the data set and become familiar with it. The fields included in the data were Date, Type (of company) and Company (name), as well as nine columns for the percentage of employees who were Female, Male, White, Latino, etc. The Date field contained five values, but I had already determined my focus would be on the latest data only, so I added a data source filter getting rid of the previous four time periods. Under Type, I was only interested in Tech and Social Media, so used another data source filter, to filter out Entity and Government. I needed to also keep Country for some later calculations. One last filter on Company kept only those that were Tech and Social Media companies…as well as U.S. Population, again needed for those calcs that we’ll get to.

Step 2. Keep It Simple

Now that I had a good feeling for the data, it was time to think about design. Earlier, I mentioned that I liked the original viz quite a bit. So, in a effort to keep it simple, my approach was to stick with a similar layout, but really emphasize where companies were either overrepresented or underrepresented for a specific gender or ethnicity. In the original viz, I found it a bit inconvenient to have to always go back and reference the very top row (USA Population), to see if a company had more or fewer employees than the US Population, for a given gender or ethnicity. This is where those previously mentioned calculations would come in, but first we’ll touch on color.

Step 3. Effective Use of Color

Going back to the original viz, once you looked past the Gender section (to the right), it didn’t make a ton of sense to me why each ethnicity needed its own color. It was more confusing than anything…did the color actually mean anything or was it there just because? So, in my version of the viz, I stuck with the maroon and gold of the Gender section, letting anything in my viz that signaled overrepresentation be colored gold and anything that signaled underrepresentation be colored maroon. This way it would be extremely easy for the user to understand, at a glance, the breakdown across companies. And to make it even easier yet, I added a highlight when hovering on a company name. This action highlights the row you hover over while also adding the value next to each bar. In an attempt to keep the view clean, I went this route as opposed to adding permanent labels on all bars like in the original. Lastly, to avoid the clutter of any sort of color legend, I tied the colors into the title.

Title with color tied throughout the vizwk46title

Step 4. Choosing the Right Chart Type

So what would be an effective chart type that could achieve the goal of emphasizing where companies were either overrepresented or underrepresented, for a specific gender or ethnicity? Given the two color approach, I felt an effective way to do this would be to use a diverging bar chart and focus on the difference within each company from the US Population. So for each field (Female, Male, etc.) I needed to calculate the difference in the number employed for a company by the number represented in the US Population. For example, women make up 51% of the US Population and 17% of employees at Nvidia. But to simplify a bit, I took the percentages out of the equation and instead went with absolute values per 100 people. So, we could say;

  • For every 100 people in the US, 51 are female
  • For every 100 employees at Nvidia, 17 are female
  • 17 minus 51 is negative 34, so;
    • At Nvidia, for every 100 employees, there is an underrepresentation of 34 females. And conversely, males would be overrepresented by 34 for every 100 employees.

For reference, I included these figures in my tooltips (see below). tooltip

There’s likely a more efficient way of going about the calculations, but since each gender and ethnicity was its own field, I created six calculations, one for each field that would be included in my visualization. And once it came time to move onto the tooltip, several more calculations came into play in order to get the color coding to work. This approach worked here, but if there’s a quicker, easier way of tackling this part of the project and you happen to be reading this, I’m all ears!! So anyway, after going the diverging bar route, here’s what the view started to look like.

wk46.1

With the addition of a ‘sort by’ parameter and the highlight action mentioned earlier, I was starting to like how the visualization was coming together. It encouraged exploration, while providing a quick snapshot of the entire picture. It was easy to see, for instance, that Latinos were underrepresented at all companies (in the above image), while Asians were overrepresented at all companies. The user could sort the data various ways and also had the option of seeing more detail about a particular company if that was of interest; either through the highlight action or through the tooltips.

My final visualization is below and the interactive version can be found here. My hope is that this post and future posts are helpful to those who are early on in their analytical and #dataviz journeys and are looking to either build their skills from the ground up or improve upon their existing skills. If you have any questions at all, whether its something you liked or something you did not like, please don’t hesitate to reach out to me through Twitter at @JtothaVizzo. Thanks for reading and have a great day!

wk46final

 

 

 

Advertisement

#MakeoverMonday Week 45 Diary

My second #MakeoverMonday Diary looks at an Aging America, as the U.S. Census Bureau projects that for the first time in U.S. history, adults aged 65+ will outnumber children aged under 18, by the year 2034. I felt like the original viz was straightforward and did a nice job of showing the anticipated shift. However, I wanted more detail than just total adults and total children, so my goal was to include something that resembled the top part of the original viz, while also adding more detail below. Let’s get started…Screen Shot 2018-11-08 at 9.30.34 PM

9:11am – My first step was to pivot the data from its original format, this would leave me with one column for age and a second for the population of each age. However, this would also leave me with duplicate data, so it was important to then go ahead and filter the data appropriately for my analysis. After digging through the data a bit, it was decided that I would not be using the Race field, so I threw a Data Source Filter on that to keep only “All Races.” This way I wouldn’t have to deal with all of the other Race options once I began my analysis. I did not do this with the Sex and Origin fields, as my viz would include sheets both at the highest level, as well as more detailed, so I chose to filter those from the worksheets. It was important to keep an eye on my filters to ensure I wasn’t reporting data that had been duplicated. Once inside Tableau, I started with two quick calculations to first parse out the word “age” from the age field and then group the ages so I had my children under 18 and adults 65+ age groups. Then it was time to replicate the original viz, the only differences here was that I went with a stepped line chart and included actual populations for each group as opposed to percent of total population, like in the original. After adding some text and a highlight circle focused on the point when adults 65+ outnumber children under 18, here’s what the top part of my viz looked like.

topline

9:42am – So, now that we had this high level overview, it was time to show some detail and find out what groups were projected to cause such a giant shift. Again, after playing around with the data, I felt the Origin field would succeed in telling the story here and since it only had a few values (Hispanic, Not Hispanic and Total), would require fewer visuals than telling the story through the Race field, which had seven values. The differences in population projections among people with Hispanic origins vs. those without Hispanic origins was quite jaw dropping at first glance, which is why I went this route. While adult males and females 65+ with Hispanic origins are projected to close the gap on children under 18 with Hispanic origins, the trend was much more gradual than that of people without Hispanic origins.

9:57am – Unfortunately, I’ve been getting pulled away with a few phone calls, so the timing of the diary this week may not make a ton of sense. Either way, we’ll forge on!! So, as we mentioned above, the projections for people with and without Hispanic origins were vastly different. Below are the final visuals displaying each Origin group for both Females and Males.

Females and Males with Hispanic Originswithhispanicorigin

Females and Males without Hispanic Originswohisporg

10:22am – As I was just about to publish the viz, a last minute idea came to me, to change the title to a sort of diverging color palette, so that it more aligned with the rest of the viz. My hope was that this also helps show the projected shift in population. Here are a before and after of the title.

Beforeuwa1

Afteruwa2

Bringing it all together now, below is my final viz for #MakeoverMonday Week 45. The interactive version can be found here. Before we wrap this up, I want to thank Neil Richards again for the awesome color palette. If you haven’t seen it yet, his Color Palette viz can be found here. It’s such a nice resource if you’re anything like me and struggle with putting colors together that compliment one another. Thanks again Neil!!

Thank you for reading and have a wonderful day!!

2018-45final