A method to approach a data visualization task

Posted on
data-science data-visualization tableau

This article is a part of my assignment at the university. In the article, I present a simple way to tackle a data visualization problem from scratch, when you are given a dataset and want to find some insights.

I use Public Tableau for demonstration.

Dataset: American Community Survey

The following process (from Fisher, Danyel, Meyer, Miriah: Making Data Visual: A Practical Guide to Using Visualization for Insight) is applied for every question:

  1. Refine the question into one or more tasks
  2. For each task:
    • Identify the components of the task:
      • Objects: Things or events of the task.

      • Measures: Variables measured for the objects, it can be existing attributes or computed from the data.

      • Groupings (or partitions): Groups of data using some filters.

      • Actions: Specifiy what to do with data (compare,, identify, characterize).

    • Look for ambiguous components (which are not directly addressable by the dataset).
    • For each ambiguous component, define a proxy by creating a new question that address the component, return to step 1.
    • If there is no ambiguous component, the task is actionable and can be addressed by visualization.

Note: All the questions are only for analytics purpose. I have absolutely no bias for gender, races or social classes.

Question 1: Is it true that you will earn more if you are a white man?

  • Task: Define high income = Annual income > 50k, low income = Annual income <= 50k, identify the number of high income and low-income

    • Action: Identify

    • Object: people and their Annual income

    • Measure: number of high-income and low-income

    • Grouping: Filter people with high-income and low-income

The number of high-income (1221) is 3 times less than the number of low-income (3778).

  • Task: Identify the number of males and females

    • Action: Identify

    • Object: people and their Sex

    • Measure: number of male and female in Sex

    • Grouping: Filter people with Sex male and Sex female

The number of male (3371) doubles the number of female (1628)

  • Task: Identify the number of male and female for each category of Annual income:

    • Action: Identify

    • Object: people

    • Measure: Number of male and female in Sex for each category of Annual income (high-income and low-income)

    • Grouping: Filter male and female

male accounts for 83.92% (1025 / 1221) of people with high-income. One part of the question can be answered here. You are more likely to earn more money if you are a man. However, since the number of male doubles the number of female and the number of low-income triples the number of high-income in the dataset, this conclusion is not concrete.

  • Task: Identify the number of people for 2 groups of race (White and the rest) according to each category of Annual income: high-income and low-income

    • Action: Identify

    • Object: People

    • Measure: Number of 2 race group White and the rest (Amer-Indian-Eskimo, Asian-Pac-Islander, Black, Other) for each category of Annual income (<= 50k and > 50k)

    • Grouping: Filter Race and Annual income group

The number of White is 4 times more than the number of None-White. However, the percentage of high-income in White is much higher than in None-White (25.99% and 15.51% respectively).

In conclusion, according to the calculation from the dataset, if you are a white man, you will have a high chance of making more money.

Tableau Online

Question 2: What are the impact of age and level of education on annual income?

Task: Identify the number of old, middle-age and young people:

  • Action: Identify

  • Object: people and their Age

  • Measure: Number of people for each Age range

  • Grouping: Filter Age for old (Age > 60), middle-age (30 < Age <= 60) and young (age <= 30)

 middle-age has the highest percentage of high-income comparing to old and young.

Task: Identify the number of people in each level of education:

  • Action: Identify

  • Object: people and their Level of education

  • Measure: Number of people in each Level of education

  • Grouping: Filter Level of education

Since this visualization seems too complicated and the Levels of education which has the highest number of people are 9 and 10 (high-school graduation and some-college respectively), I divide Level of education into 2 groups: college-level (Level of education >= 10) and none-college-level (Level of education < 10)

Task: Identify the number of people in 2 groups of Level of education

  • Action: Identify

  • Object: people and their Level of education

  • Measure: Number of people in each Level of education

  • Grouping: Filter in 2 groups ofLevel of education (college-level and none-college-level)

It is easier to visualize now. For low-income, the percentage of college-level and none-college-level are almost equal. However, college-level people accounts for a percentage of people in the high-income group.

Task: Show the relation between Age and Level of education with Annual income:

  • Action: Show

  • Object: people and their Age, Level of education and Annual income

  • Measure: Age, Level of education and Annual income

  • Grouping: Filter in 6 groups in combination of 2 categories: Age (old, middle-age, young) and Level of education (college-level and none-college-level)

In the high-income group, college-level middle-age people contribute the highest percentage and outperform these other age groups.

In conclusion, Age and Level of education has impact on Annual income. You are likely to earn more if you are a middle-age person with a college background.

Tableau Online

Question 3: Do people tend to be divorced or single if they work more than normal people?

Task: Define normal people’s work hours per week

  • Action: Define

  • Object: people and their Work hours per week

  • Measure: Average and mean of all people’s Work hours per week

  • Grouping: None

Therefore, normal people usually work 40 hours per week.

Task: Identify the average work hours per week of single people and the rest

  • Action: Identify

  • Object: People and their Work hours per week

  • Measure: Average of all people’s Work hours per week

  • Grouping: Divide intosingle (Divorced, Never-married, Separated and Windowed) and married (Married civillian spouse, Married spouse in armed forces, Married-spouse-absent)

It turns out married people work more than single people. Let’s dissect the group using Relationship.

Task: Identify the average work hours per week of single people and the rest/

  • Action: Identify

  • Object: People and their Work hours per week

  • Measure: Average of all people’s Work hours per week

  • Grouping: Divide intosingle (Divorced, Never-married, Separated and Windowed) and married (Married civillian spouse, Married spouse in armed forces, Married-spouse-absent)

It is clear now that Husband and Not-in-family people in Married group work the most (that is reasonable since they have to support their children or their family). Own-child in Single group works the least.

Therefore, to answer the question, people don’t work more if they are single.

Tableau Online

Question 4: Do people from outside the USA have to work more but earn less than people from the USA?

Task: Display the average work hours per week and the annual income of people from outside the USA and people from the USA

  • Action: Display

  • Object: People

  • Measure: Average of all people’s Work hours per week and High income (new variable created using IF [Annual income] = '>50K' THEN 1 ELSE 0 END) with Native country

  • Grouping: Divide intoUSA (Native country = United-States) and none-USA (Native country != United-States)

The numbers in the map show the average work hour for each country and the color shows the percentage of high-income. The average work hour for every country is nearly the same (about 40 hours per week) except Thailand (~ 80 hours per week but this number is biased because there is only 1 person from Thailand in the dataset). Iran has the highest percentage of high-income (3 samples in the dataset) but there are no significant differences between other countries.

In conclusion, people from all over the work work the same amount of time per week and there is no evidence from the dataset that shows people from outside the USA have to work more but earn less than people from the USA.

Tableau Online