Beyond Multivariable Thinking: Pictures as Data[1]

Authors: Stacey Hancock and Jade Schmidt

Montana State University

In this activity, you will use Tableau to re-create some data visualizations using world data from gapminder.org. We will then explore pictures as data on Gapminder’s Dollar Street (https://www.gapminder.org/dollar-street).

First, watch the following TEDx talk, “Using photos as data to understand how people live,” from the beginning of the video through time 11:50:

https://www.gapminder.org/videos/using-photos-as-data-to-understand-how-people-live/

1.     What are the observational units on Dollar Street? (Select one)

A.    Income

B.    People

C.    Households

D.    Countries

2.      For each home on Dollar Street, how many categories/things do photographers capture by photo?

3.      In order for the number of homes on Dollar Street to be proportional to the world population, out of 100 homes, we need         homes from Asian countries.

4.      Using the picture data on Dollar Street, Anna Rosling Rönnlund demonstrates that two households from different countries at the same income level generally have [more or less?] in common than two households from different income levels within the same country. (Select one)

A.    More

B.    Less

http://www.math.montana.edu/courses/s216/#data

This file contains a subset of variables available for 124 different countries from GapMinder for years 1950 – 2014. Use these data to answer the following questions.

Open a new sheet.

Tableau automatically treats a number as continuous, but we want to treat year in the data set as a categorical variable, so use the drop down menu next to Year under Dimensions to convert Year to a discrete variable.

5.     What are the observational units for the downloaded data set (gapminder.csv)? (Select one)

A.    Income

B.    People

C.    Households

D.    Countries

6.     List all of the quantitative variables available in the data set. (Hint: the note above tells you to treat year as a categorical variable.)

7.     Which type of plot would be most appropriate to determine how many countries are landlocked and how many have a coastline in the dataset? (Select one)

A.    Side-by-side boxplots

B.    Bar graph

C.    Histogram

D.    Segmented bar graph

8.     Do regions differ in the distribution of main religion among countries? Create a segmented bar graph that allows us to compare the distribution of main religion within regions in 2014. To only plot data from 2014, drag Year to Filter, make sure 2014 is the only box checked, and click OK. Ensure the legend is visible.

9.     Use your segmented bar graph to determine which region has the highest percentage of countries where Muslim is the main religion. What is this percentage?

10.  Go to the Dollar Street data (https://www.gapminder.org/dollar-street). Click the drop-down menu for Families and search for Worship Places. Select four families (one from each region) that ‘live’ in a similar place on the Dollar Street. Write a sentence or two comparing the places of worship for the four families and provide a screenshot of the picture of Worship Places for each selected family. Hint: you can use the drop-down menu for ‘the World’ to select one region at a time to ensure you pick a family from each region.

Families compared:

Income range for families compared:

Comparison of places of worship:

Pictures of Worship Places chosen:

11.  Create side-by-side boxplots that display the same variables as the chart shown at time 7:15 of the video for the year 2014 (population is not a variable plotted).

12.  Use your side-by-side boxplots to order regions from smallest (1) to largest (4) based on median income per person.

(smallest median income per person)

1

2

3

4

(largest median income per person)

Does this appear to match the plot at time 7:15 of the video? Explain your answer.

13.  We now want to see if a relationship exists between income per person and life expectancy, and determine if that relationship has changed between 1950 and 2014.

a)     Create a scatterplot to answer this question with income per person on the x-axis and life expectancy on the y-axis. Filter to only show the years 1950 and 2014, color the scatterplot by year and size the points on the scatterplot based on the country’s population. Add a logarithmic trend line for each year to the plot as well.  Ensure the legend is visible.

Hint 1: We would prefer to assess linear trends rather than logarithmic trends. We can do this by plotting income per person on the log-scale. In Tableau, right-click on the x-axis and click Edit axis. In the General tab, Change Range to Fixed with a starting point of 200 and end point of 200,000. Check the box for Logarithmic under Scale. Change the title to Log of Income Per Person.

Hint 2 (optional): On the Marks card, under Shape, you may choose to have your points filled and under Size, you can use the slider to increase the size of the points to see countries more easily

b)    Write a short paragraph summarizing the information displayed in this plot.

c)     Does it appear that log(income per person) in 2014 is more impactful, less impactful, or has about the same impact on life expectancy as it did in 1950?

A.    More impactful

B.    Less impactful

C.    About the same amount of impact

What features of the plot are you using to determine your answer? Explain.

14.  Select one ‘topic’ that was photographed for The Dollar Street that has not been previously discussed in this activity.

Topic:

a)     Select three families from different countries with similar incomes. Write a sentence or two comparing the topic for those three families and provide screenshots of the pictures selected.

Income range:

Countries:

Comparison of pictures:

Pictures selected:

b)    Select three families from the same country with different incomes (one on the low end, one in the middle, one on the high end). Write a sentence or two comparing the topic for those three

families.

Country:

Incomes:

Comparison of pictures:

Pictures selected:

Do there appear to be larger differences between the families in (a) or the families in (b)? Explain.

15.  Reflecting on this activity, write one advantage to using the categorical and quantitative variables in this data set for gaining information about the world, and one disadvantage. Do the same for the picture data.

[1] For the instructor guide and solutions to this activity, please email Stacey Hancock at stacey.hancock@montana.edu.

[2] Data source: https://github.com/syntagmatic/gapminder-csv