Beyond Multivariable Thinking: Pictures as Data[1]
Authors: Stacey Hancock and
Jade Schmidt
Montana State University
In this activity, you will use Tableau to re-create some data visualizations using world data from gapminder.org. We will then explore pictures as data on Gapminder’s Dollar Street (https://www.gapminder.org/dollar-street).
First, watch the following TEDx talk, “Using photos as data to understand how people live,” from the beginning of the video through time 11:50:
https://www.gapminder.org/videos/using-photos-as-data-to-understand-how-people-live/
Answer questions 1-4 about this video.
1.
What
are the observational units on Dollar Street? (Select one)
A.
Income
B.
People
C.
Households
D.
Countries
2.
For each home on
Dollar Street, how many categories/things do
photographers capture by photo?
3.
In order for the number of homes
on Dollar Street to be proportional to the world population, out of 100 homes, we need
homes from Asian countries.
4.
Using the picture
data on Dollar Street, Anna Rosling Rönnlund demonstrates that two households from different
countries at the same income level generally have [more or less?] in common
than two households from different income levels within the same country. (Select one)
A.
More
B.
Less
Download the GapMinder data file[2] from the course webpage and load the data into Tableau:
http://www.math.montana.edu/courses/s216/#data
This file contains a subset of variables available for 124 different countries from GapMinder for years 1950 – 2014. Use these data to answer the following questions.
• Click “Text
file” under “To a File” in the “Connect” menu on the left. Select your
downloaded gapminder.csv file.
• Open a new
sheet.
• Tableau
automatically treats a number as continuous, but we want to treat year in the
data set as a categorical variable, so use the drop down menu next to Year
under Dimensions to convert Year to a discrete
variable.
5.
What
are the observational units for the downloaded data set (gapminder.csv)? (Select one)
A.
Income
B.
People
C.
Households
D.
Countries
6.
List all of the
quantitative variables available in the data set. (Hint: the note above tells you to treat year as a categorical variable.)
7.
Which type of
plot would be most appropriate to determine how many countries are landlocked
and how many have a coastline in the dataset? (Select one)
A. Side-by-side boxplots
B. Bar graph
C. Histogram
D. Segmented bar graph
8.
Do regions differ
in the distribution of main religion among countries? Create a segmented bar
graph that allows us to compare the distribution of main religion within regions
in 2014. To only plot data from 2014, drag Year to Filter, make sure 2014 is the
only box checked, and click OK. Ensure the legend is visible.
9. Use your segmented bar graph to determine which region
has the highest percentage of countries where Muslim is the main religion. What
is this percentage?
10. Go to the Dollar Street data (https://www.gapminder.org/dollar-street). Click the drop-down menu for Families and search
for Worship Places. Select four families (one from each region) that ‘live’ in a similar place on the Dollar
Street. Write a sentence or two comparing the places of worship for the four
families and provide a screenshot of the picture of Worship Places for each
selected family. Hint: you can use the
drop-down menu for ‘the World’ to select one region at a time to ensure you
pick a family from each region.
Families compared:
Income range for families compared:
Comparison of places of worship:
Pictures of Worship Places chosen:
11. Create side-by-side boxplots that display the same
variables as the chart shown at time 7:15 of the video for the year 2014
(population is not a variable
plotted).
12. Use your side-by-side boxplots to order regions from
smallest (1) to largest (4) based on median income per person.
(smallest median income per
person)
1
2
3
4
(largest median income per person)
Does this appear to match the plot at time 7:15 of the video? Explain your answer.
13. We now want to see if a relationship exists between
income per person and life expectancy, and determine if that relationship has
changed between 1950 and 2014.
a) Create a scatterplot to answer this question with
income per person on the x-axis and life expectancy on the y-axis. Filter to
only show the years 1950 and 2014, color the scatterplot by year and size the
points on the scatterplot based on the country’s population. Add a logarithmic
trend line for each year to the plot as well.
Ensure the legend is visible.
Hint 1: We would prefer to assess linear
trends rather than logarithmic trends. We can do this by plotting income per
person on the log-scale. In Tableau, right-click on the x-axis and click Edit
axis. In the General tab, Change Range to Fixed with a starting point of 200
and end point of 200,000. Check the box for Logarithmic under Scale. Change the
title to Log of Income Per Person.
Hint 2 (optional): On the Marks card,
under Shape, you may choose to have your points filled and under Size, you can
use the slider to increase the size of the points to see countries more easily
b) Write a short paragraph summarizing the information displayed
in this plot.
c) Does it appear that log(income per person) in 2014 is
more impactful, less impactful, or has about
the same impact on life expectancy as it did in 1950?
A.
More impactful
B. Less impactful
C. About the same amount of impact
What features of the plot are you using to determine
your answer? Explain.
14. Select one ‘topic’ that was photographed for The
Dollar Street that has not been previously discussed in this activity.
Topic:
a)
Select three
families from different countries with similar incomes. Write a sentence
or two comparing the topic for those
three families and provide screenshots of the pictures selected.
Income range:
Countries:
Comparison of pictures:
Pictures selected:
b)
Select
three families from the same country with different incomes (one on the
low end, one in the middle, one on
the high end). Write a sentence or two comparing the topic for those three
families.
Country:
Incomes:
Comparison of pictures:
Pictures selected:
Do there appear to be larger differences between the families in (a) or the families in (b)? Explain.
15. Reflecting on this activity, write one advantage to using the categorical and quantitative variables in this data set for gaining information about the world, and one disadvantage. Do the same for the picture data.
[1] For the instructor guide and solutions to this activity, please email Stacey Hancock at stacey.hancock@montana.edu.
[2] Data source: https://github.com/syntagmatic/gapminder-csv