Plot with a purpose

The art of data presentation is an essential skill for conservationists and ecologists. A well-designed plot draws attention to the relationship, trend or other information being presented rather than to peripheral information. A poorly designed plot confuses, and obfuscates the purpose.

In this chapter we demonstrate how failure to structure your graphics results in obfuscation of a plot’s purpose, introduce the plotting software Veusz and ggplot2, and provide case studies of plot design, including a video tutorial using Veusz.

Don’t obfuscate the purpose

In Shaanxi Province, China, the numbers of (a) newly licensed Chinese giant salamander farms and (b) salamander hatchlings produced during 2004–2012. This is a modified version of Figure 2 from @cunninghamDevelopmentChineseGiant2016.Figure 60: In Shaanxi Province, China, the numbers of (a) newly licensed Chinese giant salamander farms and (b) salamander hatchlings produced during 2004–2012. This is a modified version of Figure 2 from Cunningham et al. (2016).

There is more than one way to draft a plot. This variation (Figure 60) on Figure 2 from Cunningham et al. (2016) illustrates the effects of a range of questionable design decisions—all of which can be commonly seen in the scientific literature—and how they obfuscate the plot’s purpose. These decisions compromise several of our principles for good design.

Different symbol and line types for the two variables This obfuscates the plot’s purpose and compromises simplicity by introducing unnecessary complexity (use of separate figure parts identifies the two variables, and therefore different symbols and line types are unwarranted).

Axis labels emphasized with bold This introduces an unnecessary hierarchy in the text, drawing attention to elements of peripheral rather than principal interest.

Vertical orientation of horizontal axis tick mark labels Legibility is compromised by reducing font size unnecessarily (there is room to accommodate the x-axis tick mark labels horizontally at the same font size as the other elements), and the labels are difficult to read in vertical orientation (there is room to orient them horizontally or at an angle).

Absence of text indicating subjects of parts (a) and (b) This compromises communication of the figure’s purpose and accessibility, requiring the viewer to read the caption to know what is in each part.

Unnecessary labelling of horizontal axis in part (a), and labels above plots The figure appears cluttered, compromising good composition. As the horizontal axes of the two parts are identical the tick labels and axis label of part (a) are redundant (without them the plots could be moved closer together), and placing the labels for each part above rather than within the plots requires more page space.

Plotting software

Spreadsheets, although useful for data entry, do not facilitate the detailed manipulation required for scientific data plots and do not export graphics in formats (such as SVG) suitable for high quality publication. There is a plethora of free software available for scientific data plotting, of which we have found that R with RStudio and Veusz both produce good quality publication-ready graphics. These two tools work in fundamentally different ways, R being command-line driven and Veusz being mouse-driven.

R with RStudio

R is an environment for statistics and graphics. There are several ways to run R, but the most convenient way is probably with RStudio. Within R there are several plotting systems, of which the ggplot2 package is popular because of its consistency and because its use can be integrated with data analysis as part of the Tidyverse collection of packages (of which ggplot2 is a part).

Run these commands to install ggplot2 in R. To run this and other R code in this chapter, copy and paste the R code into a new R script file in RStudio. You can then re-purpose the code to suit your own requirements.

# Install and use all Tidyverse packages
# Unncomment (i.e. remove '#') the next line only if the tidyverse pacakge is not already installed
# install.packages("tidyverse")
library(tidyverse)

# Alternatively, install and use only the ggplot2 package
# Unncomment the next line only if ggplot2 is not already installed
# install.packages("ggplot2")
library(ggplot2)

There is much help available for using R and ggplot2. We have found the following particularly useful:

Veusz

If you want to produce only one or a few plot figures, and are unfamiliar with R, Veusz is a good option. It is designed specifically for the production of publication-ready data plots.

The Veusz help system includes a tutorial and example documents that illustrate the construction of a range of plot types. Plot study 1 includes a video introduction to using Veusz.

Plot studies

The following plot studies illustrate the design of a range of plot types, each of which presents a particular type of data. The design of these figures avoids obfuscation of the respective purposes by adhering to our design principles, including that of accessibility—being interpretable in greyscale without requiring modification.

Study 1 Bar chart
Study 2 Grouped bar chart
Study 3 Two-part line plot
Study 4 Multi-line plot

Study 1 Bar chart

Number of tree species per country (of a total of 129 taxa that occurred at altitudes >1,500 m and in more than one country) assessed using the IUCN Red List categories and criteria (modified from Figure 2 in @tejedorgaravitoRegionalRedList2015; plotted with Veusz).Figure 61: Number of tree species per country (of a total of 129 taxa that occurred at altitudes >1,500 m and in more than one country) assessed using the IUCN Red List categories and criteria (modified from Figure 2 in Tejedor Garavito et al. (2015); plotted with Veusz).

A bar chart (Figure 61) summarizes the frequency of a categorical variable.

Purpose The number of tree species occurring at altitudes >1,500 m in the Andes and in more than one country (of a total of 129 taxa) that were assessed using the IUCN Red List categories and criteria differed between these six South American countries, being highest in Ecuador and lowest in Argentina.

The figure is presentable in greyscale (Figure 61)—although you could colour the bars if you wish, perhaps in light blue, a colour that would be presentable if the figure was reproduced in greyscale—and the following additional points are of note:

  • No hierarchy of visual elements is required: the same font size is used for all lettering, with no need for bold or italics for emphasis

  • Country names are set at 45º to aid readability and to avoid congesting the text

  • All bars have the same grey shading, as they are measures of the same variable

  • There is no need in this case to label the horizontal axis, as the names of the countries are sufficiently self-explanatory

Figure 62: Plot Study 1 demonstrates how to draft this plot in Veusz.

Figure 61 was plotted with Veusz. Alternatively, we can plot this figure with the ggplot command in the R pacakage ggplot2. In the first code chunk we import the data and draw a first draft of the bar chart.

Default ggplot style.Figure 63: Default ggplot style.

# If tidyverse package not installed in RStudio, remove '#' from next line 
# The read_csv comamnd is part of the Tidyverse package readr
# install.packages("tidyverse")
library(tidyverse)  

# Read the data
data <- read_csv("country,number_of_species
                 Ecuador,96
                 Peru,87
                 Colombia,52
                 Bolivia,43
                 Venezuela,18
                 Argentina,7")

# Default ggplot figure style
# Note that the package is called ggplot2 but the command is 'ggplot'
ggplot(data, aes(country, number_of_species)) + # Specify x first, then y
  geom_bar(stat = "identity") + # Plot type
  theme_gray()

In the second code chunk, we improve our first attempt, re-ordering the bars by decreasing number of species (we don’t need to import the data again).

Default ggplot style, with countries reordered.Figure 64: Default ggplot style, with countries reordered.

# Reorder countries by no. of species (in descending order)
ggplot(data, aes(reorder(country, -number_of_species), number_of_species)) + 
  geom_bar(stat = "identity") +
  theme_gray()

In our final version we add further code to ready the figure for publication: we modify the theme to black and white, add axis labels, modify a few details to ensure the plot is suitable for publication, and then save two versions, one suitable for publication in print and one suitable for onscreen display.

Plot study 1 formatted for publication with ggplot.Figure 65: Plot study 1 formatted for publication with ggplot.

# To format the figure for publication, modify the theme
ggplot(data, aes(reorder(country, -number_of_species), number_of_species)) + 
  geom_bar(stat = "identity", width = 0.6, fill = "dark grey", colour = "black", size = .1) + 
  # Set bar width & fill, colour & thickness of bounding line
  theme_bw() + # black & white theme
  ylab("Number of species") + # Set the y-axis label
  scale_y_continuous(expand = c(0, 0), limits = c(0,100)) + # Remove space below graph
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1.1, size = 8), # Angle x-axis text  
    axis.text.y = element_text(size = 8), # Size y-axis numbers
    axis.ticks = element_blank(), # No ticks
    axis.title.x = element_blank(), # No x-axis title
    axis.title.y = element_text(size = 8), # Size y-axis title
    panel.grid.minor = element_blank(), # No minor grid lines
    panel.grid.major.x = element_blank(), # No major x-axis grid lines
    panel.grid.major.y = element_line(colour = "dark grey", size = 0.2),
    text=element_text(size = 10, family = "Assistant Regular")
  )
# Use scalar vector graphics (svg) for publication quality
ggsave ("Plot_study_1.svg", width = 8, height = 8, unit = "cm")

# Use jpg or png for on screen, setting DPI as required
ggsave ("Plot_study_1.jpg", width = 8, height = 8, unit = "cm", dpi = 300)

Study 2 Grouped bar chart

A grouped bar chart (Figure 66) summarizes the frequency of a categorical variable that has two or more groupings.

Figure 66: Mean (± SE) recruitment of trees (<10 cm DBH) under the parent crown for three autochorous tree species (i.e. that disperse their seed without an external vector) and 19 tree species dispersed by the bonobo Pan paniscus (i.e. zoochorous) at LuiKotale; the dotted line is the threshold for self-replacement of the parent (a modified version of Figure 2 in Beaune (2015); plotted with Veusz).

Mean (± SE) recruitment of trees (<10 cm DBH) under the parent crown for three autochorous tree species (i.e. that disperse their seed without an external vector) and 19 tree species dispersed by the bonobo _Pan paniscus_ (i.e. zoochorous) at LuiKotale; the dotted line is the threshold for self-replacement of the parent (a modified version of Figure 2 in @beauneWhatWouldHappen2015; plotted with Veusz).

Purpose The mean number of young trees recruited under the parent crown is below the threshold for self-replacement for 18 of the 19 zoochorous tree species whose seeds are dispersed by the bonobo, whereas the mean number of of young trees recruited for three autochorous (i.e. dispersed by the plant’s own means) tree species is above the threshold.

The graphic produced by Veusz has one minor style issue that we would like to correct—the legend text and labels are too far apart—but the option to resolve this is not available in Veusz. It is, however, only a few minutes work to edit an SVG version of the figure using Inkscape (Figure 67).

Figure 67: Edited version of Figure 66, with an aesthetically more pleasing legend.

Edited version of Figure \@ref(fig:grouped-bar-chart), with an aesthetically more pleasing legend.

The figure is interpretable in greyscale (Fig. 66) without requiring modification, and the following additional points are of note:

  • The figure is designed at a width of 110 mm, for 2/3 page width in the chosen journal.

  • The same font size used for all lettering except for species names, which require a smaller font because of space constraints.

  • Abbreviations of species names are set at 45º to aid readability.

  • Use of colour and position identifies the two groups of trees.

  • Horizontal grid lines have not been used, so as not to distract from the horizontal line at 1.

Figure 68: Greyscale vesion of Figure 67.

Greyscale vesion of Figure \@ref(fig:grouped-bar-chart-edited).

Let’s plot the figure with ggplot:

# We need the tidyverse package
library(tidyverse)

# Read data, then convert it from 'wide' to 'long' with gather
data <- read_csv(
  "species,mean_number_of_poles,SE,position,chory
HM,2.5,0.3726779962,1,Autochorous
ScZ,3.4,0.8717797887,2,Autochorous
StZ,1.2,0.3252462513,3,Autochorous
AM,0.4,0.3055050463,4,Zoochorous
BW,0.4,0.4,5,Zoochorous
CS,0,0,6,Zoochorous
CD,0,0,7,Zoochorous
DS,2.6,0.9451631253,8,Zoochorous
EO,0.8,0.5426273532,9,Zoochorous
FS,0,0,10,Zoochorous
GL,0,0,11,Zoochorous
GO,0.3,0.2108185107,12,Zoochorous
GS,0.5,0.2687419249,13,Zoochorous
IG,0,0,14,Zoochorous
IGr,0,0,15,Zoochorous
KG,0,0,16,Zoochorous
LF,0,0,17,Zoochorous
LS,0.4,0.2449489743,18,Zoochorous
MA ,0,0,19,Zoochorous
MY,0.1,0.1,20,Zoochorous
PL,0.5,0.2236067977,21,Zoochorous
PE,0.1,0.1,22,Zoochorous"
) 

data # Look at the data (shows the top 10 rows)
## # A tibble: 22 x 5
##    species mean_number_of_poles    SE position chory      
##    <chr>                  <dbl> <dbl>    <dbl> <chr>      
##  1 HM                       2.5 0.373        1 Autochorous
##  2 ScZ                      3.4 0.872        2 Autochorous
##  3 StZ                      1.2 0.325        3 Autochorous
##  4 AM                       0.4 0.306        4 Zoochorous 
##  5 BW                       0.4 0.4          5 Zoochorous 
##  6 CS                       0   0            6 Zoochorous 
##  7 CD                       0   0            7 Zoochorous 
##  8 DS                       2.6 0.945        8 Zoochorous 
##  9 EO                       0.8 0.543        9 Zoochorous 
## 10 FS                       0   0           10 Zoochorous 
## # … with 12 more rows
# For line and rectangle elements all sizes
ggplot(data, aes(reorder(species, position), mean_number_of_poles, fill = chory)) + # Specify x first, then y
  geom_bar(stat = "identity", width = 0.8, colour = "black", size = 0.2) + # Set bar width & colour
  geom_errorbar(aes(ymin=mean_number_of_poles-SE, ymax=mean_number_of_poles+SE),
    width = .2, size = 0.2) + # Error bar
  scale_y_continuous(expand = c(0, 0), limits = c(0,4.5)) + # Remove space below graph
  scale_fill_manual(values=c("dark green", "light blue")) +    
  theme_bw(base_family = "Assistant") + # black & white theme, Assistant font
  ylab("Mean number ± SE") + # Set the y-axis label
  xlab("Species") + # Set the x-axis label
  geom_hline(aes(yintercept = 1), linetype = "dashed", size = 0.2) + # Dashed line at y = 1
  theme(
    line = element_line(size = 0.2), # Thickness of all lines, mm
    rect = element_rect(size = 0.2), # Thickness of all rectangles, mm
    text = element_text(size = 8), # Size of all text, points
    axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Angle x-axis text
    axis.title.x = element_text(size = 10), # Size x-axis title
    axis.title.y = element_text(size = 10), # Size y-axis title
    panel.grid.minor = element_blank(), # No minor grid lines
    panel.grid.major.x = element_blank(), # No major x-axis grid lines
    panel.grid.major.y = element_blank(),
    legend.position = c(0.85, .805),
    legend.title = element_blank(),
    legend.box.background = element_rect(size = 0.2), # Box for legend
    legend.key.size = unit(4, unit = 'mm'),
    legend.text = element_text(size = 8),
    legend.margin = margin(1, 1, 1, 1, unit = 'mm')
  )

Figure 69: Plot study 2 formatted for publication with ggplot.

Plot study 2 formatted for publication with ggplot.
# Use scalar vector graphics (svg) for publication quality
ggsave ("Plot_study_2.svg", width = 12, height = 7, unit = "cm")

# Use jpg or png for on screen, setting DPI as required
ggsave ("Plot_study_2.jpg", width = 12, height = 7, unit = "cm", dpi = 204)

Similarly to Veusz, ggplot does not space the legend items in the most pleasing manner but this can be resolved by a minor edit of the SVG file, using Inkscape (Figure 70).

Figure 70: Edited version of Figure 69, with an aesthetically more pleasing legend.

Edited version of Figure \@ref(fig:grouped-bar-chart-ggplot), with an aesthetically more pleasing legend.

Study 3 Two-part line plot

The numbers of (a) newly licensed Chinese giant salamander farms and (b) salamander hatchlings produced during 2004–2012 in Shaanxi Province, China (a modified version of Figure 2 in @cunninghamDevelopmentChineseGiant2016, plotted with Veusz).Figure 71: The numbers of (a) newly licensed Chinese giant salamander farms and (b) salamander hatchlings produced during 2004–2012 in Shaanxi Province, China (a modified version of Figure 2 in Cunningham et al. (2016), plotted with Veusz).

A line plot (Fig. 71) illustrates change in a variable over time.

Purpose From 2004 to 2012 new farms for the Chinese giant salamander were licensed each year, and from 2006 to 2011 there was an increase in the total number of hatchlings produced each year.

The ways in which this figure adheres to our design principles is described in Applying the framework to a data plot.

Let’s plot the figure with ggplot:

# We need the tidyverse and cowplot packages
# Cowplot can combine figures
library(tidyverse)
library(cowplot)
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave
# Read data ('NaN' is 'not a number'; i.e. missing data)
data <- read_csv(
  "year,new_licences,hatchlings
  2004,18,NaN
  2005,5,NaN
  2006,16,115000
  2007,8,180000
  2008,18,250000
  2009,17,500000
  2010,18,600000
  2011,16,800000
  2012,25,NaN"
)

data # Look at the data (shows the top 10 rows)
## # A tibble: 9 x 3
##    year new_licences hatchlings
##   <dbl>        <dbl>      <dbl>
## 1  2004           18        NaN
## 2  2005            5        NaN
## 3  2006           16     115000
## 4  2007            8     180000
## 5  2008           18     250000
## 6  2009           17     500000
## 7  2010           18     600000
## 8  2011           16     800000
## 9  2012           25        NaN

Plot study 3 formatted for publication with ggplot.Figure 72: Plot study 3 formatted for publication with ggplot.

# Plot of new_licences assigned to 'licences'
licences <- ggplot(data, aes(year, new_licences)) + # Specify x first, then y
  geom_point(size = 0.2) + # Data points
  geom_line(size = 0.2) + # Line
  theme_bw(base_family = "Assistant") + # black & white theme, Assistant font 
  ylab("Number") + # y-axis label
  scale_x_continuous(breaks=c(2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012)) +
  theme(
    axis.text.x = element_blank(), # No x-axis ticks
    axis.text.y = element_text(size = 8), # Size y-axis tick labels
    axis.title.x = element_blank(), # No x-axis title
    axis.title.y = element_text(size = 10), # Size y-axis title
    panel.grid.minor = element_blank(), # No minor grid lines
    panel.grid.major.x = element_blank(), # No major x-axis grid lines
    panel.grid.major.y = element_line(colour = "grey", size = 0.2) # Major y-axis grid lines
  ) +
  annotate("text", label = "(a) New licences", x = -Inf, y = Inf, hjust = -0.05, vjust = 2.5,
    family = "Assistant", size = 3.53) # Inf parameters centres label at top edge, 
    # hjust & vjust tune position; size is in mm (10 points = 3.53 mm) 

# Plot of number of hatchlings assigned to 'hatchlings'
hatchlings <- ggplot(data, aes(year, hatchlings)) + 
  geom_point(size = 0.2) + 
  geom_line(size = 0.2) + 
  theme_bw(base_family = "Assistant") +
  ylab("Number") + 
  scale_x_continuous(breaks=c(2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012)) +  
  scale_y_continuous(breaks = c(200000, 400000, 600000, 800000), 
    labels = c("2", "4", "6", "8")) +
  theme(
    axis.text.x = element_text(size = 8), # x-axis tick
    axis.text.y = element_text(size = 8), 
    axis.title.x = element_blank(), 
    axis.title.y = element_text(size = 10), 
    panel.grid.minor = element_blank(), 
    panel.grid.major.x = element_blank(), 
    panel.grid.major.y = element_line(colour = "grey", size = 0.2), 
    plot.margin = unit(c(0, 0, 0, 0), "cm") # Remove margins of (a) to close up plots
  ) +
  annotate("text", label = "(a) Hatchlings", x = -Inf, y = Inf, hjust = -0.05, vjust = 2.5,
    family = "Assistant", size = 3.53)  

plot_grid(licences, hatchlings, ncol = 1, align = 'v') # Put plots together vertically
# Use scalar vector graphics (svg) for publication quality
ggsave ("Plot_study_3.svg", width = 8, height = 10, unit = "cm")

# Use jpg for on screen, setting DPI as required
ggsave ("Plot_study_3.jpg", width = 8, height = 10, unit = "cm", dpi = 300)

Mean number of meals per day that contained animals (based on 24-hour recall surveys during September 2011–June 2012) consumed in a village on the Masoala Peninsula, Madagascar (modified from Fig. 1 in @borgersonOptimizingConservationPolicy2016; plotted with Veusz).Figure 73: Mean number of meals per day that contained animals (based on 24-hour recall surveys during September 2011–June 2012) consumed in a village on the Masoala Peninsula, Madagascar (modified from Fig. 1 in Borgerson (2016); plotted with Veusz).

Study 4 Multi-line plot

A multi-line plot (Figure 73) illustrates the change in categories of a variable over time.

Purpose The number of meals that contained animals was generally greater in the summer months than in the winter months, and the majority of meals contained fish, and to a lesser degree domesticated and forest animals.

The figure is interpretable in greyscale (Fig. 74) without requiring modification, and the following additional points are of note:

Greyscale version of Figure \@ref(fig:multi-line-plot).Figure 74: Greyscale version of Figure 73.

  • The relative contributions of fish and domesticated and forest animals to the diet is indicated with the use of colour, and therefore only one symbol and line type (representing food) are required.

  • A horizontal axis label is not required because it is clear from the abbreviations that the tick marks are months.

  • Names of months are set at 45º to aid readability.

Let’s plot the figure with ggplot:

Plot study 4 formatted for publication with ggplot..Figure 75: Plot study 4 formatted for publication with ggplot..

# We need the tidyverse package
library(tidyverse)

# Read data, then convert it from 'wide' to 'long' with gather
data <- read_csv(
  "month_text,month_num,Fish,Forest_animals,Domesticated_animals,Total
  Sep.,3,1.3046,0.0355,0.0254,1.3655
  Oct.,4,1.2108,0.0228,0.1282,1.3618
  Nov.,5,1.4377,0,0.0863,1.524
  Dec.,6,1.144,0,0.2957,1.4397
  Jan.,7,1.161,0.0112,0.1798,1.352
  Feb.,8,1.0676,0.0036,0.0641,1.1353
  Mar.,9,1.1779,0.0071,0.0676,1.2526
  Apr.,10,0.9605,0.1356,0.0847,1.1808
  May,11,0.8333,0.125,0.1012,1.0595
  June,12,0.7977,0.0058,0.0694,0.8729"
) %>% 
  gather(key = "type", value = "meals", Fish, Forest_animals, Domesticated_animals, Total) 

data # Look at the data (shows the top 10 rows)

ggplot(data, aes(factor(month_num), meals, group = type)) +
  geom_line(aes(colour = type), size = .2) + # Plot lines by type
  geom_point(aes(colour = type), size = .4) + # Plot points by type
  scale_colour_manual(values = c("#DF5F24", "#00AAF2", "#48AB28", "black"), # Colour lines
    breaks = c("Total", "Fish", "Domesticated_animals", "Forest_animals"), # Order legend
    labels = c("Total", "Fish", "Domesticated animals", "Forest animals")) + # Rename legend items
  scale_x_discrete(labels = c("Sep.", "Oct.", "Nov.", "Dec.", "Jan.", "Feb.", "Mar.", "Apr.", "May", "June")) + # Label x-axis
  ylab("Mean no. of meals per day") + #  Add y-axis label
  theme_bw(base_family = "Assistant") + # Theme the figure
  theme(
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 10),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.y = element_line(colour = "dark grey", size = 0.1),
    panel.grid.major.y = element_line(colour = "dark grey", size = 0.1),
    axis.ticks = element_line(size = .2, colour = "black"),
    axis.text.x = element_text(size = 8, angle = 45),
    axis.text.y = element_text(size = 8),
    legend.position = c(0.29, .493),
    legend.title = element_blank(),
    legend.box.background = element_rect(size = 0.2), 
    legend.key.size = unit(4, unit = 'mm'),
    legend.text = element_text(size = 8),
    legend.margin = margin(1, 1, 1, 1, unit = 'mm'),
    plot.margin = unit(c(0.01, 0.01, 0, 0.03), "cm"),
  ) 

# Save as publication quality SVG format
ggsave("Plot_study_4.svg", height = 8.0, width = 8.0, units = "cm")
# Use jpg or png for online, setting DPI as required
ggsave ("Plot_study_4.jpg", width = 8, height = 8, unit = "cm", dpi = 300)

Page built: 2019-06-11