The art of data presentation is an essential skill for conservationists and ecologists. A well-designed plot draws attention to the relationship, trend or other information being presented rather than to peripheral information. A poorly designed plot confuses, and obfuscates the purpose.
In this chapter we demonstrate how failure to structure your graphics results in obfuscation of a plot’s purpose, introduce the plotting software Veusz and ggplot2, and provide case studies of plot design, including a video tutorial using Veusz.
Figure 62: In Shaanxi Province, China, the numbers of (a) newly licensed Chinese giant salamander farms and (b) salamander hatchlings produced during 2004–2012. This is a modified version of Figure 2 from Cunningham et al. (2016).
There is more than one way to draft a plot. This variation (Figure 62) on Figure 2 from Cunningham et al. (2016) illustrates the effects of a range of questionable design decisions—all of which can be commonly seen in the scientific literature—and how they obfuscate the plot’s purpose. These decisions compromise several of our principles for good design.
Different symbol and line types for the two variables This obfuscates the plot’s purpose and compromises simplicity by introducing unnecessary complexity (use of separate figure parts identifies the two variables, and therefore different symbols and line types are unwarranted).
Axis labels emphasized with bold This introduces an unnecessary hierarchy in the text, drawing attention to elements of peripheral rather than principal interest.
Vertical orientation of horizontal axis tick mark labels Legibility is compromised by reducing font size unnecessarily (there is room to accommodate the x-axis tick mark labels horizontally at the same font size as the other elements), and the labels are difficult to read in vertical orientation (there is room to orient them horizontally or at an angle).
Absence of text indicating subjects of parts (a) and (b) This compromises communication of the figure’s purpose and accessibility, requiring the viewer to read the caption to know what is in each part.
Unnecessary labelling of horizontal axis in part (a), and labels above plots The figure appears cluttered, compromising good composition. As the horizontal axes of the two parts are identical the tick labels and axis label of part (a) are redundant (without them the plots could be moved closer together), and placing the labels for each part above rather than within the plots requires more page space.
Spreadsheets, although useful for data entry, do not facilitate the detailed manipulation required for scientific data plots and do not export graphics in formats (such as SVG) suitable for high quality publication. There is a plethora of free software available for scientific data plotting, of which we have found that R with RStudio and Veusz both produce good quality publication-ready graphics. These two tools work in fundamentally different ways, R being command-line driven and Veusz being mouse-driven.
R is an environment for statistics and graphics. There are several ways to run R, but the most convenient way is probably with RStudio. Within R there are several plotting systems, of which the ggplot2 package is popular because of its consistency and because its use can be integrated with data analysis as part of the Tidyverse collection of packages (of which ggplot2 is a part).
Run these commands to install ggplot2 in R. To run this and other R code in this chapter, copy and paste the R code into a new R script file in RStudio. You can then re-purpose the code to suit your own requirements.
# Install and use all Tidyverse packages
# Unncomment (i.e. remove '#') the next line only if the tidyverse pacakge is not already installed
# install.packages("tidyverse")
library(tidyverse)
# Alternatively, install and use only the ggplot2 package
# Unncomment the next line only if ggplot2 is not already installed
# install.packages("ggplot2")
library(ggplot2)
There is much help available for using R and ggplot2. We have found the following particularly useful:
R for Data Science is a good beginning; it includes an introductory chapter to ggplot2
Cookbook for R provides recipes for common tasks in data analysis, including a chapter on using ggplot2
Data Visualization with ggplot2 is a Cheat Sheet that summarizes the packages’ functionality
r-statistics.co is an educational resource for R that includes tutorials for ggplot2 and extensive examples
BBC Visual and Data Journalim cookbook for R graphics is an R package and cookbook for creating publication-ready graphics in the BBC in-house style
If you want to produce only one or a few plot figures, and are unfamiliar with R, Veusz is a good option. It is designed specifically for the production of publication-ready data plots.
The Veusz help system includes a tutorial and example documents that illustrate the construction of a range of plot types. Plot study 1 includes a video introduction to using Veusz.
The following plot studies illustrate the design of a range of plot types, each of which presents a particular type of data. The design of these figures avoids obfuscation of the respective purposes by adhering to our design principles, including that of accessibility—being interpretable in greyscale without requiring modification.
Study 1 Bar chart
Study 2 Grouped bar chart
Study 3 Two-part line plot
Study 4 Multi-line plot
Figure 63: Number of tree species per country (of a total of 129 taxa that occurred at altitudes >1,500 m and in more than one country) assessed using the IUCN Red List categories and criteria (modified from Figure 2 in Tejedor Garavito et al. (2015); plotted with Veusz).
A bar chart (Figure 63) summarizes the frequency of a categorical variable.
Purpose The number of tree species occurring at altitudes >1,500 m in the Andes and in more than one country (of a total of 129 taxa) that were assessed using the IUCN Red List categories and criteria differed between these six South American countries, being highest in Ecuador and lowest in Argentina.
The figure is presentable in greyscale (Figure 63)—although you could colour the bars if you wish, perhaps in light blue, a colour that would be presentable if the figure was reproduced in greyscale—and the following additional points are of note:
No hierarchy of visual elements is required: the same font size is used for all lettering, with no need for bold or italics for emphasis
Country names are set at 45º to aid readability and to avoid congesting the text
All bars have the same grey shading, as they are measures of the same variable
There is no need in this case to label the horizontal axis, as the names of the countries are sufficiently self-explanatory
Figure 64: Plot Study 1 demonstrates how to draft this plot in Veusz.
Figure 63 was plotted with Veusz. Alternatively, we can plot this figure with the ggplot command in the R pacakage ggplot2. In the first code chunk we import the data and draw a first draft of the bar chart.
Figure 65: Default ggplot style.
# If tidyverse package not installed in RStudio, remove '#' from next line
# The read_csv comamnd is part of the Tidyverse package readr
# install.packages("tidyverse")
library(tidyverse)
# Read the data
data <- read_csv("country,number_of_species
Ecuador,96
Peru,87
Colombia,52
Bolivia,43
Venezuela,18
Argentina,7")
# Default ggplot figure style
# Note that the package is called ggplot2 but the command is 'ggplot'
ggplot(data, aes(country, number_of_species)) + # Specify x first, then y
geom_bar(stat = "identity") + # Plot type
theme_gray()
In the second code chunk, we improve our first attempt, re-ordering the bars by decreasing number of species (we don’t need to import the data again).
Figure 66: Default ggplot style, with countries reordered.
# Reorder countries by no. of species (in descending order)
ggplot(data, aes(reorder(country, -number_of_species), number_of_species)) +
geom_bar(stat = "identity") +
theme_gray()
In our final version we add further code to ready the figure for publication: we modify the theme to black and white, add axis labels, modify a few details to ensure the plot is suitable for publication, and then save two versions, one suitable for publication in print and one suitable for onscreen display.
Figure 67: Plot study 1 formatted for publication with ggplot.
# To format the figure for publication, modify the theme
ggplot(data, aes(reorder(country, -number_of_species), number_of_species)) +
geom_bar(stat = "identity", width = 0.6, fill = "dark grey", colour = "black", size = .1) +
# Set bar width & fill, colour & thickness of bounding line
theme_bw() + # black & white theme
ylab("Number of species") + # Set the y-axis label
scale_y_continuous(expand = c(0, 0), limits = c(0,100)) + # Remove space below graph
theme(
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1.1, size = 8), # Angle x-axis text
axis.text.y = element_text(size = 8), # Size y-axis numbers
axis.ticks = element_blank(), # No ticks
axis.title.x = element_blank(), # No x-axis title
axis.title.y = element_text(size = 8), # Size y-axis title
panel.grid.minor = element_blank(), # No minor grid lines
panel.grid.major.x = element_blank(), # No major x-axis grid lines
panel.grid.major.y = element_line(colour = "dark grey", size = 0.2),
text=element_text(size = 10, family = "Assistant Regular")
)
# Use scalar vector graphics (svg) for publication quality
ggsave ("Plot_study_1.svg", width = 8, height = 8, unit = "cm")
# Use jpg or png for on screen, setting DPI as required
ggsave ("Plot_study_1.jpg", width = 8, height = 8, unit = "cm", dpi = 300)
A grouped bar chart (Figure 68) summarizes the frequency of a categorical variable that has two or more groupings.
Figure 68: Mean (± SE) recruitment of trees (<10 cm DBH) under the parent crown for three autochorous tree species (i.e. that disperse their seed without an external vector) and 19 tree species dispersed by the bonobo Pan paniscus (i.e. zoochorous) at LuiKotale; the dotted line is the threshold for self-replacement of the parent (a modified version of Figure 2 in Beaune (2015); plotted with Veusz).
Purpose The mean number of young trees recruited under the parent crown is below the threshold for self-replacement for 18 of the 19 zoochorous tree species whose seeds are dispersed by the bonobo, whereas the mean number of of young trees recruited for three autochorous (i.e. dispersed by the plant’s own means) tree species is above the threshold.
The graphic produced by Veusz has one minor style issue that we would like to correct—the legend text and labels are too far apart—but the option to resolve this is not available in Veusz. It is, however, only a few minutes work to edit an SVG version of the figure using Inkscape (Figure 69).
Figure 69: Edited version of Figure 68, with an aesthetically more pleasing legend.
The figure is interpretable in greyscale (Fig. 68) without requiring modification, and the following additional points are of note:
The figure is designed at a width of 110 mm, for 2/3 page width in the chosen journal.
The same font size used for all lettering except for species names, which require a smaller font because of space constraints.
Abbreviations of species names are set at 45º to aid readability.
Use of colour and position identifies the two groups of trees.
Horizontal grid lines have not been used, so as not to distract from the horizontal line at 1.
Figure 70: Greyscale vesion of Figure 69.
Let’s plot the figure with ggplot:
# We need the tidyverse package
library(tidyverse)
# Read data, then convert it from 'wide' to 'long' with gather
data <- read_csv(
"species,mean_number_of_poles,SE,position,chory
HM,2.5,0.3726779962,1,Autochorous
ScZ,3.4,0.8717797887,2,Autochorous
StZ,1.2,0.3252462513,3,Autochorous
AM,0.4,0.3055050463,4,Zoochorous
BW,0.4,0.4,5,Zoochorous
CS,0,0,6,Zoochorous
CD,0,0,7,Zoochorous
DS,2.6,0.9451631253,8,Zoochorous
EO,0.8,0.5426273532,9,Zoochorous
FS,0,0,10,Zoochorous
GL,0,0,11,Zoochorous
GO,0.3,0.2108185107,12,Zoochorous
GS,0.5,0.2687419249,13,Zoochorous
IG,0,0,14,Zoochorous
IGr,0,0,15,Zoochorous
KG,0,0,16,Zoochorous
LF,0,0,17,Zoochorous
LS,0.4,0.2449489743,18,Zoochorous
MA ,0,0,19,Zoochorous
MY,0.1,0.1,20,Zoochorous
PL,0.5,0.2236067977,21,Zoochorous
PE,0.1,0.1,22,Zoochorous"
)
data # Look at the data (shows the top 10 rows)
## # A tibble: 22 x 5
## species mean_number_of_poles SE position chory
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 HM 2.5 0.373 1 Autochorous
## 2 ScZ 3.4 0.872 2 Autochorous
## 3 StZ 1.2 0.325 3 Autochorous
## 4 AM 0.4 0.306 4 Zoochorous
## 5 BW 0.4 0.4 5 Zoochorous
## 6 CS 0 0 6 Zoochorous
## 7 CD 0 0 7 Zoochorous
## 8 DS 2.6 0.945 8 Zoochorous
## 9 EO 0.8 0.543 9 Zoochorous
## 10 FS 0 0 10 Zoochorous
## # … with 12 more rows
# For line and rectangle elements all sizes
ggplot(data, aes(reorder(species, position), mean_number_of_poles, fill = chory)) + # Specify x first, then y
geom_bar(stat = "identity", width = 0.8, colour = "black", size = 0.2) + # Set bar width & colour
geom_errorbar(aes(ymin=mean_number_of_poles-SE, ymax=mean_number_of_poles+SE),
width = .2, size = 0.2) + # Error bar
scale_y_continuous(expand = c(0, 0), limits = c(0,4.5)) + # Remove space below graph
scale_fill_manual(values=c("dark green", "light blue")) +
theme_bw(base_family = "Assistant") + # black & white theme, Assistant font
ylab("Mean number ± SE") + # Set the y-axis label
xlab("Species") + # Set the x-axis label
geom_hline(aes(yintercept = 1), linetype = "dashed", size = 0.2) + # Dashed line at y = 1
theme(
line = element_line(size = 0.2), # Thickness of all lines, mm
rect = element_rect(size = 0.2), # Thickness of all rectangles, mm
text = element_text(size = 8), # Size of all text, points
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8), # Angle x-axis text
axis.title.x = element_text(size = 10), # Size x-axis title
axis.title.y = element_text(size = 10), # Size y-axis title
panel.grid.minor = element_blank(), # No minor grid lines
panel.grid.major.x = element_blank(), # No major x-axis grid lines
panel.grid.major.y = element_blank(),
legend.position = c(0.85, .805),
legend.title = element_blank(),
legend.box.background = element_rect(size = 0.2), # Box for legend
legend.key.size = unit(4, unit = 'mm'),
legend.text = element_text(size = 8),
legend.margin = margin(1, 1, 1, 1, unit = 'mm')
)
Figure 71: Plot study 2 formatted for publication with ggplot.
# Use scalar vector graphics (svg) for publication quality
ggsave ("Plot_study_2.svg", width = 12, height = 7, unit = "cm")
# Use jpg or png for on screen, setting DPI as required
ggsave ("Plot_study_2.jpg", width = 12, height = 7, unit = "cm", dpi = 204)
Similarly to Veusz, ggplot does not space the legend items in the most pleasing manner but this can be resolved by a minor edit of the SVG file, using Inkscape (Figure 72).
Figure 72: Edited version of Figure 71, with an aesthetically more pleasing legend.
Figure 73: The numbers of (a) newly licensed Chinese giant salamander farms and (b) salamander hatchlings produced during 2004–2012 in Shaanxi Province, China (a modified version of Figure 2 in Cunningham et al. (2016), plotted with Veusz).
A line plot (Fig. 73) illustrates change in a variable over time.
Purpose From 2004 to 2012 new farms for the Chinese giant salamander were licensed each year, and from 2006 to 2011 there was an increase in the total number of hatchlings produced each year.
The ways in which this figure adheres to our design principles is described in Applying the framework to a data plot.
Let’s plot the figure with ggplot:
# We need the tidyverse and cowplot packages
# Cowplot can combine figures
library(tidyverse)
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
##
## ggsave
# Read data ('NaN' is 'not a number'; i.e. missing data)
data <- read_csv(
"year,new_licences,hatchlings
2004,18,NaN
2005,5,NaN
2006,16,115000
2007,8,180000
2008,18,250000
2009,17,500000
2010,18,600000
2011,16,800000
2012,25,NaN"
)
data # Look at the data (shows the top 10 rows)
## # A tibble: 9 x 3
## year new_licences hatchlings
## <dbl> <dbl> <dbl>
## 1 2004 18 NaN
## 2 2005 5 NaN
## 3 2006 16 115000
## 4 2007 8 180000
## 5 2008 18 250000
## 6 2009 17 500000
## 7 2010 18 600000
## 8 2011 16 800000
## 9 2012 25 NaN
Figure 74: Plot study 3 formatted for publication with ggplot.
# Plot of new_licences assigned to 'licences'
licences <- ggplot(data, aes(year, new_licences)) + # Specify x first, then y
geom_point(size = 0.2) + # Data points
geom_line(size = 0.2) + # Line
theme_bw(base_family = "Assistant") + # black & white theme, Assistant font
ylab("Number") + # y-axis label
scale_x_continuous(breaks=c(2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012)) +
theme(
axis.text.x = element_blank(), # No x-axis ticks
axis.text.y = element_text(size = 8), # Size y-axis tick labels
axis.title.x = element_blank(), # No x-axis title
axis.title.y = element_text(size = 10), # Size y-axis title
panel.grid.minor = element_blank(), # No minor grid lines
panel.grid.major.x = element_blank(), # No major x-axis grid lines
panel.grid.major.y = element_line(colour = "grey", size = 0.2) # Major y-axis grid lines
) +
annotate("text", label = "(a) New licences", x = -Inf, y = Inf, hjust = -0.05, vjust = 2.5,
family = "Assistant", size = 3.53) # Inf parameters centres label at top edge,
# hjust & vjust tune position; size is in mm (10 points = 3.53 mm)
# Plot of number of hatchlings assigned to 'hatchlings'
hatchlings <- ggplot(data, aes(year, hatchlings)) +
geom_point(size = 0.2) +
geom_line(size = 0.2) +
theme_bw(base_family = "Assistant") +
ylab("Number") +
scale_x_continuous(breaks=c(2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012)) +
scale_y_continuous(breaks = c(200000, 400000, 600000, 800000),
labels = c("2", "4", "6", "8")) +
theme(
axis.text.x = element_text(size = 8), # x-axis tick
axis.text.y = element_text(size = 8),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 10),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(colour = "grey", size = 0.2),
plot.margin = unit(c(0, 0, 0, 0), "cm") # Remove margins of (a) to close up plots
) +
annotate("text", label = "(a) Hatchlings", x = -Inf, y = Inf, hjust = -0.05, vjust = 2.5,
family = "Assistant", size = 3.53)
plot_grid(licences, hatchlings, ncol = 1, align = 'v') # Put plots together vertically
# Use scalar vector graphics (svg) for publication quality
ggsave ("Plot_study_3.svg", width = 8, height = 10, unit = "cm")
# Use jpg for on screen, setting DPI as required
ggsave ("Plot_study_3.jpg", width = 8, height = 10, unit = "cm", dpi = 300)
Figure 75: Mean number of meals per day that contained animals (based on 24-hour recall surveys during September 2011–June 2012) consumed in a village on the Masoala Peninsula, Madagascar (modified from Fig. 1 in Borgerson (2016); plotted with Veusz).
A multi-line plot (Figure 75) illustrates the change in categories of a variable over time.
Purpose The number of meals that contained animals was generally greater in the summer months than in the winter months, and the majority of meals contained fish, and to a lesser degree domesticated and forest animals.
The figure is interpretable in greyscale (Fig. 76) without requiring modification, and the following additional points are of note:
Figure 76: Greyscale version of Figure 75.
The relative contributions of fish and domesticated and forest animals to the diet is indicated with the use of colour, and therefore only one symbol and line type (representing food) are required.
A horizontal axis label is not required because it is clear from the abbreviations that the tick marks are months.
Names of months are set at 45º to aid readability.
Let’s plot the figure with ggplot:
Figure 77: Plot study 4 formatted for publication with ggplot..
# We need the tidyverse package
library(tidyverse)
# Read data, then convert it from 'wide' to 'long' with gather
data <- read_csv(
"month_text,month_num,Fish,Forest_animals,Domesticated_animals,Total
Sep.,3,1.3046,0.0355,0.0254,1.3655
Oct.,4,1.2108,0.0228,0.1282,1.3618
Nov.,5,1.4377,0,0.0863,1.524
Dec.,6,1.144,0,0.2957,1.4397
Jan.,7,1.161,0.0112,0.1798,1.352
Feb.,8,1.0676,0.0036,0.0641,1.1353
Mar.,9,1.1779,0.0071,0.0676,1.2526
Apr.,10,0.9605,0.1356,0.0847,1.1808
May,11,0.8333,0.125,0.1012,1.0595
June,12,0.7977,0.0058,0.0694,0.8729"
) %>%
gather(key = "type", value = "meals", Fish, Forest_animals, Domesticated_animals, Total)
data # Look at the data (shows the top 10 rows)
ggplot(data, aes(factor(month_num), meals, group = type)) +
geom_line(aes(colour = type), size = .2) + # Plot lines by type
geom_point(aes(colour = type), size = .4) + # Plot points by type
scale_colour_manual(values = c("#DF5F24", "#00AAF2", "#48AB28", "black"), # Colour lines
breaks = c("Total", "Fish", "Domesticated_animals", "Forest_animals"), # Order legend
labels = c("Total", "Fish", "Domesticated animals", "Forest animals")) + # Rename legend items
scale_x_discrete(labels = c("Sep.", "Oct.", "Nov.", "Dec.", "Jan.", "Feb.", "Mar.", "Apr.", "May", "June")) + # Label x-axis
ylab("Mean no. of meals per day") + # Add y-axis label
theme_bw(base_family = "Assistant") + # Theme the figure
theme(
axis.title.x = element_blank(),
axis.title.y = element_text(size = 10),
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.y = element_line(colour = "dark grey", size = 0.1),
panel.grid.major.y = element_line(colour = "dark grey", size = 0.1),
axis.ticks = element_line(size = .2, colour = "black"),
axis.text.x = element_text(size = 8, angle = 45),
axis.text.y = element_text(size = 8),
legend.position = c(0.29, .493),
legend.title = element_blank(),
legend.box.background = element_rect(size = 0.2),
legend.key.size = unit(4, unit = 'mm'),
legend.text = element_text(size = 8),
legend.margin = margin(1, 1, 1, 1, unit = 'mm'),
plot.margin = unit(c(0.01, 0.01, 0, 0.03), "cm"),
)
# Save as publication quality SVG format
ggsave("Plot_study_4.svg", height = 8.0, width = 8.0, units = "cm")
# Use jpg or png for online, setting DPI as required
ggsave ("Plot_study_4.jpg", width = 8, height = 8, unit = "cm", dpi = 300)
Page built: 2019-07-12