Building ecoinformatics at BigCB



@_inundata  
karthik.github.io/bigcb_demo



Access all available ecological data



Liberating 400+ million observation records
Full text 100k articles
Data from papers in > 200 journals

Data from Dryad

library(rdryad)
dryaddat <- download_url("10255/dryad.1759")
# Get a file given the URL
file <- dryad_getfile(dryaddat)
dim(file)
## [1] 131  30

Records from Berkeley's museum

library(ecoengine)
pinus_data <- ee_observations(genus = "Pinus", georeferenced = TRUE, page = 1:25)
nrow(pinus_data$data)
# [1] 625

Download fisheries time series data

library(rfisheries)
library(plyr)
library(reshape2)
species <- of_species_codes()
who <- c("TUX", "COD", "VET", "NPA")
by_species <- lapply(who, function(x) of_landings(species = x))
names(by_species) <- who
dat <- melt(by_species, id = c("catch", "year"))[, -5]
write.csv(dat, file = "dat.csv")
names(dat) <- c("catch", "year", "type", "a3_code")
# plot the data
ggplot(dat, aes(year, catch)) + geom_line() + facet_wrap(~a3_code, scales = "free_y")

Resolve taxonomic names

library(taxize)
temp <- gnr_resolve(names = c("Helianthos annus", "Homo saapiens"))
temp[, -c(1, 4)]
##                  matched_name data_source_title
## 1        Helianthus annuus L. Catalogue of Life
## 2         Helianthus annus L.               EOL
## 3            Helianthus annus               EOL
## 4            Helianthus annus     uBio NameBank
## 5 Homo sapiens Linnaeus, 1758 Catalogue of Life

Get complete and current taxonomic records

classification(c("Helianthus annuus"), db = "ncbi")
## $`Helianthus annuus`
##                    name         rank
## 1    cellular organisms      no rank
## 2             Eukaryota superkingdom
## 3         Viridiplantae      kingdom
## 4          Streptophyta       phylum
## 5        Streptophytina      no rank
## 6           Embryophyta      no rank
## 7          Tracheophyta      no rank
## 8         Euphyllophyta      no rank
## 9         Spermatophyta      no rank
## 10        Magnoliophyta      no rank
## 11      Mesangiospermae      no rank
## 12       eudicotyledons      no rank
## 13           Gunneridae      no rank
## 14         Pentapetalae      no rank
## 15             asterids     subclass
## 16          campanulids      no rank
## 17            Asterales        order
## 18           Asteraceae       family
## 19          Asteroideae    subfamily
## 20 Heliantheae alliance      no rank
## 21          Heliantheae        tribe
## 22           Helianthus        genus
## 23    Helianthus annuus      species
## 
## attr(,"class")
## [1] "classification"
## attr(,"db")
## [1] "ncbi"

Interactively visualize and analyze data

Explore these data interactively, including any plots you might make.



Visualize your data


Link to full map

Interactive figures


Interactive figures


Document and upload your data

Easily deposit data alongside analysis

Our tools make it easy to deposit your data after publication. This includes generating metadata, getting dois,



Sharing data - (figshare)

Using figshare's API it is possible to share figures, data and any other object generated in R and obtain a data citation.


library(rfigshare)
id <- fs_create("Fisheries dataset", "A dataset containing catch for 4 important commerical fish species",
    "dataset")
fs_upload(id, "dat.csv")


Tools we develop are extremeley easy to install

install.packages("ecoengine", dependencies = TRUE)
# Requires R version 3.0.1 or higher

Overlaying species occurrence data with climate data

library("rWBclimate")
usmex <- c(273:284, 328:365)
### Download KMLs and read them in.
usmex.basin <- create_map_df(usmex)
## Download temperature data
temp.dat <- get_historical_temp(usmex, "decade")
temp.dat <- subset(temp.dat, temp.dat$year == 2000)
# Bind temperature data to map data frame
usmex.map.df <- climate_map(usmex.basin, temp.dat, return_map = F)

Download occurrence records from various sources

splist <- c("Acer saccharum", "Abies balsamea", "Arbutus xalapensis", "Betula alleghaniensis",
    "Chilopsis linearis", "Conocarpus erectus", "Populus tremuloides", "Larix laricina")
## get data from bison and gbif
splist <- sort(splist)
out <- occ(query = splist, from = c("bison", "ecoengine"), limit = 100)
## scrub names
usmex.map <- ggplot() + geom_polygon(data = usmex.map.df, aes(x = long, y = lat,
    group = group, fill = data, alpha = 1)) + scale_fill_continuous("Average annual \n temp: 1990-2000",
    low = "yellow", high = "red") + guides(alpha = F) + theme_bw(10)
## And overlay of gbif data
usmex.map <- usmex.map + geom_point(data = out_df, aes(y = latitude, x = longitude,
    group = common, colour = common)) + xlim(-125, -59) + ylim(5, 55)
print(usmex.map)

Mean temperature and latitude


Capturing the entire workflow (1/2)


Capturing the entire workflow (2/2)


Automatically include provenance information in your manuscript


The version of data and code used to generate this version of the manuscript is available at commit reference `r markdown_link()`

When parsed in R becomes:

The version of data and code used to generate this version of the manuscript is available at commit reference e403e67

ropensci.org

ropensci on GitHub
@ropensci on Twitter
Questions or comments to: karthik dot ram at berkeley dot edu

To navigate this presentation, type M to see all slides.
G to go to a specific slide

/

#