Fostering
open science with R

Karthik Ram

@_inundata


The Royal Society of London, 1660

Nullius in verba



An article about [computational] science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete set of instructions [and data] which generated the figures.

- David Donoho (Stanford University)


A reproducibility crisis



A reproducibility crisis



Science

A published research paper

A published research paper: PDF + code + data + raw markdown + statistical implementation...

The way we do science is also changing

The three pillars of scientific practice

Experimentation, theory and computing

The four pillars of scientific practice

Experimentation, theory, computing, data-intensive


How and where these data were obtained is often a black box

Data Life Cycle

Source: Michener, 2006 Ecoinformatics.



Open Science



Data and computer code should be made available at an early stage
Article

Open data + code

Source: Wolkovich et al. Global Change Biology, 2012.





Enable access to scientific data repositories, full-text of articles, and science metrics and also facilitate a culture shift in the scientific community.



More info @ ropensci.org/packages

      
 Data
Treebase 
Fishbase
GBIF
Dryad

      
  Journals
PLOS
Springer
Mendeley
textmine
pensoft
      
 Data Viz
rMaps
plot.ly

      
  Data Publication figshare
git2r
rdat
DataONE 
rAltmetric
EML

Access to a variety of scientific data

400+ million observation records
Full text 100k articles
Data from papers in > 200 journals

Search full text of 100k+ open access articles - rplos


library(rplos)
plot_throughtime(list("reproducible science"), 500)


Accessing data behind papers - rdryad

library(rdryad)
dryaddat <- download_url("10255/dryad.1759")
# Get a file given the URL
file <- dryad_getfile(dryaddat)
dim(file)
## [1] 131  30
Dataset

50+ years of fisheries data

library(rfisheries)
species <- of_species_codes()
# Returns 11k species of commericially important fish
who <- c("TUX", "COD", "VET", "NPA")
species_data <- plyr::ldply(who,function(x) of_landings(species = x))

World Bank climate portal rWBclimate

library(rWBclimate)
eu_basin <- create_map_df(Eur_basin)
eu_basin_dat <- get_ensemble_temp(Eur_basin, "annualanom", 2080, 2100)

Data Viz

Interactively visualize and analyze data



Taxon specific databases - AntWeb

library(AntWeb)
acd <- aw_data(genus = "acanthognathus")
aw_map(acd)

Interactive figures - plotly


Interactive figures - plotly


Altmetrics: Alternative to using only citations, NOT Alternative to citations

Document and upload your data

Easily deposit data alongside analysis



Sharing data - (rfigshare)

Using figshare's API it is possible to share figures, data and any other object generated in R and obtain a data citation.


library(rfigshare)
id <- fs_create("Fisheries dataset", "A dataset containing catch for 4 important commerical fish species","dataset")
fs_upload(id, "dat.csv")

The rOpenSci workflow

The rOpenSci workflow


The scientific workflow


The scientific workflow



Roadmap for 2014-15

Made possible by generous support from

ropensci.org


karthik.github.io/useR2014

Type M for and G to go to specific slide

/

#