R for reproducible scientific analysis
- Use the escape key to cancel incomplete commands or running code (Ctrl+C) if you’re using R from the shell.
- Basic arithmetic operations follow standard order of precedence:
- Scientific notation is available, e.g:
- Anything to the right of a
#is a comment, R will ignore this!
- Functions are denoted by
function_name(). Expressions inside the brackets are evaluated before being passed to the function, and functions can be nested.
- Mathematical functions:
- Comparison operators:
all.equalto compare numbers!
<-is the assignment operator. Anything to the right is evaluate, then stored in a variable named to the left.
lslists all variables and functions you’ve created
rmcan be used to remove them
- When assigning values to function arguments, you must use
- To create a new project, go to File -> New Project
- Install the
packratpackage to create self-contained projects
install.packagesto install packages from CRAN
libraryto load a package into R
packrat::statusto check whether all packages referenced in your scripts have been installed.
- To access help for a function type
- Use quotes for special operators e.g.
- Use fuzzy search if you can’t remember a name ‘??search_term’
- CRAN task views are a good starting point.
- Stack Overflow is a good place to get help with your code.
?dputwill dump data you are working from so others can load it easily.
sessionInfo()will give details of your setup that others may need for debugging.
Individual values in R must be one of 5 data types, multiple values can be grouped in data structures.
typeof(object)gives information about an items data type.
- There are 5 main data types:
?numericreal (decimal) numbers
?integerwhole numbers only
?logicalTRUE or FALSE values
?NaN“not a number” for undefined values (e.g.
?NULLa data structure that doesn’t exist
NAcan occur in any atomic vector.
Infcan only occur in complex, integer or numeric type vectors. Atomic vectors are the building blocks for all other data structures. A
NULLvalue will occur in place of an entire data structure (but can occur as list elements).
Basic data structures in R: - atomic
?vector (can only contain one type) -
?list (containers for other objects) -
?data.frame two dimensional objects whose columns can contain different types of data -
?matrix two dimensional objects that can contain only one type of data. -
?factor vectors that contain predefined categorical data. -
?array multi-dimensional objects that can only contain one type of data
Remember that matrices are really atomic vectors underneath the hood, and that data.frames are really lists underneath the hood (this explains some of the weirder behaviour of R).
?vector() All items in a vector must be the same type. - Items can be converted from one type to another using coercion. - The concatenate function ‘c()’ will append items to a vector. -
seq(from=0, to=1, by=1) will create a sequence of numbers. - Items in a vector can be named using the
?factor() Factors are a data structure designed to store categorical data. -
levels() shows the valid values that can be stored in a vector of type factor.
?list() Lists are a data structure designed to store data of different types.
?matrix() Matrices are a data structure designed to store 2-dimensional data.
Data Frames -
?data.frame is a key data structure. It is a
cbind() will add a column (vector) to a data.frame. -
rbind() will add a row (list) to a data.frame.
Useful functions for querying data structures: -
?str structure, prints out a summary of the whole data structure -
?typeof tells you the type inside an atomic vector -
?class what is the data structure? -
?head print the first
n elements (rows for two-dimensional objects) -
?tail print the last
n elements (rows for two-dimensional objects) -
?dimnames retrieve or modify the row names and column names of an object. -
?names retrieve or modify the names of an atomic vector or list (or columns of a data.frame). -
?length get the number of elements in an atomic vector -
?dim get the dimensions of a n-dimensional object (Won’t work on atomic vectors or lists).
read.csvto read in data in a regular structure
separgument to specify the separator
- “,” for comma separated
- “” for tab separated
- Other arguments:
header=TRUEif there is a header row
- Elements can be accessed by:
[single square brackets:
- extract single elements or subset vectors
xextracts the first item from vector x.
- extract single elements of a list. The returned value will be another
- extract columns from a data.frame
[with two arguments to:
- extract rows and/or columns of
x[1,2]will extract the value in row 1, column 2.
x[2,:]will extract the entire second column of values.
[[double square brackets to extract items from lists.
$to access columns or list elements by name
negative indices skip elements
- Put code whose parameters change frequently in a function, then call it with different parameter values to customize its behavior.
- The last line of a function is returned, or you can use
- Any code written in the body of the function will preferably look for variables defined inside the function.
- Document Why, then What, then lastly How (if the code isn’t self explanatory)
- figures can be created with the grammar of graphics:
ggplotto create the base figure
aesthetics specify the data axes, shape, color, and data size
geometry functions specify the type of plot, e.g.
geometry functions also add statistical transforms, e.g.
scalefunctions change the mapping from data to aesthetics
facetfunctions stratify the figure into panels
aesthetics apply to individual layers, or can be set for the whole plot inside
themefunctions change the overall look of the plot
- order of layers matters!
ggsaveto save a figure.
- Most functions and operations apply to each element of a vector
*applies element-wise to matrices
%*%for true matrix multiplication
TRUEif any element of a vector is
TRUEif all elements of a vector are
ifcondition to start a conditional statement,
else ifcondition to provide additional tests, and
elseto provide a default
- The bodies of the branches of conditional statements must be indented.
==to test for equality.
X && Yis only true if both X and Y are
X || Yis true if either X or Y, or both, are
- Zero is considered
FALSE; all other numbers are considered
- Nest loops to operate on multi-dimensional data.
write.tableto write out objects in regular format
quote=FALSEso that text isn’t wrapped in
- Use the
xxplyfamily of functions to apply functions to groups within some data.
- the first letter,
list corresponds to the input data
- the second letter denotes the output data structure
- Anonymous functions (those not assigned a name) are used inside the
plyrfamily of functions on groups within data.
?selectto extract variables by name.
?filterreturn rows with matching conditions.
?group_bygroup data by one of more variables.
?summarizesummarise multiple values to a single value.
?mutateadd new variables to a data.frame.
- Combine operations using the
- ‘?gather’ convert data from wide to long format.
- ‘?spread’ convert data from long to wide format.
- ‘?seprarate’ split a single value into multiple values.
- ‘?unite’ merge multipe values into a single value.
- Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
- Write tests before writing code in order to help determine exactly what that code is supposed to do.
- Know what code is supposed to do before trying to debug it.
- Make it fail every time.
- Make it fail fast.
- Change one thing at a time, and for a reason.
- Keep track of what you’ve done.
- Be humble