Link to home

Working With Data Sets

Suppose you would like to combine two sets of values into a single object.

dai <- c(0,1,2,5,10,20)  # dai might indicate
# days after inoculation
pinf <- c(0,0,0,5,25,80) # pinf might indicate
# percent infection

 

Try

rbind(dai, pinf)
cbind(dai, pinf)

 

Compare to

rbind(c(dai,pinf))

 

You can create arrays using the following commands

z5 <- array(0,c(5,3))z6 <- array(-1,c(5,3,2))

 

Sometimes arrays with entries 0 or -1 are useful for starting values, where the array will later get filled in with other values. For example, 0 might be the starting infection level or -1 might be a placeholder indicating that infection levels haven’t been entered yet.

Data Structures

Venables et al. (2007) recommend keeping variables in data frames for easier organization. See what df5 looks like

df5 <- data.frame(dai, pinf)

Data frames need to be composed of objects with the same length. If objects don't have the same length, you might get an error or you may get surprising results. Check the following example.

var1 <- 1:10

The above command assigns the sequence of numbers between 1 and 10 to the object named var1.

var2 <- 1:2df6 <- data.frame(var1, var2)

Did df6 look the way you expected it to? This shows how different variable length can produce confusing outcomes when the data.frame command is used.

You can indicate dai within df5 by df5$dai, df5[1], or df51. The name dai is only maintained with the call df5[1]

Lists can also be used to gather objects together and the objects don't need to be of the same length.

list6 <- list(var1, var2)

Consider list61 and list62

The interactive R environment in Windows has a data editing window available through the Menu under Edit.

Moving to and From Files

Reading Data into R

R users can construct data sets in programs such as Microsoft Excel or Notepad and then bring those files into R for further analyses. Data input in R is handled through a series of options, including:

  • read.table
  • read.csv
  • read.csv2
  • read.delim
  • read.delim2

For further information on the differences amongst these methods, use the help() or ? options in R, for example: ?read.table.

The primary information that is required to bring the data into R is the location of the file (e.g., a folder on your hard drive or elsewhere), and information regarding how the data are formatted. The following examples will help explain some of these differences. For the purpose of this tutorial, a Windows system was used and a default “temp” folder was created on the C-drive, but we will also show how to bring data from a source such as “My Documents”, dealing with folder names that incorporate spaces. Examples for importing to a Mac OSX or Linux computer will also be provided. To save the four example files, right click using your mouse, and then select 'Save Link As' or 'Save Target As'.

The two .txt files are spain1.txt and spain2.txt. The two .csv files are spain2.csv and spain3.csv, one file with missing data coded as "?". Remember where you saved these files, because later you will need to tell R where to find them.

In all the example files, there is a heading for maximum and minimum temperature in °F and °C. The files include 141 rows and four columns. Here is an example of reading a .txt file with no missing data.

Working in Windows With No Spaces in File Path

#spain1 will be the name used in R as a data frame
#read.table begins by looking for the file in the location
# you describe and header=T
#indicates that there was a heading for this file
# (header=FALSE is default)
spain1<-read.table('c:/temp/spain1.txt', header=T)
#To examine the first five rows
spain1[1:5,]

 

Output

   MAX  MIN MAXC MINC
1 48.6 32.0 9.2 0.0
2 49.3 46.0 9.6 7.8
3 50.7 40.3 10.4 4.6
4 51.8 48.2 11.0 9.0
5 53.2 44.2 11.8 6.8

Working in Windows When the File Path or Name Contains Spaces

If a file is in a folder with spaces in its name, “\” can be used to indicate the presence of a space. For example, if the file is located in a folder named “My Documents”, the following code can be used:

spain1 <- read.table(
'c:/Documents\ and Settings/"username"/My\ Documents/spain2.txt',
header=T)
# Now, read in spain2 using the .csv format; because of the  
# .csv format, header=T is assumed; if there was no header,
# then a header=FALSE would need to be defined
spain2<-read.csv('c:/temp/spain2.csv')
#Again, as a check, you can examine a subset of the data as:
spain2[1:5,]

 

Output

    MAX  MIN MAXC MINC
1 47.8 42.1 8.8 5.6
2 46.0 41.0 7.8 5.0
3 45.3 39.2 7.4 4.0
4 48.9 38.5 9.4 3.6
5 46.0 32.7 7.8 0.4

Working in Mac OSX or Linux When the File Path or Name Contains No Spaces

Mac OSX and Linux share the same syntax, with a command generally the same as for Windows except for the file structure syntax. Since a typical user will not have "write" access to the root directory, the files should be stored in your /home directory. In this example, we have created a "temp" folder in the user's /home/"username" directory. In this case, to minimize typing and help reduce the chance of mistakes, a tilde "~" is used in place of "/home/username/".

#if you have a Unix or Mac based operating system and a temp
# folder in your home directory, you may bring the dataset
# in as:
spain1<-read.table('~/temp/spain1.txt',header=T)

 

To import data from a file or directory with spaces in the name, a "\" must be placed in the file path where there are spaces for R to understand how to read the directory structure. Suppose the file is called "2003 Data" and is in a directory called "field work.csv" in your home documents folder. The syntax would be "~/Documents/2003\ Data/field\ work.csv"

Once the data are imported, the commands are the same for the different operating systems. The only difference is in the directory structure syntax used to indicate the location of the files.

Working With Missing Data Values

#In the last example, there are two missing observations in
# the first row and these were coded as "?" in the original
# data entry. R uses "NA" to represent missing observations,
# but instead of having change "?" to "NA" before read in
# the table, you can tell R what the missing data
# looks like and R will change "?" to "NA":
spain3<-read.csv('c:/temp/spain3.csv', na.strings="?")
spain3[1:5,]

 

Output

   MAX  MIN MAXC MINC
1 NA 42.1 NA 5.6
2 46.0 41.0 7.8 5.0
3 45.3 39.2 7.4 4.0
4 48.9 38.5 9.4 3.6
5 46.0 32.7 7.8 0.4

Working With Many Data Files

If you work with many data files it may be convenient to set a working directory using setwd().

For example:

setwd('c:/temp')
#Read in the spain2.txt file from this folder as:
spain2<-read.table('spain2.txt',header=T)
spain2[1:5,]

 

Output

    MAX  MIN MAXC MINC
1 47.8 42.1 8.8 5.6
2 46.0 41.0 7.8 5.0
3 45.3 39.2 7.4 4.0
4 48.9 38.5 9.4 3.6
5 46.0 32.7 7.8 0.4

 

Next: Character Data