Data input and output-R load data in files

Keywords: Excel

problem

You want to load data from a file.

programme

Text files with delimiters

The simplest way to input data is to save it as a text file with delimiters, such as tabs or commas.

data <- read.csv("datafile.csv")

# Import a CSV file without a header
data <- read.csv("datafile-noheader.csv", header=FALSE)

The function read.table() is a more general function that allows you to set separators, whether or not there is a header, whether or not a string has quotes, and so on. Use? read.table to see more details.

data <- read.table("datafile-noheader.csv",
                   header=FALSE,
                   sep=","         # Tabulation-separated files are "\t"
)

Open File Selector

Some platforms use file. select () to open a dialog window for file selection; others prompt users to enter a file name.

data <- read.csv(file.choose())

Think of strings as factor s or character s

By default, strings in the data are converted to factors. If you load data with read.csv(), all text columns are treated as factors, even though it makes more sense to process them as strings. To do this, use stringsAsFactors=FALSE:

data <- read.csv("datafile.csv", stringsAsFactors=FALSE)

# Convert a column into a factor
data$Sex <- factor(data$Sex)

Another way to load them as factors and convert a column into characters is to

data <- read.csv("datafile.csv")

data$First <- as.character(data$First)
data$Last  <- as.character(data$Last)

# Another way: Convert two columns named "First" and "Last"
stringcols <- c("First","Last")
data[stringcols] <- lapply(data[stringcols], as.character)

Importing Files from the Web

You can also load data from a URL. These (long) URLs can load related files.

data <- read.csv("http://www.cookbook-r.com/Data_input_and_output/Loading_data_from_a_file/datafile.csv")


# Read CSV files without headers
data <- read.csv("http://www.cookbook-r.com/Data_input_and_output/Loading_data_from_a_file/datafile-noheader.csv", header=FALSE)

# Manual addition of headers
names(data) <- c("First","Last","Sex","Number")

The data files used above are:

datafile.csv:

"First","Last","Sex","Number"
"Currer","Bell","F",2
"Dr.","Seuss","M",49
"","Student",NA,21

datafile-noheader.csv:

"Currer","Bell","F",2
"Dr.","Seuss","M",49
"","Student",NA,21

Fixed-width text file

If your data column width is fixed, as follows:

  First     Last  Sex Number
 Currer     Bell    F      2
    Dr.    Seuss    M     49
    ""   Student   NA     21

One way to read this data is to simply use the read.table() function strip.white=TRUE to clear extra spaces.

read.table("clipboard", header=TRUE, strip.white=TRUE)

However, your data file may contain spatial columns, or columns may not be separated by spaces, so that the scores list represents six different measurements, each from 0 to 3.

subject  sex  scores
   N  1    M  113311
   NE 2    F  112231
   S  3    F  111221
   W  4    M  011002

In this case, you may need to use the read.fwf() function. If you read columns from files, it requires them to be separated by separators (tabs, spaces, commas). If there are multiple spaces to separate them, as in the following example, you need to specify the name of the column directly.

# Specifies the name of the column
read.fwf("myfile.txt", 
         c(7,5,-2,1,1,1,1,1,1), # Column width, -2 means abandoning these columns
         skip=1,                # Skip the first line (including the header)
         col.names=c("subject","sex","s1","s2","s3","s4","s5","s6"),
         strip.white=TRUE)      # Skip the lead and tail of each data
#>   subject sex s1 s2 s3 s4 s5 s6
#> 1    N  1   M  1  1  3  3  1  1
#> 2    NE 2   F  1  1  2  2  3  1
#> 3    S  3   F  1  1  1  2  2  1
#> 4    W  4   M  0  1  1  0  0  2
# subject sex s1 s2 s3 s4 s5 s6
#    N  1   M  1  1  3  3  1  1
#    NE 2   F  1  1  2  2  3  1
#    S  3   F  1  1  1  2  2  1
#    W  4   M  0  1  1  0  0  2

# If the first line is as follows:
# subject,sex,scores
# We can use header=TRUE
read.fwf("myfile.txt", c(7,5,-2,1,1,1,1,1,1), header=TRUE, strip.white=TRUE)
#> Error in read.table(file = FILE, header = header, sep = sep, row.names = row.names, : more columns than column names
# Error: More column proportions

Excel file

The read.xls function in the gdata package reads Excel files.

library(gdata)
data <- read.xls("data.xls")

gdata package, see http://cran.r-project.org/doc/manuals/R-data.html#Reading-Excel-spreadsheets.

Installation of packages, see Basics - Install and use R packages

SPSS data

The read.spss function in the foreign package reads the SPSS file.

library(foreign)
data <- read.spss("data.sav", to.data.frame=TRUE)

Links to the original text: http://www.cookbook-r.com/Data_input_and_output/Loading_data_from_a_file/

Posted by bigdaddysheikh on Sun, 26 May 2019 14:11:01 -0700