Importing data

You can get a cheatsheet here.

1 The process of importing data

R makes this process quite simple, but you still have to validate that steps are working as you want them.

  1. Determine if the file containing the data has data separated by commas or tabs. You can do this in at least one of two ways:

    • Load it in your favorite text editor (not word processor) and look at it.
    • Single click on the file name in the Files tab, choose View File, and then look at it within R Studio.
  2. Import the data as described below.

  3. Use glimpse() (or some other command as described on this page about examining data) to determine the data type of each column.

  4. If the data types are as you would want them to be, then you are done.

  5. If the data types are not as you would want them to be, then re-import the data but add the col_types argument (as described below). Then go back to step 3.

2 Importing from a CSV/TSV file

CSV means “comma-separated values” and TSV means “tab-separated values.” Working with one is exactly the same as working with the other, so what we say about CSV holds equally for TSV.

ACSV file is a file (many times exported from Excel or a database) containing rows of numeric and string values. You need to import the contents of this file into R so that you can manipulate and analyze it.

The following is the general structure of the command that reads a CSV file into R. Using the command below, you are telling R to read the my_data.csv file (which must be in your working directory if you are to use this command exactly as written!) and put the results into my_data (which you will then be able to manipulate with R commands in RStudio):

my_data <- read_csv("my_data.csv")

For explanatory purposes, we are going to read a file student_info.csv and put the results into st_info:

st_info <- read_csv("../data/class202405/student_info.csv",
                    show_col_types = FALSE)

Notice that this file is not in the working directory as it refers to a separate directory.

Throughout this and other pages, we will use this st_info data frame.

2.1 How working with TSV files differs

For the above, substitute read_tsv() for read_csv() if you are dealing with a file with tab-separated values. Everything else is the same.

2.2 Adding type information with col_types

Suppose that you have imported a table and the income column is represented as a character string instead of an integer and the zip column is represented as a double instead of a string. You would fix both of these problems with the following import statement:

my_table <- read_csv("data/this-is-my-data.csv",
                     col_types = cols(
                       income = col_integer(),
                       zip = col_character()
                     ))

The other common column data type is col_double().

3 From an Excel file

Except under the most exceptional of circumstances, it is almost assuredly easier, more straight-forward, and less prone to error to export a CSV file from Excel than to go through the process of reading directly from an Excel file.

As such, we will not address this here.