<- read_csv("my_data.csv") my_data
Importing data
You can get a cheatsheet here.
1 The process of importing data
R makes this process quite simple, but you still have to validate that steps are working as you want them.
Determine if the file containing the data has data separated by commas or tabs. You can do this in at least one of two ways:
- Load it in your favorite text editor (not word processor) and look at it.
- Single click on the file name in the
Files
tab, chooseView File
, and then look at it within R Studio.
Import the data as described below.
Use
glimpse()
(or some other command as described on this page about examining data) to determine the data type of each column.If the data types are as you would want them to be, then you are done.
If the data types are not as you would want them to be, then re-import the data but add the
col_types
argument (as described below). Then go back to step 3.
2 Importing from a CSV
/TSV
file
CSV
means “comma-separated values” and TSV
means “tab-separated values.” Working with one is exactly the same as working with the other, so what we say about CSV
holds equally for TSV
.
ACSV
file is a file (many times exported from Excel or a database) containing rows of numeric and string values. You need to import the contents of this file into R so that you can manipulate and analyze it.
The following is the general structure of the command that reads a CSV
file into R. Using the command below, you are telling R to read the my_data.csv
file (which must be in your working directory if you are to use this command exactly as written!) and put the results into my_data
(which you will then be able to manipulate with R commands in RStudio):
For explanatory purposes, we are going to read a file student_info.csv
and put the results into st_info
:
<- read_csv("../data/class202405/student_info.csv",
st_info show_col_types = FALSE)
Notice that this file is not in the working directory as it refers to a separate directory.
Throughout this and other pages, we will use this st_info
data frame.
2.1 How working with TSV
files differs
For the above, substitute read_tsv()
for read_csv()
if you are dealing with a file with tab-separated values. Everything else is the same.
2.2 Adding type information with col_types
Suppose that you have imported a table and the income
column is represented as a character string instead of an integer and the zip
column is represented as a double instead of a string. You would fix both of these problems with the following import statement:
<- read_csv("data/this-is-my-data.csv",
my_table col_types = cols(
income = col_integer(),
zip = col_character()
))
The other common column data type is col_double()
.
3 From an Excel file
Except under the most exceptional of circumstances, it is almost assuredly easier, more straight-forward, and less prone to error to export a CSV file from Excel than to go through the process of reading directly from an Excel file.
As such, we will not address this here.