Finding data

1 Built-in datasets

R has many datasets that you can use to play around with built in to the base installation. First, you need to ensure that datasets is installed:

install.packages("datasets")

If you want to check in RStudio if datasets is already installed, then you can do so:

  1. Look in the bottom right panel of RStudio.
  2. Click on the Packages tab.
  3. At the top of this list of packages are the User Library packages. Scroll down that tab until you get to the end of that list.
  4. You should now see the System Library packages.
  5. In this set of packages, you should see datasets.
  6. If it is clicked, leave it alone. If it isn’t, then click it in order to install it.

Now, go back to the R Console and type the following:

library("datasets")

You now have lots of datasets that you can experiment with. But which datasets? Execute the following to find out:

data()

If you want to get more detail on all of these datasets, then go to this page.

2 Public data sets on the web

2.1 Data.gov

This is the “home of the U.S. government’s open data.” And there is a lot of it. This page has their most popular data sets. This link points to popular CSV-formatted data sets from the Federal government—and there were over 4700 of them in September 2023.

If you’re looking for some interesting data to play with, this isn’t a bad place to start.

2.2 Kaggle

Kaggle has been around for quite a long time. It is a gathering place for those interested in and active in either machine learning or artificial intelligence. As such, it has over 250k datasets available for download and exploration.

Those datasets are available for exploration on this page. As of September 2023, over 7,000 CSV datasets exist that have the absolute highest usability while over 18,000 datasets have an 8.0 or higher rating for usability.

2.3 Google datasets

Not too surprisingly, this Google search tool is meant to provide a single search location for datasets available across the Web. As of February 2023, they state that they provide access to over 45 million datasets.

The best way to learn about this tool is to play around with it. Here is what I got when I typed in “global temperature” and then set the “Download format” to “Tabular”. Notice the filtering tools across the top and the datasets available down the left side of the screen.