Importing Data

Introduction

The data that we work with comes in many forms from many different sources. However, all of them can be easily imported into R, where they can be combined (if necessary) and analyzed. This resource covers the most common ways in which we can import data for our visualization purposes.

Note Some of these techniques will require you to install a package the first time that you try it out. The code for installing a package is as follows:

Code in R
install.packages("NAMEOFPACKAGE")

Static files

Static files are the easiest to input. As long as the file that you are entering into R is in the same folder as your project, you will not have to enter any long folder structure.

Standard .csv files

Code in R
library(tidyverse)
NameofDataset <- read_csv("NameofFile.csv")

read_csv is a powerful function from the tidyverse package. Some extra features as follows:

  • skip - Number of lines to skip before reading data. This is useful if the file has a header that you don’t need
  • col_types - R, like Excel, will sometimes make a best guess at what type of data is being inputted. Unlike Excel, however, you can tell R what to do; this is most useful when you have numbers with a leading 0 that are supposed to be ID’s. In this case, if you input read_csv("FILENAME.csv", col_types="c"), it will make sure that all columns are first read in as characters (which you can convert to numbers later on)

Excel files

Code in R
library(readxl)
NameofXLDataset <- read_xlsx("NameofFile.xlsx", sheet = "Sheet1", skip=0, col_types = "text")

Here, we rely on the readxl package. Note that this call would read in the sheet of the Excel file called Sheet 1, would not skip any rows, and would read in each column as a text field.

RData files (internal to Research Team)

R has its own specialized data type. While not useful externally, many of the internal scripts from the Research Team take advantage of this data type; it’s main strength is that you can save many files of different types as one RData file. For instance, you could save the dataframe and graph for a file in one place to share.

The call to load doesn’t require any library. It will automatically load the dataframes in the same way that they showed up in R before.

Code in R
load("FileName.RData")

Google Sheets

Reading Google Sheets into R is pretty easy thanks to the googlesheets4 package. We will use it to input the sample dataset found here.

If you need to, install the googlesheets4 package as follows.

Code in R
## Only run this once
install.packages("googlesheets4")

Then, you should be able to read in the sheet as follows:

  1. Look at the Share and Copy Link URL for the sheet.

  2. Copy the link within the read_sheet function of googlesheets4. Example below will give you the titanic dataset.

Code in R
library(googlesheets4)
titanic <- read_sheet("https://docs.google.com/spreadsheets/d/1BP9w6LHEx5z5FDRgVO_Q3mU8MMnU1yCEtiAU4RhBlEY/edit?usp=sharing", skip = 1)

## Here, I purposefully used the `skip` command to remove the header seen in the Google Sheet.

Note that, when you run this, your console will ask you to authorize your Google account. Once you have done that, it will pull in the dataset in the background, and will store the credentials for future work in R.

Salesforce

Installation and Authentication

In a similar manner to google sheets, gathering data from salesforce requires you to install a special package named salesforcer

Code in R
install.packages("salesforcer")

## Only run to install the package the first time you need it

Once you have done that, you’ll need to authenticate yourself to make sure that R knows who you are in salesforce. The easiest way to do that is to log into salesforce in your internet browser, and then run the following in R.

Code in R
library(salesforcer)
sf_auth()

Running a report

The easiest way to get data from Salesforce to R is through the sf_run_report command, which will take a report from Salesforce and download it as a dataset.

Steps to import a dataset into Salesforce as follows (note that the links and code below will only work for EL employees with the correct Salesforce Credentials):

  1. Log into Salesforce and make or find a report that you are interested in analyzing. One that I return to from time to time is this report on our current opportunities with partners.
    Note: R likes tidy data sets (where each row is a separate piece of data). If your report has group rows, then you will need to remove them before you go on to the next step.
  2. Save the report (if you made it or edited it). You will need the ID for the report - this can be found in the URL for the report, as seen below.
  3. Run the following code in R, taking the Report ID that you are using.
Code in R
SalesforceData <- sf_run_report("00OQQ000001k6uf2AA")

Surveys

Other than Dimensions, where dataframes must be pulled manually from the back-end, we also gather data using two other survey platforms that have packages built in R for automated access. Each one allows you to find and pull survey data all from within R.

As a note, the data that each survey platform pulls may not be tidy as regards your use of the data, as each row is a survey entry, as opposed to an answer to a singular question. Understanding how pivots work will allow you to easily reframe the data if necessary for analysis and visualization.

Note - the instructions for each platform require you to have an access token or other method of authorization within Survey Monkey or Qualtrics. Please see someone on the research team to discuss your data needs and gain access.

Survey Monkey

The svmkR package was developed by Survey Monkey as an interface for R. It can technically create surveys as well as download them, but we are far more interested in using it to grab information from surveys that already exist.

Steps to use as follows:
1. Run the following code with the token provided by someone on the Research Team. You only need to run this once, as it will install the package and provide the access token.

Code in R
## If the following does not run, it means that you also need to install devtools using
## install.packages("devtools")

devtools::install_github("soubhikbarari/svmkR")
usethis::edit_r_profile()
## this will open up an R Profile where you can save your passwords. Paste the following inside this and save:
options(sm_oauth_token = "ACCESSTOKEN")
  1. The following code will grab all of the surveys that exist within surveymonkey. Find the ID for the survey you are interested in.
Code in R
library(svmkR)
surveys <- browse_surveys(200) # see your most recent 200 surveys
  1. Once you have that, you can download the survey of your choice by running the following.
Code in R
SurveyMonkeySurvey <- fetch_survey_obj(OBJECTIDNUMBER) %>%
  parse_survey()

To learn more about this package, see the documentation here

Qualtrics

The qualtRics package was developed by the rOpenSci community in order to foster more reproducible and open systems for data analysis. In order to use it, you will need a token that is particular to your account in Qualtrics. You can find and use it by doing the following:

  1. After logging into EL’s Qualtrics portal, click on your initals in the top right to access “Account Settings”. If you scroll down, you should see “Token” with a long string of characters (and if not, you can click “Generate Token” to create one). You will need that string of characters below.

  2. Run the following to install the package and set the token.

Code in R
install.packages("qualtRics")
library(qualtRics)

qualtrics_api_credentials(api_key = "TOKENFROMSTEP1", 
                          base_url = "eleducation.qualtrics.com",
                          install = TRUE)

## Only run the following if you immediately wish to use the Qualtrics platform - it will reload R using the credentials that were just placed there
readRenviron("~/.Renviron")
  1. As with Survey Monkey, to see all surveys in Qualtrics and their current status run the following:
Code in R
surveys <- all_surveys()
  1. Find the id for the survey you are interested in:
Code in R
Qualtricssurvey <- fetch_survey("id of survey")
  1. When you look at this survey, you will notice that the questions are all in a subheader. In order to pull them out, you will need a different data file, which you can pull in later when wrangling your data (probably through a join)
Code in R
questions <- extract_colmap(Qualtricssurvey)