R Basics / Course Business ! We’ll be using a sample dataset in class today: ! ! ! CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats background still available. Thanks to everyone who’s responded! R Basics R Basics R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help R Commands ! Simplest way to interact with R is by typing in commands at the > prompt: R STUDIO R R as a Calculator ! Typing in a simple calculation shows us the result: ! ! ! 608 + 28 What’s 11527 minus 283? Some more examples: ! ! ! 400 / 65 2 * 4 5 ^ 2 (division) (multiplication) (exponentiation) Functions ! More complex calculations can be done with functions: sqrt(64) ! What the function is (squareroot) Can often read these left to right (“square root of 64”) ! ! In parenthesis: What we want to perform the function on What do you think this means? ! abs(-7) Arguments ! ! Some functions have settings (“arguments”) that we can adjust: round(3.14) - ! Rounds off to the nearest integer (zero decimal places) round(3.14, digits=1) - One decimal place Nested Functions Nested Functions ! We can use multiple functions in a row, one inside another - - sqrt(abs(-16)) “Square root of the absolute value of -16” Don't get scared when you see multiple parentheses! - Can often just read left to right - R first figures out the thing nested in the middle • Can you round off the square root of 7? ! Using Multiple Numbers at Once ! When we want to use multiple numbers, we concatenate them c(2,6,16) ! A list of the numbers 2, 6, and 16 Sometimes a computation requires multiple numbers - ! - ! mean(c(2,6,16)) Also a quick way to do the same thing to multiple different numbers: - sqrt(c(16,100,144)) R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help Course Documents: Sample Data: Week 2 ! ! Reading plausible versus implausible sentences “Scott chopped the carrots with a knife.” Measure reading time on final word “Scott chopped the carrots with a spoon.” Note: Simulated data; not a real experiment. Course Documents: Sample Data: Week 2 ! ! ! ! ! Reading plausible versus implausible sentences Reading time on critical word 36 subjects Each subject sees 30 items (sentences) —half plausible, half implausible Interested in changes over time, so we’ll track serial position (trial 1 vs trial 2 vs trial 3…) Reading in Data ! Make sure you have the dataset at this point if you want to follow along: Course Documents " Sample Data " Week 2 Reading in Data – RStudio ! Navigate to the folder in lower-right More -> Set as Working Directory ! ! Open a “comma-separated value” file: - experiment <-read.csv('week2.csv') Name of the “dataframe” we’re creating (whatever we want to call this dataset) read.csv is the function name File name Reading in Data – Regular R ! Read in a “comma-separated value” file: - experiment <- read.csv('/Users/ scottfraundorf/Desktop/week2.csv') Name of the “dataframe” we’re creating (whatever we want to call this dataset) Folder & file name read.csv is the function name • Drag & drop the file into R to get the full folder & filename Looking at the Data: Summary ! A “big picture” of the dataset: summary(experiment) ! ! summary() is a very important function! ! ! Basic info & descriptive statistics Check to make sure the data are correct Looking at the Data: Summary ! A “big picture” of the dataset: summary(experiment) ! ! We can use $ to refer to a specific column/variable in our dataset: ! summary(experiment$ItemName) Looking at the Data: Raw Data ! Let’s look at the data! experiment ! ! Ack! That’s too much! How about just a few rows? ! ! head(experiment) head(experiment, n=10) Reading in Data: Other Formats ! Excel: - - ! library(gdata) experiment <- read.xls('/Users/ scottfraundorf/Desktop/week2.xls') SPSS: - - library(foreign) experiment <- read.spss('/Users/ scottfraundorf/Desktop/ week2.spss', to.data.frame=TRUE) R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help R Scripts ! Save & reuse commands with a script R File -> New Document R STUDIO R Scripts ! ! Run commands without typing them all again R Studio: ! ! ! R: - - ! Code -> Run Region -> Run All: Run entire script Code -> Run Line(s): Run just what you’ve highlighted/selected Highlight the section of script you want to run Edit -> Execute Keyboard shortcut for this: - Ctrl+Enter (PC), ⌘+Enter (Mac) R Scripts ! Saves times when re-running analyses ! Other advantages? Some: - Documentation for yourself - Documentation for others - Reuse with new analyses/experiments - Quicker to run—can automatically perform one analysis after another ! R Scripts—Comments Add # before a line to make it a comment - Not commands to R, just notes to self (or other readers) ! Can also add a # to make the rest of a line a comment • • summary(experiment$Subject) #awesome R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help Descriptive Statistics ! Remember how we referred to a particular variable in a dataframe? - ! Combine that with functions: - - - ! $ mean(experiment$RT) median(experiment$RT) sd(experiment$RT) Or, for a categorical variable: - - levels(experiment$ItemName) summary(experiment$Subject) Descriptive Statistics ! We often want to look at a dependent variable as a function of some independent variable(s) tapply(experiment$RT, experiment $Condition, mean) - “Split up the RTs by Condition, then get the mean” - ! ! ! Try getting the mean RT for each item How about the median RT for each subject? To combine multiple results into one table, “column bind” them with cbind(): ! cbind( tapply(experiment$RT, experiment$Condition, mean), tapply(experiment$RT, experiment$Condition, sd) ) Descriptive Statistics ! Can have 2-way tables... - tapply(experiment$RT, list(experiment$Subject, experiment$Condition), mean) 1st variable is rows, 2nd is columns ...or more! - ! - tapply(experiment$RT, list(experiment$ItemName, experiment$Condition, experiment $TestingRoom), mean) Descriptive Statistics ! Contingency tables for categorical variables: - xtabs (~ Subject + Condition, data=experiment) R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help Subsetting Data ! ! Often, we want to examine or use just part of a dataframe Remember how we read our dataframe? - ! experiment <- read.csv(...) Create a new dataframe that's just a subset of experiment: - experiment.LongRTsRemoved <subset(experiment, RT < 2000) New dataframe name Original dataframe Inclusion criterion: RT less than 2000 ms Subsetting Data: Logical Operators ! Try getting just the observations with RTs 200 ms or more: experiment.ShortRTsRemoved <subset(experiment, RT >= 200) - ! Why not just delete the bad RTs from the spreadsheet? ! ! ! ! Easy to make a mistake / miss some of them Faster to have the computer do it We’d lose the original data No documentation of how we subsetted the data Subsetting Data: AND and OR What if we wanted only RTs between 200 and 2000 ms? - Could do two steps: ! experiment.Temp <subset(experiment, RT >= 200) - experiment.BadRTsRemoved <subset(experiment.Temp, RT <= 2000) ! One step with & for AND: - experiment2 <- subset(experiment, RT >= 200 & RT <= 2000) - Subsetting Data: AND and OR ! ! What if we wanted only RTs between 200 and 2000 ms? One step with & for AND: - ! experiment2 <- subset(experiment, RT >= 200 & RT <= 2000) | means OR: - - experiment.BadRTs <subset(experiment, RT < 200 | RT > 2000) Logical OR (“either or both”) Subsetting Data: == and != ! Get a match / equals: - ! Words/categorical variables need quotes: - ! experiment.FirstTrials <subset(experiment, SerialPosition == 1) Note DOUBLE equals sign experiment.ImplausibleSentences <subset(experiment, Condition=='Implausible') != means “not equal to”: - experiment.BadSubjectRemoved <subset(experiment, Subject != Drops subject “S23” 'S23') Subsetting Data: %in% ! ! ! Sometimes our inclusion criteria aren't so mathematical Suppose I just want the “Ducks”, “Hornet”, and “Panther” items We can check against any arbitrary list: - ! experiment.SpecialItems <subset(experiment, ItemName %in% c('Ducks', 'Hornet', 'Panther')) Or, keep just things that aren't in a list: - experiment.NonNativeSpeakersRemoved <- subset(experiment, Subject %in% c('S10', 'S23') == FALSE) Logical Operators Review ! Summary - - - - - - - - - > >= < <= & | == != %in% Greater than Greater than or equal to Less than Less than or equal to AND OR Equal to Not equal to Is this included in a list? R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help Assignment Remember the pointing arrow used to create dataframes and subsets? - e.g., experiment <- read.csv(...) ! This is the assignment operator. It saves results or values in a variable ! - - x <- sqrt(64) CriticalTrialsPerSubject <- 30 Remember, typing a name by itself shows you the current value: - - ! CriticalTrialsPerSubject Assigning a new value overwrites the old Assignment ! We can use this to create new columns in our dataframe: - - ! experiment$ExperimentNumber <- 1 Here, the same number (1) is assigned to every trial Or, compute a value for each row: - - - experiment$LogSerialPosition <log(experiment$SerialPosition) For each trial, takes the log serial position and saves that into LogSerialPosition Similar to an Excel formula ifelse() IF YOU WANT DESSERT, EAT YOUR PEAS … OR ELSE! ifelse() ! ! ! ! ifelse(): Use a test to decide which of two values to assign: Function name experiment$Half <- ifelse( experiment$SerialPosition <= 15, If serial position IS <= 15… “Half” is 1 1, If it’s NOT, “Half” is 2 2) Possible to nest ifelse() if we need more than 2 categories: experiment$Third <ifelse(experiment$SerialPosition <= 10, 1, ifelse(experiment $SerialPosition <= 20, 2, 3)) ifelse() ! ! ! Instead of specific numbers, can use other columns or a formula: experiment$RT.Fixed <- ifelse( experiment$TestingRoom==2, experiment$RT + 100, experiment$RT) How can we check if this worked? ! head(experiment) Fixed RTs are indeed 100 ms longer for TestingRoom 2 Which do you like better? - experiment$Half <-ifelse(experiment $SerialPosition <= 15, 1, 2) - ! Shorter & faster to write vs: - CriticalTrialsPerSubject <- 30 - - - - experiment$Half <- ifelse(experiment $SerialPosition <= (CriticalTrialsPerSubject / 2) , 1, 2) Explains where the 15 comes from—helpful if we come back to this script later We can also refer to CriticalTrialsPerSubject variable later in the script & this ensure it’s consistent Easy to update if we change the number of critical trials R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help Referring to Specific Cells ! So far, we’ve seen how to Create a new dataframe that’s a subset of an existing dataframe Modify a dataframe by creating an entire column - - ! What if we want to modify a dataframe by adjusting some existing values? ! ! ! e.g., replace all RTs above 2000 ms with the number 2000 (“fencing”) Creating a new subset won’t work because we want to change the original dataframe Need a way to edit specific values Referring to Specific Cells ! Use square brackets [ ] to refer to specific entries in a dataframe: Row, column experiment[3,7] - - ! Omit the row or column number to mean all rows or all columns: experiment[3,] Row 3, all columns - experiment[,4] All rows in column 4 ! Can also use column names: - ! ! experiment[,'RT'] All rows in the RT column Remember c()? We can check multiple rows: - experiment[c(1:4),] Logical Indexing ! We can look at rows or columns that meet a specific criterion... - ! experiment[experiment$RT < 200,] Can use this as another way to subset: experiment.ShortRTsRemoved <experiment[experiment$RT > 200, ] - Actually, subset() just does this - ! But we can also set values this way - - experiment[experiment$RT < 200, 'RT'] <- 200 In the dataframe experiment, find the rows where RT < 200, and set the column RT to 200 R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help Types ! R treats continuous & categorical variables differently: These are different data types: - Numeric - Factor: Variable w/ fixed set of categories (e.g., treatment vs. placebo) - Character: Freely entered text (e.g., open response question) ! Types R's heuristic when reading in data: - Letters anywhere in the column → factor - No letters, purely numbers → numeric ! Type Conversion: Numeric → Factor Sometimes we need to correct this - Room 4 is not “twice as much” Room 2 ! ! Create a new column that's the factor (categorical) version of TestingRoom: - ! experiment$Room.Factor <as.factor(experiment$TestingRoom) Or, just overwrite the old column: - experiment$TestingRoom <as.factor(experiment$TestingRoom) Conversion: Character → Factor When ifelse() results in words, R creates a character variable rather than a factor - Need to convert it ! Wrong: ! - ! experiment$FaveRoom <ifelse(experiment$TestingRoom==3, 'My favorite room', 'Not favorite') Right: - experiment$FaveRoom <as.factor(ifelse(experiment $TestingRoom== 3, 'My favorite room', 'Not favorite')) Type Conversion: Factor → Numeric ! To change a factor to a number, need to turn it into a character first: - experiment$Age.Numeric <as.numeric(as.character(experiment$ Age)) R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help NA ! ! We might have run into some problems trying to change Age into a numerical variable... NA means “not available”... - - - Characters that don't convert to numbers Missing data in a spreadsheet Invalid computations NA ! If we try to do computations on a set of numbers where any of them is NA, we get NA as a result... - ! sd(experiment$Age.Numeric) R wants you to think about how you want to treat these missing values NA – Solutions ! To ignore the NAs when doing a specific computation, use na.rm=TRUE: - ! To get a copy of the dataframe that excludes all rows with an NA (in any column): - ! mean(experiment$Age.Numeric,na.rm=TRUE) experiment.NoNAs <- na.omit(experiment) Change NAs to something else with logical indexing: - experiment[is.na(experiment$ Age.Numeric)==TRUE, ]$Age.Numeric <- 23 R Basics ! ! ! ! ! ! ! ! ! ! R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help Getting Help ! Get help on a specific known function: - - ! ?sqrt ?write.csv Try to find a function on a particular topic: - ??logarithm External resources: - - ling-R-lang-L < https://mailman.ucsd.edu/mailman/listinfo/ling-r-lang-l > - Mailing list for using R in language research Wrap-Up ! Can use R for: Reading in data Descriptive statistics Subsetting data Creating new variables - - - - ! Additional practice: Baayen (2008) p. 20 - - • Use commands on pp. 1-2 to get the data Next week: Mixed effects models! • Baayen, Davidson, & Bates (2008): Introduced mixed effects models to cog psych. Covers fixed and random effects
© Copyright 2025 Paperzz