GitHub

Outline

Installation
Types
Basic operations with numbers
Vectors
Matrices
Data frames
Reading/writing ASCII files
Descriptive statistics
Plots
Conditional statements
Loops
Functions
Libraries
Distributions

Installation

You can install R and R-libraries and also have access to many materials and manuals at the R-website.

To install R, follow the instructions under Getting Started. Once R is installed, you should have the R-icon on your programs. Click on the icon to open the R-console.

Back to Outline

Types

R support several types of variables, the basic ones are: logical (TRUE/FALSE), integer, numeric (double-precision, this is use for real numbers), character (these are used to store text), and factors (these are reserved for variables that can take on a limited set of values, e.g., ethnicity). The following example illustrates the creation and basic operations with this types of variables.

  # numeric
  x=1.1
  str(x)
  class(x)
  
  # integer
  x=1
  class(x) # by default a numeric type was created but we can coerce it to integer
  x=as.integer(x)
  class(x)
  
  # logical
  x= 1.1 >2 
  x
  class(x)
  !x  # exclamation sign returns the negative of the logical value
  isTRUE(x)
  isTRUE(!x)
  
  # character
  x='hello' # you can use either single or double quates to create a character
  class(x)
  print(x)
  show(x)
  x="hello"

Back to Outline

Basic Operations with `numeric` and `integer`

 x=2
 x+10
 x-10
 x*4
 x^2
 sqrt(x)
 log(x) # natural log
 log(100,base=10)

Back to Outline

Vectors

The following code shows how to create vectors, subset (i.e., extract single or multiple elements) and modify (repleacement) them.

  x=c(1,10,15,100)
  x[3] # extracting one element
  x[3]=99 # replacing one element
  x[-3] # `-` can be used to extract all but some entries
  
  # Sequence
  x=1:10 # creates a sequence from 1:10
  x
  x[3]=1000
  x
  
  # Indexing and replacement can also be done with TRUE/FALSE
  x=1:4
  x[c(TRUE,FALSE,FALSE,FALSE)]
 
  # Vectors can be of any type
  x=c("a","b","hello")
  x

Back to Outline

Matrices

A matrix is a two dimensional array that holds values of the same type (e.g., numeric, logical). The following code illustrates how to create, subset and modify a matrix. Matrix operations will be covered in the course.

  x1=1:10
  x2=11:20
  x3=21:30
  
  X=cbind(x1,x2,x3) # Binds columns
  dim(X)
  nrow(X)
  ncol(X)
  X
  
  ## Subseting 
  X[1,] # returns the first row
  X[,2] # returns the second column
  X[1:2,2:3] # returns the block defined by rows 1 and 2 and columns 2 and 3
  
  ## Replacement
  X[2,3]=1000
  X
  
  ## Try: Z=rbind(x1,x2,x3); dim(Z)

Data Frames

Vectors and matrices can store data of a single type (e.g., numeric, integer, character). In statistics often we need to use data tables that store variables of different types. For instance, we may want to store in a single data table: sex ("M"/"F" will be character, age and weight (both numeric). We can do this using data frames. Strictily speaking data.frames are lists; however, unlike the general list, data.frames are two dimensional arrays, pretty much like matrices, with the flexibility that they can store different types in the columns.

Back to Outline

   N=100
   x1=sample(c("F","M"),size=N,replace=T)
   x2=runif(min=25,max=60,n=N) # samples 10 values from a uniform distribution with support on [25,60]
   DATA=data.frame(sex=x1,age=x2)
   DATA$height=ifelse(DATA$sex=="F",170,175)+rnorm(n=N,sd=sqrt(40)) # adding a new variable can be done this way
   
   head(DATA)    # prints the first rows of the data to the screen
   tail(DATA)    # prints the last rows of the data to the screen
   str(DATA)     # tells you the strcture (class, dimensions) of the object
   fix(DATA)     # shows the data frame in a spread-sheet-like fashion
   summary(DATA) # most objects in R have a summary method, note summaries depend upon the type.
   
   ## Indexing  
   DATA[,1]
   DATA$sex  # you can index by variable name, same for replacement.
   
   DATA[1,1]
   DATA$sex[1]

Writing/reading ASCII files

  # Writing
   write.table(DATA,file='DATA.txt') # writes the data to an ASCII file
   list.files(pattern='.txt') # list the files in the current folder having *.txt in the name.
  
  # Reading
   DATA2=read.table('DATA.txt',header=T) # you can add sep="," or sep"\t" for comma and tab-spearated files, respectively
   head(DATA)
   head(DATA2)

Back to Outline

Descriptive Statistics

   summary(DATA$age)
   table(DATA$sex)
   quantile(DATA$age,p=.08)
   isTall<-ifelse(DATA$height>median(DATA$height),">median","<median")
   table(DATA$sex,isTall)

Plots

   barplot(table(DATA$sex))
   hist(DATA$age)
   boxplot(height~sex,data=DATA)
   plot(height~age,data=DATA)
   plot(density(DATA$height))

Back to Outline

Conditional Statments

In programing conditional statements can be used to execute one type of code or another depending on a conditon.

 x=1
 y=2
 
 if(x>y){
   print("X is greater than Y!")
 }
 
 ## IF-ELSE
 if(x>y){
   print("X is greater than Y!")
 }else{
   print("Y is greater than X!")
 }

 ## IF-ELSE
 x=3
 if(x>y){
   print("X is greater than Y!")
 }else{
   print("Y is greater than X!")
 }
 
 
 ## We can evaluate multiple conditions at a time by nesting if statments or by evaluating them jointly
 
 x=TRUE
 y=FALSE
 
 if(x){
  if(y){
    print("Both X and Y are TRUE!")
  }else{
    print("X is TRUE and Y is FALSE")
  }
 }else{
   if(y){
    print("X is FALSE and Y is TRUE")
   }else{
    print("Both X and Y are FALSE")
   }
 }

 ## Alternatively
 
 if(x&y){ print("Both X and Y are TRUE") }
 if(x&!y){ print("X is TRUE and Y is FALSE") }
 if((!x)&y){ print("X is FALSE and Y is TRUE") }
 if((!x)&(!y)){ print("Both X and Y are FALSE") }

Back to Outline

Loops

In many applications we need to repeat a task a fixed numer of times or until somthing happen. For this you can use the for and while loops.

 for(i in 1:10){
   print(i)
 }
 
 ## We can iterate over any vector
 for(i in c("a","b","zzz")){
    print(i)
 }

 ## While loop
 x=0
 while(x<=10){
  x=x+1
  print(x)
 }

Back to Outline

Functions

A function takes on a numbrer of arguments, carries out some computations and (often) returns an object. The sin, cos , log and summary are examples of functions that return a value.

   x=100
   sin(x)
   cos(x)

You can easily create your own functions. Remember, that in the least-squares (OLS=Ordinary Least Squares) estimate of a regression coefficient of simple linear regerssion equals the covariance between x and y divided by the variance of x. The following example returns OLS estimates of the intercept and regression coefficient in a simple linear regression.

  myOLS=function(x,y){
    b=cov(x,y)/var(x)
    a=mean(y)-mean(x)*b
    return(c(a,b))
  }
  
  # simulating a simple data set
  pred=rnorm(100)
  response=100+.5*pred + rnorm(100)
  
  myOLS(x=pred,y=response)

Back to Outline

Libraries

The basic installation of R comes with several functions for computation, basic statistical analyses, descriptive statistics, etc. Specialized code is contributed by develpers under the form of libraries. To use a library you first need to install it and then load it into the environment.

   install.packages(pkg='BGLR', repos='https://cran.r-project.org/') # installs BGLR package from the CRAN repository.

Now that the package is installed you can load it into your environment.

  library(BGLR)

Back to Outline

Distributions

Package stats already included in R contains functions for probability function, cumulative distribution function, quantile function and random variable generation for many probability distributions. Functions consists of a prefix followed by the root name of the distribution.

Probability function. Prefix d

Calculates the probability density function (p.d.f) for continuos distributions, f(x), and the probability mass function (p.m.f) for discrete distributions, f(x)=P(X=x).

# For a discrete distribution (e.g.,binomial distribution)
# Example. Suppose there are 10 multiple choice questions in an EPI class exam. Each question has 5 possible answers,
# and only one of them is correct. The student fails the course if she/he gets fewer than 6 correct answers. 
# The probability of passing the course if the student attempts to answer every question at random is

dbinom(6,10,0.2)+dbinom(7,10,0.2)+dbinom(8,10,0.2)+dbinom(9,10,0.2)+dbinom(10,10,0.2)

# For a continuous distribution (e.g.,normal distribution)
# Example. In a certain population, BMI has a normal distribution with mean=27.5 and sd=5
x <- seq(12.5,42.5,length=1000) # creates a sequence of values between 12.5 and 42.5.
y <- dnorm(x,mean=27.5, sd=5) # evaluates the density function for the values of x.
plot(x,y,type="l",main='Normal distribution with mean=27.5 and sd=5',ylab='f(x)')

Cumulative distribution. Prefix p

Calculates the cumulative distribution function (c.d.f.) for the random variable X

F(x) = P(X <= x)

# In our EPI class example, the probability of failing the course is P(X<6)=P(X<=5)
pbinom(5,10,0.2)
# Thus the probability of passing is 1-P(X<=5)
1 - pbinom(5,10,0.2)
# or
pbinom(5,10,0.2,lower.tail=FALSE)

# Normal distribution
# In our BMI example, a person is declared obese if her/his BMI is greater or equal than 30.
1-pnorm(30,27.5,5) # Probability that a randomly choosen person is obese
# or
pnorm(30,27.5,5,lower.tail=FALSE)
# Standardizing
z <- (30-27.5)/5
1-pnorm(z)

Special problem

Response to Selection

In a certain population of plants, the height of the plant has a Normal distribution with mean=5.3 feet and a sd=0.71. We select plants that are 6.0 feet or taller to intercross to form a new generation of plants.

Whis is the proportion p_s of selected individuals?

Which is the selection differential S?

Quantile. Prefix q

For continuous distributions, it calculates the inverse c.d.f. of the distribution, x = F^-1(p) where p = F(x).

# Example. In testing Ho in certain experiment, we get a F-statistic=6.02 that has an F-distribution with 
# 3 and 20 d.f. in numerator and denominator, respectively. Reject Ho at a level 0.05 if 6.02 > qF(0.05,3,20)
qf(0.95,3,20) # Which is smaller than 6.02 hence rejecting Ho

# Example. A sample of n=50 students was taken randomly from a heights population with unknown standard deviation.
# The sample mean=165.4 and sample sd=8.3. Null hyphotesis Ho: Mean=163. Reject Ho at a level 0.05 if t0 > qt(0.05,49)
to=(165.4-163)/(8.3/sqrt(50)) # t-statistics
qt(0.95,49) # 1.67 is smaller than t0=2.04 thus Ho is rejected.

For discrete distribution, which have a step c.d.f an thus not invertible, the quantile is defined as the smallest value x such that F(x)>=p, where F is the distribution function (c.d.f).

# In our EPI class example, P(X<=3)=0.879, P(X<=4)=0.967 and P(X<=5)=0.994, 
# so the smallest 'x' such as P(X<=x)>=0.9  is 4
qbinom(0.9,10,0.2)

Random variable. Prefix r

Simulates random variables having a specified distribution with given parameters.

x1 <- rnorm(10000,10,2.2)   # draw 10,000 samples from a normal distribution with mean=10 and sd=2.2
x2 <- rnorm(10000,11.5,3.5)   # draw 10,000 samples from a normal distribution with mean=11.5 and sd=3.5
plot(density(x1),ylab="Density",col="red")
lines(density(x2),col="blue")
legend("topright",legend=c("mean=10, sd=2.2","mean=11.5, sd=3.5"),col=c("red","blue"),pch=20)

Back to Outline

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Outline

Installation

Types

Basic Operations with `numeric` and `integer`

Vectors

Matrices

Data Frames

Writing/reading ASCII files

Descriptive Statistics

Plots

Conditional Statments

Loops

Functions

Libraries

Distributions

About

Releases

Packages

Contributors 2

QuantGen/RIntro

Folders and files

Latest commit

History

Repository files navigation

Outline

Installation

Types

Basic Operations with numeric and integer

Vectors

Matrices

Data Frames

Writing/reading ASCII files

Descriptive Statistics

Plots

Conditional Statments

Loops

Functions

Libraries

Distributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Basic Operations with `numeric` and `integer`

Packages