# Basic commands
## Packages
Packages provide supplement to the Built-in functions of `R`. Check the list of installed packages:
```{r}
library()
```
Let us, for example, install the `AER` (applied econometrics with `R`) package and the `ISLR2` package. The `ISLR2` package comes with datasets used by the textbook. The `dependencies=NA` option specifies that if the package depends for its operation on other packages, these should be installed as well (if they have not already been installed). Setting `dependencies=TRUE` installs all packages that depend on the package.
```{r eval=FALSE, include=FALSE}
install.packages("AER", dependencies=NA)
install.packages("ISLR2", dependencies=NA)
```
To get an overview of an installed package:
```{r}
help(package="ISLR2")
```
## Working directory
Get working directory:
```{r}
getwd()
```
Set working directory:
```{r}
?setwd()
```
Or in R Studio, use
* Session->Set Working Directory, or
* Tools->Global Options.
## Vectors and matrices
Generate a vector:
```{r}
x<-c(1,2,3)
x
typeof(x)
```
Check the length"
```{r}
length(x)
```
```{r}
x<-c("No","Yes")
x
typeof(x)
```
Generate a matrix:
```{r}
X<-matrix(c(1,2,3,4),ncol=2)
X
typeof(X)
```
Note: `R` fills in a matrix on a column-by-column basis.
Add a vector as another column:
```{r}
x=c(5,6)
Y=cbind(X,x)
Y
```
Add a vector as another row:
```{r}
y=c(7,8,9)
rbind(Y,y)
```
The `ls()` function allows us to look at a list of all of the objects, such as data and functions, that we have saved so far.
```{r}
ls()
```
The `rm()` function can be used to delete any that we don't
want.
```{r chunk4}
ls()
rm(x)
ls()
```
It's also possible to remove all objects at once:
```{r}
rm(list = ls())
```
Random matrix: generate eight independent $N(0,1)$ random variables arranged in 4 columns:
```{r}
X=matrix(rnorm(8),ncol=4)
X
```
Choose the mean and the standard deviation:
```{r}
X=matrix(rnorm(8,mean=1,sd=.1),ncol=2)
X
```
Picking specific elements:
```{r}
X[1,2]
```
Pick an entire column (first column):
```{r}
X[,1]
```
Pick an entire row:
```{r}
X[1,]
```
Pick rows 3 & 4:
```{r}
X[c(3,4),]
```
Sequences:
```{r}
?seq
x=seq(1,10,by=2)
x
```
Matrix algebra operations:
```{r}
X=matrix(seq(-1,-4,by=-1),ncol=2)
Y=matrix(seq(1,4),ncol=2)
X
Y
```
Matrix addition:
```{r}
X+Y
```
Matrix product:
```{r}
X%*%Y
```
Transpose:
```{r}
t(X)
```
Element-by-element operations:
```{r}
sqrt(Y)
```
```{r}
X*Y
```
```{r}
1/Y
```
```{r}
Y^X
```
# Working with data
## Data frames
The basic object that is used by `R` to store data is a data frame: tabular data consisting of rows (observations) and columns (variables).
```{r}
x=c(1,2,3,4)
y=c("male","male","female","female")
X=cbind(x,y)
```
When combining x and y in a matrix, x is converted into characters:
```{r}
X
typeof(X)
```
Data frames can have variables (columns) of different types. There are relationships between the columns: each row is an observation.
```{r}
Data=data.frame(years=x,gender=as.factor(y))
typeof(Data)
Data
```
Note that gender is now a factor! (Factors are variables that take on limited number of values. They are used to categorize data by levels. Can be integers or characters.)
```{r}
class(Data$years)
class(Data$gender)
```
The `summary()` and `names()` commands on Data:
```{r}
names(Data)
summary(Data)
```
## Load data
Data can be loaded from external files using:
* `read.table()`
* `read.csv()`
* `read.xlsx()`
We load data from a text file, `Auto.data`:
```{r}
Auto <- read.table("Auto.data")
```
Once the data has been loaded, the `View()` function can be used to view it in a spreadsheet-like window. The `head()` function can also be used to view the first few rows of the data.
```{r eval=FALSE, include=FALSE}
View(Auto)
head(Auto)
```
Using the option `header = T` (or `header = TRUE`) in the `read.table()` function
tells `R` that the first line of the file contains the variable names, and using the option `na.strings` tells `R` that any time it sees a particular character or set of characters (such as a question mark), it should
be treated as a missing element of the data matrix. The `stringsAsFactors = T` argument tells `R` that any variable containing character strings should be interpreted as a qualitative variable, and that each distinct character string represents a distinct level for that qualitative variable.
```{r}
Auto <- read.table("Auto.data", header = T, na.strings = "?", stringsAsFactors = T)
# View(Auto)
```
An easy way to load data from Excel into `R` is to save it as a csv (comma-separated values) file, and then use the `read.csv()` function.
```{r}
Auto <- read.csv("Auto.csv", na.strings = "?", stringsAsFactors = T)
# View(Auto)
dim(Auto)
```
The `dim()` function tells us that the data has $397$ observations, or rows, and nine variables, or columns:
```{r}
dim(Auto)
```
There are various ways to deal with the missing data. In this case, only five of the rows contain missing observations,
and so we choose to use the `na.omit()` function to simply remove these rows.
```{r}
Auto <- na.omit(Auto)
dim(Auto)
```
Once the data are loaded correctly, we can use `names()` to check the variable names.
```{r}
names(Auto)
```
Many R packages come with imported data sets. Package `ISLR2` contains data `Boston` on housing values in Boston area:
```{r}
library(ISLR2)
?Boston
```
Quick inspection of the data:
```{r}
summary(Boston)
```
The first 4 observations:
```{r}
Boston[1:4,]
```
Also the first 4 observations:
```{r}
head(Boston,4)
```
The last 4 observations:
```{r}
tail(Boston,4)
```