-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathBasicStatsCode.Rmd
More file actions
88 lines (68 loc) · 2.43 KB
/
BasicStatsCode.Rmd
File metadata and controls
88 lines (68 loc) · 2.43 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
title: "Basic Stats"
author: "Lilly Winfree, Danielle Robinson"
date: "May 26"
output:
pdf_document: default
html_document: default
ioslides_presentation: default
beamer_presentation: default
slidy_presentation: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
##Let's learn how to read in data and then do some basic statistical analysis.
Read in data:
```{r}
data1<-read.table("example_lengths_for_R.csv", header=TRUE, sep=',')
```
Reminder! Name your variables carefully!
Note: When we read in the data, we need to specify if there is a **header** (or column titles) and how the data is separated, or **sep**. This data is a csv so it is separated by ',' - sometimes data can be separated by '|' for example.
We can look at the data quickly using **summary()**, **nrow()** to see the row numbers, **colnames()** for column names:
```{r}
summary(data1)
nrow(data1)
colnames(data1)
```
We can subset to different columns with **$** and only see select rows with **[:]**
```{r}
data1$coverlip.code[1:10]
data1$treatment[1:10]
summary(data1$values)
summary(data1$values[data1$treatment=='Control'])
```
Let's plot our data!
```{r}
plot(data1$treatment, data1$values, xlab='treatment', ylab='length values')
```
Note that **xlab** and **ylab** mean x and y labels.
We can also use **hist()** to make a histogram:
```{r}
hist(data1$values[data1$treatment=='Control'], xlab='Length Values',
main='Histogram of Length for Control Group')
```
Note that **main** means title here.
Now let's look at how to do some basic stats:
```{r}
mean(data1$values[data1$treatment=='Treatment A'])
mean(data1$values[data1$treatment=='Control'])
mean(data1$values[data1$treatment=='Treatment B'])
median(data1$values[data1$treatment=='Control'])
var(data1$values[data1$treatment=='Control'])
min(data1$values[data1$treatment=='Control'])
max(data1$values[data1$treatment=='Control'])
```
Notice how summary() gave us the same info as the above functions
OK let's do a t-test now! We can use **t.test** to compare the means between 2 of the groups
```{r}
t.test(data1$values[data1$treatment=='Control'], data1$values[data1$treatment=='Treatment B'])
```
What does this tell us?
How do we change the code to look at the other groups?
We can also do an ANOVA with this data using **aov** and plot the ANOVA information:
```{r}
results<-aov(values ~ treatment, data=data1)
summary(results)
```
See ANOVA doc for more info on how an ANOVA works.