This repository has been archived by the owner on Sep 15, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
R4SME 09 logistic.Rmd
138 lines (107 loc) · 3.73 KB
/
R4SME 09 logistic.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: "R for SME 9: Logistic regression (basics)"
author: Andrea Mazzella [link](https://github.com/andreamazzella)
output: html_notebook
---
## Basics
Load packages
```{r}
library(haven)
library(epiDisplay)
library(magrittr)
library(tidyr)
library(dplyr)
```
## Data import, exploration, management
1.
Make sure you have the mortality.dta dataset in the same folder as this .rmd
2.
Import data
```{r}
mortality<-read_dta("./mortality.dta")
mortality %<>% mutate_if(is.labelled,as_factor)
```
Explore data
```{r}
glimpse(mortality)
summary(mortality)
View(mortality)
```
## Data analysis
3.
Outcome variable: died (3-year-mortality; binary, 0/1)
Exposure variable: vimp (visual impairment; binary, "Normal" / "Visually impaired")
- Examine the association between visual impairment and death
```{r}
# Frequency table with %
mortality %$% tabpct(vimp, died, percent = "row", graph = F)
# OR
mortality %$% cc(vimp, died, graph = F)
```
4.
Perform a logistic regression to examine the association between visual impairment and death.
```{r Logistic regression of a binary exposure}
logit_vimp <- glm(died ~ vimp, data = mortality, family = binomial()) # create the model
logistic.display(logit_vimp) # get a more readable output
```
Output:
- OR with 95%CI for outcome in exposed vs unexposed, Wald's test and LR-test p-value
- (final) log-likelihood - (no intermediate iterations shown in R, unlike in stata)
- number of observations
- AIC value
6.
Unlike Stata, R already gives you the answers in OR - no need to show useless log(OR).
8.
Explore the association between microfilaria and death with crosstabulation:
- Exposure: microfilarial load/mg - mfgrp (categorical variable: Uninfected, <10, 10-49, 50, NA)
```{r}
mortality %$% summary(mfgrp)
mortality %$% tabpct(mfgrp, died, percent = "row", graph = F)
```
9.
Now explore the same with logistic regression.
Unlike Stata, R understands that this variable is categorical, so the formula is the same as for a binary variable.
```{r}
logit_mfgrp <- glm(died ~ mfgrp,
data = mortality,
family = binomial())
logistic.display(logit_mfgrp)
```
13.
To perform a likelihood ratio test (LRT), you need to use lrtest() to divide the log likelihood from a logistic regression model *with* the variable (that you have already defined) and the log likelihood from a logistic regression model *without* the variable - that you need to calculate now.
Caution: mfgrp has missing data, so this new model will have more observations than the first - and the LRT test can only work when the two models have the same number of observations. So you need to
```{r}
# Check for missing values
mortality %$% summary(mfgrp)
# Logistic regression model without the covariate (and removing observations with a missing value in mfgrb)
logit_0 <- mortality %>%
drop_na(mfgrp) %>%
glm(died ~ 1,
data = .,
family = binomial())
# Likelihood ratio test (LRT)
lrtest(logit_0,logit_mfgrp)
```
14.
Run a logistic regression model to check the association of age and death.
Exposure variable: agegrp (categorical: 15-34, 35-54, 55-64, 65+)
```{r}
# Check for missing values
mortality %$% summary(agegrp)
# Logistic regression model for age
logit_age <- mortality %>%
glm(died ~ agegrp,
data = .,
family = binomial())
logistic.display(logit_age)
```
How do we use two exposure variables in a logistic regression model?
- Use both visual impairment (vimp) and age (agegrp)
```{r}
# Logistic regression model with 2+ exposures
logit_vimp_age <- mortality %>%
glm(died ~ vimp + agegrp,
data = .,
family = binomial())
logistic.display(logit_vimp_age)
```