generated from byu-transpolab/template_bookdown
-
Notifications
You must be signed in to change notification settings - Fork 1
/
04-findings.Rmd
54 lines (45 loc) · 2.77 KB
/
04-findings.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
output:
html_document: default
pdf_document: default
---
# Findings
```{r random_clusters, echo=FALSE, message=FALSE, warning=FALSE}
source("R/gps2trips.R")
source("_targets.R")
source("R/MyAnalysis.R")
library(kableExtra)
library(tidyverse)
tar_load(matchStats)
tar_load(errorPlot)
```
The error between the two numbers of clusters per day was calculated by taking the
difference in algorithm clusters and manual clusters. Then, the percent error was
found by dividing the integer error by the number of manual clusters, since
that was treated as the "goal" for the algorithm to match. The percent error is
indicated by the "pctError" column.
```{r showMatchStats, echo=FALSE, message=FALSE, warning=FALSE, fig.cap = "Example final comparison table"}
errorTableHead <- head(matchStats)
errorTableHead %>% select(
eps, minpts, delta_t, entr_t, date, manual, clusters, error, pctError
)
```
Figure 3.1 visualizes the percent error between algorithm clusters and manual clusters
for all 10 dates. The black line represents the overall trend line.
```{r showErrorPlot, echo=FALSE, fig.cap= "delta_t versus percent error by date", message=FALSE, warning=FALSE}
errorPlot
```
Most of the dates appear to follow the same trend with a decrease in error when delta_t
is equal to 10.78 seconds. However, this is when the error is the largest for February 17th and
May 27th, so is not the best option. The percent error is close to 0 for all dates when delta_t
is equal to 106.3 seconds and 144.7 seconds because delta_t defines how long the minpts must
be in eps radius for something to count as a trip. It is a lot more likely that a trip/activity
is occurring if someone stays put for over a minute than only 10 seconds. For example,
someone could be at a red light for 10 seconds, but that should not count as its own separate trip.
Having delta_t be closer to 100 seconds removes potential error from situations such as that.
The trend line also shows that the larger delta_t gets, the larger the percent error gets. However, this is likely due
to the outliers of February 17th and May 27th. Without those dates, the trend line would likely not start as low. It is
also important to consider the possible gaps in this analysis: falsely identifying manual clusters in the GeoJSON software and
the constant parameters. Manual clusters would be more accurate if confirmed by the respondent via a journal. Also,
the algorithm would potentially predict a different number of clusters for each delta_t if the three
constant parameters were different. Therefore, more rounds of this study should be conducted to confirm these results. Further applications of this method should be done with more dates, confirmed correct manual clusters, more than 20 draws for delta_t, and different options for the constant parameters.