-
Notifications
You must be signed in to change notification settings - Fork 0
/
noc2noc.Rmd
196 lines (162 loc) · 6.66 KB
/
noc2noc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
title: "10 closest Occupations in terms of skills"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
source_code: "https://github.com/bcgov/noc_noc_whos_there"
runtime: shiny
resource_files:
- data/skills_original.xlsx
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(readxl)
library(janitor)
library(gt)
library(gtExtras)
library(here)
#functions-----------------
noc_link <- function(x){
clean_x <- x%>%
str_replace_all(":","")%>%
str_replace_all(" ","+")
gt_hyperlink(url=paste0("https://google.com/search?btnI=&q=statistique+canada+noc+2021+", clean_x), text=x)
}
skill_link <- function(x){
clean_x <- x%>%
str_replace_all(" ","+")
link <- paste0("https://www.google.com/search?q=onetonline.org+find+descriptor+result+", clean_x, "&btnI=I")
htmltools::a(href = link, x)
}
#read in the data----------------------
skills_raw <- read_excel(here::here("data","skills_original.xlsx"),
col_types = c("text","text","text","numeric"))%>%
mutate(noc2021=str_pad(noc2021, width=5, side="left", pad="0"))%>%
pivot_wider(id_cols=contains("noc"), names_from = contains("name"), values_from = contains("value"))%>%
clean_names()%>%
unite(NOC, noc2021, noc2021_title, sep = ": ")%>%
column_to_rownames(var="NOC")
```
```{css}
.chart-shim {
overflow: auto;
}
```
10 closest occupations in terms of skills:
=====================================
Inputs {.sidebar}
-------------------------------------
```{r}
selectInput(
"noc",
"Choose your current occupation by either scrolling through the list below OR click in the box, hit backspace and then search by keyword or NOC code:",
rownames(skills_raw),
selected = "41401: Economists and economic policy researchers and analysts",
multiple = FALSE
)
```
* The table to the right gives the 10 closest occupations (columns) to the chosen occupation in terms of 35 skills (rows).
* In each cell is the difference in skill required: positive values indicate a skill surplus (blue), negative indicates a skill deficit (red).
* You can choose between an unrestricted search (all other occupations considered), or a restricted search (no skill deficits greater than one allowed).
```{r}
radioButtons(
"restrict",
"Should the search be restricted?",
choices = c("yes","no"),
selected = "no"
)
```
* You can also control the breaks used for the colour scheme.
```{r}
sliderInput(
inputId = "range",
label = "Skill differences in this range have white background",
min = -5,
max = 5,
value = c(-1,1),
step = .1
)
dataset <- reactive({
if(input$restrict=="no"){
skills_raw%>%
scale()%>%
as.data.frame()
}else{
chosen_skill_df <- skills_raw[rownames(skills_raw)==input$noc,]%>%
slice(rep(1:n(), each = nrow(skills_raw))) #creates a dataframe of same dimension as skills_raw by replicating chosen NOC
lesser_skill <- skills_raw<(chosen_skill_df+1) #a logical matrix, containing TRUE if skill less than chosen NOC skill plus 1.
logical_vec <- apply(lesser_skill, MARGIN = 1, all) #test that row has lower skill than chosen NOC for all 35 skills
skills_raw[logical_vec,]%>% #keep only the NOCs where skill is lower than chosen NOC skill+1 for all 35 skills.
scale()%>%
as.data.frame()
}
})
skills_pca <- reactive({
prcomp(dataset()) #principal component analysis
})
first_five <- reactive({
skills_pca()[["x"]][,1:5]%>% #keep only the first 5 principal components.
as.data.frame()
})
q <- reactive({
first_five()[rownames(first_five())==input$noc,] #the location of the chosen NOC in 5D space.
})
nn <- reactive({
temp <- dbscan::kNN(first_five(), k = 11, sort=TRUE, query = q()) #11 nearest neighbors to query (own NOC included)
rownames(first_five())[as.vector(temp[["id"]])] #the names of the closest occupations
})
chosen_noc <- reactive({
skills_raw[rownames(skills_raw)==input$noc,] #the skill profile of the chosen NOC (35D)
})
eleven_chosen <- reactive({
chosen_noc()[rep(1, 11), ] #replicate the skill profile of the chosen NOC
})
closest <- reactive({
df <- skills_raw[rownames(skills_raw) %in% nn(), ] #the nearest neighbour NOCs to the chosen NOC
df[nn(), ] #orders the dataframe correctly
})
difference <- reactive({
(-(closest()-eleven_chosen()))%>% #the difference in skills (note double negative because column names inherited from closest())
mutate(across(everything(), \(x) round(x, 2)))%>% #rounded to the hundredth
t()%>% #transposed
as.data.frame()%>%
rownames_to_column(var="Skill")%>%
mutate(Skill=str_to_title(str_replace_all(Skill, "_"," ")))%>%
select(-input$noc)%>% #get rid of own occupation
mutate(
Skill = map(Skill, skill_link),#add hyperlinks to skills
Skill = map(Skill, ~ gt::html(as.character(.x))))
})
```
Column
-------------------------------------
### **`r renderUI({input$noc})`**
```{r}
gt::render_gt({
gt(difference())|>
data_color(
columns = -Skill,
method="bin",
bins=c(-10, input$range[1]-.01, input$range[2]+.01, 10),
palette = c("red","white","blue")
)%>%
tab_options(column_labels.font.size = 10,
table.font.size = 10)%>%
cols_label_with(columns=-1, fn = ~ noc_link(.x))#add hyperlinks to nocs
})
```
FAQs:
=====================================
### Why are the ten closest occupations not necessarily in the same cluster as the chosen occupation?
* In a related project we cluster occupations on the basis of their skill profile: this is done via AGNES, an agglomerative hierarchical clustering algorithm.
* In **both** the clustering exercise and in this application, proximity is based on euclidean distance using the first 5 principal components of the 35D data.
* The 10 closest occupations will not necessarily share the same cluster membership: if an occupation is close to the edge of its cluster then it is likely that occupations from other clusters may in fact be closer than occupations that are in the same cluster.
### Why am I given the option of a restricted vs. unrestricted search?
* Some users may wish to make a quick transition to a new occupation: with a restricted search no major skill upgrades are required.
* Other users may have a longer timeline where they can upgrade skills before changing occupations: with an unrestricted search the closest occupations may require a substantial skill upgrade.
### Why am I given the option to control the colouring of the cells?
* It allows you control how big of a difference in skill is required to be highlighted by colour: e.g.
- If you set the range -5 to 5 all the cells will be white.
- If you set the range -.1 to .1 almost all the cells will **not** be white.