noc2noc.Rmd

---
title: "10 closest Occupations in terms of skills"
output:
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    source_code: "https://github.com/bcgov/noc_noc_whos_there"
runtime: shiny
resource_files:
- data/skills_original.xlsx
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(readxl)
library(janitor)
library(gt)
library(gtExtras)
library(here)
#functions-----------------
noc_link <- function(x){
  clean_x <- x%>%
    str_replace_all(":","")%>%
    str_replace_all(" ","+")
  gt_hyperlink(url=paste0("https://google.com/search?btnI=&q=statistique+canada+noc+2021+", clean_x), text=x)
}

skill_link <- function(x){
  clean_x <- x%>%
    str_replace_all(" ","+")
  link <- paste0("https://www.google.com/search?q=onetonline.org+find+descriptor+result+", clean_x, "&btnI=I")
  htmltools::a(href = link, x)
}
#read in the data----------------------
skills_raw <- read_excel(here::here("data","skills_original.xlsx"),
                             col_types = c("text","text","text","numeric"))%>%
  mutate(noc2021=str_pad(noc2021, width=5, side="left", pad="0"))%>%
  pivot_wider(id_cols=contains("noc"), names_from = contains("name"), values_from = contains("value"))%>%
  clean_names()%>%
  unite(NOC, noc2021, noc2021_title, sep = ": ")%>%
  column_to_rownames(var="NOC")
```

    ```{css}
    .chart-shim {
      overflow: auto;
    }
    ```

10 closest occupations in terms of skills:
=====================================  

Inputs {.sidebar}
-------------------------------------
```{r}
selectInput(
  "noc",
  "Choose your current occupation by either scrolling through the list below OR click in the box, hit backspace and then search by keyword or NOC code:",
  rownames(skills_raw),
  selected = "41401: Economists and economic policy researchers and analysts",
  multiple = FALSE
)
```

* The table to the right gives the 10 closest occupations (columns) to the chosen occupation in terms of 35 skills (rows).
* In each cell is the difference in skill required: positive values indicate a skill surplus (blue), negative indicates a skill deficit (red).

* You can choose between an unrestricted search (all other occupations considered), or a restricted search (no skill deficits greater than one allowed).

```{r}
radioButtons(
  "restrict",
  "Should the search be restricted?",
  choices = c("yes","no"),
  selected = "no"
)
```

* You can also control the breaks used for the colour scheme. 

```{r}
sliderInput(
    inputId = "range",
    label = "Skill differences in this range have white background",
    min = -5,
    max = 5,
    value = c(-1,1),
    step = .1
  )

dataset <- reactive({
  if(input$restrict=="no"){
  skills_raw%>%
    scale()%>%
    as.data.frame()
  }else{
    chosen_skill_df <- skills_raw[rownames(skills_raw)==input$noc,]%>% 
      slice(rep(1:n(), each = nrow(skills_raw))) #creates a dataframe of same dimension as skills_raw by replicating chosen NOC  
    lesser_skill <- skills_raw<(chosen_skill_df+1) #a logical matrix, containing TRUE if skill less than chosen NOC skill plus 1. 
    logical_vec <- apply(lesser_skill, MARGIN = 1, all) #test that row has lower skill than chosen NOC for all 35 skills
    skills_raw[logical_vec,]%>% #keep only the NOCs where skill is lower than chosen NOC skill+1 for all 35 skills.
      scale()%>%
      as.data.frame()
   } 
})

skills_pca <- reactive({
  prcomp(dataset()) #principal component analysis
})

first_five <- reactive({
  skills_pca()[["x"]][,1:5]%>% #keep only the first 5 principal components.
  as.data.frame()
})

q <- reactive({
  first_five()[rownames(first_five())==input$noc,] #the location of the chosen NOC in 5D space.
}) 

nn <- reactive({
  temp <- dbscan::kNN(first_five(), k = 11, sort=TRUE,  query = q()) #11 nearest neighbors to query (own NOC included)
  rownames(first_five())[as.vector(temp[["id"]])] #the names of the closest occupations
})

chosen_noc <- reactive({
  skills_raw[rownames(skills_raw)==input$noc,] #the skill profile of the chosen NOC (35D)
})

eleven_chosen <- reactive({
  chosen_noc()[rep(1, 11), ] #replicate the skill profile of the chosen NOC
})  

closest <- reactive({
  df <- skills_raw[rownames(skills_raw) %in% nn(), ] #the nearest neighbour NOCs to the chosen NOC
  df[nn(), ] #orders the dataframe correctly
})

difference <- reactive({
  (-(closest()-eleven_chosen()))%>% #the difference in skills (note double negative because column names inherited from closest())
  mutate(across(everything(), \(x) round(x, 2)))%>% #rounded to the hundredth
  t()%>% #transposed
  as.data.frame()%>%
  rownames_to_column(var="Skill")%>% 
  mutate(Skill=str_to_title(str_replace_all(Skill, "_"," ")))%>%  
  select(-input$noc)%>% #get rid of own occupation
  mutate(
        Skill = map(Skill, skill_link),#add hyperlinks to skills
        Skill = map(Skill, ~ gt::html(as.character(.x))))
})


```
 
Column
-------------------------------------
    
### **`r renderUI({input$noc})`**
    
```{r}
gt::render_gt({
gt(difference())|>
  data_color(
    columns = -Skill,
    method="bin",
    bins=c(-10, input$range[1]-.01, input$range[2]+.01, 10),
    palette = c("red","white","blue")
 )%>%
    tab_options(column_labels.font.size = 10,
                table.font.size = 10)%>%
    cols_label_with(columns=-1, fn = ~ noc_link(.x))#add hyperlinks to nocs
})
```

FAQs:
=====================================  

### Why are the ten closest occupations not necessarily in the same cluster as the chosen occupation?

* In a related project we cluster occupations on the basis of their skill profile: this is done via AGNES, an agglomerative hierarchical clustering algorithm.
* In **both** the clustering exercise and in this application, proximity is based on euclidean distance using the first 5 principal components of the 35D data.
* The 10 closest occupations will not necessarily share the same cluster membership: if an occupation is close to the edge of its cluster then it is likely that occupations from other clusters may in fact be closer than occupations that are in the same cluster. 

### Why am I given the option of a restricted vs. unrestricted search?

* Some users may wish to make a quick transition to a new occupation: with a restricted search no major skill upgrades are required.
* Other users may have a longer timeline where they can upgrade skills before changing occupations: with an unrestricted search the closest occupations may require a substantial skill upgrade. 

### Why am I given the option to control the colouring of the cells?

* It allows you control how big of a difference in skill is required to be highlighted by colour: e.g.
  - If you set the range -5 to 5 all the cells will be white.
  - If you set the range -.1 to .1 almost all the cells will **not** be white.