Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed broken links, mainly on Automation section #1312

Draft
wants to merge 13 commits into
base: main-flask
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions content/blog/last_semester.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,39 +6,39 @@ draft: false
aliases:
- /blog/last_semester
---
## **Last semester: Diving into the World of Causal Inference and Virtual Machines** ##

During the previous semester at Tilburg Science Hub, we were involved in a time of learning, exploring, and innovating. We've been busy with developing new content and using new tools that we're excited to share with you.
## **Last semester: Diving into the World of Causal Inference and Virtual Machines**

During the previous semester at Tilburg Science Hub, we were involved in a time of learning, exploring, and innovating. We've been busy with developing new content and using new tools that we're excited to share with you.

We have mainly been busy creating new content in the field of causal inference and also delved into the world of virtual machines. Let's take a closer look at the resources we've created in these two areas.

### **Causal Inference**

Causal inference has been the most important subject of our exploration. We have put together a comprehensive set of tools to guide you through this complex but captivating subject:

- [Introduction to Instrumental Variables Estimation](https://tilburgsciencehub.com/topics/analyze-data/regressions/iv/): We start with a thorough introduction to IV (Instrumental Variable) estimation, which is a fundamental concept in the field of causal inference.

- [Doing Calculations with Regression Coefficients Using deltaMethod](https://tilburgsciencehub.com/topics/analyze-data/regressions/deltamethod/): We show you how to handle regression coefficients with precision through the deltaMethod.
- [Introduction to Instrumental Variables Estimation](../topics/Analyze/causal-inference/instrumental-variables/iv.md): We start with a thorough introduction to IV (Instrumental Variable) estimation, which is a fundamental concept in the field of causal inference.

- [Impact evaluation](https://tilburgsciencehub.com/topics/analyze-data/regressions/impact-evaluation/): We take a closer look at impact evaluation through regressions, uncovering how interventions and policies can be rigorously analyzed to make informed decisions.
- [Doing Calculations with Regression Coefficients Using deltaMethod](../topics/Analyze/Regression/linear-regression/deltamethod.md): We show you how to handle regression coefficients with precision through the deltaMethod.

- [Synthetic Controls](https://tilburgsciencehub.com/topics/analyze-data/regressions/synth-control/): Discover the power of the Synthetic Control Method, which is an invaluable tool for causal inference in diverse research scenarios.
- [Impact evaluation](../topics/Analyze/causal-inference/did/impact-evaluation.md): We take a closer look at impact evaluation through regressions, uncovering how interventions and policies can be rigorously analyzed to make informed decisions.

- [Fixed-Effects Estimation in R with the fixest Package](https://tilburgsciencehub.com/topics/analyze-data/regressions/fixest/): Lastly, we dive into fixed-effects estimation using the `fixest`` package in R, which is particularly useful for panel data analyses (and, super super fast!).
- [Synthetic Controls](../topics/Analyze/causal-inference/synthetic-control/synth-control.md): Discover the power of the Synthetic Control Method, which is an invaluable tool for causal inference in diverse research scenarios.

- [Fixed-Effects Estimation in R with the fixest Package](../topics/Analyze/causal-inference/panel-data/fixest.md): Lastly, we dive into fixed-effects estimation using the `fixest`` package in R, which is particularly useful for panel data analyses (and, super super fast!).

### **Virtual Machines**

Besides our focus on causal inference, we've also explored virtual machines (VMs). Think of them like a supercomputer that you can rent on-demand. We've created some building blocks related to virtual machines to help you set them up and run environments in the cloud.

- [Configure a VM with GPUs in Google Cloud](https://tilburgsciencehub.com/topics/automate-and-execute-your-work/reproducible-work/config-vm-gcp/): Learn how to use the computing power of GPUs on Google Cloud by setting up a customized VM to meet your research requirements.
- [Configure a VM with GPUs in Google Cloud](../topics/Automation/Replicability/cloud-computing/config-VM-GCP.md): Learn how to use the computing power of GPUs on Google Cloud by setting up a customized VM to meet your research requirements.

- [Import and run a Python environment on Google cloud with Docker](https://tilburgsciencehub.com/topics/automate-and-execute-your-work/reproducible-work/google_cloud_docker/): Explore the world of containerization and Docker to import and run Python environments on Google Cloud, enhancing the reproducibility and efficiency of your work.

- [Export a Python environment with Docker and share it through Docker Hub](https://tilburgsciencehub.com/topics/automate-and-execute-your-work/reproducible-work/dockerhub/): Learn to export Python environments with Docker and streamline collaboration by sharing them on Docker Hub, ensuring easy access for fellow researchers.
- [Import and run a Python environment on Google cloud with Docker](../topics/Automation/Replicability/cloud-computing/google_cloud_docker.md): Explore the world of containerization and Docker to import and run Python environments on Google Cloud, enhancing the reproducibility and efficiency of your work.

- [Export a Python environment with Docker and share it through Docker Hub](../topics/Automation/Replicability/Docker/dockerhub.md): Learn to export Python environments with Docker and streamline collaboration by sharing them on Docker Hub, ensuring easy access for fellow researchers.

### **Enhancing Your Research Skills in Causal Inference and Virtual Machines**
Our aim is to support your exploration of causal inference and virtual machines. These resources are designed to equip you with the knowledge and tools needed to excel in your research.

Curious about what we will be working on next semester? Keep an eye on our blog!
Our aim is to support your exploration of causal inference and virtual machines. These resources are designed to equip you with the knowledge and tools needed to excel in your research.

Curious about what we will be working on next semester? Keep an eye on our blog!
14 changes: 7 additions & 7 deletions content/examples/keywords-finder.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,19 @@ aliases:

## Overview

This is a template for a reproducible [Dockerized](https://tilburgsciencehub.com/topics/automate-and-execute-your-work/reproducible-work/docker/) application, based on [R](https://tilburgsciencehub.com/topics/configure-your-computer/statistics-and-computation/r/), that **finds keywords and/or sentences in multiple `PDF` files**.
This is a template for a reproducible [Dockerized](../topics/Automation/Replicability/Docker/docker.md) application, based on [R](../topics/Computer-Setup/software-installation/RStudio), that **finds keywords and/or sentences in multiple `PDF` files**.

{{% summary %}}
* We use `R` to first, convert the `PDF` files into plain text files (`.txt`).
* Then, a second `R` script searches the keywords and/or sentences that are previously defined in those converted text files.
* Matches are reported into an Excel file, reporting what keyword or sentence was found in which file.
{{% /summary %}}

`Docker` is used to run the above mentioned process inside an isolated container (*see [this building block](https://tilburgsciencehub.com/topics/configure-your-computer/automation-and-workflows/docker/) to learn more about containers*). By doing so, you can run this application without even having to have `R` installed in your computer plus it will also run smoothly, regardless of what operating system (OS) you're using.

- We use `R` to first, convert the `PDF` files into plain text files (`.txt`).
- Then, a second `R` script searches the keywords and/or sentences that are previously defined in those converted text files.
- Matches are reported into an Excel file, reporting what keyword or sentence was found in which file.
{{% /summary %}}

`Docker` is used to run the above mentioned process inside an isolated container (_see [this building block](../topics/Automation/Replicability/Docker/) to learn more about containers_). By doing so, you can run this application without even having to have `R` installed in your computer plus it will also run smoothly, regardless of what operating system (OS) you're using.

## Motivating Example

In many situations, we make use of `crtl (or "Command" in Mac) + F` to find words or sentences in `PDF` files. However, this can be highly time-consuming, especially if needed to apply in multiple files and/or different keywords or sentences. For instance, we first applied this application in legal research, where we needed to check in over 10,000 court rulings, which ones made reference to a specific law.

## Get The Workflow
Expand Down
16 changes: 7 additions & 9 deletions content/examples/reproducible-workflow-airbnb.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,17 @@ We've crafted this project to run:
- platform-independent (Mac, Linux, Windows)
- across a diverse set of software programs (Stata, Python, R)
- producing an entire (mock) paper, including modules that
- download data from Kaggle,
- prepare data for analysis,
- run a simple analysis,
- produce a paper with output tables and figures.
- download data from Kaggle,
- prepare data for analysis,
- run a simple analysis,
- produce a paper with output tables and figures.

## How to run it

### Dependencies

- Install [Python](/get/python/).
- Anaconda is recommended. [Download Anaconda](https://www.anaconda.com/distribution/).
- Anaconda is recommended. [Download Anaconda](https://www.anaconda.com/download).
- check availability: type `anaconda --version` in the command line.
- Install Kaggle package.
- [Kaggle API](https://github.com/Kaggle/kaggle-api) instruction for installation and setup.
Expand Down Expand Up @@ -61,11 +62,9 @@ Open your command line tool:
- if not, type `cd yourpath/airbnb-workflow` to change your directory to `airbnb-workflow`
- Type `make` in the command line.



### Directory structure

Make sure `makefile` is put in the present working directory. The directory structure for the Airbnb project is shown below.
Make sure `makefile` is put in the present working directory. The directory structure for the Airbnb project is shown below.

```text
├── data
Expand Down Expand Up @@ -108,7 +107,6 @@ Make sure `makefile` is put in the present working directory. The directory stru
- **src**: all source codes.
- Three parts: **data_preparation**, **analysis**, and **paper** (including TeX files).


<!-- {{% codeblock %}}

[js-link](code.js)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,20 @@ aliases:
- /deltaMethod/package
---

## Overview
## Overview

`deltaMethod` is a R function that approximates the standard error of a transformation $g(X)$ of a random variable $X = (x_{1}, x_{2},..)$, given estimates of the mean and the covariance matrix of $X$. The approximation is given by the formula:
$Cov(g(X)) = g'(\mu)Cov(X)[g'(\mu)]^T$, where $\mu$ is an estimate of the mean of $X$.

Having the regression coefficients might not be enough if we want to interpret some combined effects between them. Merely summing the coefficients does not ensure that the resulting effect is significant and it also does not offer any information on the resulting standard error.
Having the regression coefficients might not be enough if we want to interpret some combined effects between them. Merely summing the coefficients does not ensure that the resulting effect is significant and it also does not offer any information on the resulting standard error.

To obtain the combined effects of coefficients we can use the `deltaMethod` function of the R `car` package. To illustrate how to implement it we use two example datasets: the ["Boston Housing" dataset example](https://tilburgsciencehub.com/topics/analyze-data/regressions/deltamethod/#example1) available in the `MASS` package of R and the ["hernandez.nitrogen" example](https://tilburgsciencehub.com/topics/analyze-data/regressions/deltamethod/#example2), a dataset of an agricultural trial of corn with nitrogen fertilizer available in the `agridat` package.
To obtain the combined effects of coefficients we can use the `deltaMethod` function of the R `car` package. To illustrate how to implement it we use two example datasets: the ["Boston Housing" dataset example](#example-1) available in the `MASS` package of R and the ["hernandez.nitrogen" example](#example-2), a dataset of an agricultural trial of corn with nitrogen fertilizer available in the `agridat` package.

## Implementation

### Example 1

Let's first load the Boston dataset and required packages.

Let's first load the Boston dataset and required packages.

{{% codeblock %}}

Expand All @@ -40,14 +40,16 @@ data(Boston)
#dataset description
?Boston
```

{{% /codeblock %}}

Next, we perform a non-linear regression analysis. We regress the natural logarithm of the `medv` variable, which is the median value of owner-occupied homes in $1000s, on :

- `rad`: the index of accessibility to radial highways, included as factor variable
- `dis`: the weighted mean of distances to five Boston employment centres
- `nox`: the nitrogen oxide concentration
- `chas`: the Charles River dummy variable
- `crim`: per capita crime rate
- `crim`: per capita crime rate

{{% codeblock %}}

Expand All @@ -57,23 +59,27 @@ mod <- lm(log(medv) ~ as.factor(rad) + dis + nox + chas + crim + I(crim^2), data

summary(mod)
```

{{% /codeblock %}}

_Output:_

<p align = "center">
<img src = "../images/deltamethod.png" width=350">
</p>

Now let's calculate the combined effect of the first order crime rate per capita (`crim`) with the second order `crim`. The `deltaMethod` function takes as first argument the regression object, followed by the names of the coefficients between quotes:
Now let's calculate the combined effect of the first order crime rate per capita (`crim`) with the second order `crim`. The `deltaMethod` function takes as first argument the regression object, followed by the names of the coefficients between quotes:

{{% codeblock %}}

```R
deltaMethod(mod, "crim + `I(crim^2)`", rhs=1)
```

{{% /codeblock %}}

_Output:_

<p align = "center">
<img src = "../images/deltamethod1.png" width=450">
</p>
Expand All @@ -87,9 +93,11 @@ Alternatively, we can calculate the effect of the Charles River dummy (`chas`) m
```R
deltaMethod(mod, "chas*(crim + `I(crim^2)`)", vcov = hccm)
```

{{% /codeblock %}}

_Output:_

<p align = "center">
<img src = "../images/deltamethod2.png" width=450">
</p>
Expand All @@ -111,11 +119,12 @@ library(car)
#load dataset
df <- hernandez.nitrogen
```

{{% /codeblock %}}

This is a simple profit maximization problem: given the quantity of nitrogen and corn resulted, as well as input and sales prices, we aim to maximize the profit.

We first create an income variable by subtracting the costs of fertilization (`nitro*input_price`) from the revenue resulted from selling the corn (`yield*sale_price`):
We first create an income variable by subtracting the costs of fertilization (`nitro*input_price`) from the revenue resulted from selling the corn (`yield*sale_price`):

`income = yield*sale_price - nitro*input_price`

Expand All @@ -127,6 +136,7 @@ input_price <- 0.665

df <- df %>% mutate(income = yield*sale_price - nitro*input_price)
```

{{% /codeblock %}}

Next we run a quadratic regression between income as a dependent variable, and the nitrogen use as an independent variable.
Expand All @@ -138,9 +148,11 @@ mod1 <- lm(income ~ nitro + I(nitro^2), data = df)

summary(mod1)
```

{{% /codeblock %}}

_Output:_

<p align = "center">
<img src = "../images/deltamethod3.png" width=350">
</p>
Expand All @@ -155,14 +167,17 @@ deltaMethod(mod1, "-nitro /(2*`I(nitro^2)`)", rhs=1)
#equivalently we can also use the beta parameters
deltaMethod(mod1, "-b1/(2*b2)", parameterNames = paste("b", 0:2, sep = ""))
```

{{% /codeblock %}}

_Output:_

<p align = "center">
<img src = "../images/deltamethod4.png" width=450">
</p>

The optimum nitrogen quantity is 166, as shown by the significant estimate of the `deltaMethod`.

## See also
[deltaMethod documentation](https://www.rdocumentation.org/packages/car/versions/3.1-2/topics/deltaMethod)

[deltaMethod documentation](https://www.rdocumentation.org/packages/car/versions/3.1-2/topics/deltaMethod)
Loading