This project is intended as a template structure for data science projects. Its main intended use is for teams within organizations but we see no reason why you would not benefit from it even if you are coding solo, participating in a data hackathon or are in an academic group, doing exploratory, statistical analysis or algorithm modelling.
This is a standalone template project that can be used as a starting point for any data science project. It is not a framework, a library, or a package. It is a template that you can use to start your own project. It is not intended to be a one-size-fits-all solution, but rather a starting point for you to build your own project structure.
If you like this project, please consider giving it a ⭐️!
- Jon as the core developer
People who have contributed to this course in the past:
- Karina Moura. A stellar project manager!
- Natalia Del Coco. She is taking her van to the mountains for a while.
- Sara Luxmoore. She can be seen doing cool research-related stuff in Italy these days.
Follow the instructions below to make use of this template.
-
Create a new repository on GitHub using this template. You can do this by clicking on the green "Use this template" button on the top right of this page.
-
Give your project a name and description. You can also choose to make the repository private if you wish.
- Leave "Include all branches" unchecked.
-
GitHub will copy the files from this repository into your new repository and it will trigger an Actions workflow. This workflow will customize labels (to include emojis!) as well as Issues and Pull Request templates for your project.
- If you are not familiar with GitHub Actions, you can read more about it here.
-
Clone your new repository to your computer and start working on it!
Once you have cloned your new repository to your computer, you might want to do the following:
-
Update the
README.md
file to remove all things related to this template and add information about your project. -
Update the
LICENSE
file to reflect the license you want to use for your project. You can find a list of open-source licenses here. -
Modify the name of the
src/python/pkg_name
folder to reflect the name of your project. You can also remove thepkg_name
folder if you are not planning on using custom Python packages.
Click on the links below to learn how to best use this template, and how to contribute to it.
✋ How to contribute
If you want to propose changes to the template, follow the steps below:
- Set up your environment by following the instructions in the Dev Setup section.
- Create a new branch from
develop
and give it a meaningful name. Best practices involve using the following format:<your-username>/<issue-number>-<short-description>
. For example, if you are working on issue #3, you could name your branchjonjoncardoso/3-update-github-action
. Remember the GitFlow workflow! - Make your changes and commit them to your branch. Remember to commit often and to write meaningful commit messages. If you are working on a specific issue, you can use the following format:
<gitmoji> #<issue-number> <commit-message>
. For example, if you are working on issue #3, you could write📝 #3 Update GitHub Action
.- To add emojis on Windows, just type
Win + .
and then select the emoji you want. On Mac, it's the world symbol⌘ + Ctrl + Space
. - You can find a list of gitmojis here. If you are not sure what to write, you can use
📝
for documentation,🐛
for bug fixes,🌟
for new features, and♻️
for refactoring. You can also use🔧
for general changes. If you are not sure, just ask!
- To add emojis on Windows, just type
- When you are done, push all your commits and then open a pull request to merge your branch into
develop
. You can do this by clicking on the "Compare & pull request" button on GitHub. Make sure to add a meaningful title and description to your pull request. If you are working on a specific issue, you can use the following format:#<issue-number> <pull-request-title>
. For example, if you are working on issue #3, you could write#3 Update GitHub Action
. Mark @jonjoncardoso as a reviewer.
🧰 Dev Setup
-
Install Python 3.9 or higher on your computer.
-
Create a new conda environment:
conda create -y -n=venv-ds-workflow python=3.10.8
-
Activate the environment and make sure you have
pip
installed inside that environment:
# the exact `activate` command will vary depending on your OS
conda activate venv-ds-workflow
💡 Remember to activate this particular conda
environment whenever you reopen VSCode/the terminal.
- Install required libraries
pip install -r requirements.txt
Now, whenever you open a Jupyter Notebook, you should see the venv-ds-workflow
kernel available. You can also run jupyter kernelspec list
to see all the kernels available on your computer.
- Clone this repository to your computer.
- Open a terminal and navigate to the root of this repository.
- Ensure you have R version 4.2.2 or higher
- Open the R console in this same directory and install
renv
package:
install.packages("renv")
- Run
renv::restore()
to install all the packages needed for this project
If using quarto is not your thing, you can just ignore this section. If you want to use quarto, follow the steps below:
- Install Quarto on your computer.
- Run the following command to start the website locally:
This will read the instructions from
quarto preview . --render all --no-browser
_quarto.yml
and render the website locally. - Open your browser and navigate to
http://localhost:<port>/
. That's it!
⚒️ (Advanced) Jon's full setup
I, @jonjoncardoso, like to use R on VSCode (WSL Ubuntu) instead of RStudio. It is a weird setup if you come from R, but it's a good setup for when you need to switch between R and Python all the time. Feel free to just ignore this stuff but if you want to replicate my setup, just follow the steps below:
- Install VSCode
- Install WSL on Windows
- Install WSL extension on VSCode
- Open VSCode and open a new WSL window (Type
Ctrl+Shift+P
and typeWSL: New Window
) - Open the Ubuntu terminal on VSCode and install R
When doing R
- Install the R extension on VSCode
- Install Quarto
- Install the Quarto extension on VSCode
- When running R notebooks (either
.Rmd
or.qmd
) manually, you will see that some plots do not render with adequate size. To fix this, follow these instructions.
When doing Python
- Install the Python extension on VSCode
- Install the Jupyter extension on VSCode
I also use the following VSCode Extensions: