EBI Genome Bioinformatics: Scaling Things Up

This is the code repository used for the "Scaling Things Up" section of the EBI course Genome Bioinformatics, named in previous years as "NGS Bioinformatics".

This sections follows the previous 3 days of the course, where command line tools and basic bioinformatics commands to index files and align fastqs to a reference genome have been acquired. Here we focus on reusing the commands learnt during previous days, to run the same commands using parallelisation and job scheduling.

The following README is a copy of the 2021 Google Docs walkthrough of the interactive part of the session.

Parallelisation

Run git clone on this repository
Go into the folder you just cloned, and then inside the “Parallelisation” folder
Open the align_all_extra_fqs.sh script. What do you think the script will do?
Do you think the script will take a long time to run? What command could we use to time how long a script takes?

Modify the script so that instead of running each alignment, it echos the align command to a file we will call align_commands.sh
Run the script using the parallel command, you can even use the time command to measure how long it takes to run
How long did it take when using parallel to run the command?

Job Schedulers

Remove the echo we added to align_all_extra_fqs.sh so that it will run everything in a for loop
Do you remember how to submit a job with slurm? (hint: its the sbatch command followed by what you want to run)
Run squeue to see your job running. You should see something like this:
We will now kill our job, we do this using the scancel command followed by the JOBID. For me, this is scancel 8 . Find your jobid with squeue and cancel the job
Remove the bam files we generated here
Edit the align_all_extra_fqs.sh file to submit each bwa mem command to slurm
See all the jobs running at once

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Parallelisation		Parallelisation
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EBI Genome Bioinformatics: Scaling Things Up

Parallelisation

Job Schedulers

About

Languages

seanlaidlaw/EBI-Bioinformatics-course-Scaling_things_up

Folders and files

Latest commit

History

Repository files navigation

EBI Genome Bioinformatics: Scaling Things Up

Parallelisation

Job Schedulers

About

Topics

Resources

Stars

Watchers

Forks

Languages