Parse SyRi into R and plot with gggenomes

Niklas Schandry

About

I was looking for a way to plot syri-output, similar to what plotsr does, but with easier costumization and in R. I could not find anything, so I wrote something. The files included here in data for demonstration are the plotsr example files.

This requires ‘tidyverse’ (only dplyr and vroom) and gggenomes. This repo also comes with a snapshot that can be used with renv::restore()

renv::install("tidyverse","thackl/gggenomes")

library(tidyverse)
library(gggenomes)
library(magrittr)
source("functions/parse_syri.R")

The calculation of polygons for curves between sequences is directly lifted from GENESPACE

Input

The script expects the syri output to be named genomeA_on_genomeB.syri.out, and will split based on this. There is no flexibility here.

Running

In this example, genomeA is col and genomeB is ler.

dat <- parse_syri("data/col_on_ler.syri.out",
                  order = data.frame(bin_id = c("col","ler")))

## Created seqtab

## Created links

## Calculating polygons

Plotting

gggenomes::gggenomes(seqs = dat$seqs,
                     links = dat$links) + 
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.6
  ) +
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.8
  ) +
  geom_seq(linewidth = 1) + 
  geom_bin_label(size=7) +
  syri_plot_fills  +
  ggtitle("Synteny between Col and Ler")

Options

No resizing

By default, short syntenic regions larger than 5000bp are resized to make them visible. Since this does not reflect the original input, this can be disabled:

dat <- parse_syri("data/col_on_ler.syri.out",
                  order = data.frame(bin_id = c("col","ler")),
                  resize_polygons = F)

## Created seqtab

## Created links

## Calculating polygons

gggenomes::gggenomes(seqs = dat$seqs,
                     links = dat$links) + 
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.6
  ) +
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.8
  ) +
  geom_seq(linewidth = 1) + 
  geom_bin_label(size=7) +
  syri_plot_fills  +
  ggtitle("Synteny between Col and Ler without resizing")

Minimum resize size

Only regions larger than min_polygon_feat_size are resized (default 5000), this can be modified to also include smaller regions

dat <- parse_syri("data/col_on_ler.syri.out",
                  order = data.frame(bin_id = c("col","ler")),
                  resize_polygons = T,
                  min_polygon_feat_size = 1000)

## Created seqtab

## Created links

## Calculating polygons

Naturally, this will create a busier plot.

gggenomes::gggenomes(seqs = dat$seqs,
                     links = dat$links) + 
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.6
  ) +
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.8
  ) +
  geom_seq(linewidth = 1) + 
  geom_bin_label(size=7) +
  syri_plot_fills  +
  ggtitle("Synteny between Col and Ler, resizing regions larger than 999bp")

Resize output size

Regions are resized to have a certain length relative to the chromosome, controlled by resize_polygons_size, which defaults to 0.003 (0.3%) of the chromosome length. Altering this will make resized regions larger, or smaller.

dat <- parse_syri("data/col_on_ler.syri.out",
                  order = data.frame(bin_id = c("col","ler")),
                  resize_polygons = T,
                  resize_polygons_size = 0.01)

## Created seqtab

## Created links

## Calculating polygons

This will produce wider polygons for resized links.

gggenomes::gggenomes(seqs = dat$seqs,
                     links = dat$links) + 
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.6
  ) +
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.8
  ) +
  geom_seq(linewidth = 1) + 
  geom_bin_label(size=7) +
  syri_plot_fills  +
  ggtitle("Synteny between Col and Ler")

Multiple genomes

Comparing two genomes is nice, but more are better.

parse_syri can handle multiple outputs in one go:

file_list <- list.files("data", full.names = T)
syri_order <- data.frame(bin_id = c("col", "ler", "cvi", "eri"))
dat <- parse_syri(file_list, order = syri_order)

## Created seqtab

## Created links

## Calculating polygons

Making a plot from this works the same way of making a plot of only one comparison. The order of sequences is set via the order argument to parse_syri()

gggenomes::gggenomes(seqs = dat$seqs,
                     links = dat$links) + 
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type == "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.6
  ) +
  geom_polygon(
    data = dat$polygons %>% filter(direct) %>% filter(type != "SYN"),
    aes(
      x = x,
      y = y,
      fill = type,
      group = link_grp
    ),
    alpha = 0.8
  ) +
  geom_seq(linewidth = 1) + 
  geom_bin_label(size=7) +
  syri_plot_fills  +
  ggtitle("Synteny between Col - Ler - Cvi - Eri")

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
functions		functions
parse_files/figure-gfm		parse_files/figure-gfm
renv		renv
.gitignore		.gitignore
README.md		README.md
parse.Rmd		parse.Rmd
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parse SyRi into R and plot with gggenomes

About

Input

Running

Plotting

Options

No resizing

Minimum resize size

Resize output size

Multiple genomes

About

Releases

Packages

Languages

nschan/syri_gggenomes

Folders and files

Latest commit

History

Repository files navigation

Parse SyRi into R and plot with gggenomes

About

Input

Running

Plotting

Options

No resizing

Minimum resize size

Resize output size

Multiple genomes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages