Skip to content

Commit

Permalink
Merge pull request #26 from databio/dev
Browse files Browse the repository at this point in the history
Rename repository to `gtars`
  • Loading branch information
nleroy917 authored Jun 11, 2024
2 parents f40fc8c + 09a0716 commit 506dabe
Show file tree
Hide file tree
Showing 70 changed files with 288 additions and 578 deletions.
2 changes: 1 addition & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"rust-analyzer.linkedProjects": [
"./genimtools/Cargo.toml",
"./gtars/Cargo.toml",
"./bindings/Cargo.toml",
]
}
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
<h1 align="center">
<img src="genimtools/docs/logo.svg" alt="genimtools logo" height="100px">
<img src="gtars/docs/logo.svg" alt="gtars logo" height="100px">
</h1>

`genimtools` is a rust crate that provides a set of tools for working with genomic interval data. Its primary goal is to provide processors for our python package, [`geniml`](https:github.com/databio/geniml), a library for machine learning on genomic intervals. However, it can be used as a standalone library for working with genomic intervals as well.
`gtars` is a rust crate that provides a set of tools for working with genomic interval data. Its primary goal is to provide processors for our python package, [`geniml`](https:github.com/databio/geniml), a library for machine learning on genomic intervals. However, it can be used as a standalone library for working with genomic intervals as well.

`genimtools` provides three things:
`gtars` provides three things:

1. A rust library crate.
2. A command-line interface, written in rust.
Expand All @@ -14,24 +14,24 @@

This repo is organized like so:

1. A rust library crate (`/genimtools/lib.rs`) that provides functions, traits, and structs for working with genomic interval data.
2. A rust binary crate (in `/genimtools/main.rs`), a small, wrapper command-line interface for the library crate.
1. A rust library crate (`/gtars/lib.rs`) that provides functions, traits, and structs for working with genomic interval data.
2. A rust binary crate (in `/gtars/main.rs`), a small, wrapper command-line interface for the library crate.
3. A rust crate (in `/bindings`) that provides Python bindings, and a resulting Python package, so that it can be used within Python.

This repository is a work in progress, and still in early development.

## Installation
To install `genimtools`, you must have the rust toolchain installed. You can install it by following the instructions [here](https://www.rust-lang.org/tools/install).
To install `gtars`, you must have the rust toolchain installed. You can install it by following the instructions [here](https://www.rust-lang.org/tools/install).

You may build the binary locally using `cargo build --release`. This will create a binary in `target/release/genimtools`. You can then add this to your path, or run it directly.
You may build the binary locally using `cargo build --release`. This will create a binary in `target/release/gtars`. You can then add this to your path, or run it directly.

## Usage
`genimtools` is very early in development, and as such, it does not have a lot of functionality yet. However, it does have a few useful tools. To see the available tools, run `genimtools --help`. To see the help for a specific tool, run `genimtools <tool> --help`.
`gtars` is very early in development, and as such, it does not have a lot of functionality yet. However, it does have a few useful tools. To see the available tools, run `gtars --help`. To see the help for a specific tool, run `gtars <tool> --help`.

Alternatively, you can link `genimtools` as a library in your rust project. To do so, add the following to your `Cargo.toml` file:
Alternatively, you can link `gtars` as a library in your rust project. To do so, add the following to your `Cargo.toml` file:
```toml
[dependencies]
genimtools = { git = "https://github.com/databio/genimtools" }
gtars = { git = "https://github.com/databio/gtars" }
```

## Testing
Expand All @@ -42,13 +42,13 @@ To run the tests, run `cargo test`.
If you'd like to add a new tool, you can do so by creating a new module within the src folder.

### New public library crate tools
If you want this to be available to users of `genimtools`, you can add it to the `genimtools` library crate as well. To do so, add the following to `src/lib.rs`:
If you want this to be available to users of `gtars`, you can add it to the `gtars` library crate as well. To do so, add the following to `src/lib.rs`:
```rust
pub mod <tool_name>;
```

### New binary crate tools
Finally, if you want to have command-line functionality, you can add it to the `genimtools` binary crate. This requires two steps:
Finally, if you want to have command-line functionality, you can add it to the `gtars` binary crate. This requires two steps:
1. Create a new `cli` using `clap` inside the `interfaces` module of `src/cli.rs`:
```rust
pub fn make_new_tool_cli() -> Command {
Expand Down
8 changes: 4 additions & 4 deletions bindings/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
[package]
name = "genimtools-py"
version = "0.0.13"
name = "gtars-py"
version = "0.0.14"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "genimtools"
name = "gtars"
crate-type = ["cdylib"]

[dependencies]
anyhow = "1.0.82"
genimtools = { path = "../genimtools" }
gtars = { path = "../gtars" }
pyo3 = { version = "0.21", features=["anyhow", "extension-module"] }
numpy = "0.21"
# pyo3-tch = { git = "https://github.com/LaurentMazare/tch-rs" }
Expand Down
10 changes: 5 additions & 5 deletions bindings/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# genimtools
This is a python wrapper around the `genimtools` crate. It provides an easy interface for using `genimtools` in python. It is currently in early development, and as such, it does not have a lot of functionality yet, but new tools are being worked on right now.
# gtars
This is a python wrapper around the `gtars` crate. It provides an easy interface for using `gtars` in python. It is currently in early development, and as such, it does not have a lot of functionality yet, but new tools are being worked on right now.

## Installation
You can get `genimtools` from PyPI:
You can get `gtars` from PyPI:
```bash
pip install genimtools
pip install gtars
```

## Usage
Import the package, and use the tools:
```python
import genimtools as gt
import gtars as gt

gt.prune_universe(...)
```
Expand Down
1 change: 0 additions & 1 deletion bindings/genimtools/__init__.py

This file was deleted.

1 change: 0 additions & 1 deletion bindings/genimtools/ailist/__init__.py

This file was deleted.

1 change: 0 additions & 1 deletion bindings/genimtools/models/__init__.py

This file was deleted.

1 change: 0 additions & 1 deletion bindings/genimtools/tokenizers/__init__.py

This file was deleted.

1 change: 0 additions & 1 deletion bindings/genimtools/utils/__init__.py

This file was deleted.

1 change: 0 additions & 1 deletion bindings/genimtools/vocab/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions bindings/genimtools/vocab/__init__.pyi

This file was deleted.

1 change: 1 addition & 0 deletions bindings/gtars/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .gtars import * # noqa: F403
File renamed without changes.
1 change: 1 addition & 0 deletions bindings/gtars/ailist/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .gtars.ailist import * # noqa: F403
File renamed without changes.
1 change: 1 addition & 0 deletions bindings/gtars/models/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .gtars.models import * # noqa: F403
File renamed without changes.
1 change: 1 addition & 0 deletions bindings/gtars/tokenizers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .gtars.tokenizers import * # noqa: F403
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ class FragmentTokenizer:
:param path: The path to the universe file. This should be a BED file.
"""

def tokenize_fragments(self, file_path: str, out_path: str = None, filter: List[str] = None) -> None:
def tokenize_fragments_to_gtoks(self, file_path: str, out_path: str = None, filter: List[str] = None) -> None:
"""
Tokenize a file containing fragments.
Expand Down
1 change: 1 addition & 0 deletions bindings/gtars/utils/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .gtars.utils import * # noqa: F403
File renamed without changes.
2 changes: 1 addition & 1 deletion bindings/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ requires = ["maturin>=1.3,<2.0"]
build-backend = "maturin"

[project]
name = "genimtools"
name = "gtars"
requires-python = ">=3.8"
classifiers = [
"Programming Language :: Rust",
Expand Down
2 changes: 1 addition & 1 deletion bindings/src/ailist/mod.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use genimtools::ailist::{AIList, Interval};
use gtars::ailist::{AIList, Interval};
use pyo3::{prelude::*, pyclass};

use crate::models::PyInterval;
Expand Down
14 changes: 5 additions & 9 deletions bindings/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,16 @@ mod ailist;
mod models;
mod tokenizers;
mod utils;
mod vocab;

pub const VERSION: &str = env!("CARGO_PKG_VERSION");

#[pymodule]
fn genimtools(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
let vocab_module = pyo3::wrap_pymodule!(vocab::vocab);
fn gtars(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
let tokenize_module = pyo3::wrap_pymodule!(tokenizers::tokenizers);
let ailist_module = pyo3::wrap_pymodule!(ailist::ailist);
let utils_module = pyo3::wrap_pymodule!(utils::utils);
let models_module = pyo3::wrap_pymodule!(models::models);

m.add_wrapped(vocab_module)?;
m.add_wrapped(tokenize_module)?;
m.add_wrapped(ailist_module)?;
m.add_wrapped(utils_module)?;
Expand All @@ -28,11 +25,10 @@ fn genimtools(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
let sys_modules: &Bound<'_, PyDict> = binding.downcast()?;

// set names of submodules
sys_modules.set_item("genimtools.vocab", m.getattr("vocab")?)?;
sys_modules.set_item("genimtools.tokenizers", m.getattr("tokenizers")?)?;
sys_modules.set_item("genimtools.ailist", m.getattr("ailist")?)?;
sys_modules.set_item("genimtools.utils", m.getattr("utils")?)?;
sys_modules.set_item("genimtools.models", m.getattr("models")?)?;
sys_modules.set_item("gtars.tokenizers", m.getattr("tokenizers")?)?;
sys_modules.set_item("gtars.ailist", m.getattr("ailist")?)?;
sys_modules.set_item("gtars.utils", m.getattr("utils")?)?;
sys_modules.set_item("gtars.models", m.getattr("models")?)?;

// add constants
m.add("__version__", VERSION)?;
Expand Down
2 changes: 1 addition & 1 deletion bindings/src/models/region.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use pyo3::exceptions::PyTypeError;
use pyo3::prelude::*;

use anyhow::Result;
use genimtools::common::models::region::Region;
use gtars::common::models::region::Region;

use crate::models::PyUniverse;

Expand Down
2 changes: 1 addition & 1 deletion bindings/src/models/region_set.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ use numpy::ndarray::Array;
use numpy::{IntoPyArray, PyArray1};

use anyhow::Result;
use genimtools::common::utils::extract_regions_from_bed_file;
use gtars::common::utils::extract_regions_from_bed_file;

use crate::models::{PyRegion, PyTokenizedRegion, PyUniverse};

Expand Down
2 changes: 1 addition & 1 deletion bindings/src/models/universe.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use pyo3::prelude::*;
use anyhow::Result;

use crate::models::PyRegion;
use genimtools::common::models::Universe;
use gtars::common::models::Universe;

#[pyclass(name = "Universe")]
#[derive(Clone, Debug)]
Expand Down
68 changes: 61 additions & 7 deletions bindings/src/tokenizers/fragments_tokenizer.rs
Original file line number Diff line number Diff line change
@@ -1,20 +1,34 @@
use gtars::tokenizers::FragmentTokenizer;
use gtars::tokenizers::TreeTokenizer;
use pyo3::prelude::*;

use super::PyTokenizedRegionSet;
use super::PyUniverse;

#[pyclass(name = "FragmentTokenizer")]
pub struct PyFragmentTokenizer {
pub tokenizer: genimtools::tokenizers::FragmentTokenizer,
pub tokenizer: gtars::tokenizers::FragmentTokenizer<TreeTokenizer>,
pub universe: Py<PyUniverse>, // this is a Py-wrapped version self.tokenizer.universe for performance reasons
}

#[pymethods]
impl PyFragmentTokenizer {
#[new]
pub fn new(path: String) -> PyResult<Self> {
let path = std::path::Path::new(&path);
let tokenizer = genimtools::tokenizers::FragmentTokenizer::try_from(path)?;
Ok(PyFragmentTokenizer { tokenizer })
Python::with_gil(|py| {
let path = std::path::Path::new(&path);
let tokenizer = gtars::tokenizers::TreeTokenizer::try_from(path)?;
let frag_tokenizer = FragmentTokenizer::new(tokenizer);
let py_universe: PyUniverse = frag_tokenizer.tokenizer.universe.to_owned().into();
let py_universe_bound = Py::new(py, py_universe)?;
Ok(PyFragmentTokenizer {
tokenizer: frag_tokenizer,
universe: py_universe_bound,
})
})
}

pub fn tokenize_fragments(
pub fn tokenize_fragments_to_gtoks(
&self,
file: String,
out_path: Option<String>,
Expand All @@ -26,9 +40,49 @@ impl PyFragmentTokenizer {
match filter {
Some(filter) => self
.tokenizer
.tokenize_fragments_with_filter(path, out_path, filter),
None => self.tokenizer.tokenize_fragments(path, out_path),
.tokenize_fragments_to_gtoks_with_filter(path, out_path, filter),
None => self.tokenizer.tokenize_fragments_to_gtoks(path, out_path),
}?;
Ok(())
}

pub fn tokenize_fragments(
&self,
file: String,
filter: Option<Vec<String>>,
) -> PyResult<Vec<PyTokenizedRegionSet>> {
let path = std::path::Path::new(&file);
match filter {
Some(filter) => {
let tokenized_region_sets = self
.tokenizer
.tokenize_fragments_with_filter(path, filter)?;
Python::with_gil(|py| {
let py_tokenized_regions_sets = tokenized_region_sets
.into_iter()
.map(|trs| PyTokenizedRegionSet {
ids: trs.ids,
curr: 0,
universe: self.universe.clone_ref(py),
})
.collect();
Ok(py_tokenized_regions_sets)
})
}
None => {
let tokenized_region_sets = self.tokenizer.tokenize_fragments(path)?;
Python::with_gil(|py| {
let py_tokenized_regions_sets = tokenized_region_sets
.into_iter()
.map(|trs| PyTokenizedRegionSet {
ids: trs.ids,
curr: 0,
universe: self.universe.clone_ref(py),
})
.collect();
Ok(py_tokenized_regions_sets)
})
}
}
}
}
6 changes: 3 additions & 3 deletions bindings/src/tokenizers/tree_tokenizer.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
use genimtools::tokenizers::traits::SpecialTokens;
use gtars::tokenizers::traits::SpecialTokens;
use pyo3::prelude::*;
use pyo3::types::PyAny;

use anyhow::Result;

use std::path::Path;

use genimtools::common::models::RegionSet;
use genimtools::tokenizers::{Tokenizer, TreeTokenizer};
use gtars::common::models::RegionSet;
use gtars::tokenizers::{Tokenizer, TreeTokenizer};

use crate::models::{PyRegion, PyTokenizedRegionSet, PyUniverse};
use crate::utils::extract_regions_from_py_any;
Expand Down
8 changes: 4 additions & 4 deletions bindings/src/utils/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use pyo3::prelude::*;
use pyo3::types::{PyAny, PyIterator};

use anyhow::Result;
use genimtools::common::models::{Region, RegionSet};
use gtars::common::models::{Region, RegionSet};

// this is for internal use only
pub fn extract_regions_from_py_any(regions: &Bound<'_, PyAny>) -> Result<RegionSet> {
Expand All @@ -20,7 +20,7 @@ pub fn extract_regions_from_py_any(regions: &Bound<'_, PyAny>) -> Result<RegionS
.into());
}

let regions = genimtools::common::utils::extract_regions_from_bed_file(regions);
let regions = gtars::common::utils::extract_regions_from_bed_file(regions);
match regions {
Ok(regions) => return Ok(RegionSet::from(regions)),
Err(e) => return Err(pyo3::exceptions::PyValueError::new_err(e.to_string()).into()),
Expand Down Expand Up @@ -55,13 +55,13 @@ pub fn extract_regions_from_py_any(regions: &Bound<'_, PyAny>) -> Result<RegionS

#[pyfunction]
pub fn write_tokens_to_gtok(filename: &str, tokens: Vec<u32>) -> PyResult<()> {
genimtools::io::write_tokens_to_gtok(filename, &tokens)?;
gtars::io::write_tokens_to_gtok(filename, &tokens)?;
Ok(())
}

#[pyfunction]
pub fn read_tokens_from_gtok(filename: &str) -> PyResult<Vec<u32>> {
let tokens = genimtools::io::read_tokens_from_gtok(filename)?;
let tokens = gtars::io::read_tokens_from_gtok(filename)?;
Ok(tokens)
}

Expand Down
Loading

0 comments on commit 506dabe

Please sign in to comment.