Skip to content

Yevgnen/pybrat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents TOC

Introduction

pybrat is a reader/parser for reading/parsing data annotated by brat.

Installation

From pip

pip install pybrat

From source

pip install git+https://github.com/Yevgnen/pybrat

Usages

Fetch sample data

git clone https://github.com/nlplab/brat.git

Parse annotated data

Below is an example of parsing BioNLP-ST_2011 data:

# -*- coding: utf-8 -*-

import dataclasses

from pybrat.parser import BratParser, Entity, Event, Example, Relation

# Initialize a parser.
brat = BratParser(error="ignore")
examples = brat.parse("brat/example-data/corpora/BioNLP-ST_2011")

# The praser returns dataclasses.
assert len(examples) == 80
assert all(isinstance(x, Example) for x in examples)
assert all(isinstance(e, Entity) for x in examples for e in x.entities)
assert all(isinstance(e, Relation) for x in examples for e in x.relations)
assert all(isinstance(e, Event) for x in examples for e in x.events)

id_ = "BioNLP-ST_2011_EPI/PMID-19377285"
example = next(x for x in examples if x.id == id_)
print(example.text)
print(len(example.entities), next(iter(example.entities)))
print(len(example.relations), next(iter(example.relations)))
print(len(example.events), next(iter(example.events)))

# Use dataclasses.asdict to convert examples to dictionaries.
examples = [*map(dataclasses.asdict, examples)]
assert all(isinstance(x, dict) for x in examples)
assert all(isinstance(e, dict) for x in examples for e in x["entities"])
assert all(isinstance(e, dict) for x in examples for e in x["relations"])
assert all(isinstance(e, dict) for x in examples for e in x["events"])

print(examples[0])

Helper scripts

The pybrat-convert script can be used to convert Brat examples into JSON files.

pybrat-convert -i brat/example-data/corpora/BioNLP-ST_2011 -o ./output --error ignore

Contribution

Formatting Code

To ensure the codebase complies with a style guide, please use flake8, black and isort tools to format and check codebase for compliance with PEP8.