chapter_08 #21
priti-chahal
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What is semanticClimate?
semanticClimate is a project which aims to convert climate related documents by the IPCC to a semantic form, understood by both humans and computers.
Why are such conversions required?
The goal is to extract useful information from the IPCC documents so that relevant information could be made accessible for everyone to read and understand. This is achieved by creating dictionaries of different types for different purposes.
WG3 (Mitigation)
We are currently using following tools:
py4ami
docanalysis
gensim
(keyword extraction)Environment and origins:
window 11
python version 3.8.0
docanalysis version 0.2.0
py4ami version 0.0.45
What we started with and our environment.
Software used:
py4ami
docanalysis
How we set up the software:
pip install docanalysis
docanalysis --help
py4ami
has been used for:docanalysis
has been used for:The protocol:
Download the IPCC/ar6/wg3 Chapter* pdf:
Download Peter’s semanticClimate github repository:
git clone https://github.com/petermr/semanticClimate.git
What is chapter 08 about?
Summary of the Chapter
Creation of raw HTML:(using py4ami)
create pdf to html:
fulltext.html
Creation of sections
Extraction of dictionaries
Creation of manual dictionary from pdf:
Creation of abbreviation dictionary from pdf
method to create abb. dict.:
docanalysis tutorial for help
mkdir wiki_hackathon
cd wiki_hackathon
mkdir Chapter08
cd Chapter08
mkdir sections
cd sections
mkdir 0_main_body
docanalysis --project_name wiki_hackathon --output dict_search_5.csv --make_json dict_search_5.json --make_ami_dict entities --extract_abb ip_3_8_urban_abb
where,
--project name
– the name of the project (here, wiki_hackathon)--output
- a csv for dictionary search (not of our use, but required to be created)--make_json
- just enter this. Not of current use, but required.--make_ami_dict
– uses the entities created in the above command--extract_abb
- the abbreviation dictionary that is the output.Validating the created dictionary:
python -m py4ami.pyamix DICT --dict <dict_path> -–validate
keywords/phrase: keyword are extracted by the help of gensim method
keywords.csv
Introduction.md
Table of contents.md
FAQs.md
Annotation of HTML using dictionaries:
html is annotated with dict. by the help of py4ami
method:
py4ami HTML --annotate --dict <dict_path> --inpath <html_path> --outpath <outdir_path> --color <color>
where,
dict_path
– dictionary used for annotation.html_path
– html file to be annotated.outdir_path
– output directory for annotated html file.color
- color to highlight annotation (symbolic name, egYELLOW
, or RGB, eg'#ff7700'
).annotatedfulltext.html
Chapter 08 google colab notebook : click here
Beta Was this translation helpful? Give feedback.
All reactions