Georgian (ka_GE) word list

Download in: DIC | TXT | SQL

Data sources

Kevin Scannell (http://crubadan.org/languages/ka, CC-BY 4.0)
National Parliamentary Library of Georgia (http://www.nplg.gov.ge/gwdict/index.php)
Other Georgian eBooks/websites (Crawler)

Crawler

Crawler is written on PHP and uses MySQL as a database. Code placed under crawler folder.

Before running the script should be configured the database and run migrations.

First rename the file .env.example to .env and specify database credentials.

Install composer dependencies:

composer install

And run migrations:

composer migrate

Usage

Crawl links with `internal` profile

This command will crawl urls only inside specified domain and ignore external urls

php cmd crawl --project-name="My Project" --profile=internal "http://www.nplg.gov.ge/gwdict/index.php"

Crawl links with `all` profile

This command will crawl all links

php cmd crawl --project-name="My Project" --profile=all "http://www.nplg.gov.ge/gwdict/index.php"

Crawl links with `domain` profile

This command will crawl links with all domains, which end with --domain

php cmd crawl --project-name="My Project" --profile=domain --domain=.ge "http://www.nplg.gov.ge/gwdict/index.php"

Will be crawled links, where url's domain ends with .ge suffix

Crawl links with `subset` profile

This command will crawl all urls if link starts with --subset

php cmd crawl --project-name="My Project" --profile=subset --subset="http://www.nplg.gov.ge/gwdict/index.php?a=list&d=1" "http://www.nplg.gov.ge/gwdict/index.php?a=list&d=1"

Will be crawled links, where url starts with http://www.nplg.gov.ge/gwdict/index.php?a=list&d=1 prefix

Continue project

You can continue stopped project by command

php cmd crawl --project-id={id}

Show all possible options: php cmd help crawl

TODO

Fix wrong entries and add more words
Add tests
Add notification sending on complete

License

Please see the LICENSE included in this repository for a full copy of the MIT license, which this project is licensed under.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
crawler		crawler
dictionary		dictionary
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Georgian (ka_GE) word list

Data sources

Crawler

Usage

Crawl links with `internal` profile

Crawl links with `all` profile

Crawl links with `domain` profile

Crawl links with `subset` profile

Continue project

TODO

License

About

Releases

Packages

Languages

License

akalongman/geo-words

Folders and files

Latest commit

History

Repository files navigation

Georgian (ka_GE) word list

Data sources

Crawler

Usage

Crawl links with internal profile

Crawl links with all profile

Crawl links with domain profile

Crawl links with subset profile

Continue project

TODO

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Crawl links with `internal` profile

Crawl links with `all` profile

Crawl links with `domain` profile

Crawl links with `subset` profile

Packages