wikiextract
is a word extractor for Wikipedia articles. It can extract
words bigger than 4 characters from a given Wikipedia page or list of
pages and save them to a file you can later use as the source for
generating diceware passwords.
First install the dependencies:
- Go 1.22 or above.
- make.
- scdoc.
Switch to the latest stable tag, v1.0.0
, then compile and install:
git checkout v1.0.0
make
sudo make install
$ wikiextract --help
NAME:
wikiextract - a simple word extractor for Wikipedia articles
USAGE:
wikiextract [global options]
VERSION:
1.0.0
GLOBAL OPTIONS:
--input-url value, -u value [ --input-url value, -u value ] the URL of the Wikipedia page
--input-file value, -f value a file containing a list of URLs
--output value, -o value the path to the output file
--help, -h show help
--version, -v print the version
$ wikiextract -u 'https://en.wikipedia.org/wiki/Wikipedia' -o 'output.txt'
See wikiextract(1) after installing for more information.
Anyone can help make wikiextract
better. Send patches on the mailing
list and report
bugs on the issue
tracker.
You must sign-off your work using git commit --signoff
. Follow the
Linux kernel developer's certificate of
origin
for more details.
All contributions are made under the GPL-2.0 license.
The following resources are available:
- Support and general discussions.
- Patches and development related questions.
- Instructions on how to prepare patches.
- Feature requests and bug reports.
Released under the GPL-2.0 license.