** You need Python 3.10 or later to run this script. **
This script uses the Save Page Now 2 Public API.
To use it:
-
Clone or download and unzip this repository.
-
Install the required Python libraries. Assuming you cloned or unzipped this repository to the directory
path/to/capture-urls/
:cd path/to/capture-urls/ make
-
Go to https://archive.org/account/s3.php and get your S3-like API keys.
-
In
path/to/capture-urls/
, create a file calledsecret.py
with the following contents:ACCESS_KEY = 'your access key' SECRET_KEY = 'your secret key'
(Use the actual values of your access key and secret key, not
your access key
andyour secret key
.) -
Optionally edit
config.py
to your liking. -
Archive your URLs:
cat urls.txt | ./capture-urls.py > archived-urls.txt
urls.txt
should contain a list of URLs to be archived, one on each line. -
Archiving URLs can take a long time. You can interrupt the process with
Ctrl-C
. This will create a file calledprogress.json
with the state of the archiving process so far. If you start the process again, it will pick up where it left off. You can add new URLs tourls.txt
before you restart the process. -
When it finishes running you should have a list of the archived URLs in
archived-urls.txt
.