TMVis is a web service to provide visualizations of how individual webpages have changed over time. We offer four visualizations: Image Grid, Image Slider, Timeline, and Animated GIF.
See https://www.cs.odu.edu/~mweigle/tmvis-embed/ for examples of embedding the Image Grid, Image Slider, and Animated GIF in a webpage.
For more details, including a system walk-through and demo video, see Visualizing Webpage Changes Over Time With TMVis from May 2020 on the ODU WS-DL Blog.
This work has been supported by a NEH/IMLS Digital Humanities Advancement Grant (HAA-256368-17). We are grateful for the support of NEH and IMLS, and for the input from our partners, Deborah Kempe and Sumitra Duncan from the Frick Art Reference Library and New York Art Resources Consortium and Pamela Graham and Alex Thurman from Columbia University Libraries. This project is an extension of AlSum and Nelson’s ECIR 2014 work, “Thumbnail Summarization Techniques for Web Archives”, and Mat Kelly's (@machawk1) ArchiveThumbnails project, funded by an incentive grant from Columbia University Libraries and the Andrew W. Mellon Foundation.
Please cite this project as indicated in the Citing Project section.
Running the server in a Docker container can make the process of dependency management easier. This document assumes that you have Docker setup already, if not then follow the official guide.
A Docker image from the current version of the code is built and published on Docker Hub at oduwsdl/tmvis. Pull the image and run the service in a container as shown below:
$ docker pull oduwsdl/tmvis
$ docker run --shm-size=1G -it --rm -p 3000:3000 oduwsdl/tmvis
Alternatively, a custom Docker image can be built from the source. In order to do this, clone the repository and change the working directory then build and run the image as the following:
$ git clone https://github.com/oduwsdl/tmvis.git
$ cd tmvis
$ docker image build -t timemapvis .
$ docker run --shm-size=1G -it --rm -p 3000:3000 timemapvis
In the above command the container is running and can be accessed from outside on port 3000
at http://localhost:3000/. If you want to run the service on a different port, say 80
then change -p 3000:3000
to -p 80:3000
.
In order to persist generated thumbnails, mount a host directory as a volume inside the container by adding -v /SOME/HOST/DIRECTORY:/app/assets/screenshots
flag when running the container.
Container is completely transparent from the outside and it will be accessed as if the service is running in the host machine itself.
In case if you want to make changes in the tmvis
code itself, you might want to run it in the development mode by mounting the code from the host machine inside the container so that changes are reflected immediately, without requiring an image rebuild. Use the following command to mount the code from the host:
$ docker run --shm-size=1G -it --rm -v "$PWD":/app -v /app/node_modules -p 3000:3000 timemapvis
Node.js is required to run the service. In order to run this program locally, clone the repository and change the working directory then install dependencies and run the server file as the following:
$ git clone https://github.com/oduwsdl/tmvis.git
$ cd tmvis
$ npm install -g
$ node tmvis.js
Running this service provides a user with the array of JSON object as the response (webservice model), which then has to be visualized with the UI tool deployed at http://tmvis.cs.odu.edu/ for which the code is available at https://github.com/oduwsdl/tmvis/ under the public folder.
Supported server arguments are as follows:
host
is used to specify server's hostname, defaultlocalhost
port
is used to specify server's port, default3000
proxy
is used to specify proxy, generated usinghost
andport
if not provideddebug
is used to add logs to server's console, useful for developmentssd
is used to specify the amount of time to wait before taking a screenshot of a memento in seconds, default2
oes
overrides the usage of cache files, defaultfalse
os
computes both simhash and hamming distance, defaulttrue
maxMementos
is used to specify the maximum number of mementos to be analyzed, default1000
To query the server instance generated using your browser visit http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/[histogram | stats | summary]/[from]/[to]/http://4genderjustice.org/
, which has the attributes path as primesource/ci/role/from/to/URI-R
substitute the URI-R to request a different site's summarization.
primesource
gets the value of 'archiveit', 'arquivopt', or 'internetarchive' as to let the service know which is the primary source.ci
is used to specify the collection identifier if not specified the argument 'all' is usedrole
: This is used to specify the values 'histogram', 'stats' or 'summary'.- histogram: to get dates and times of a timemap in the specified date range.
- stats: for getting the no of unique mementos.
- summary: to get the the unique mementos along with the screenshots captured.
from
andto
: These parameters are used to specify the date range of the timemap to be loaded (/0/0/
for full timemap or/YYYY-MM-DD/YYYY-MM-DD
format for a specific date range).
http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/histogram/0/0/http://4genderjustice.org/
http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/stats/0/0/http://4genderjustice.org/
http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/summary/0/0/http://4genderjustice.org/
http://localhost:3000/alsummarizedtimemap/internetarchive/all/4/stats/2016-08-01/2017-07-23/http://4genderjustice.org/
http://localhost:3000/alsummarizedtimemap/internetarchive/all/4/summary/2016-08-01/2017-07-23/http://4genderjustice.org/
curl -il http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/histogram/0/0/http://4genderjustice.org/
Mapping of attributes of URI to the values are as follows:
primesource -> archiveIt
collection Identifier -> 1068
hammingdistance -> 4
role -> histogram
from date -> 0
to date -> 0
URI-R under request -> http://4genderjustice.org/
[
"Jul 01, 2015 21:56:41",
"Jul 01, 2015 22:32:40",
"Oct 01, 2015 21:17:52",
....
"Oct 24, 2019 00:52:02",
"Jan 23, 2020 23:10:05",
"Jan 23, 2020 23:10:25"
]
curl -il http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/stats/0/0/http://4genderjustice.org/
Mapping of attributes of URI to the values are as follows:
primesource -> archiveIt
collection Identifier -> 1068
hammingdistance -> 4
role -> stats
from date -> 0
to date -> 0
URI-R under request -> http://4genderjustice.org/
[
{
"threshold":2,
"totalmementos":42,
"unique":9,
"timetowait":5,
"fromdate":"Wed, 01 Jul 2015 21:56:41 GMT",
"todate":"Wed, 24 Jul 2019 02:42:08 GMT"
},
....
{
"threshold":12,
"totalmementos":42,
"unique":2,
"timetowait":2,
"fromdate":"Wed, 01 Jul 2015 21:56:41 GMT",
"todate":"Wed, 24 Jul 2019 02:42:08 GMT"
}
]
curl -il http://localhost:3000/alsummarizedtimemap/archiveIt/1068/4/summary/0/0/http://4genderjustice.org/
Mapping of attributes of URI to the values are as follows:
primesource -> archiveIt
collection Identifier -> 1068
hammingdistance -> 4
role -> summary
from date -> 0
to date -> 0
URI-R under request -> http://4genderjustice.org/
[
{
"timestamp": 1435787801,
"event_series": "Thumbnails",
"event_html": 'http://localhost:3000/static/timemapSum_httpwaybackarchiveitorg106820150701215641http4genderjusticeorg.png',
"event_date": "Aug. 01, 2015",
"event_display_date": "2015-07-01, 21:56:41",
"event_description": "",
"event_link": "http://wayback.archive-it.org/1068/20150701215641/http://4genderjustice.org/"
},
{
"timestamp": 1435789960,
"event_series": "Non-Thumbnail Mementos",
"event_html": 'http://localhost:3000/static/notcaptured.png',
"event_html_similarto": 'http://localhost:3000/static/timemapSum_httpwaybackarchiveitorg106820150701215641http4genderjusticeorg.png',
"event_date": "Aug. 01, 2015",
"event_display_date": "2015-07-01, 22:32:40",
"event_description": "",
"event_link": "http://wayback.archive-it.org/1068/20150701223240/http://4genderjustice.org/"
},....
]
curl -il http://localhost:3000/alsummarizedtimemap/internetarchive/all/4/stats/2016-08-01/2017-07-23/http://4genderjustice.org/
Mapping of attributes of URI to the values are as follows:
primesource -> internetarchive
collection Identifier -> all
hammingdistance -> 4
role -> stats
from date -> 2016-08-01
to date -> 2017-07-23
URI-R under request -> http://4genderjustice.org/
[
{
"threshold":2,
"totalmementos":91,
"unique":6,
"timetowait":4,
"fromdate":"Tue, 02 Aug 2016 16:39:55 GMT",
"todate":"Sat, 22 Jul 2017 06:49:56 GMT"
},
....
{
"threshold":7,
"totalmementos":91,
"unique":1,
"timetowait":1,
"fromdate":"Tue, 02 Aug 2016 16:39:55 GMT",
"todate":"Sat, 22 Jul 2017 06:49:56 GMT"
}
]
curl -il http://localhost:3000/alsummarizedtimemap/internetarchive/all/4/summary/2016-08-01/2017-07-23/http://4genderjustice.org/
Mapping of attributes of URI to the values are as follows:
primesource -> internetarchive
collection Identifier -> all
hammingdistance -> 4
role -> summary
from date -> 2016-08-01
to date -> 2017-07-23
URI-R under request -> http://4genderjustice.org/
[
{
"timestamp":1470155995,
"event_series":"Thumbnails",
"event_html":"http://localhost:3000/static/timemapSum_httpwebarchiveorgweb20160802163955http4genderjusticeorg.png",
"event_date":"Aug. 02, 2016",
"event_display_date":"2016-08-02, 16:39:55",
"event_description":"",
"event_link":"http://web.archive.org/web/20160802163955/http://4genderjustice.org/"
},
....
{
"timestamp":1500706196,
"event_series":"Non-Thumbnail Mementos",
"event_html":"notcaptured",
"event_html_similarto":"http://localhost:3000/static/timemapSum_httpwebarchiveorgweb20170714114554http4genderjusticeorg.png",
"event_date":"Jul. 22, 2017",
"event_display_date":"2017-07-22, 06:49:56",
"event_description":"",
"event_link":"http://web.archive.org/web/20170722064956/http://4genderjustice.org/"
}
]
A tech report related to this project is available in arXiv.org (pdf). Please cite it as below:
Abigail Mabe, Dhruv Patel, Maheedhar Gunnam, Surbhi Shankar, Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle. Visualizing Webpage Changes Over Time. Technical report arXiv:2006.02487, June 2020.
@techreport{tmvis-arxiv-2020,
author = {Abigail Mabe and Dhruv Patel and Maheedhar Gunnam and Surbhi Shankar and Mat Kelly and Sawood Alam and Michael L. Nelson and Michele C. Weigle},
title = {Visualizing Webpage Changes Over Time},
year = {2020},
month = jun,
number = {arXiv:2006.02487},
url = {https://arxiv.org/abs/2006.02487}
}
Though GPL Licensing was used for base (https://github.com/machawk1/ArchiveThumbnails) of this repository, but for this current one MIT license is in place and is changed with the permission from the original author @machawk1.