Object detection
Use case: object detection on heritage images
Automatic annotation of objects on heritage images has uses in the field of information retrieval and digital humanities. Depending on the scenarios considered, this may involve obtaining a new source of textual metadata ("this image contains a cat, a child and a sofa") or locating every object classes of interest within the image (in this image, there is a car at position x,y,w,h).
These goals can be satisfy with "out-of-the-box" services or customized solutions.
Vogue magazine, French edition, 1922
YOLO performs object detection on a 80 classes model. YOLO is well known to be fast and accurate.
This Python 3 script uses a YOLO v4 model that can be easily downloaded from the web. The images of a Gallica document are first loaded thanks to the IIIF protocol. The detection then occurs and annotated images are generated, as well as the CSV data.
Display the Jupyter notebook with nbviewer
Launch the notebook with Binder:
These APIs may be used to perform objects detection. They are trained on huge datasets of thousands of object classes (like ImageNet) and may be useful for XXth century heritage content. These datasets are primarily aimed at photography, but the generalizability of artificial neural networks means that they can produce acceptable results for drawings and prints.
The Perl script described here calls the Google or IBM APIs.
> perl toolbox.pl -CC datafile -google
Note: IBM Watson Visual Recognition is discontinued. Existing instances are supported until 1 December 2021.
The API endpoint is simply called with a curl command sending the request to the API as a JSON fragment including the image data and the features expected to be returned:
> curl --insecure -v -s -H "Content-Type: application/json" https://vision.googleapis.com/v1/images:annotate?key=yourKey --data-binary @/tmp/request.jso
...
"features": [
{
"type": "LABEL_DETECTION"
},
{
"type": "CROP_HINTS"
},
{
"type": "IMAGE_PROPERTIES"
}
], ...
See also with Recipe which makes use of IBM Watson API to call a model previously trained with Watson Studio.
Cost, difficulties: Analyzing an image with such APIs costs a fraction of a cent per image. Processing can be done entirely using the web platform or with a minimal coding load.
Out-of-the box solutions use pretrained models. Transfert learning means to cut-off the last classification layer of these models and transfert the "model's knowledge" to a local problem, i.e. the set of images and objects one needs to work with.
Transfer learning and domain adaptation refer to the situation where what has been learned in one setting … is exploited to improve generalization in another setting. (Deep Learning, Ian Goodfellow and al., 2016)
Google Cloud Vision and other commercial framework can be used for training a specific object detector on custom data. Training can be done on the web platform (e.g. AutoML Vision) or using APIs. The trained models can then be deployed in the cloud or locally.
Same is true for YOLO, using a commercial web app like Roboflow or local code.
Open source AI platforms all offers APIs to apply transfert learning. This Google Colab Jupyter script from the MODOAP project uses tf.keras (the high level API of TensorFlow) to train a classification model. Training images must be stored on a Google drive.
Cost, difficulties: Training means having annotated images available, which implies some preliminary work, and some computing power to train the model. Depending on the context and the expected performance, tens or hundreds of annotated images may be required. For commercial products, pricing is higher when using a trained model.
There is almost no reason to start from complete scratch, as the pretreained models tend to generalize well to other tasks, and will reduce overfitting then starting from small dataset of images.
- Information Retrieval: the labels of the object classes are used as metadata and generally feed the library search engine.
- GallicaPix web app;
- Digitens project: indexing wallpaper and textile design patterns from the The National Archives and the BnF
- Standford University Library: Clustering and Classification on all public images
- Artificial intelligence @ the National Library of Norway
- Digital Humanities: in this context, labels and bounding boxes are used for retrieval or data mining scenarii.
-
Helsinki Digital Humanities Hackathon 2019: data analysis of newspapers illustrated adds regarding transport means
-
Numapress project: data analysis and information retrieval on the newspapers movie section (1900-1945):
-
- Telecom Paris-Tech, Nicolas Gonthier: Weakly Supervised Object Detection in Artworks
- Object Detection in a Nutshell
- Object annotation tools: