In Search of Art Elliot J. Crowley and Andrew Zisserman Visual - - PowerPoint PPT Presentation
In Search of Art Elliot J. Crowley and Andrew Zisserman Visual - - PowerPoint PPT Presentation
In Search of Art Elliot J. Crowley and Andrew Zisserman Visual Geometry Group Department of Engineering Science University of Oxford The Goal An on-the-fly system for searching paintings visually A user can type in the name of any
The Goal
- An on-the-fly system for searching paintings
visually
- A user can type in the name of any category...
- Then hundreds of paintings containing that
category will be retrieved in a matter of seconds
dog
Benefits
- In many instances, the retrieved paintings will
not have been known to contain the category
- Meaning these are new discoveries for the Art
History community
dog
Why is this good?
- Art historians can discover when something
first appeared in paintings
- They can also observe how things have
changed over time
How is this achieved?
- Natural images annotated with object
categories are everywhere.
- These can be used to learn object classifiers.
Google images of dog
Dataset of Paintings
- We use `Your Paintings’ as the dataset
- `Your Paintings’ consists of over 210,000
paintings from UK galleries http://www.bbc.co.uk/arts/yourpaintings/
- Method is independent of dataset however
- Can use other datasets e.g. Rijksmuseum or
PrintART
Outline
- Methodology
- Quantitative Evaluation
- Aligning retrieved objects
What do we do?
- We crawl Google Images for a given category and
learn a CNN-based classifier
- This classifier is applied to a dataset of paintings,
retrieving paintings containing the category
The Architecture
How do we do this quickly?
- The bulk of the data has been pre-processed
- ffline (negative training data, dataset of
paintings)
- Online processing of Google Images is done in
parallel across multiple cores
In more detail…
- For a given query, the top 200 Google Image
Hits are downloaded
- For each of these
a CNN feature is computed online
- This is the positive
training data
Negative Training Data
- Offline, images are downloaded for Google
searches of `things’ and ‘photos’
- The features for
these are pre-computed
Classification
- A Support Vector Machine is used to learn a
classifier that discriminates the positive training data from the negative data
beard not beard
Retrieval
- The classifier is applied to the pre-processed
features of `Your Paintings’
- Each painting is given a score by the classifier
Retrieval
- The paintings are displayed in order of score.
beard
The Architecture - Timings
0.5s 4.5s <0.5s <0.5s 2s
Example Queries
bridge
Example Queries
carriage
Example Queries
flower
Example Queries
house
Outline
- Methodology
- Quantitative Evaluation
- Aligning retrieved objects
Quantitative Evaluation
- Evaluating the domain transfer problem of
learning classifiers on natural images and applying these to paintings
Test Set
- For this an annotated dataset of paintings is
required
- 10,000 paintings in `Your Paintings’ have been
tagged by the public
- These tags + painting titles are used to form
the `Paintings Dataset’ with annotations corresponding to classes of PASCAL VOC
The Paintings Dataset
Class Paintings with Class Aeroplane 200 Bird 805 Boat 2143 Chair 1202 Cow 625 Dining-table 1201 Dog 1145 Horse 1493 Sheep 751 Train 329
- Assume complete annotation
in the PASCAL sense
- Assess by calculating APs
Train Dog Horse
Training Datasets
- 4 Datasets of natural images are used for training
- VOC12, VOC12+, Net Noisy, Net Curated
Experiments
Features compared:
- Shallow Features - Fisher Vectors
VS.
- Deep Features - Convolutional Neural
Networks (CNNs)
Experiments - Features
- Fisher Vector VS. CNN Features
- CNN outperforms
Fisher Vectors
- Added advantage
- f being lower
dimensionality
Augmentation
- No augmentation
- C+F augmentation
224 224 224 256
Experiments - Augmentation
- Sum Pool: Classifier applied to
mean of augmented windows
- Max Pool: Classifier applied to
each augmented window and maximum score recorded
- Best performance is aug + sum
pool but almost as good with no aug + sum pool
Experiments - Dimensionality
- 1K performs best
- Not that different from the
- thers however
Experiment Conclusions
- For the on-the-fly system 1K CNN features are
used as these performed the best
- Sum pooled features are used for `Your
Paintings’ as time is not a factor in computing these
- No augmentation is used on the images
downloaded from Google (0.3s per image per core vs. 2.4s)
Outline
- Methodology
- Quantitative Evaluation
- Aligning retrieved objects
Alignment
- Some objects are automatically aligned…
moustache
The Pencil Moustache
Anonymous Trendsetter, 1565
Copycats, Now
Alignment
- Other objects require some work…
train
Solution
Learn a DPM [1] on either
- 1. annotated bounding boxes (e.g. PASCAL
VOC) or
- 2. the downloaded Google Images
[1] P Felzenszwalb, R Girshick, D McAllester, D Ramanan, Object Detection with Discriminatively Trained Part Based Models, CVPR 2010
Auto-alignment
train
Auto-alignment
horse
Conclusion
- We provide a system that can find objects in
paintings with high precision in very little time
- The objects found can be further curated
using a DPM
Links
- VISOR: Visual Search of BBC News [1]
http://www.robots.ox.ac.uk/~vgg/research/on-the-fly/
- CNN code [2]
http://www.robots.ox.ac.uk/~vgg/research/deep_eval/
- Our system
COMING SHORTLY!
[1] K Chatfield, A Zisserman, VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval, ACCV, 2012 [2] Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, BMVC, 2014
Thank you
- Any questions?
- Or email elliot@robots.ox.ac.uk