This library implements an unsupervised segmentation-free method for word spotting in document images. Documents are represented with a grid of HOG descriptors, and a sliding window approach is used to locate the document regions that are most similar to the query. We use the exemplar SVM framework to produce a better representation of the query in an unsupervised way. Finally, the document descriptors are precomputed and compressed with Product Quantization. This offers two advantages: first, a large number of documents can be kept in RAM memory at the same time. Second, the sliding window becomes significantly faster since distances between quantized HOG descriptors can be precomputed.
- Jon Almazán
- Albert Gordo
- Alicia Fornés
- Ernest Valveny
BMVC 2012 Paper Citation
Jon Almazán, Albert Gordo, Alicia Fornés, Ernest Valveny. Efficient Exemplar Word Spotting. In BMVC, 2012. | [PDF]
To get started, you need to install MATLAB and download the code from Github. This code has been tested on Linux and pre-compiled Mex files are included.
Download Exemplar Word Spotting Library source code (MATLAB and C++) and compile it
$ cd ~/your_projects/ $ git clone git://github.com/almazan/ews.git $ cd ews/util $ ./compileAll.sh
Download and uncompress datasets
$ cd ews/datasets $ wget http://www.cvc.uab.es/~almazan/data/Datasets.tar.gz $ tar -xzf Datasets.tar.gz
Script for parameters validation
$ matlab >> validation_script
Run Exemplar Word Spotting
$ matlab >> main_eews
Note: In order to modify the parameters or select a dataset, you should edit manually the get_initparams.m file.
This work has been partially supported by the Spanish projects TIN2011-24631, TIN2009-14633-C03-03 and CSD2007-00018, by the EU project ERC-2010-AdG-20100407-269796 and by two research grants of the UAB (471-01-8/09).
Support or Contact
For any comment or suggestion, please contact with Jon Almazán.