View on GitHub

Exemplar Word Spotting

An efficient word spotting library

Download this project as a .zip file Download this project as a tar.gz file

Introduction

This library implements an unsupervised segmentation-free method for word spotting in document images. Documents are represented with a grid of HOG descriptors, and a sliding window approach is used to locate the document regions that are most similar to the query. We use the exemplar SVM framework to produce a better representation of the query in an unsupervised way. Finally, the document descriptors are precomputed and compressed with Product Quantization. This offers two advantages: first, a large number of documents can be kept in RAM memory at the same time. Second, the sliding window becomes significantly faster since distances between quantized HOG descriptors can be precomputed.

People

Jon Almazán
Albert Gordo
Alicia Fornés
Ernest Valveny

BMVC 2012 Paper Citation

Jon Almazán, Albert Gordo, Alicia Fornés, Ernest Valveny. Efficient Exemplar Word Spotting. In BMVC, 2012. | [PDF]

Matlab Code

To get started, you need to install MATLAB and download the code from Github. This code has been tested on Linux and pre-compiled Mex files are included.

Download Exemplar Word Spotting Library source code (MATLAB and C++) and compile it

$ cd ~/your_projects/
$ git clone git://github.com/almazan/ews.git
$ cd ews/util
$ ./compileAll.sh

Download and uncompress datasets

$ cd ews/datasets
$ wget http://www.cvc.uab.es/~almazan/data/Datasets.tar.gz
$ tar -xzf Datasets.tar.gz

Script for parameters validation

$ matlab
>> validation_script

Run Exemplar Word Spotting

$ matlab
>> main_eews

Note: In order to modify the parameters or select a dataset, you should edit manually the get_initparams.m file.

Funding

This work has been partially supported by the Spanish projects TIN2011-24631, TIN2009-14633-C03-03 and CSD2007-00018, by the EU project ERC-2010-AdG-20100407-269796 and by two research grants of the UAB (471-01-8/09).

Support or Contact

For any comment or suggestion, please contact with Jon Almazán.