We investigate to what extent ‘bag of visual words’ mod- els can be used to distinguish categories which have significant 
visual similarity. To this end we develop and op- timize a nearest neighbour classifier architecture, which is evaluated 
on a very challenging database of flower images. The flower categories are chosen to be indistinguishable 
on colour alone (for example), and have considerable varia- tion in shape, scale, and viewpoint.