For most large-scale image retrieval systems, performance depends upon accurate meta-data. While content-based image retrieval has progressed in recent years, typically image contributors must provide appropriate keywords or tags that describe the image. Tagging, however, is a difficult and time-consuming task, especially for non-native English speaking contributors.
Eliot Brenner (Data Scientist, Shutterstock) talks about automatic tag recommendations and the machine learning infrastructure behind it developed by Shutterstock’s Search and Algorithm Teams.
Tag co-occurrence forms the basis of the recommendation algorithm. Co-occurrence is also the basis for some previous systems of tag recommendation deployed in the context of popular photo sharing services such as Flickr. In the context of online stock photography, tag recommendation has several aspects which are different from the context of photo sharing sites. In online stock photography, contributors are highly motivated to provide high quality tags because they make images easier to find and consequently earn higher contributor revenue. In building the system, we explored several different recommendation strategies and found that significant improvements are possible as compared to a recommender that only uses tag co-occurrence.