IEEE Signal Processing Society 1997 Workshop on Multimedia Signal
Processing
June 23 --- 25, 1997, Princeton, New Jersey, USA
Electronic Proceedings
- John R. Smith
-
- IBM T. J. Watson Research Center
- 30 Sawmill River Road
- Hawthorne, NY 10532 USA
- (914) 784-7320
- jrsmith@watson.ibm.com
- http://www.ctr.columbia.edu/~jrsmith
- Shih-Fu Chang
- Dept. of Electrical Engineering
- Columbia University
- New York, N.Y. 10027 USA
- (212) 854-6894
- sfchang@ctr.columbia.edu
- http://www.ctr.columbia.edu/~sfchang
Abstract
The recent diverse environments and applications for image searching (e.g., Web image search engines) provide an enormous resource of information beyond the image pixels which can be used to improve the image search process. We explore several directions of enhancements that integrate the visual information with other information related to the images in the analysis and query processes. We demonstrate that these methods improve image search functionalities over non-integrated content-based methods.
We demonstrate several enhanced image search methods that integrate the analysis of the visual-features of images, e.g., extracted regions and colors, with other attributes of the images, e.g., text and spatial information. The objective is to improve the usability and functionality of the image search engines. We describe the enhancements developed for two systems: the WebSEEk image and video search engine for the World-Wide Web [1] and the SaFe integrated spatial and feature image query system [2].
Both of these systems have extended the content-based image query paradigm beyond the domain of visual features to improve the image search process. We demonstrate these recently deployed systems on the Web and provide an initial characterization of how the users are searching for images based on over 800,000 query and browse operations.
The World-Wide Web is an enormous warehouse of visual information. It is distinct from a traditional visual information archive in that it is highly distributed, schema-less and minimally indexed. However, the Web also provides a wealth of information that is not visual which can be used to help analyze, index and search for the visual information. For example, The WebSEEk system uses the textual information that is related to each image on the Web to help to determine its content [1]. In another example, the patterns of usage of an image, which may be replicated throughout the Web and/or have multiple links to it, provide additional characterizations of the image.
The recent image search systems (Virage [3], QBIC [4] and Photobook [5]) do not provide a framework for integrating meta-information into the analysis and query processes. As a result, content-based searching based solely on visual features (color, texture, shape) is not finding convincing application. In order to improve the image search systems, new methods must be developed that
- better utilize the meta-information related to the visual information, and
- provide the user with higher-level tools for searching.
We explore two such examples of enhancements that improve search capabilities. Their objectives are summarized in Figure 1. The information that resides in a diverse media environment such as the Web is typically in the form of text and images. However, since it is difficult to index and search for information at this low level, we develop higher-level feature and semantic descriptors by analyzing the text and images together.
Figure 1: Integrating content in image search systems. (Click on the figure to see a larger version)
In the first enhancement, we develop a semi-automated image classification procedure for the WebSEEk image search engine that utilizes both text and visual features. The text that is related to the images is first parsed into a set of terms. These are used to map the images into a taxonomy of semantic categories [1]. Then, the images that are mapped into the categories are in-turn analyzed in terms of prevalent visual feature characteristics in order to map the remaining images into the categories. In this bootstrapping process, the analyses of the text and images are integrated to better index the images at a higher semantic level.
In the second enhancement, we develop an analysis procedure that extracts color regions from images. The SaFe system integrates spatial and feature querying of images. The system provides the user with higher-level query tools with which to diagram a query as a spatial arrangement of color regions.
In the initial deployment of the image search engines on the Web, the users have made over 800,000 query and browse operations. More than 5.5 million image icons have been displayed back to the users. The system provides several types of querying and browsing options at the levels depicted Figure 1, which include content-based querying, text-based querying, spatial querying, subject-based querying, and visual browsing through icons.
The initial usage results, summarized in Table 1, indicate that subject-based querying is the dominant query method for images (accounting for 21.2% of all operations and 53.5% of the image queries). Content-based querying accounts for only 3.7% of the image queries. On the other hand, 22.9% of the user's operations consist of visually browsing through pages of image icons. Clearly, these results indicate that the users are finding only a small role for content-based querying in the WebSEEk image search engine. We explore some of the other methods by which the users are searching for images in WebSEEk.
WebSEEk IMAGE SEARCH ENGINE
WebSEEk collects images and videos from the Web, catalogs them, and provides tools for searching and browsing through the collection. In the cataloging process, the images are automatically assigned to the semantic classes. By integrating the search at the semantic class and text levels with content-based search techniques, WebSEEk improves the image search process.
To illustrate, we issue the text-based query using the term = ``ferrari.'' We see in Figure 2 that many images of ferrari sports cars are retrieved in addition to images of a person named ``Ferrari.'' This illustrates that the available text does not alone provide a sufficient filter for the images. However, in order to further disambiguate the images, the user searches the ``ferrari'' class using an image of a Ferrari sports car. This readily retrieves the images from the class that are most visually similar to it, as illustrated in Figure 3, which satisfies the user's query for ferrari sports cars.
Figure 2: Text query using ambiguous term = ``ferrari.'' (Click on the figure to see a larger version)
Figure 3: Subject class ``transportation/cars/ferraris'' ordered by highest similarity to an example ferrari sports car. The integration of the high-level semantic query and content-based query improves the search results. (Click on the figure to see a larger version)
A second example further illustrates the synergy provided by integrating text and visual information. In the first query, see Figure 4, the user provides an example image of a Kandinsky painting. In the first 14 images retrieved by the system using content-based techniques, the system retrieves only two additional Kandinskys, one of which is a larger-sized duplicate of the query image.
Oddly enough, the system retrieves several other images of artwork, including one Nolde, one Beckman and two Renoirs. This indicates the important insufficiency of content-based searching: the user merely provides an example image to initiate the search. The system has little means to infer the aspect of its content by which to search and compare the images. However, the role of content-based searching is enhanced by restricting the domain of images or by providing a context for the search. For example, in a better query the user first navigates to the category: ``/art/paintings/kandinksy/.'' The user then searches by content or visually browses within this subject domain, see Figure 5, to more easily find the desired images.
Figure 4: Unconstrained content-based query using an example Kandinsky painting in the upper-left as the query image. (Click on the figure to see a larger version)
Figure 5: Navigation to subject class ``/art/paintings/kandinksy/'' followed by a content-based search within the class. (Click on the figure to see a larger version)
SaFe SPATIAL AND FEATURE QUERY
SaFe provides a system for the spatial and feature querying of images. The system automatically extracts color regions from the images. The system indexes them by color, location and size [6]. The images are compared by comparing the regions obtained from the query and target images. The process considers the similarity in the spatial locations, sizes, colors and spatial arrangements of the regions. To formulate a SaFe image query, the user selects regions, positions them on the query grid and assigns them properties of color, size and absolute location, as illustrated in Figure 6. The user also assigns boundaries for location and size.

Figure 6: SaFe spatial and color querying of synthetic color images. (Click on the figure to see a larger version)
We demonstrate the power of the SaFe method in searching for color photographic images. We design this query to retrieve images of sunsets. Prior to the trials, 3,100 images were inspected, and each was subjectively assigned a relevance to the sunset query. The query results are shown in Figure 7. and Figure 8. The results show that SaFe improves retrieval effectiveness over non-spatial content-based query methods which use color histograms. A more detailed evaluation of SaFe is provided in [2].
Figure 7: Sunset image queries. Q = query images. Best four matches are listed from left to right, where
= SaFe query,
= color histograms,
= color sets. (Click on the figure to see a larger version)
Figure 8: Sunset image queries retrieval effectiveness. (Click on the figure to see a larger version)
The usability and functionality of image search engines is improved by integrating the information available in the application domain (such as text on the Web, and spatial arrangements of color regions) into the image analysis and search processes. We presented demos of the system prototypes (WebSEEk and SaFe). These systems enhance the image search process by offering the user higher-level methods by which to search and browse for images and methods for integrating multiple analysis and search techniques.
The initial usage of these systems indicates that the higher-level query and browse methods are used significantly more often to search for images than purely content-based methods. We conclude from these results that multiple image features, related non-visual information, and content-based techniques should be integrated in order to derive higher-level descriptors and classifiers of the images. The recent diverse visual information environments are providing just such opportunities.
References
- 1
-
J. R. Smith and S.-F. Chang.
WebSEEk: a content-based image and video search engine for the
world-wide web.
IEEE Multimedia, Summer 1997.
- 2
-
J. R. Smith and S.-F. Chang.
SaFe: A general framework for integrated spatial and feature image
search.
In Workshop on Multimedia Signal Processing, Princeton, NJ,
June 1997. IEEE.
- 3
-
J. R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R. C.
Jain, and C. Shu.
Virage image search engine: an open framework for image management.
In Symposium on Electronic Imaging: Science and Technology -
Storage & Retrieval for Image and Video Databases IV, volume 2670, pages
76 - 87. IS&T/SPIE, 1996.
- 4
-
M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani,
J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker.
Query by image and video content: The QBIC system.
IEEE Computer, 28(9):23 - 32, September 1995.
- 5
-
A. Pentland, R. W. Picard, and S. Sclaroff.
Photobook: Tools for content-based manipulation of image databases.
In Proceedings of the SPIE Storage and Retrieval Image and Video
Databases II, February 1994.
- 6
-
J. R. Smith and S.-F. Chang.
VisualSEEk: a fully automated content-based image query system.
In Proc. ACM Intern. Conf. Multimedia, pages 87 - 98,
Boston, MA, November 1996. ACM.
We provide links to the SaFe integrated spatial and feature image query system and to other systems. The SaFe system implements all of the functions described in this manuscript. The SaFe provides several image test-collections (symbolic images, synthetic images and color photographic images) to illustrate the types of querying provided. The WebSEEk system extends the image search paradigm to produce an image and video search engine for the WWW. VisualSEEk provides additional content-based image query functions.
 | SaFe integrated spatial and feature image query system |
 | WebSEEk content-based image and video search engine |
 | VisualSEEk content-based image search system |
SaFe Java code
The Java source code for the SaFe query interface is provided by Columbia's content-based visual query group directed by Prof. Shih-Fu Chang. It was developed by John R. Smith.
Document and demos prepared by John R. Smith, April, 1997