Fully Automatic Video Region Segmentation Fusing Multiple Visual Features
Contacts : Di Zhong and Shih-Fu Chang
In
this project, we developed an automatic region segmentation and tracking system for general video
sources. It uses feature fusion and motion projection to track salient video regions over long video sequences.
Specifically, we proposed an innovative method to combine
color, edge and optical flow directly in one tracking process. This
approach is robust to image noises and can achieve accurate region
boundaries.
Experiments show that
our system can track salient regions reliably through long video shots.
We define an image region as a contiguous area of pixels with consistent features (e.g., color) in an image frame. It may correspond to part of a physical object, like a car, a person, or a house. A video region is a sequence of instances of the tracked image region in consecutive frames. The region segmentation and tracking process is applied within a video shot to obtain video regions.

More detailed information are given in the following sections:
Software Description (free binary code available)
Moving objects segmentation using the motion field or optical flow has been the main focus of many researches. As motion fields are usually noisy for real-world scenes, direct segmentation of them is erroneous and not stable. One main problem with many existing approaches is that segmentation results are sensitive to noises and/or slight variances of features, especially at places around segmentation boundaries. In region tracking, the problem may cause different segmentations at successive frames. When video sequence is short, boundary errors usually do not hurt overall tracking performance seriously. However, when a region needs to be tracked over a long period, accumulated boundary errors are likely to completely break the tracking process. To increase the stability of region segmentation, fusion of various visual features in the segmentation process is an essential approach. As an example, edge-based methods may produce accurate object boundaries but are sensitive to noises. On the contrary, color based region growing methods are robust to noises but usually results in over-segmented regions, and may not be able to generate accurate boundaries (e.g., due to color blur). An efficient method to combine these two features is highly desired to achieve more consistent segmentations.
Another problem is that the mapping between regions at successive frames is not reliable when these regions are segmented independently. Because similar regions often exist within even small local windows, minor segmentation differences and/or motion estimation errors could cause region mismatches. To address this problem, an inter-frame segmentation process needs to be developed to partition an intermediate frame consistently with segmentation results of its preceding frame. This approach avoids the non-reliable afterwards mapping between successive frames.
In this project, we developed an automatic video region segmentation and tracking method based on the fusion of color, edge, motion and temporal features. This method can track video regions stably over a long period, and is especially useful to build visual index of a large video collection.
The
segmentation and tracking of feature regions is based on the fusion of color,
edge and optical flow. Color is
chosen as the major segmentation feature because of its consistency under
varying conditions, such as change in orientation, shift of view, partial
occlusion or change of shape. Compared
with other features such as edge, shape and motion, colors (or more precisely,
mean colors) are more stable. Edge features are
complementary to color information: color captures low frequency information
(means) while edge captures high-frequency details (edges) of an image. Thus
fusion of them greatly improves segmentation results, especially region
boundaries. Different from old
merge-and-split methods where the edge is applied after color-based region
merge, we propose a new method to fuse edge information directly in the color
merging process. Affine
motion model is estimated for each region based on the computation of optical
flow. It is utilized to track color
regions through a video shot.

Video Regions
Figure
1. The
diagram of region segmentation and tracking

Figure 2. The motion projection and segmentation module
Some segmentation results of
sports videos are shown in Figure
3. People are the main objects within these videos. These images give us
a general idea of what regions are automatically extracted. The results show
that our algorithm can correctly identify salient region such as body and face,
while ignoring detailed features like eyes. The region boundaries are accurate,
which allows us to define shape features.






Figure
3.
More
region segmentation results shown in random colors
The system have been developed and tested under Sun-Solaris, HP-UX and SGI-IRIX
systems. It requires GNU C++ compiler. The binary code of this system is free
available. Please contact authors for the software.
The system takes a scene cut file specifying the frame number of cuts, and a
parameter file defining track options is required.
The system can take both MPEG(1,2) or raw frames as inputs. For each frame, the
system generates two output files:
- SEG file: segmented regions drawn in PPM format with mean colors
- OIF file: segmented regions with their basic features
"Description of MPEG-4", ISO/IEC JTC1/SC29/ WG11 N1410, MPEG document N1410 Oct. 1996.
D.Zhong and S.-F.Chang, "Video Object Model and Segmentation for Content-Based Video Indexing", ISCAS?7, HongKong, June 9-12, 1997.
Di Zhong and S.-F.Chang, "Spatio-Temporal Video Search Using the Object Based Video Representation", ICIP'97, October 26-29, 1997 Santa Barbara, CA