IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 36, NO. 6, DECEMBER 2006 1373 Qualitative Visual Environment Retrieval Rajashekhara, Amit B. Prabhudesai, and Subhasis Chaudhuri Abstract—A system for retrieval of an unstructured environment under static and dynamic scenarios is proposed. The use of cylindrical mosaics or omnidirectional images is exploited for providing a rich description about the surrounding environment spanning 360◦ . The environment description is based on defining the attributes of the nodes of a graph derived from the angular partitions of the captured images. Content-based image retrieval for each of these partitions is performed on an exemplar image database to annotate the nodes of the graph. The complete environment description is recovered by collating the retrieval results over all the partitions based on a simple voting scheme. This offers a qualitative description of the location in a totally natural and unstructured surrounding. The experiments yield quite promising results. Index Terms—Concentric mosaic, environment retrieval, node annotation, omnicam, view partition. I. I NTRODUCTION T HE PROBLEM of localization of robots is a welldocumented and researched problem. Several approaches have been presented in solving this. In the class of problems belonging to the simultaneous localization and mapping (SLAM) category [1], the robot starts at an unknown location with no prior knowledge of the landmark positions. From landmark observations, it simultaneously estimates its location and those of the landmarks. The robot then builds up a complete map of landmarks for localization. Early approaches used laser range data or sonar for localization, whereas recent approaches have focused on using vision for mobile robot localization. An illustrative example is the MINERVA tour guide robot [2] used in the Smithsonian’s National Museum of American History. Vision-based localization approaches have also been reported by Kosecka and Li [3], Davison [4], Davison and Murray [5], Kröse et al. [6], Goedeme et al. [7], and Se et al. [8]. These works are not exhaustive but are quite representative of the various approaches proposed. Among other approaches, the global positioning system (GPS) [9] and location estimation using global system for mobile communications (GSM) [10] are becoming increasingly popular. The robot localization problem attempts to answer the question “Where am I?” It is one of estimating the state of the robot at the time instant tk , given all the measurements up to tk . Manuscript received July 8, 2005; revised January 9, 2006 and April 3, 2006. This work was supported in part by the Indian Department of Science and Technology, Ministry of Science and Technology, under a Swarnajayanti project. This paper was recommended by Associate Editor S. Sarkar. Rajashekhara is with GE Healthcare Technologies, Bangalore 560 066, India. A. B. Prabhudesai is with Siemens Corporate Technology, Bangalore 560 001, India. S. Chaudhuri is with the Department of Electrical Engineering, Indian Institute of Technology, Bombay, Mumbai 400076, India (e-mail: sc@ee.iitb.ac.in). Digital Object Identifier 10.1109/TSMCB.2006.877797 Typically, a three-dimensional (3-D) state vector is used: x = [x, y, θ]T , i.e., the position and orientation of the robot. Even the GPS and the GSM systems provide only a metric coordinate of the position in terms of latitude and longitude. None of these systems provides a qualitative description of the surrounding environment. In the proposed system, we attempt to provide a rich description of the surrounding environment. Such a system would prove to be of significant interest to the wearable computing community and could be intended for visually impaired persons. To the best of our knowledge, there has been no prior work directed toward generating a qualitative description of the environment in a “humanlike” fashion that involves topological relationships among various entities in a scene. In this paper, we present a novel approach for environment retrieval using cylindrical panoramic mosaics or omnidirectional images as input. The use of panoramic or omnidirectional vision sensors in the vision system of a mobile robot has previously been reported by Thompson and Zelinsky [11], Menegatti et al. [12], Matsumoto et al. [13], [14], and Dellaert et al. [2], although Dellaert et al. use an image mosaic built up from the individual images. However, the crux of all these approaches is positioning or navigation and not the generation of a description of the surrounding environment. Furthermore, all these approaches as well as those reported in [2]–[5], [7], and [8] deal with an indoor environment, where they rely on artificial or natural landmarks. In an unstructured environment, artificial landmarks cannot be set up, and natural landmarks cannot be segmented with accuracy. The outdoor environment is totally unstructured, and this motivates the use of global appearance-based features in our environment retrieval system. Zhou et al. have proposed the use of global appearance-based features [15], but again, the emphasis is on robot localization. We show how, given a 360◦ or a hemispherical view of a totally unstructured and natural surrounding, descriptions such as “buildings to our left,” “road in the front,” and “a lawn to our right” can be obtained. We provide a description of the environment in terms of a graph whose nodes correspond to one of the annotated image classes. It may be noted that Ulrich and Nourbakhsh [16] have also suggested the use of content-based image retrieval (CBIR) [17] for localization. However, their method mainly focuses on a restricted and pretrained environment such as halls, rooms, and corridors. Given the region adjacency map of the environment and the color histogram of each of the entities, they find out which node is currently being traversed by the robot. Recently, Wolf et al. [18] have also proposed a system combining an image retrieval system with Monte Carlo localization (MCL). They represent an image by a histogram of local features. However, the primary use of their image retrieval system is to update the weights of the samples 1083-4419/$20.00 © 2006 IEEE Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 9, 2009 at 06:16 from IEEE Xplore. Restrictions apply. 1374 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 36, NO. 6, DECEMBER 2006 in the subsequent MCL method. Again, their system was tested on an indoor environment. Our method handles scenes from a totally unconstrained natural static or dynamic environment using a simple feature, such as color, for image retrieval. It can handle the variability of the outdoor world. Our method assumes no a priori information about the environment. In the proposed method, we are able to generate a fairly rich description of the environment using a limited number of image classes. We represent the topological relationships among the entities in the environment using a graph structure whose nodes are annotated by an identifier associated with a particular image class. We also show that the graph can be updated in real time as the observer roams around. The rest of this paper is organized as follows. The next section presents a formal problem definition. Section III describes how one can obtain an environment representation from a 360◦ view of a scene. Section IV explains the environment retrieval mechanism. Section V focuses on the experimental results of environment retrieval. Finally, this paper concludes in Section VI with a look at a possible direction for future work. II. P ROBLEM D EFINITION Concentric panoramic mosaics and images obtained from a catadioptric imaging system such as an omnicam [19], [20] provide a 360◦ view of the environment. Given these panoramic/ omnidirectional images, we investigate the problem of generating a rich description of a static or dynamic environment. A. Static Environment Retrieval We propose to describe the environment using a graph to indicate the topological relationships among the entities in the environment. Let G denote a graph of the environment and gk denote the kth node of this graph. The nodes are annotated using an identifier associated with a class Ci . Mathematically, this may be represented as G ← gk : gk ∈ {Ci }M i=1 . (1) Here, M is the number of annotation classes into which the image database is divided. Let V be a 360◦ view of the environment. The maximum-likelihood solution to the problem of static environment retrieval can now be written as = arg max p V |G, {Ci }M . G i=1 {gk } (2) The granularity of the graph G in terms of the number of nodes has been discussed in Section III. B. Dynamic Environment Retrieval In a manner similar to that of the static case, we build a graph for each frame of the video sequence. Thus, the complete representation is given by a temporally evolving graph −1 G = {Gn }N n=0 corresponding to the N frames of the video sequence. Let V = {V1 , V2 , . . . , VN } be the omnidirectional video sequence as the robot moves in the environment. The maximum-likelihood solution to the dynamic environment retrieval problem is given by = arg max p V|G, {Ci }M . (3) G i=1 {Gi } C. Change Detection in the Environment The graph G of a temporally evolving environment can change due to two reasons: 1) a change in the entities of the environment as the robot moves past a building, a lawn, or any one of the other classes; 2) a change in the topological relationships of the entities vis-a-vis the observer, as he/she turns left or right, or reverses his/her direction. We also address this problem of detecting the change in the environment. For the first case aforementioned, we intend to find the frame n for which Gn = Gn−1 (Gn denotes the current frame and Gn−1 , the previous frame). The detected frame is given by (n) (n−1) for at least one k . (4) n : gk = gk In other words, we find that the attribute of at least one node of the graph has changed by comparing the graphs for two successive frames. In the second case aforementioned, we obtain a complete different annotation of the graph when the observer takes a turn along his path, but the environment otherwise remains the same. We may then relate the graphs in the nth frame and the (n − 1)th frame as Gn = RGn−1 , where R represents a rotation operator and G represents a subgraph of G excluding the base (explained in the next section). Thus, the corresponding change detection problem has the following solution: (n) (n−1) ≥ 2 && Gn = RGn−1 (5) n : # gk = gk (n) (n−1) where #{gk = gk } denotes the number of nodes for which the annotations have changed between the previous [(n − 1)th] and the current (nth) frames. III. E NVIRONMENT R EPRESENTATION The proposed method uses either cylindrical panoramic mosaics or omnidirectional images as the input to the system for building a description of the environment. We use the following six classes (categories) for annotation: lawns L, woods W, buildings B, water bodies H, roads R, and traffic T. We notice that most natural environments may be quite reasonably described using images belonging to these classes. Our database consists of about 200 images divided nearly equally into these six classes. In order to capture the variability in the outdoor scenes, the images within a class were chosen to have a moderately large intraclass variance (in the feature space). However, there is a tradeoff between handling the variability in the outdoor scenes and the discriminative power of the classifier as they are inversely related. Furthermore, the use of more Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 9, 2009 at 06:16 from IEEE Xplore. Restrictions apply. RAJASHEKHARA et al.: QUALITATIVE VISUAL ENVIRONMENT RETRIEVAL 1375 It may be noted that one can select a different granularity for environment description. For example, if we are interested only in a left/right description, a 90◦ split of the view is enough and G will have just four nodes for the mosaic. If required, one can increase the number of nodes by reducing the view angle suitably. However, the retrieval of the attributes of the nodes may not be that accurate when the view granularity is much finer. IV. E NVIRONMENT R ETRIEVAL Fig. 1. Illustration of an environment in terms of attributes such as woods and lawns (FR, front; RT, right top; RB, right bottom; LB, left bottom; LT, left top). The environment retrieval method involves three main processing steps, namely: 1) view partitioning; 2) feature computation; and 3) node annotation. A. View Partitioning Fig. 2. (a) View partitioning for an omnidirectional image. (b) Graphical representation of the environment for the aforementioned partitioning. sample images, as it will be discussed later in the text, will slow the retrieval process. To represent the topological relationships among the entities in an environment, we use a graph whose nodes are annotated by an identifier associated with a particular class, as illustrated in Fig. 1. As shown, an observer passes through an arbitrary environment through a set of points {P1 , P2 , P3 , . . .} at time instants {t1 , t2 , t3 , . . .}. The observer sees a part of woods W, lawns L, buildings B, and water bodies H, around him/her. Now, we face the question as to how should we represent the environment topologically. In order to simplify the description, we select a topology that is fixed with respect to the observer rather than being environment specific as an outdoor scene is totally unstructured and no prior information is available. As a simple way of indicating relationships such as “to the left of” or “in front of,” we construct a graph with six nodes. We divide the 360◦ view into six viewing cones of 60◦ each. They are front FR, left top LT, left bottom LB, back XX, right bottom RB, and right top RT, respectively. It may be noted that the nodes of the graph G are denoted by two characters, as illustrated in Fig. 2. The attributes for the nodes are represented by a single character, such as “L” and “W,” which corresponds to the appropriate class. For an omnidirectional camera, one can also see the reflection of the base (on which the observer is standing) along the periphery. One would also like to find out if we are walking on a road or, say, on grass. Hence, the base BS forms the seventh node of the graph G. For the cylindrical mosaic, G has six nodes only. One is now required to find out the attribute of each node to recover the environment. We adopt two separate methods for partitioning the concentric cylindrical mosaics and the omnidirectional images to match the nodes of the graph G. Concentric cylindrical mosaics: The partitioning of the concentric mosaics involves extracting six equal nonoverlapping windows that span the entire image. The vertical span of each of these windows is the complete vertical span of the given mosaic. Each subimage consists of a 60◦ field of view. Omnidirectional images: In the case of omnidirectional images, we adopt a similar approach, except for one difference. The part near the center of the image corresponds to the part directly overhead the observer, which in an outdoor scene carries little information for navigation or description purposes. The sky component is not considered while extracting the feature. Again, the part of the omnicam image near the periphery corresponds to the surface on which the observer is standing while capturing the images. This ring-shaped part extending inward from the periphery to some reasonable extent is considered as one single separate partition. The remaining annular part of the image is now split into six sectoral views, as shown in Fig. 2(a). Out of these sectors, one corresponds to the direction opposite the direction of motion (sector marked as XX in Fig. 2), which is not considered for feature computation as it is always occluded by the mobile trolley or the person carrying it. B. Feature Extraction As the environment is fully unstructured and previously untrained for, we do not attempt to recognize objects in the scene. They will appear at different scales, perspectives, and locations, and under varying amounts of occlusions. Hence, we prefer the CBIR method, in which an image is represented by certain features, and the comparison of images is carried out in this feature space. For CBIR, we desire a feature invariant to scaling, viewpoint, illumination changes, and the geometric warping introduced by omnicam images. In the literature, many researchers have proposed the use of textural features for image retrieval. One of the popular representations of image texture is the co-occurrence matrix proposed by Haralick et al. [21]. Recently, Jhanwar et al. [22] have proposed a translation- and illumination-invariant retrieval scheme using motif co-occurrence matrix (MCM). Unlike the co-occurrence matrix, MCM captures third-order statistical features for texture Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 9, 2009 at 06:16 from IEEE Xplore. Restrictions apply. 1376 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 36, NO. 6, DECEMBER 2006 Fig. 3. Cylindrical concentric mosaic of a university campus used as a test image. description. Wavelet-based methods [23], [24] and Gabor filters [25] have also been proposed for textural description. However, the geometric distortion introduced in the catadioptric images does not preserve texture; thus, texture-based methods cannot be used in our system. Rajashekhar et al. [26] propose a CBIR method based on projective invariance; however, this works only for entities having linear structures. Furthermore, in omnicam images, straight lines are not mapped into straight lines; therefore, we cannot use this method for retrieval purposes. We are not aware of any existing CBIR scheme that can handle all such variations. In [14], Matsumoto et al. propose transforming a raw omnidirectional image to a low-resolution image in cylindrical projection. However, such a transformation is very much viewpoint specific, and the textural properties of a scene change quite substantially when viewed from a different point. Hence, we do not perform the transformation suggested in [14]. Instead, we use color [27], [28] as a feature for image retrieval on the scaled omnidirectional images, and our results provide ample evidence of the efficacy of this simple scheme. Because the color histogram relates only to the property of a point and not about its neighborhood and because most outdoor objects, with the possible exception of glass buildings, are quite close to being matte surfaces, it provides a convenient way of doing CBIR in cylindrical or spherical images. The choice of the color space may have a bearing on the accuracy of the results of similarity matching using color histograms. We experimented with both the red–green–blue (RGB) and hue–saturation–value (HSV) color spaces. As would be expected, the HSV color space yields better results, as it is relatively more robust to changes in illumination. Fig. 4. Graph of the environment shown in Fig. 3. Here, the characters refer to textual annotation, and the thumbnails provide a visual annotation. C. Node Annotation We partition the input panoramic/omnidirectional image as discussed in Section IV-A and compute the color histogram for each of the partitions. We experimented with the use of three different distance metrics for the similarity measure, namely: 1) Euclidean distance; 2) Kullback–Leibler distance; and 3) Jeffrey divergence. The Euclidean distance metric yielded the best results, and this was used to compile the final results. Assuming the components of the color histogram to be Gaussian distributed about their nominal values, it yields a maximum-likelihood solution to (2). The top 20 retrievals are considered while deciding the annotation for a particular image. To make the retrieval robust against illumination changes and variations within a class, we use a simple voting scheme, instead of focusing on retrieval rank, to decide the image annotation. We prepare a frequency count for each class using Fig. 5. Retrieved visual description of the environment shown in Fig. 3. the top 20 retrievals. Then, the class having the maximum representation is used to annotate the given query image. Let mj denote the number of retrieved images belonging to class Cj . Then, the majority class C ∗ is defined as {C ∗ = Ck : k = arg max mi } i (6) where i is in the range of 1–5, which corresponds to the specified image classes in the database. Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 9, 2009 at 06:16 from IEEE Xplore. Restrictions apply. RAJASHEKHARA et al.: QUALITATIVE VISUAL ENVIRONMENT RETRIEVAL Fig. 6. 1377 Cylindrical concentric mosaic of a lawn used as another test image. The majority class is found for each of the partitions. This information is used to build the graph G. The vertices of the graph are indexed by the identifier corresponding to the majority class C ∗ . For the omnidirectional images acquired with the sphericalmirror–camera system, the part of the image extending about 50 pixels (20% of the radius) inward from the periphery is treated as a separate partition. We use this partition as the base BS in the environment. We obtain the retrievals for this partition in the same way as for the other partitions. One may partition the BS node into two suitable halves or four quadrants for finer granularity in environment description, if required. D. Dynamic Node Annotation A scene where one moves with an omnidirectional camera corresponds to a graph Gn that changes dynamically with time tn . We perform the retrieval operations as previously discussed on each frame of the video sequence. Currently, each frame is processed independently to get the environment description Gn . The complete temporal evolution of the environment as one navigates through it is given by G and obtained by concatenating the subgraphs Gn , i.e., G = {G1 , G2 , . . . , Gk , . . .}. E. Change Detection The dynamic node annotation discussed in the previous section helps us to detect a change in the environment given the omnivideo sequence as the input. By comparing the annotations of the corresponding nodes for the graphs generated for consecutive frames through a simple XOR operation, we can detect a change in the scene. This may arise due to either of the two reasons mentioned in Section II-C. Once a change is detected at more than one node in the subgraph G (excluding the base BS from G), we try to match Gn+1 with Gn by shifting the nodes to the left or right appropriately. If a match is found, we declare that the observer has changed his direction. We note that if there is a simultaneous change in the observer direction and the scene content (mostly due to occlusion or disocclusion), this cannot be recovered. F. Real-Time Operation The color histogram provides a very compact feature vector. In addition, the color histograms of all the database images are computed off-line and stored. Given an input image, we only have to compute the histograms for the six partitions of the image and compute the similarity metric for each partition. Histogram comparison has a linear time complexity (O(r)), in Fig. 7. Retrieved visual environment description for the cylindrical concentric mosaic image shown in Fig. 6. terms of the number of gray levels. The database comprises of only around 30 images for each class, requiring very little computation. We performed experiments on a Pentium IV processor clocked at 2 GHz. The image resolution for all the omnicam test images was 512 × 512 pixels. It took approximately 100 ms to process a single omnidirectional image without any effort on code optimization. Hence, the environment updating can be performed at a rate of approximately 5–10 frames/s. However, an outdoor environment typically does not change very rapidly. Hence, an operation even at the rate of 1 frame/s should suffice, and building a real-time system poses no difficulty at all. V. R ESULTS We conducted extensive experiments on three categories of images, namely: 1) cylindrical panoramic mosaics; 2) still omnicam images; and 3) images obtained from an omnicam video. Because real-time generation of cylindrical mosaic video is not possible, this was not considered in this paper. The panoramic image mosaics were collected randomly from the Internet. The omnidirectional images were generated using a hemisphericalmirror–camera system developed at Indian Institute of Technology (IIT), Bombay. The camera was mounted on top of the mirror with the optical axis coinciding with the axis of the hemisphere. All images used for experimentation were collected in the IIT campus and the adjoining urban localities. For CBIR purposes, we initially created an image database by manually annotating an appropriate set of training images into several classes. The training images used were collected partly from the web and partly from the images provided by the University of Texas, Austin. The exemplar images are all Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 9, 2009 at 06:16 from IEEE Xplore. Restrictions apply. 1378 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 36, NO. 6, DECEMBER 2006 Fig. 8. (a) Frame from the omnicam video. (b) Description of the retrieved environment. Fig. 9. (a) Another frame from the omnicam video sequence. (b) Description of the retrieved environment. standard parallel plane images because the collection of such data is much easier and serves the purpose equally well. We demonstrate the performance of the proposed method starting with a cylindrical mosaic of the environment. Fig. 3 shows one such cylindrical concentric mosaic image used in the experiment. The graph generated by collating the retrieval results for all the partitions of this mosaic is shown in Fig. 4. We see that the majority class for the LT partition is the buildings B class, as indicated by the image placed at the LT node of the graph. As shown in Fig. 3, the rest of the image is dominated by lawns, and this is correctly indicated in the graph. Fig. 5 shows another way of describing the retrieved environment. We place a characteristic thumbnail image of the retrieved class to each node of the graph. To further illustrate the result of the environment retrieval problem, we present the results of analyzing another concentric mosaic. Fig. 6 shows a concentric mosaic of a lawn image. The thumbnail representation of the retrieved environment is shown in Fig. 7. Such a representation is useful in providing a feel for the environment one is surrounded with. As an example of processing a static omnicam scene, we present one of the frames from our omnicam video sequence [Fig. 8(a)]. Notice the observer blocking the XX partition. The complete environment recovered for this scene is shown in Fig. 8(b). Another example of a static omnicam scene is provided in Fig. 9(a). The recovered environment for this image is shown in Fig. 9(b). The annotations of all the nodes are, indeed, correct. To test the performance of the proposed technique, we collected data at many locations and at different times of the day when the ambient illumination changes. Because there is considerable temporal redundancy in the omniview video, we used a temporally downsampled video for our experiments. Accordingly, we considered about 50 frames of the omniview sequence for creating the animated video for demonstration purposes. We compiled our results over all the data sets. Results over this extensive data set were quite positive, with an accuracy of about 80%, given the use of a simple feature such as color. This may appear to be a bit poor, but several points deserve explanation. A third of the labeling errors occurs between the buildings B and the traffic T classes when both of these classes receive comparable votes. This is not surprising for, in an urban environment, both coexist in a scene. In addition, in cases where a building may be partially occluded by trees, the system is confused. The class having the second highest representation among the top 20 retrievals is often the correct label in such cases. However, we have not considered the multiclass labeling problem in this paper. The discriminative power of the system can be further improved by a simple yet effective modification. We use a simple clustering technique to segment out the region in each partition having the largest area Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 9, 2009 at 06:16 from IEEE Xplore. Restrictions apply. RAJASHEKHARA et al.: QUALITATIVE VISUAL ENVIRONMENT RETRIEVAL and use the corresponding local color histogram for retrieval purposes. With the aforementioned modification, we achieved an accuracy close to 90%. VI. C ONCLUSION We have presented a CBIR-based approach to build a reasonably rich description of the environment using cylindrical concentric mosaics or omnidirectional images. We describe the visual environment using a graph whose nodes are annotated with the identifiers of classes belonging to an annotated database. We tested our method extensively on static scenes as well as on omnivideo sequences. For the latter case, we provide a temporally evolving graph as well as an animated representation that tracks the change in environment over time. This representation also provides us with a mechanism to detect changes in the scene. Our experiments have yielded quite promising results in terms of the accuracy of description vis-a-vis the computational complexity involved. For practical reasons, we have included a Braille board display in the developed system to display the environment annotations for the benefit of visually impaired persons. A complete description of the developed portable system can be found in a patent document [29]. In the current implementation, each graph at a given instant is generated independently of the previous graph. In the future, we intend to introduce a memory in the system to predict the changes in the environment based on past observations. Furthermore, it should be noted that the proposed method of sensing the visual environment is quite different from the way humans do. This is because we are good at extracting features from only the foveated part of the scene. However, the peripheral vision does provide enough information for us to figure out what surroundings we are in and that alone may suffice for the proposed task of environment retrieval, albeit not over the entire 360◦ view. This has been another motivation for using a weak cuelike color for the CBIR purposes. It would be interesting to compare the performance of the proposed system to that of the human visual system. ACKNOWLEDGMENT The authors would like to thank the reviewers for their constructive comments. R EFERENCES [1] J. J. Leonard and H. F. Durrant-Whyte, “Simultaneous map building and localization for an autonomous mobile robot,” in Proc. IEEE/RSJ IROS, Osaka, Japan, 1991, pp. 1442–1447. [2] F. Dellaert, W. Burgard, D. Fox, and S. Thrun, “Using the CONDENSATION algorithm for robust, vision-based mobile robot localization,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. and Pattern Recog., Fort Collins, CO, Jun. 1999, pp. 588–594. [3] J. Kosecka and F. Li, “Vision based topological Markov localization,” in Proc. IEEE Int. Conf. Robot. and Autom., New Orleans, LA, Apr. 2004, pp. 1481–1486. [4] A. J. Davison, “Real-time simultaneous localization and mapping with a single camera,” in Proc. 9th IEEE Int. Conf. Comput. Vis., Nice, France, 2003, pp. 1403–1410. [5] A. J. Davison and D. W. Murray, “Simultaneous localization and map building using active vision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 865–880, Jul. 2002. [6] B. J. A. Kröse, N. Vlassis, R. Bunschoten, and Y. Motomura, “A probabilistic model for appearance-based robot localization,” Image Vis. Comput., vol. 19, no. 6, pp. 381–391, Apr. 2001. 1379 [7] T. Goedeme, M. Nuttin, T. Tuytelaars, and L. V. Gool, “Markerless computer vision based localization using automatically generated topological maps,” in Proc. Eur. Navigat. Conf., Rotterdam, The Netherlands, 2004. [8] S. Se, D. Lowe, and J. Little, “Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks,” Int. J. Rob. Res., vol. 32, no. 4, pp. 431–443, 1996. [9] D. Fox, J. Hightower, L. Liao, D. Schultz, and G. Borriello, “Bayesian filters for location estimation,” IEEE Pervasive Comput., vol. 2, no. 3, pp. 24–33, Jul.–Sep. 2003. [10] T. Rappaport, J. Reed, and B. Woemer, “Position location using wireless communications on highways of the future,” IEEE Commun. Mag., vol. 34, no. 10, pp. 33–41, Oct. 1996. [11] S. Thompson and A. Zelinsky, “Accurate local positioning using visual landmarks from a panoramic sensor,” in Proc. IEEE Int. Conf. Robot. and Autom., Washington, DC, May 2002, pp. 2656–2661. [12] E. Menegatti, M. Zoccarato, E. Pagello, and H. Ishiguro, “Image-based Monte Carlo localization with omnidirectional images,” Robot. Auton. Syst., vol. 48, no. 1, pp. 17–30, Aug. 2004. [13] Y. Matsumoto, M. Inaba, and H. Inoue, “Memory-based navigation using omni-view sequence,” in Proc. Int. Conf. Field and Service Robot., 1997, pp. 184–191. [14] ——, “Visual navigation using view-sequenced route representation,” in Proc. Int. Conf. Robot. and Autom., 1996, pp. 83–88. [15] C. Zhou, Y. Wei, and T. Tan, “Mobile robot self-localization using global visual appearance based features,” in Proc. IEEE Int. Conf. Robot. and Autom., Sep. 2003, pp. 1271–1276. [16] I. Ulrich and I. Nourbakhsh, “Appearance-based place recognition for topological localization,” in Proc. IEEE Int. Conf. Robot. and Autom., San Francisco, CA, Apr. 2000, pp. 1023–1029. [17] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content based image retrieval at the end of the early years,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349–1380, Dec. 2000. [18] J. Wolf, W. Burgard, and H. Burkhardt, “Robust vision based localization by combining an image-retrieval system with Monte Carlo localization,” IEEE Trans. Robot., vol. 21, no. 2, pp. 208–216, Apr. 2005. [19] S. K. Nayar, “Catadioptric omnidirectional camera,” in Proc. IEEE Int. Conf. Comput. Vis. and Pattern Recog., 1997, pp. 482–488. [20] ——, “Omnidirectional vision systems,” in Proc. DARPA Image Understanding Workshop, Monterey, CA, Nov. 1998, pp. 93–99. [21] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Texture features for image classification,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, no. 6, pp. 610–621, Nov. 1973. [22] N. Jhanwar, S. Chaudhuri, G. Seetharaman, and B. Zavidovique, “Content based image retrieval using motif cooccurrence matrix,” Image Vis. Comput. J., vol. 22, no. 14, pp. 1211–1220, 2004. [23] M. N. Do and M. Vetterli, “Wavelet based texture retrieval using generalized Gaussian density and Kullback–Leibler distance,” IEEE Trans. Image Process., vol. 11, no. 2, pp. 146–158, Feb. 2002. [24] M. Kokare, P. K. Biswas, and B. N. Chatterji, “Rotated complex wavelet based texture features for content based image retrieval,” in Proc. 17th Int. Conf. Pattern Recog., Cambridge, U.K., Aug. 2004, vol. 1, pp. 652–655. [25] B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 8, no. 8, pp. 837–841, Nov. 1996. [26] Rajashekhara, S. Chaudhuri, and V. P. Namboodiri, “Image retrieval based on projective invariance,” in Proc. IEEE Int. Conf. Image Process., Singapore, Oct. 2004, pp. 405–408. [27] M. J. Swain and D. H. Ballard, “Color indexing,” Int. J. Comput. Vis., vol. 7, no. 1, pp. 11–32, Sep. 1991. [28] D. Balthasar, “Color matching by using tuple matching,” in Proc. Int. Conf. Image Anal. and Process., Sep. 2003, vol. 1, no. 12, pp. 402–407. [29] S. Chaudhuri, Rajashekhara, and A. Prabhudesai, “Head mounted device for semantic representation of the user surroundings,” Indian Patent Filed, no. 133/MUM/2006, 2006. Rajashekhara received the B.E. degree in electronic and communication engineering from Kuvempu University, Karnataka, India, in 1994, the M.Tech. degree from the Mysore University, Mysore, India, in 1997, and the Ph.D. degree from the Indian Institute of Technology (IIT), Bombay, India, in 2006. He is currently part of the imaging team at GE Healthcare Technologies, Bangalore, India. His research interests include signal and image processing, pattern recognition, and computer vision. Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 9, 2009 at 06:16 from IEEE Xplore. Restrictions apply. 1380 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 36, NO. 6, DECEMBER 2006 Amit B. Prabhudesai received the B.E. degree in electronics engineering from Bombay University, Bombay, India, in 2004. He is currently working toward the M.Tech degree at the Indian Institute of Technology (IIT), Bombay. He is currently with Siemens Corporate Technology, Bangalore, India. His research interests include signal and image processing and computer vision. Subhasis Chaudhuri was born in Bahutali, India. He received the B.Tech. degree in electronics and electrical communication engineering from the Indian Institute of Technology (IIT), Kharagpur, in 1985, the M.S. degree from the University of Calgary, Calgary, AB, Canada, and the Ph.D. degree from the University of California, San Diego, both in electrical engineering. He joined the IIT, Bombay, as an Assistant Professor in 1990, where he is currently a Professor and the Head of the Department of Electrical Engineering. He was a Visiting Professor with the University of ErlangenNuremberg, Germany, and the University of Paris XI, France. He is a coauthor of the books Depth From Defocus: A Real Aperture Imaging Approach (Springer, 1999) and Motion-Free Super-Resolution (Springer, 2005) and the Editor of the book Super-Resolution Imaging (Kluwer Academic, 2001). His research interests include image processing, computer vision, and multimedia. He is an Associate Editor for the International Journal of Computer Vision. Dr. Chaudhuri was a recipient of the Dr. Vikram Sarabhai Research Award in 2001, the Prof. SVC Aiya Memorial Award in 2003, the Swarnajayanti Fellowship in 2003, and the S. S. Bhatnagar Prize in engineering sciences in 2004. He is a Fellow of the Alexander von Humboldt Foundation, Germany, the Indian National Academy of Engineering, and the National Academy of Sciences, India. He is also an Associate Editor for the IEEE TRANSACTIONS ON P ATTERN A NALYSIS AND M ACHINE I NTELLIGENCE . Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on January 9, 2009 at 06:16 from IEEE Xplore. Restrictions apply.
© Copyright 2025 Paperzz