Serge Belongie (University of California, San Diego)
The Visipedia Field Guide to North American Birds
Abstract: We present an interactive, hybrid human-computer method for object classification. The method applies to classes of problems that are difficult for most people, but are recognizable by people with the appropriate expertise (e.g., animal species or airplane model recognition). The classification method can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. Incorporating user input drives up recognition accuracy to levels that are good enough for practical applications; at the same time, computer vision reduces the amount of human interaction required. The resulting hybrid system is able to handle difficult, large multi-class problems with tightly-related categories. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate the accuracy and computational properties of different computer vision algorithms and the effects of noisy user responses on a dataset of 200 bird species and on the Animals With Attributes dataset. Our results demonstrate the effectiveness and practicality of the hybrid human-computer classification paradigm.
Mark Cohen (University of California, Los Angeles)
Abstract: The toolkit created within the field of computer vision contains methods with considerable power for the detection of tangible objects in the context of natural images. Many data images, however, do not share the features of readily labeled classes (e.g., cats) yet contain data objects that must be detected and measured. A few examples include high dimensional neural data (image time series as used in functional MRI, or transient electrical topographies in EEG) or patterns of connectivity. We consider here the question of how computer vision might gain traction in more abstract image contexts.
James Duncan (Yale University)
Model-Based Strategies for Biomedical Image Analysis
Abstract: The development of methods to accurately and reproducibly recover useful quantitative information from biomedical images is often hampered by uncertainties in handling the data related to: image acquisition parameters, the variability of normal biological, anatomical and physiological structure and function, the presence of disease or other abnormal conditions, and a variety of other factors. This talk will review image analysis strategies that make use of models based on geometrical and physical/biomechanical information to help constrain the range of possible solutions in the presence of such uncertainty. The discussion will be focused by looking primarily at several problem areas in the realms of neuroanatomical structure analysis, cardiac function analysis, and work in cellular image analysis, with an emphasis on image segmentation and motion/deformation tracking. The presentation will include a description of the problem areas and visual examples of the image datasets being used, an overview of the mathematical techniques involved and a presentation of results obtained when analyzing actual patient image data using these methods. Emphasis will be placed on how image-derived information and appropriate modeling can be used together to address the image analysis and processing problems noted above.
Rob Fergus (New York University)
Generative Models for the Direct Imaging of Exoplanets
Abstract: Exoplanet detection and charaterization is currently a hot topic within Astronomy. Within the last 17 years the number of known exoplanets around nearby stars has gone from zero to nearly 800. These have been discovered with a variety of detection methods, of which the most scientifically informative is direct imaging since the planet's spectrum can be measured. However, to detect the planet against the glare of the star requires a formidable contrast ratio of ~10^9. To meet this challenge, astronomers have built coronagraphs which block out much of the star light but result in severe diffraction artifacts the overwhelm the weak planet signal. We have recently developed computer vision techniques that can detect planets whose brightness is 1-2% of the diffraction artifacts, roughly an order of magnitude better than the current state-of-the-art algorithms used by astronomers. These methods can also be used to precisely estimate the spectrum of the planet, thus revealing its elemental composition.
Joint work with David Hogg (NYU Physics), Ben Oppenheimer & Doug Brenner (American Museum of Natural History) and the rest of the P1640 team.
Jitendra Malik (University of California, Berkeley)
Teaching computer vision online
Hartmut Neven (Google)
Algorithmic Frontiers in Computer Vision
Abstract: This talk will focus on two frontiers being explored by Google's visual search team.
i) Approaches based on extracting interest points and matching local descriptors using approximate nearest neighbor search have been successfully scaled to indices comprised of billions of images. But these methods fail for objects for which only a few interest points are present and locally only less discriminative features such as simple edges are prevalent. Accomplishing fine grained classification for such weakly textured objects is a challenge. But many objects fall into this category, such as cars or furniture. A promising approach we have been investigating integrates object detection with the ability to localize parts that are subsequently described by grouping local features into sufficiently global discriminative descriptors.
ii) A major challenge AI needs to overcome is to enable a system to learn from noisy datasets not curated by humans. It is a common experience that often the most time consuming task in constructing a new vision capability is the collection of quality training data since this step typically involves human tagging. The better and the more detailed the annotation of a training set the easier the learning task. This is largely independent of the learning framework used. We will present latest results in our effort to learn from training sets which can contain significant amounts of mislabeled examples such as training a car detector using a set of images returned by Google Image Search for the query "car". We will describe a method that can learn a Bayes optimal classifier from noisy data by mapping training to a quadratic programe that can be solved by quantum hardware.
Anand Rangarajan (University of Florida)
Richard Szeliski (Microsoft Research)
Computer vision on Wikipedia
Speakers: Richard Szeliski and Serge Belongie
Abstract: As we try to disseminate our shared technical knowledge about computer vision and raise general awareness on this topic, we need to look beyond -traditional media such as conference and journal papers. One of the most widely read and used resources in the world is Wikipedia. So, what is the state of computer vision articles on Wikipedia? A small group of us have been meeting regularly (since the August Frontiers workshop) to take stock of Wikipedia vision articles and lay the groundwork for improving the content. We have catalogued existing and needed articles, and we are reaching out to faculty and students to start writing lead articles. In this talk, we will report on our status and solicit the audience for more help.
Sinisa Todorovic (Oregon State University)
The Tree of Life -- Computer Vision for Next Generation Phenomics
Abstract: Phenomic characters represent a rich source of data on biodiversity. They enable scientists to reconstruct the evolutionary history of taxa, and build a comprehensive Tree of Life (ToL), without genetic data. Phylophenomicists working across the ToL are challenged by the difficulty of discovering and scoring phenomic characters, generating images that describe characters, and annotating and extracting phylogenetically informative data from legacy taxonomic and natural history literature. There is an opportunity for computer vision to advance the entire field of evolutionary inference by addressing these challenges. This talk will present a collaborative effort of vision researchers and phylophenomicists toward providing the systematics community with new and refined methods for discovering and scoring phenomic characters, and generating phenomic data sets. In particular, I will talk about our approach to fine-graned object recognition, which identifies arthropods (to the genus level) in high-resolution images more accurately than expert biologists. This approach represents the first vision-based discovery of a novel phenotype that differentiates taxa.
Alan Yuille (University of California, Los Angeles)