Spring 2009 Data and Vision Weekly Seminar
Organizers: Qi Tian, Kay Robbins, Weining Zhang, Tom Bylander, Carola Wenk, Yufei Huang (EE),
Jianhua Ruan.
Time: 10:00-11:00 am, Every Friday
Place: SB 4.01.20, CS Conference Room
Schedule for Spring 2009
01/30 Angela Dean (Ruan lab)Cancelled
02/06 Jie Xiao (Tian lab)
02/13 Dragana Veljkovic (Robbins lab)
02/20 Prof Todd Troyer
02/27 Hongwei Tian (Zhang lab)
03/06 Atlas Cook (Wenk lab)
03/13 Sprint Break
03/20 Jian Cui (Huang lab)
03/27 Xia Li (Tian lab)
04/03 Chengwei Lei (Ruan lab)
04/10 Mark Doderer (Robbins lab)
04/17 Angela Dean (Ruan lab)
04/24 Jia Meng (Huang lab)
05/01 Lijie Zhang (Zhang lab)
05/08 Isha Gupta (Wenk lab)
Seminar information
-
02/06/09 Refining the image retrieval using one-class classification
Speaker: Jie Xiao
Presentation slides in [ ppt ]Abstract :
Current image search engines like Google cannot provide satisfying search results, which contain a large amount of unrelated data. Based on this observation, we propose a novel model to re-rank the Google image search results by exploring the latent characteristic of the massive unrelated images as a clue to filter them in the re-ranking. Inspired by the characteristic of the intrinsic diversity and the unwanted availability of the unrelated images, in our model, we adopt one-class classification to build a hyper-sphere for the target objects, unrelated images, and construct a robust boundary to distinguish them from the related images effectively. Then the Google results can be easily re-ranked by filtering the unrelated images with the built-up model. Extensive experiments on the comparison with Google image search engine's result show that our approach makes a significant improvement.
References :
- F. Schroff, A. Criminisi, and A. Zisserman, "Harvesting Image Databases from the Web," IEEE ICCV , pp. 1-8, 2007.
- C. Dance , J. Willamowski , L.X. Fan , C. Bray , G. Csurka , "Visual categorization with bags of keypoints," ECCV Workshop on Statistical Learning in Computer Vision , pp. 1-22, 2004.
- Tax, D., Juszczak, P., "Kernel whitening for one-class classification," Int'l Journal of Pattern Recognition and Artificial Intelligence , vol. 17, no. 3, pp. 333-347, 2003.
02/013/09 Kernel-based Information Fusion
Speaker: Dragana Veljkovic
Presentation slides in [ ppt ]Abstract :
Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in a symmetric and positive semidefinite matrix called the kernel matrix. Kernel methods provide a principled framework in which to represent many types of data, including vectors, strings, trees and graphs. As such, these methods are useful for drawing inferences about many types of problems in science.
In this talk we present a method proposed by Lanckrriet et al. for combining multiple kernel representations in an optimal fashion, by formulating the problem using semidefinite programming techniques. The method is applied on kernels extracted from a single dataset as well as on kernels extracted from several heterogeneous but related dataset. The method is applied to several problems, including classification and prediction of yeast protein functional classifications using a support vector machine (SVM) trained on five types of data.
References :
- G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan, "Learning the Kernel Matrix with Semidefinite Programming," J. Mach. Learn. Res. , vol. 5, pp. 27-72, 2004.
- G. R. G. Lanckriet, M. Deng, N. Cristianini, M. I. Jordan, and W. S. Noble, "Kernel-based data fusion and its application to protein function prediction in yeast," Proc. of Pacific Symposium on Biocomputing , pp. 300-311, 2004.
- G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Noble, "A statistical framework for genomic data fusion," Bioinformatics , vol. 20, pp. 2626-2635, 2004.
02/20/09 Data mining a complex behavior: syllables, sequence and timing in songbirds
Speaker: Prof. Todd Troyer (Biology, UTSA)
Presentation slides in [ ppt ]Abstract :
Much like humans, birds learn to sing early in lift by copying their vocal patterns from adults. The combination of a learned behavior and a specialized set of brain areas dedicated to song learning has songbirds one of the premier model animals for understanding how the brain learns and produces complex sequential behaviors. Adult songs in the zebra finch consist of repeated strings of well-defined vocal gestures call syllables. But relatively little is understood about the details of how birds progress from an early babbling phase to arrive at this adult song. Recent gains in the ability to store and process huge amounts of data have opened the possibility of using data-mining approaches to understand natural behavior. In this talk, I will present the challenges faced in attempting this research approach, and outline some recent advances in measuring the rich temporal structure of song production.
References :
- Christopher M Glaze and Todd W Troyer, Temporal Structure in Zebra Finch Song: Implications for Motor Coding, J. Neurosci., 2006, 26 (3): 991-1005.
- Christopher M Glaze and Todd W Troyer, Behavioral Measurements of a Temporally Precise Motor Code for Birdsong, J. Neurosci., 2007, 27(29): 7631–7639
- How sleep affects the developmental learning of bird song.
Derégnaucourt S, Mitra PP, Fehér O, Pytte C, Tchernichovski O.
Nature. 2005 Feb 17;433(7027):710-6
http://www.nature.com/nature/journal/v433/n7027/abs/nature03275.html - Dynamics of the vocal imitation process: how a zebra finch learns its song.
Tchernichovski O, Mitra PP, Lints T, Nottebohm F.
Science. 2001 Mar 30;291(5513):2564-9. Epub 2001 Mar 15.
http://www.sciencemag.org/cgi/content/full/291/5513/2564
02/27/09 Providing k-anonymity in data mining
Speaker: Hongwei Tian
Presentation slides in [ ppt ]Abstract :
The privacy problem is not only happening in data, but also in knowledge models. An extended definition of k-anonymity is presented and used to prove that a given data mining model does not violate the k-anonymity of the individuals represented in the learning examples. A tool is also provided to measure the amount of anonymity retained during data mining. Some examples show that this knowledge model can be applied to various data mining problems, such as classification, association rule mining and clustering. Two data mining algorithms are introduced, which exploit the extension to guarantee they will generate only k-anonymous output, and provide experimental results for one of them. Finally, this method contributes new and efficient ways to anonymize data and preserve patterns during anonymization.
References :
-
1. Arik Friedman, Ran Wolff, and Assaf Schuster. Providing k-anonymity in data mining . VLDB Journal, 2009.
-
Kristen LeFevre, David J. DeWitt, and Raghu Ramakrishnan. Incognito: Efficient Full-Domain K-Anonymity . SIGMOD, 2005
03/06/09 Shortest Path Problems on a Polyhedral Surface
Speaker: Atlas F. Cook IV
Presentation slides in [ ppt ]Abstract :
In general, the computation of shortest paths in three dimensions is NP-Hard by [3]. However, if shortest paths are required to travel along a polyhedral surface, then the problem becomes much easier. Our main result computes all shortest path edge sequences on a convex polyhedral surface a linear factor faster than [1].
References :
-
P. K. Agarwal, B. Aronov, J. O'Rourke, and C. Schevon. Star Unfolding of a Polytope with Applications. SIAM Journal on Computing, Society for Industrial and Applied Mathematics, 1997 , 26 , 1689-1713 [ PDF ]
-
G. Albers, J. S. B. Mitchell, L. J. Guibas, and T. Roos. Voronoi diagrams of moving points. International Journal of Computational Geometry and Applications , 1998 , 8:365-380 [ PDF ]
-
J. Canny and J.H. Reif. New Lower Bound Techniques for Robot Motion Planning Problems . Proc. 28th IEEE Annual Symp. Foundations of Computer Science , 1987, 49-60 [ PDF ]
-
V. Chandru, R. Hariharan, and N. M. Krishnakumar. Short-cuts on star, source and planar unfoldings. Foundations of Software Technology and Theoretical Computer Science (FSTTCS) , 2004, 174-185 [ PDF ]
-
J. Chen and Y. Han. Shortest paths on a polyhedron. International Journal of Computational Geometry and Applications, 1996 , 6 , 127-144 [ PDF ]
-
A. Maheshwari and J. Yi. On computing Fréchet distance of two paths on a convex polyhedron. 21st European Workshop on Computational Geometry (EuroCG) , 2005 [ PDF ]
-
D. M. Mount. The number of shortest paths on the surface of a polyhedron. SIAM Journal on Computing , 1990 , 19(4):593-611 [ PDF ]
-
03/20/09 The Comparison of Peak-detection Algorithms for LC/MS
Speaker: Jian Cui
Presentation slides in [ ppt ]Abstract :
Label free Liquid-Chromatography-Mass-Spectrometry (LC/MS) has become an increasingly important tool for proteomic research. Currently, there exits several software packages such as msInspect, mzMine developed for the processing of LC/MS datasets. These software packages utilize different processing methods for LC/MS peak detection. However, a systematic performance comparison of these peak detection algorithms does not exist, and hence there lacks operational guidance on the selection of various peak picking methods. In this presentation, we investigated the performance of various LC/MS peak detection methods based on the LC/MS dataset of a Universal Protein Standard (UPS1) from SIGMA-ALDRICH TM which is a mixture of 48 proteins. The performance is evaluated based on various criteria including: peak intensity, SNR, isotope pattern matching based on KL distance, isotope pattern matching based on mean square error, LC peak shape, and LC peak length. The ROC curve is obtained for each method. Our research shows that, surprisingly, the performance of peak detection based on peak intensity is the best among all criteria in this dataset with little contamination. Peak detection based on isotope pattern matching deteriorates greatly with lower peak intensity at less abundant charge states. Different isotope pattern matching metric, KL distance or mean square error, makes no significant difference in performance. The combination of several criteria such as intensity and isotope pattern does not bring performance improvement.
03/27/09 Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning
Speaker: Xia Li
Presentation slides in [ ppt ]Abstract :
In computer vision, visual category recognition is an area of considerable interest, and recognition methods based on local feature models have have been demonstrated to have success across a range of tasks. To speed up comparison over large databases, histogram pyramid representations computed from a vocabulary tree of visual words are introduced, and have have been proven valuable in indexing and recognition tasks. However, they have only used a single, fixed partition of feature space. In this paper, a new efficient algorithm to incrementally compute set-of-trees (forest) vocabulary representations is proposed, and recognition and indexing performance is shown to be improved.
References :
- T. Yeh, J. Lee, and T. Darrell. Adaptive vocabulary forests for dynamic indexing and category learning . ICCV, 2007
- D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree . CVPR, 2006
- D. Grauman and T. Darrell. Approximate correspondences in high dimensions . NIPS, 2006
04/03/09 Identification of transcription factor binding sites based on genetic algorithms
Speaker: Chengwei Lei
Presentation slides in [ ppt ]Abstract :
Identification of transcription factor binding sites (TFBSs) plays an important role in deciphering the mechanisms of gene regulation. Recently, GAME, a Genetic Algorithm (GA)-based approach with iterative post-processing, has shown superior performance in TFBS identification. However, the basic GA in GAME is not elaborately designed, and may be trapped in local optima in real problems. The feature operators are only applied in the post-processing, but the final performance heavily depends on the GA output.
This paper describes a novel framework, GALF-P, to achieve both effectiveness and efficiency for TFBS identification by introducing more advanced representations and novel operators in the GA, as well as designing the post-processing in an adaptive way.References :
-
Chan T.M., Leung K.S. and Lee K.H. TFBS identification based on genetic algorithm with combined representations and adaptive post-processing . Bioinformatics (2008), VOL 24; 341-349
-
Wei,Z. and Jensen,S.T. GAME: detecting cis-regulatory elements using a genetic algorithm . Bioinformatics(2006), VOL 22, 1577-1584.
04/10/09 Model-Shared Subspace Boosting for Multi-label Classification
Speaker: Mark Doderer
Presentation slides in [ ppt ]Abstract :
Typical approaches to the multi-label classification problem require learning an independent classifier for every label from all the examples and features. This can become a computational bottleneck for sizeable datasets with a large label space. In this paper, we propose an efficient and effective multi-label learning algorithm called model-shared subspace boosting (MSSBoost) as an attempt to reduce the information redundancy in the learning process. This algorithm automatically finds, shares and combines a number of base models across multiple labels, where each model is learned from random feature subspace and boots trap data samples. The decision functions for each label are jointly estimated and thus a small number of shared subspace models can support the entire label space. Our experimental results on both synthetic data and real multimedia collections have demonstrated that the proposed algorithm can achieve better classification performance than the non-ensemble baselineclassifiers with a significant speedup in the learning and prediction processes. It can also use a smaller number of base models to achieve the same classification performance as its non-model-shared counterpart.
References :
- Yan, R., Tesic, J., and Smith, J. R. 2007. Model-shared subspace boosting for multi-label classification. In Proceedings of the 13th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (San Jose, California, USA, August 12 - 15, 2007). KDD '07. ACM, New York, NY, 834-843.
- G. Tsoumakas, I. Katakis, "Multi-Label Classification: An Overview", International Journal of Data Warehousing and Mining, 3(3):1-13, 2007
- G. Tsoumakas, I. Katakis, I. Vlahavas, "Mining Multi-label Data", Data Mining and Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2009.
04/17/09 Evolutionary stability on graphs
Speaker: Angela Dean
Presentation slides in [ ppt ]Abstract :
Evolutionary stability is a fundamental concept in evolutionary game theory. A strategy is called an evolutionarily stable strategy (ESS), if its monomorphic population rejects the invasion of any other mutant strategy. Recent studies have revealed that population structure can considerably affect evolutionary dynamics. Here we derive the conditions of evolutionary stability for games on graphs. We obtain analytical conditions for regular graphs of degree k42. Those theoretical predictions are compared with computer simulations for random regular graphs and for lattices. We study three different update rules: birth-death (BD), death-birth (DB), and imitation (IM) updating. Evolutionary stability on sparse graphs does not imply evolutionary stability in a well-mixed population, nor vice versa. We provide a geometrical interpretation of the ESS condition on graphs.
References :
- Hisashi Ohtsuki, Martin A. Nowak, Evolutionary stability on graphs, J Theor Biol. ( 2008); 251(4):698-707
- Hisashi Ohtsuki, Christoph Hauert, Erez Lieberman, and Martin A. Nowak, A simple rule for the evolution of cooperation on graphs and social networks, Nature (2006); 441:502-505
- Ohtsuki, H., Nowak, M.A., The replicator equation on graphs. J. Theor. Biol. (2006); 243:86-97.
04/24/09 Dynamic Gene Regulatory Networks Revealed by Integrating Gene Expression Data and Existing Knowledge
Speaker: Jia Meng
Presentation slides in [ ppt ]Abstract :
Response of cells to changing environmental conditions is governed by the dynamics of intricate bimolecular interactions. Revealing the dynamics of gene regulations is ensential for understanding the functional mechanisms that lead to the response. A novel approach for inferring the dynamics of gene regulatory networks is presented here, whichintegrates gene expression data and existing knowledge. When applied to a time series microarray dataset, the constructed network partially reveals the signal transduction and transcriptional regulation in the human endothelial cells during the invasion of Kaposi's sarcoma-associated herpesvirus (KSHV).
Questions and Comments?
Please send emails to jruan@cs.utsa.edu, or seminar co-organizers: Kay Robbins, Weining Zhang, Yufei Huang, Carola Wenk, and Qi Tian.