Department of Computer Science
University of Texas at San Antonio
Schedule:
Abstract:
MicroRNAs are non-coding small RNAs of ~22 nt that regulate gene expression by base pairing with target mRNAs, leading to mRNA cleavage or translational repression. However, one major problem facing miRNA research is the lack of computational tools for accurate target prediction. GenMiR++ (Generative model for miRNA regulation) , a novel Bayesian model and learning algorithm, was proposed recently[1]. They demonstrate that paired expression profiles of microRNAs (miRNAs) and mRNAs can be used to identify functional miRNA-target relationships with high precision. By using GenMiR++, a network of 1,597 high-confidence target predictions for 104 human miRNAs, which was supported by RNA expression data across 88 tissues and cell types, sequence complementarity and comparative genomics data. [2]
References:
Abstract:
Concept-based multimedia search has become more and more popular in
Multimedia Information Retrieval (MIR). However, which semantic
concepts should be used for data collection and model construction is
still an open question. Currently, there is very little research found
on automatically choosing multimedia concepts with small semantic
gaps. In this paper, we propose a novel framework to develop a lexicon
of high-level concepts with small semantic gaps (LCSS) from a
large-scale web image dataset. By defining a confidence map and
content-context similarity matrix, images with small semantic gaps are
selected and clustered. The final concept lexicon is mined from the
surrounding descriptions (titles, categories and comments) of these
images. This lexicon offers a set of high-level concepts with small
semantic gaps, which is very helpful for people to focus for data
collection, annotation and modeling. It also shows a promising
application potential for image annotation refinement and rejection.
The experimental results demonstrate the validity of the developed
concepts lexicon.
References:
Abstract:
The comparison of geometric shapes is essential in various
applications including computer vision, computer aided
design, robotics, medical imaging, and drug design. The
Fréchet distance is a similarity metric for continuous
shapes such as curves or surfaces which is defined using
reparametrizations of the shapes.
We present the first algorithm to compute the geodesic
Fréchet distance between two polygonal curves A and B inside
a simple bounding polygon P. We use a randomized approach
based on red-blue intersections to solve this problem almost
as quickly as the standard non-geodesic Fréchet distance.
References:
Abstract:
An algorithm for the discovery of time varying modules
using genome-wide expression data is presented here.
When applied to large-scale time serious data, our method
is designed to discover not only the transcription modules
but also their timing information, which is rarely
annotated by the existing approaches. Rather than assuming
commonly defined time constant transcription modules, a
module is depicted as a set of genes that are co-regulated
during a specific period of time, i.e., a time dependent
transcription module (TDTM). A rigorous mathematical definition
of TDTM is provided, which serve as an objective function
for the retrieving modules. Based on the definition, an effective
signature algorithm is proposed that iteratively searches the
transcription modules from the time series data. The proposed
method was tested on the simulated systems and applied to the
human time series microarray data derived from Kaposi's
sarcoma-associated herpesvirus (KSHV) infection of
human endothelial cells. The result has been verified by
Expression Analysis Systematic Explorer.
References:
Abstract:
References:
Abstract:
Mapping the pathways that give rise to metastasis is one of the key challenges of breast cancer
research. Recently, several large-scale studies have shed light on this problem through analysis of
gene expression profiles to identify markers correlated with metastasis. Here, we apply a protein network-
based approach that identifies markers not as individual genes but as subnetworks
extracted from protein interaction databases. The resulting subnetworks provide novel hypotheses
for pathways involved in tumor progression. Although genes with known breast cancer mutations
are typically not detected through analysis of differential expression, they play a central role in the
protein network by interconnecting many differentially expressed genes. We find that the
subnetwork markers are more reproducible than individual marker genes selected without
network information, and that they achieve higher accuracy in the classification of metastatic
versus non-metastatic tumors.
References:
Abstract:
References:
Abstract:
Billions of images are available online, constituting a dense sampling of the visual world. In contrast, the existing image datasets range
from 102 to 104 images spreading over a few different classes. Faced to this fact, they collect 79,302,017 images from seven independent image search
engines, loosely labeling one word to each image with 75,062 non-abstract nouns in English as listed in the Wordnet lexical database. Since the low
resolution images still have a good tolerant in object recognition, scene recognition and segmentation, they store images with resolution of 32 × 32.
Combined with the semantic information from Wordnet and nearest-neighbor methods, they propose a wordnet voting scheme to solve the semantic gap between
images and semantic meaning. It has a good performance in object recognition and outperforms some prevalent algorithms.
References:
Abstract:
This talk describes novel fully automated techniques for analyzing large
amounts of cardiovascular data. In contrast to traditional medical expert
systems the presented techniques incorporate no a priori knowledge about
disease states. This facilitates the discovery of unexpected events. The
algorithm starts by transforming continuous waveform signals into symbolic
strings derived directly from the data. Morphological features are used to
partition heart beats into clusters by maximizing the dynamic time-warped
sequence-aligned separation of clusters. Each cluster is assigned a
symbol, and the original signal is replaced by the corresponding sequence
of symbols. The symbolization process allows us to shift from the analysis
of raw signals to the analysis of sequences of symbols. This discrete
representation reduces the amount of data by several orders of magnitude,
making the search space for discovering interesting activity more
manageable. The authors describe techniques that operate in this symbolic
domain to discover rhythms, transient patterns, abnormal changes in
entropy, and clinically significant relationships among multiple streams
of physiological data. The techniques are tested on cardiologist-annotated
ECG data from forty-eight patients. The process for labeling heart beats
produced results that were consistent with the cardiologist supplied
labels 98.6% of the time, and often provided relevant finer-grained
distinctions. The higher level analysis techniques proved effective at
identifying clinically relevant activity not only from symbolized ECG
streams, but also from multimodal data obtained by symbolizing ECG and
other physiological data streams. Using no prior knowledge, the presented
techniques uncovered examples of ventricular bigeminy and trigeminy,
ectopic atrial rhythms with aberrant ventricular conduction, paroxysmal
atrial tachyarrhythmias, atrial fibrillation, and pulsus paradoxus.
References:
Abstract:
This talk describes an approach to object and scene retrieval which
searches for and localizes all the occurrences of a user outlined object
in a video. The object is represented by a set of viewpoint invariant
region descriptors so that recognition can proceed successfully despite
changes in viewpoint, illumination and partial occlusion. The temporal
continuity of the video within a shot is used to track the regions in
order to reject unstable regions and reduce the effects of noise in the
descriptors.
The analogy with text retrieval is in the implementation where matches on
descriptors are pre-computed (using vector quantization), and inverted
file systems and document rankings are used. The result is that retrieval
is immediate, returning a ranked list of key frames/shots in the manner of
Google.
References:
Abstract:
Motivation: MicroRNAs (miRNAs) are involved in many diverse biological processes and they may potentially regulate
the functions of thousands of genes. However, one major issue in miRNA studies is the lack of bioinformatics programs
to accurately predict miRNA targets. Animal miRNAs have limited sequence complementarity to their gene targets, which
makes it challenging to build target prediction models with high specificity.
Results: Here we present a new miRNA target prediction program based on support vector machines (SVMs) and a large
microarray training dataset. By systematically analyzing public microarray data, we have identified statistically
significant features that are important to target downregulation. Heterogeneous prediction features have
been non-linearly integrated in an SVM machine learning framework for the training of our target prediction model,
MirTarget2. About half of the predicted miRNA target sites in human are not conserved in other organisms. Our
prediction algorithm has been validated with independent experimental data for its improved performance on predicting
a large number of miRNA downregulated gene targets.
References:
Please send emails to qitian@cs.utsa.edu, or seminar co-organizers: Kay Robbins, Weining Zhang, Yufei Huang, Carola Wenk, Jianhua Ruan, and Qi Tian.