Multimedia content analysis

We have experience in learning applications dealing with text, audio, and video data. In particular, together with leading companies in the sector, we have applied our designs to music genre recognition, semantic video analysis, or event detection in sport videos. We have also built video recommender systems based on collaborative filtering techniques.

Music Genre Recognition
Systems for automatically identifying the genre of songs have received a lot of attention during recent years.  In this line, and in cooperation with the Cognitive Systems group (DTU Compute, Denmark), we have applied kernel multivariate analysis methods to learn from datasets with tens of thousands of songs.  In practice, this is a big data problem, since extraction of Mel Coefficients and subsequent postprocessing, maps each song into hundreds of training vectors.  We have also developed methods to exploit the time structure, where important information for the music genre recogntion problem lies.

Semantic Video Analysis
Our research team has participated for several years on the Semantic Video Analysis task of the Semantic Indexing task of the TRECVID initiative.  This task consists on detecting pre-defined concepts in streams of video data. The classes we try to identify can be considered as "high-level": e.g., outdoor, skyline, telephone, two persons, etc.  For this, we need to feed our classifiers with low level data capturing the color and texture structure of the video frames, as well as local features extracted at key points of the image (SIFT features).  Our designs have been ranked in the first quartile of over 100 systems sent for evaluation from groups worldwide.

Sport Events Detection
Automatic detection of event in sport videos is a task with many potential applications.  Apart from automatic indexing, accurate systems could be crucial for automatic advertisement insertion or summary generation.  Successful systems need to exploit good medium-level features extracted from the audio and video streams, and be aware of the structure of the sport discipline.  Our research in this line has been carried out in the framework of the i3Media project, and has focused on the detection of events such as goal, free kick, corner, offside, etc, in football videos.  For this, we have developed new learning methods based on a hierarchy of Hidden Markov Models, so that the different levels of the hierarchy can deal with events at conceptually different levels.