AboutPeopleSponsors |
Purpose:The video proceedings with search functionalities of our MMM webpages [MLMI 04, IDIAP 15th anniversary, ACM UIST 2006 symposium, TechnoArk Event 2007, IM2 2007 Winter Institute, ACM CHI 2007] is the output of an automatic presentation acquisition system developed at IDIAP and Klewel. The system is capable of capturing audio video and slides from the projector no electronic copy of the slides is needed and does not require the speakers to change their behavior. The system runs on a PC that must be physically located in the room where the presentations take place and must be connected to the different capture devices cameras, microphones, projector. The different streams of information are automatically synchronized indexed and distributed. The process does not require manual intervention and it starts at the end of the acquisition.The mmm server on which the media files are supports browsing, playing, retrieving of recorded multimodal data files. (1) The capture stepWe have an application that records all the media described below on a single PC. The application is as simple as a start-stop button interface. Everything is directly synchronised:
(2) The postprocessingOnce recorded, with our acquisition system, everything is already structured into directories, we automatically keep the start time and date of each presentation: see http://mmm.idiap.ch/private/uist06/. A Makefile-like batch script runs the following processes for all the presentations:
(3) EncodingThe audio and the videos are originally encoded on-the-fly into separated and synchronised MPEG-4 DivX files. At the processing stage, the videos are converted into RealMedia -- allowing to play with SMIL -- at different bitrates while the audio is converted in WAV and MP3. The images are captured on-the-fly in high resolution BMP and then converted into JPEG. Given that all the media files are stored in high quality standard formats (MPEG4, WAV, BMP), all kind of encodings are feasible: Flash, QuickTime Mov, Windows media formats etc.We have expertise in capturing meetings for research in multimodal information management (ie. AMI european research project ). We do not use any DV tapes, all is on-the-fly encoded on disk thanks to the use of video capture cards. Therefore no manual process is needed. MPEG4 is a standard that enables to encode in any desired format afterwards. RealMedia and SMIL are also multi-platform. For slide capture, we orignally encode in high resolution BMP images (for accurate OCR) and convert into standard JPG images at various resolutions. (4) DistributionThe main manual action in this overall [acquisition / processing / distribution] process is to simply edit the author names / paper title database from the UIST program page . The core of the current webpages was dynamically generated and simply integrated into a Plone CMS. It is currently implemented in Python, we have the same version running in JSP and Perl CGI.
For each talk -- ie. 1 UIST talk , a dynamically generated page describes all the details: camera views, media download links, podcast, slide viewer with timestamped slides, personalized SMIL video player.
Alessandro Vinciarelli and Jean-Marc Odobez, Application of Information Retrieval Technologies to Presentation Slides, IEEE Transactions on Multimedia, Vol. 8, no. 5, pp. 981-995, October 2006. Datong Chen, Jean-Marc Odobez, and Herve Bourlard, Text Detection and Recognition in Images and Videos. Pattern Recognition, 37(3):595\u2013609, March 2004.
Note about the current streaming performance: we use an academic streaming licence from Real network which does not allow more than 10-20 simultaneous connexions. Again, other encoding formats like Flash, Quicktime .mov, .mp4, Windows media videos .wmv could be simply added. A remaining issue would be the streaming quality performance of academic streaming servers w.r.t professional ones and solutions like high scale mirroring. People:
Sponsors: |