Klewel: Presentation Acquisition System

About

People

Sponsors

Purpose:

The video proceedings with search functionalities of our MMM webpages [MLMI 04, IDIAP 15th anniversary, ACM UIST 2006 symposium, TechnoArk Event 2007, IM2 2007 Winter Institute, ACM CHI 2007] is the output of an automatic presentation acquisition system developed at IDIAP and Klewel. The system is capable of capturing audio video and slides from the projector no electronic copy of the slides is needed and does not require the speakers to change their behavior. The system runs on a PC that must be physically located in the room where the presentations take place and must be connected to the different capture devices cameras, microphones, projector. The different streams of information are automatically synchronized indexed and distributed. The process does not require manual intervention and it starts at the end of the acquisition.

The mmm server on which the media files are supports browsing, playing, retrieving of recorded multimodal data files.

(1) The capture step

We have an application that records all the media described below on a single PC. The application is as simple as a start-stop button interface. Everything is directly synchronised:
  • video: 2 Pan-Tilt-Zoom high quality video cameras + 1 video output from the slide VGA signal via a converter. the speaker camera was manually remotly controlled.
  • audio: output mix from the local audio system. For better performance, it is possible to get signal from a dedicated audio recording system including an audio mix table and high quality michrophones.
  • slides: high resolution VGA image capture. includes online realtime slide change detection with timings.

Acquisition system

(2) The postprocessing

Once recorded, with our acquisition system, everything is already structured into directories, we automatically keep the start time and date of each presentation: see http://mmm.idiap.ch/private/uist06/. A Makefile-like batch script runs the following processes for all the presentations:

Optical character recognition

media processing

  • slide OCR based on state-of-the-art research, for each captured slide images
  • slide text database / dictionnary generator for our ITR tool
  • image thumbnails and HTML slide viewer/browser generator
  • audio noise removal -- wiener filtering
  • audio video format conversions (see section (3))
  • MP3 segmenter given slide timings, RSS podcast files generator
  • muliple video [video+slide] + audio encoding


(3) Encoding

The audio and the videos are originally encoded on-the-fly into separated and synchronised MPEG-4 DivX files. At the processing stage, the videos are converted into RealMedia -- allowing to play with SMIL -- at different bitrates while the audio is converted in WAV and MP3. The images are captured on-the-fly in high resolution BMP and then converted into JPEG. Given that all the media files are stored in high quality standard formats (MPEG4, WAV, BMP), all kind of encodings are feasible: Flash, QuickTime Mov, Windows media formats etc.

We have expertise in capturing meetings for research in multimodal information management (ie. AMI european research project ). We do not use any DV tapes, all is on-the-fly encoded on disk thanks to the use of video capture cards. Therefore no manual process is needed. MPEG4 is a standard that enables to encode in any desired format afterwards. RealMedia and SMIL are also multi-platform. For slide capture, we orignally encode in high resolution BMP images (for accurate OCR) and convert into standard JPG images at various resolutions.

(4) Distribution

The main manual action in this overall [acquisition / processing / distribution] process is to simply edit the author names / paper title database from the UIST program page . The core of the current webpages was dynamically generated and simply integrated into a Plone CMS. It is currently implemented in Python, we have the same version running in JSP and Perl CGI.

Information text retrieval


For each talk -- ie. 1 UIST talk , a dynamically generated page describes all the details: camera views, media download links, podcast, slide viewer with timestamped slides, personalized SMIL video player.


Browsing, streaming

example of slide search

The slide search engine allows to search for slides given the text, like google. The OCR and search engine were both developped at IDIAP and are state-of-the-art:

Alessandro Vinciarelli and Jean-Marc Odobez, Application of Information Retrieval Technologies to Presentation Slides, IEEE Transactions on Multimedia, Vol. 8, no. 5, pp. 981-995, October 2006.

Datong Chen, Jean-Marc Odobez, and Herve Bourlard, Text Detection and Recognition in Images and Videos. Pattern Recognition, 37(3):595\u2013609, March 2004.


JFerret


Note about the current streaming performance: we use an academic streaming licence from Real network which does not allow more than 10-20 simultaneous connexions. Again, other encoding formats like Flash, Quicktime .mov, .mp4, Windows media videos .wmv could be simply added. A remaining issue would be the streaming quality performance of academic streaming servers w.r.t professional ones and solutions like high scale mirroring.

People:

  • Maël Guillemot, IDIAP, Switzerland.
  • Alessandro Vinciarelli, IDIAP, Switzerland.
  • Jean-Marc Odobez, IDIAP, Switzerland.
  • Acknowledgement: Darren Moore, Olivier Bornet, Olivier Masson, David Grangier, Mickaela Keller, Andrei Popsecu-Belis, Frank Crittin (IDIAP, Switzerland), Denis Lalanne, Didier Von Rotz (Univ of Fribourg, Switzerland).

Sponsors:

  • Augmented Multi-Party Interaction -- AMI European project.
  • Interactive Multimodal Information Management -- (IM)2, part of the swiss National Centre of Competence in Research (NCCR).