Real-time communication between groups of people has an ad-hoc nature. We believe the way it is mediated through video can be improved by taking intelligent decisions about what each participant is going to see and hear from all of the available sources. We call this intelligent decision making process Orchestration.
Orchestration processes cues generated from real-time content analysis of the audio/video signals to help provide an understanding of what is happening in the conversation/scene and then uses appropriate aesthetic principles to choose the most appropriate viewpoint.
The decision making process that decides the “most appropriate viewpoint” builds on an aesthetic from television, but in the Vconect system the selection of shots is more complex than in television for three reasons: firstly, there is no director to select the best shot; secondly, what constitutes the best shot will be different for each end point; and thirdly, each end point should also be considered as an active contributor to the communication and not (as in television) a passive observer.
We believe that a key determinant of delivering well-orchestrated communication is the context of communication. For this reason, much effort has been expended in the analysis of conversation in order to identify useful concepts such as the notions of, for example: turn, crosstalk, short-turn taking, silence and simultaneous start. Analysing communication in terms of metrics associated with these concepts, e.g. the number of turns, the number of cross talks, the duration of silence etc. can be used to identify contexts, for example a monologue, a lecture situation or an animated social chat. We believe that the context of the communication will have an impact on what each participant should see and hear. The research involves a knowledge elicitation from experts in communication and cinematography in order to generate rules – as well as detailed analysis of people’s responses to being involved in video mediated group communications that have been orchestrated.
Besides ad-hoc social communication, as one specific focus of Orchestration, we are seeking to explore how new methods of automatic audio-visual mixing can enable a new paradigm in theatrical performance. Video streaming of theatre is already part of the mainstream but the Vconect team is exploring how multiple locations can be part of a unified production experience available to audience members in multiple theatres and many homes. Such a multi-platform delivery generates new requirements for the selection and presentation of audio-visual streams. We believe, building on the insights gained from Orchestration based on scene understanding and communication contexts, we can form the foundation for systems enabling new multi-node community-based theatrical experiences.