Discover how Nuwa can transform your organisation. Get in touch today.Contact Us
Nuwa

DFKI Video Call Alternative Appearance Technology Integrated into XRisis

XRisis platform implements avatar-based video conferencing enabling privacy-aware communication whilst maintaining presence and non-verbal expression through MediaPipe face tracking and Ready Player Me avatar rendering.

Published by Nuwa Team
Funded by the European Union

Funded by the European Union

This project has received funding from the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Grant agreement number: 101070192

Video Call Alternative Appearance Implementation

Nuwa has integrated avatar-based video conferencing capabilities into XRisis platform through implementation of Video Call Alternative Appearance technology concepts developed by DFKI's German Research Centre for Artificial Intelligence. The integration enables video call participants to appear as avatars rather than exposing direct camera feeds, preserving presence and non-verbal communication whilst protecting privacy and enabling appearance customisation that some users find psychologically liberating compared to constant video surveillance during extended training sessions. The implementation employs web-based approach using MediaPipe machine learning framework for real-time facial landmark detection and Ready Player Me for avatar rendering and customisation, creating functional privacy-aware communication whilst technology integration matures toward deeper Rainbow CPaaS incorporation in future development phases.

Technical Architecture and User Experience

Participants configure their avatar appearances through Ready Player Me customisation interfaces selecting from diverse character options representing different ages, genders, ethnicities, and clothing styles enabling visual representation matching cultural contexts and personal preferences. MediaPipe face tracking captures webcam video feeds identifying 468 facial landmarks updated at 30 frames per second, tracking head rotation, facial expressions, and mouth movements with sufficient accuracy for avatar animation conveying general emotional tone (smiling, frowning, concern, surprise) and speech synchronisation. Users share their avatar video through screen sharing in Rainbow calls or employ virtual camera software (such as OBS Studio) injecting avatar streams as camera feed replacements, creating seamless privacy-aware presence maintaining professional appearance regardless of physical environment or appearance preparation. The system proved particularly valuable for facilitators and role-players conducting extended training sessions without constant camera self-monitoring, reporting reduced fatigue and greater scenario delivery focus compared to conventional video exposing their actual faces throughout multi-hour workshops.

Validation Outcomes and User Acceptance

Early testing with Action Contre la Faim team members revealed mixed adoption patterns with some participants enthusiastically embracing avatar communication whilst others found additional setup complexity created barriers discouraging use compared to simply enabling conventional video. Facilitators consistently valued the capability for maintaining professional presence without appearance concerns, whilst participants showed variable engagement with some appreciating privacy protection and customisation flexibility whereas others preferred direct video maintaining familiar communication modalities they had employed throughout COVID-19 pandemic remote work transitions. The avatar system's racial and gender diversity options received positive feedback for enabling representation across diverse cultural contexts, allowing humanitarian organisations operating globally to present training characters reflecting local community demographics rather than defaulting to Western-centric visual representations. Technical challenges emerged around animation quality where subtle microexpressions and gaze direction that convey important social signals in face-to-face communication proved difficult to capture and transmit reliably, occasionally creating uncanny valley effects undermining communication naturalness rather than enhancing it.

Development Priorities and Future Enhancement

Based on early implementation feedback, development priorities include reducing computational and bandwidth requirements enabling reliable operation on modest hardware and limited internet connections prevalent in humanitarian field locations, improving avatar animation fidelity crossing uncanny valley thresholds where current implementations sometimes trigger viewer discomfort from expression-body language mismatches, and providing graceful degradation so participants with resource constraints can access core functionality even if advanced avatar features remain unavailable. Future roadmap pursues deeper integration with Rainbow infrastructure enabling automatic avatar transformation without separate web application requirements that currently introduce setup friction, expanding avatar customisation options supporting culturally appropriate character representation across diverse deployment contexts, and incorporating avatar presence into platform analytics tracking non-verbal communication patterns alongside dialogue content for comprehensive interaction assessment. The VCAA integration demonstrates Nuwa's commitment to privacy-aware technology design whilst acknowledging that balancing privacy protection, communication effectiveness, and deployment simplicity requires continued refinement informed by actual user behaviour patterns rather than designer assumptions about preferred interaction modalities.