Discover how Nuwa can transform your organisation. Get in touch today.Contact Us
Nuwa

Integrating Real-Time Cooperation Technologies: CORTEX2 Platform in Practice

Technical lessons from integrating Rainbow CPaaS, DFKI VCAA, CEA CoVA, and Linagora summarisation technologies into a production humanitarian training platform, addressing architecture challenges and discovering unexpected integration opportunities.

Published: by Anastasiia P.
Funded by the European Union

Funded by the European Union

This project has received funding from the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Grant agreement number: 101070192

Rainbow CPaaS: Communication Backbone Integration

Integrating Alcatel Lucent Enterprise's Rainbow Communication Platform as a Service as the foundational communication layer for XRisis required substantial architectural adaptation beyond straightforward SDK implementation. Rainbow's C# SDK provided native integration points for Unity-based applications, enabling voice and video communication within immersive environments, but required careful consideration of how communication channels would map to training scenario requirements where participants needed to move fluidly between individual work, small group conversations, and full team coordination whilst maintaining appropriate facilitator oversight. The platform architecture established Rainbow as the authoritative communication service across all participant endpoints regardless of device type (XR head-mounted displays, desktop applications, web browsers), ensuring consistent voice quality, connection reliability, and feature availability rather than implementing separate communication systems for different device categories that would have introduced complexity and potential inconsistency. Unity SDK integration required configuring audio spatialisation so that voice positioned in 3D space when participants communicated within shared virtual environments (creating natural directional audio cues about who was speaking) but switched to conventional stereo presentation during one-to-one calls or facilitator briefings where spatial positioning would serve no purpose and might create confusion. The system needed to handle dynamic communication topology changes as scenarios progressed: individual participants beginning in solo environments would join team coordination calls, break into smaller working groups for focused discussions, receive incoming calls from facilitators or role-players, and potentially split into parallel conversations before reconvening for collective decision-making, all whilst maintaining stable connections without requiring manual call setup or termination that would break scenario immersion. Rainbow's enterprise-grade infrastructure provided reliability essential for professional training delivery: connection failures or audio quality degradation during conventional videoconferences prove annoying but tolerable, whereas communication breakdowns during immersive simulation exercises destroy scenario realism and undermine learning objectives, requiring substantially higher performance standards than typical consumer applications. The integration architecture deliberately separated communication infrastructure (Rainbow) from scenario logic (Unity application) and state synchronisation (WebSocket server), enabling independent scaling, troubleshooting, and replacement of components without cascading system-wide impacts that tightly coupled architectures would have introduced. Nuwa's team discovered unexpected capabilities during integration: Rainbow's call recording features, originally planned only for compliance and quality assurance purposes, proved pedagogically valuable when combined with Linagora's summarisation agent, transforming raw recordings into structured post-exercise reflection materials that enhanced learning consolidation. The architecture decisions validated an important principle: rather than building all communication capabilities from scratch using WebRTC primitives, leveraging enterprise-grade CPaaS platforms provides reliability, scalability, and feature richness that dramatically accelerate development whilst reducing ongoing maintenance burden, even though platform fees add recurring costs compared to purely self-hosted alternatives. The successful integration demonstrated that research-stage platforms developed within CORTEX2 consortium could transition to production deployment supporting real operational requirements rather than merely demonstrating technical feasibility through controlled laboratory conditions, validating the maturity of CORTEX2's enabling technologies and the effectiveness of consortium collaboration in advancing capabilities toward practical applicability.

DFKI Video Call Alternative Appearance: Privacy-Aware Presence

Implementing DFKI's Video Call Alternative Appearance technology revealed both compelling value propositions and practical deployment challenges that shaped platform evolution strategy. The VCAA system enables video call participants to appear as avatars rather than exposing direct camera feeds, preserving presence and non-verbal communication whilst protecting privacy and enabling appearance customisation that some users find psychologically liberating compared to constant video surveillance. XRisis initially planned tight VCAA integration with Rainbow's video infrastructure, enabling automatic avatar transformation within the communication platform itself, but timeline constraints and API maturity limitations led to implementing a web-based emulator using MediaPipe for real-time face tracking and Ready Player Me for avatar rendering and customisation. Participants could share screens showing their avatar representations or use virtual camera software (such as OBS) to inject avatar video into Rainbow calls, creating functional privacy-aware communication whilst acknowledging the technical workaround introduced friction compared to seamless native integration. The emulator approach provided valuable insights about user acceptance of avatar-based communication: some participants enthusiastically embraced the ability to present professional avatar appearances regardless of their physical environment or appearance preparation, whilst others found the additional setup steps and processing overhead created barriers that discouraged use compared to simply enabling conventional video. Facilitators and role-players particularly valued VCAA for maintaining professional presence during extended training sessions without constant awareness of how they appeared on camera, reporting reduced fatigue and greater focus on scenario delivery rather than self-monitoring, a finding that suggested value extends beyond privacy protection to include cognitive load reduction during intensive facilitation work. The avatar system's racial and gender diversity options received positive feedback for enabling representation across diverse cultural contexts, allowing humanitarian organisations operating globally to present training characters that reflect local community demographics rather than defaulting to Western-centric visual representations that might feel alien to staff from other cultural backgrounds. Technical challenges emerged around avatar animation quality: while facial expression mapping worked adequately for general emotional tone (smiling, frowning, concern), subtle microexpressions and gaze direction that convey important social signals in face-to-face communication proved difficult to capture and transmit reliably, occasionally creating uncanny valley effects where participants noticed expression-body language mismatches that undermined rather than enhanced communication realism. The platform discovered that avatar presence value varied substantially based on scenario requirements: during formal briefings where social dynamics mattered less than information transfer, participants showed indifference to whether presenters appeared as avatars or conventional video, but during negotiation scenarios requiring rapport building and emotional reading, avatar quality significantly impacted perceived realism and learning value. Network bandwidth requirements for real-time face tracking and avatar rendering exceeded what participants in low-connectivity environments could reliably support, creating deployment equity issues where field staff in resource-constrained locations experienced degraded functionality compared to headquarters personnel with high-speed internet access, precisely opposite the democratisation objectives that motivated XR training platform development. The VCAA implementation experience validated the technology's conceptual value whilst highlighting the gap between research demonstrations and production deployment: capabilities that work reliably in controlled laboratory settings with high-end hardware and optimal network conditions require substantial additional engineering to achieve acceptable performance across the variable real-world contexts where operational systems must function. Future development priorities will focus on reducing computational and bandwidth requirements, improving avatar animation fidelity to cross the uncanny valley threshold, and providing graceful degradation so that participants with limited resources can still access core functionality even if advanced avatar features remain unavailable, ensuring technology enables rather than constrains access.

CEA Conversational Virtual Agent: AI Dialogue Integration

Integration of CEA's Conversational Virtual Agent platform to create AI-powered dialogue characters represented both a technical accomplishment and a pedagogical revelation about conversational AI's readiness for training applications. The CoVA system combined large language model capabilities with knowledge grounding in organisational documentation, enabling AI avatars to engage in contextual conversations about emergency management procedures, scenario-specific information, and operational decision-making whilst maintaining character personalities and communication styles appropriate to their roles as emergency coordinators, local officials, community leaders, or logistics partners. Mentor Maud, the AI avatar introducing participants to emergency management concepts during Pilot 1, demonstrated successful integration of CoVA with Rainbow communication infrastructure, appearing as a call participant that participants could interrupt, question, and engage through natural dialogue rather than predetermined branching conversation trees that characterise conventional training chatbots. The system employed vector embeddings of Action Contre la Faim's Standard Operating Procedures, Emergency Management documentation, and training materials to ground responses in organisational knowledge rather than relying solely on general language model training, ensuring answers reflected actual institutional practices rather than generic emergency management principles that might contradict specific organisational approaches. Implementation challenges centred on response generation latency: the several-second delay between participant questions and AI responses felt acceptable during informational briefings but created awkward pauses during fast-paced negotiation scenarios where realistic conversation flow required rapid turn-taking, exposing the gap between current generation language model inference speeds and human conversation dynamics. Speech recognition accuracy proved the most significant limitation: the AI system struggled with non-native English speakers, particularly when participants exhibited strong regional accents, used domain-specific jargon without clearly enunciating technical terms, or spoke rapidly under scenario pressure, resulting in misrecognitions that generated nonsensical responses and broke scenario immersion that required facilitator intervention to reset and continue. The platform discovered that AI dialogue value varied dramatically based on scenario type: during implementation simulations where participants negotiated with stakeholders, AI capabilities proved transformative because they enabled realistic unpredictable responses that adapted to participant approaches rather than following scripted paths, creating authentic interpersonal challenge; conversely, during structured planning activities where participants primarily needed access to reference information, a well-designed search interface would have served needs more effectively than conversation. CoVA integration required substantial prompt engineering to establish appropriate character personalities, conversational boundaries (the AI should refuse requests outside its character's knowledge or authority), emotional ranges (frustration when negotiations stall, enthusiasm when finding common ground, caution when facing institutional risks), and cultural communication styles reflecting diverse stakeholder backgrounds. The system demonstrated impressive capability to maintain consistent character perspective throughout extended conversations, remembering previous dialogue exchanges and incorporating them into subsequent responses, creating coherent interaction arcs rather than treating each utterance independently. Unexpected benefits emerged from integration with Linagora's summarisation agent: transcripts of participant-AI conversations provided rich data about communication strategies, negotiation approaches, and decision-making patterns that facilitators could reference during debriefs, effectively creating detailed observational records impossible to capture through facilitator notes alone. The CoVA experience validated that conversational AI has crossed the threshold into practical applicability for training simulations, particularly for scenarios where conversation is the primary activity and perfect realism is less important than providing authentic interpersonal challenge, whilst also clarifying that continued capability development in speech recognition, response latency, and multilingual support remains essential before the technology can deploy reliably in international humanitarian contexts characterised by linguistic diversity and variable communication infrastructure.

Linagora Meeting Summarisation: Automated Documentation Value

The integration of Linagora's automatic meeting summarisation agent into the XRisis platform workflow addressed a persistent challenge in simulation exercise delivery: capturing rich interaction data for post-exercise analysis without requiring facilitators to manually document all participant communications. The summarisation system processed Rainbow call recordings through speech-to-text transcription and large language model summarisation, generating PDF reports that captured key discussion points, decisions made, disagreements encountered, and action items agreed upon, providing structured reference materials supporting evidence-based debrief conversations. Facilitators reported substantial value from having concrete documentation about what actually occurred during exercise phases rather than relying on memory or incomplete notes taken whilst simultaneously managing scenario progression and participant support, enabling more specific and actionable feedback about team dynamics, communication effectiveness, and decision-making quality. The system proved particularly valuable for complex multi-participant scenarios where multiple conversations occurred simultaneously or in rapid sequence, creating informational density that exceeded human facilitators' ability to observe and retain, with automated transcription ensuring nothing got lost even when facilitators' attention focused elsewhere during critical moments. Integration challenges emerged around processing latency: the summarisation pipeline required several minutes to generate output after call completion, creating gaps in fast-paced exercise sequences where facilitators wanted immediate debrief conversations before participants lost detailed memory of their decisions and reasoning, requiring careful scenario design to incorporate natural break points where processing delays felt appropriate rather than disruptive. Transcription accuracy varied based on audio quality, speaker accents, and domain terminology, with occasional misrecognitions creating misleading summary content that facilitators needed to validate against their own observations before using in debrief discussions, introducing quality assurance overhead that reduced the efficiency gains from automation. Privacy and data protection considerations required careful configuration: storing recordings and transcripts of simulation exercises containing potentially sensitive discussions about organisational capacity, operational approaches, and individual performance created data governance obligations under GDPR and organisational policies, requiring explicit participant consent, secure storage with appropriate access controls, and clear retention and deletion schedules balancing learning value against privacy minimisation principles. The platform architecture stored summarisation outputs separately from exercise platforms, enabling granular access control where participants could review their own performance data without accessing other participants' materials and organisational administrators could aggregate anonymised learning analytics without compromising individual privacy. Unexpected benefits emerged from participant access to summarisation outputs: reviewing transcripts of their AI stakeholder conversations helped participants recognise communication patterns they had not consciously noticed during scenario engagement, including tendencies to over-explain, avoid direct questions, make assumptions without verification, or miss opportunities to establish rapport, creating powerful learning moments where evidence-based self-reflection generated insights that facilitator feedback alone would struggle to deliver. The integration experience validated automatic summarisation as valuable capability for training platforms that generate rich verbal interaction data, particularly when combined with facilitator interpretation and participant reflection rather than attempting fully automated assessment that current language models cannot reliably provide, finding the appropriate balance between human expertise and algorithmic support.

Integration Challenges and Technical Solutions

The process of assembling four distinct CORTEX2 enabling technologies within a unified platform exposed numerous integration challenges that required creative technical solutions and architectural adjustments. API maturity varied substantially across components: Alcatel Lucent Enterprise's Rainbow SDK provided production-grade interfaces with comprehensive documentation and responsive support, whilst some CORTEX2 research components remained earlier in development lifecycle with evolving interfaces and occasional breaking changes that required adapter layers to insulate XRisis application code from upstream modifications. Cross-component state synchronisation proved complex when multiple systems needed to coordinate: for example, when a facilitator triggered a scenario phase transition, the Unity application needed to update participant environments, the WebSocket server needed to distribute new inject content, Rainbow needed to initiate or terminate specific communication channels, and AI agents needed to adjust their behaviour to match scenario progression, requiring careful orchestration to ensure changes occurred in appropriate sequences without race conditions or partially inconsistent state. Performance profiling revealed that running multiple AI dialogue agents simultaneously whilst maintaining real-time 3D environment rendering and network synchronisation pushed hardware requirements beyond what participants with modest computers could support, requiring strategic decisions about where to allocate computational budget and whether certain capabilities should run server-side rather than client-side to reduce endpoint resource demands. Network connectivity variability presented persistent challenges: participants joining from locations with unstable internet connections experienced intermittent disconnections that existing recovery mechanisms handled poorly, creating frustration and compromising training effectiveness, driving development of enhanced reconnection logic and state recovery capabilities that would gracefully resume participant progress rather than requiring full scenario restarts. Security and authentication coordination across multiple systems required implementing federated identity approaches where participants authenticated once against XRisis platform but received automatic access to Rainbow communication services, AI dialogue agents, and scenario content without repeated login prompts that would disrupt training flow and create security vulnerabilities from credential proliferation. The technical team discovered that component integration testing required substantially more effort than anticipated: testing individual capabilities in isolation provided confidence they functioned correctly, but validating that they worked together reliably under realistic loads with diverse participant behaviours required elaborate test scenarios that approximated actual workshop dynamics including concurrent users, varied network conditions, and exploratory interactions that might trigger edge cases. Documentation gaps in CORTEX2 component specifications required extensive experimentation to discover correct usage patterns, with the Nuwa team serving as early adopters who identified issues that component providers could address to ease integration for future projects. Regular mentoring sessions with CORTEX2 consortium members provided essential troubleshooting support when the team encountered inexplicable behaviours or performance limitations, with component experts offering insights about undocumented configuration options, known issues with particular deployment contexts, or workaround approaches that formal documentation had not captured. The integration experience yielded technical architecture lessons applicable beyond XRisis: maintain clear component boundaries with well-defined interfaces enabling independent evolution; implement comprehensive logging and monitoring so that issues can be traced to specific components rather than mysterious system-level failures; design graceful degradation strategies allowing core functionality to continue when advanced features encounter problems; validate integration under realistic conditions including imperfect networks, resource-constrained devices, and diverse user behaviours rather than assuming optimal operational environments. The successful integration demonstrated that assembling complex capabilities from multiple providers remains challenging even with well-designed components and consortium support, requiring skilled integration engineering, tolerance for unexpected issues, and willingness to adapt architectural plans when initial approaches prove impractical, yet delivers substantial value by enabling platforms to incorporate best-of-breed capabilities rather than implementing all functionality in-house with likely lower quality results.

Lessons for Multi-Partner Technology Integration

The XRisis experience assembling capabilities from DFKI, Alcatel Lucent Enterprise, CEA, and Linagora whilst integrating with Mozilla Hubs as the base framework provides actionable guidance for future projects pursuing similar multi-partner technology integration approaches. Establish clear technical ownership and integration responsibility from project outset: XRisis positioned Nuwa as the integration lead responsible for overall system architecture whilst component providers supported through consultation and troubleshooting rather than attempting shared responsibility that would have created coordination overhead and unclear accountability for integration issues. Implement abstraction layers between external components and core application logic so that upstream changes, deprecations, or replacements impact localised adapter code rather than propagating throughout the system, reducing technical debt and enabling component substitution when alternatives prove superior or original providers discontinue support. Prioritise early technical validation of integration points rather than deferring component connection until late development phases: XRisis conducted integration feasibility testing during Phase 1 design specifically to identify potential blockers before committing to architectural approaches that components might not support, discovering several cases where initial integration strategies required adjustment based on actual API capabilities versus assumed functionality. Negotiate clear support agreements with component providers specifying response times, escalation procedures, and knowledge transfer mechanisms so that integration teams can access expertise when encountering issues beyond their independent troubleshooting capability, recognising that component documentation inevitably contains gaps and integration contexts may expose behaviours that providers did not anticipate during development. Plan for technical diversity across component maturity levels, development practices, and quality standards rather than assuming uniform engineering sophistication: some CORTEX2 components operated at production quality whilst others represented research prototypes, requiring different integration strategies and risk management approaches reflecting actual capability rather than aspirational specifications. Invest in comprehensive integration testing that validates not just that components connect successfully but that they deliver acceptable performance under realistic operational loads with diverse failure modes and recovery scenarios, discovering issues before production deployment where failures compromise organisational reputation and user confidence. Maintain regular communication channels with component providers throughout integration lifecycle rather than limiting interaction to formal project reviews, enabling early flagging of emerging issues, sharing of lessons learned across integration projects, and collaborative problem-solving when unexpected challenges arise that no single party can resolve independently. The multi-partner integration experience demonstrated that assembling complex platforms from distributed components remains substantially more challenging than integrated development under single ownership, requiring sophisticated engineering practices, strong project management, clear architectural vision, and collaborative relationships built on mutual commitment to project success rather than narrow optimisation of individual component showcase objectives, yet delivers unique value by combining specialised capabilities that no single organisation could develop independently within reasonable timelines and budgets.