Methodological Framework Design
The XRisis validation methodology addressed dual requirements: assessing CORTEX2 enabling technology integration effectiveness and evaluating humanitarian training value for Action Contre la Faim operational deployment decisions. This dual purpose necessitated evaluation framework combining standardised instruments enabling cross-study comparison (System Usability Scale) with domain-specific assessments ensuring operational relevance (emergency management competency development). The approach employed mixed-methods design integrating quantitative metrics providing measurable outcomes with qualitative feedback explaining patterns and revealing contextual nuances that numbers alone obscure. Validation timing positioned evaluation after platform achieved sufficient stability for reliable operation yet early enough to incorporate findings into commercial platform evolution, balancing formative assessment (informing improvement) against summative evaluation (determining minimum viable quality achievement).
Participant Selection and Workshop Structure
Eight participants drawn from Action Contre la Faim's internal emergency roster organised as two four-person teams representing typical country office emergency response composition (programme leads, logistics specialists, finance managers). Selection criteria ensured participants possessed relevant emergency deployment experience (five of eight had completed field deployments) and simulation exercise familiarity (six of eight had participated in three or more previous exercises), enabling informed assessment comparing XR capabilities against conventional training modalities. Demographic diversity across age ranges (late twenties through late fifties), gender, and prior VR experience levels enabled evaluation of accessibility and usability across varied user profiles rather than optimised for specific narrow demographic segments.
The validation workshop on 14 May 2025 in Paris sequenced three pilot experiences (arrival briefing with AI mentor, collaborative alert and response strategy, implementation simulation with AI stakeholders) with debrief intervals enabling reflection between phases. Total exercise duration approximately 90 minutes excluded separate induction session (15-20 minutes) and post-exercise structured debrief (30 minutes). The rehearsal workshop on 25 March 2025 with Action Contre la Faim project team members provided critical pre-validation testing identifying issues requiring resolution before engaging emergency roster personnel.
Quantitative Assessment Instruments
System Usability Scale: Ten Likert-scale statements assessed on strongly disagree (1) to strongly agree (5) scales, with alternating positive and negative framing reducing response bias. Calculation formula converts responses to 0-100 score enabling comparison against benchmark data from thousands of previous studies across diverse application domains. The instrument captures multiple usability dimensions including learnability, efficiency, memorability, error frequency, and satisfaction through validated psychometric structure.
Component Added Value Assessment: Five-point Likert scales (1 = no added value, 5 = substantial added value) rated each of five platform components specifically regarding contribution to emergency management competency development rather than general satisfaction or enjoyment. Components assessed separately included informational briefing from AI avatar, interactive response strategy tool, team collaboration in VR coordination office, soft skills practice with AI avatars, and facilitator debrief in VR environment, enabling granular identification of differential effectiveness.
User Satisfaction Measurement: Overall assessment of VR simulation exercise concept using five-point scale (1 = very dissatisfied, 5 = very satisfied) supplemented by open-ended questions inviting improvement suggestions and identifying strengths.
Qualitative Data Collection Methods
Post-Exercise Surveys: Written questionnaires combining numerical ratings with open-ended narrative questions inviting detailed feedback about specific strengths, limitations, comparison to conventional training, and recommendations for future development. Survey administration immediately following exercise completion whilst experiences remained fresh in participant memory, before extended debrief discussions that might influence recall or assessment.
Structured Verbal Debriefs: Group facilitated discussions with participants following exercise completion employed open-ended questioning ("What aspects added most value to your learning?" "What frustrated you or created unnecessary difficulty?") rather than yes-no questions or rating scales, inviting narrative responses participants could elaborate with examples and comparisons. Facilitators documented verbatim responses preserving participant language enabling subsequent thematic analysis identifying recurring patterns.
Facilitator and Project Team Debriefs: Separate sessions with facilitators and wider project team captured perspectives on delivery effectiveness, technical reliability, pedagogical appropriateness, and operational deployment feasibility that participant-focused sessions would not address. These sessions enabled candid discussion about implementation challenges, scenario design decisions, and potential improvements without participants present.
Data Analysis Approach
Quantitative data analysis calculated descriptive statistics (means, ranges, distributions) for System Usability Scale scores, added value ratings, and satisfaction measurements, identifying overall trends and variance patterns revealing user experience diversity. Component-specific ratings enabled direct comparison identifying highest-value capabilities (soft skills simulation) versus lowest-value components (theoretical briefing), informing resource allocation priorities for continued development.
Qualitative analysis employed thematic coding where researchers independently reviewed narrative responses and debrief transcripts, identified recurring themes and patterns, compared coding frameworks to ensure inter-rater reliability, and synthesised findings into recommendation frameworks. Themes emerged across multiple data sources (written surveys, participant debriefs, facilitator debriefs) provided confidence that identified patterns represented genuine phenomena rather than isolated observations. Integration of quantitative metrics with qualitative explanations created comprehensive understanding: numbers grounded impressions in measurable outcomes whilst narratives explained why patterns emerged and what contextual factors shaped user experiences.
Methodological Limitations and Mitigations
Sample size (eight participants plus two facilitators) enabled depth of engagement and rich qualitative feedback but limited statistical power for detecting subtle effects or generalising findings to full population with high confidence. Mitigation involved prioritising clear effect sizes (substantial performance differences across components) rather than marginal distinctions, supplementing statistical analysis with qualitative triangulation, and framing conclusions appropriately acknowledging sample constraints. Single-site validation (Paris) prevented assessment of cultural variation or deployment context differences that multi-site evaluation would reveal. Mitigation included deliberate participant diversity within site and explicit acknowledgement that findings may not generalise to substantially different operational contexts without additional validation.
Self-reported data susceptible to social desirability bias where participants provide responses they perceive evaluators prefer rather than authentic assessments. Mitigation employed anonymous surveys reducing attribution concerns, creating psychologically safe debrief environments explicitly framing critical feedback as most valuable, and triangulating self-reports against facilitator observations and system analytics capturing actual behaviour rather than reported perceptions.
Validation Contributions and Transferable Lessons
The methodology demonstrates rigorous approach to evaluating immersive training technology combining standardised instruments enabling benchmark comparison with domain-specific assessment ensuring operational relevance. The framework proves transferable to other specialised training contexts (healthcare, industrial safety, emergency services) requiring evidence-based assessment beyond technical feasibility demonstrations. Key methodological contributions include component-disaggregated assessment revealing differential value rather than aggregate impressions obscuring variation, integration of quantitative metrics with qualitative explanation providing comprehensive understanding, and dual assessment of usability and training effectiveness recognising that unusable systems deliver no learning value regardless of pedagogical potential whilst highly usable systems prove worthless if learning outcomes remain unaffected.
Conclusion
The validation methodology enabled rigorous assessment of XRisis platform providing evidence base for informed deployment decisions rather than speculative assumptions about XR training value. Results demonstrated methodology effectiveness: clear identification of high-value applications (implementation simulation), moderate-value components (collaborative planning), and low-value capabilities (theoretical briefing), combined with actionable improvement priorities addressing usability barriers and technical limitations. The approach provides template for humanitarian sector technology evaluation balancing scientific rigour with operational constraints and timeline practicalities.
Reference
Complete validation methodology documentation available through XRisis Deliverable D5 Evaluation Report and ACF VR SIMEX Final Report, May-June 2025.
Related
Industries
Products
Technologies

