Discover how Nuwa can transform your organisation. Get in touch today.Contact Us
Nuwa

AR Translation Agent for Multilingual Cultural Heritage Tour Accessibility

How XR Ireland developed real-time AR translation using VOXReality ASR and Neural Machine Translation for live museum tours, achieving strong technical performance whilst revealing critical hardware limitations and minority language quality challenges through validation with 37 museum visitors.

Published: by Anastasiia P.
Funded by the European Union

Funded by the European Union

This project has received funding from the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Grant agreement number: 101070521

Multilingual Accessibility Challenge in Specialist Heritage Interpretation

Open-air archaeological museums delivering specialised cultural heritage interpretation through live tour guides and traditional craft demonstrations face persistent language accessibility barriers that exclude international visitors and linguistic minority populations from fully engaging with educational programming. At Āraiši Ezerpils Archaeological Park, expert guides and craft demonstrators possess deep practical knowledge of 9th-10th century Latgalian building techniques, textile production, metalworking, and daily life cultural practices transmitted through hands-on demonstration and oral explanation, yet language capabilities remain limited primarily to Latvian with some English or German proficiency, creating communication barriers for visitors speaking other European languages or requiring translation support to follow technical explanations laden with specialised archaeological and craft terminology. The emphasis on practical demonstration and materiality at open-air heritage sites makes conventional screen-based mobile translation applications unsuitable for this context: visitors focusing attention on smartphone screens to read translated text miss crucial non-verbal communication including gesture demonstrations, tool manipulation techniques, material handling processes, and spatial relationships between craftsperson, materials, and environmental context that convey essential information inaccessible through purely linguistic translation. International tourist groups booking guided tours frequently require language-specific guide assignment, limiting scheduling flexibility and reducing utilisation efficiency when visitor demand for particular languages peaks whilst guide availability constrains capacity, or alternatively forcing visitors to choose between preferred tour timing with language mismatch versus convenient language with suboptimal scheduling. Academic and technical subject-matter experts often lack fluency in majority foreign languages despite possessing authoritative domain knowledge, preventing institutions from leveraging their expertise for international visitor engagement without costly professional interpreter hiring adding complexity to operational logistics and straining limited heritage institution budgets. The VAARHeT sub-project Pilot 3 scenario addressed multilingual accessibility through AR translation agent combining VOXReality Automatic Speech Recognition and Neural Machine Translation components, enabling real-time transcription and translation of tour guide speech displayed as text subtitles on mobile devices or AR wearable glasses, allowing visitors to maintain visual attention on physical demonstrations whilst reading translated content without screen-focused distraction compromising observational learning. XR Ireland developed mobile Android application with ActiveLook micro-OLED AR wearable integration, deployed on Samsung Galaxy Note10+ 5G smartphones paired via Bluetooth to lightweight glasses displaying translated text in wearer's peripheral vision, validated with 37 participants across five language pair combinations (German-English, English-Latvian, German-Latvian, Latvian-English, Latvian-German) between July 14-16 2025, generating quantitative performance metrics and qualitative user feedback about translation quality, hardware ergonomics, and application appropriateness for museum tour enhancement contexts.

Technical Implementation and Neural Machine Translation Integration

The AR translation agent architecture integrated VOXReality ASR and Neural Machine Translation components through mobile-cloud hybrid processing distributing computational workload between edge devices and server infrastructure for latency optimisation whilst maintaining translation quality. Tourist visitors activated the mobile application on provided Samsung Galaxy Note10+ devices, optionally pairing with ActiveLook AR wearable glasses through Bluetooth connection enabling hands-free translated text display, and configured source and target languages through simple dropdown selectors offering German, English, and Latvian options covering primary visitor demographics at Latvian archaeological sites serving Central European tourism markets. Push-to-talk activation using prominent on-screen button initiated microphone capture during tour guide speech segments, with VOXReality ASR component processing audio input locally on the mobile device converting spoken language to text transcription achieving real-time performance enabling continuous translation without accumulating delay over extended tour durations. Transcribed text transmitted to XR Ireland's cloud infrastructure hosted on NVIDIA A100 GPU servers where VOXReality Neural Machine Translation component processed source language text generating target language translation through sequence-to-sequence neural models trained on parallel corpora including general domain text and cultural heritage specialised terminology, returning translated text to mobile application for display rendering. Mobile UI presented camera passthrough view occupying majority of screen real estate enabling visitors to observe tour guide demonstrations through device whilst reading translated subtitle text overlaid at bottom screen position, deliberately mimicking video subtitle conventions establishing familiar interaction pattern reducing learning curve for technology adoption. ActiveLook AR glasses displayed translated text in compact format optimised for micro-OLED resolution constraints, positioning text in upper peripheral vision enabling simultaneous attention to physical environment and subtitle reading without head position adjustment or gaze direction changes, though hardware limitations including small display area, limited brightness in outdoor lighting conditions, and ergonomic compromises for users wearing prescription eyeglasses created substantial usability friction explored in validation assessment. System architecture implemented parcelled speech processing where push-to-talk button press initiated fixed-duration recording segments transmitted for translation as discrete chunks rather than continuous streaming, introducing interaction pattern requiring visitor timing discipline to capture complete guide sentences or explanation segments without premature button release truncating content or excessive duration capturing multiple sentences creating subtitle length overwhelming display capacity, representing inherent trade-off between processing reliability and interaction naturalness that continuous streaming might improve at cost of increased technical complexity and latency accumulation risks. Performance monitoring tracked end-to-end latency from push-to-talk activation through ASR transcription, cloud transmission, NMT processing, return transmission, and mobile plus wearable text rendering, achieving median 2076 milliseconds, average 2318 milliseconds, mean 2256 milliseconds, standard deviation 557 milliseconds, absolute deviation average 444 milliseconds, and 95th percentile latency 3203 milliseconds, representing highest latency among three VAARHeT pilots whilst still meeting project sub-2500 millisecond KPI threshold in 90%+ of cases and receiving 91.9% participant rating as "acceptable" or "very acceptable" speed demonstrating user tolerance for processing delay when translation utility justified wait time.

Validation Methodology and Multilingual Testing Protocol

Validation engaged 37 participants selected using identical inclusion criteria as other VAARHeT pilots, with particular emphasis on bilingual or multilingual capability enabling informed assessment of translation quality and accuracy rather than merely evaluating interface usability without linguistic comprehension to verify output correctness. Language pair testing protocol deliberately expanded beyond primary German-to-English specification to opportunistically assess additional combinations when participants expressed interest in extended testing or required alternative languages for comprehension assessment: 19 tests evaluated German-to-English translation representing primary use case for Central European tourists visiting Latvian heritage sites, 16 tests assessed English-to-Latvian translation serving English-speaking visitors attending Latvian-language tours, 3 tests examined German-to-Latvian translation, single test evaluated Latvian-to-English translation, and single test assessed Latvian-to-German translation, creating comprehensive evidence base about Neural Machine Translation component performance across European language diversity rather than limiting validation to single predetermined language pair. Technical failure definition for Pilot 3 extended beyond software malfunction to encompass ergonomic limitations including inability to wear AR glasses due to prescription eyewear conflicts, text illegibility from display brightness or resolution constraints, or physical discomfort preventing sustained wearable use, recognising that hardware usability proved equally critical as software functionality for determining deployment viability in operational museum contexts. Cordula Hansen designed test scenarios measuring task completion across wearable donning and comfort, mobile application startup and Bluetooth pairing, language selection configuration, and translated text reading comprehension on both AR glasses and mobile phone displays, with success criteria requiring not merely technical operation but comfortable extended use enabling 15-20 minute tour duration typical for museum guided experiences. Observation protocols captured assistance requirements, user abandonment, technical failures, and spontaneous user comments revealing expectations, frustrations, and satisfaction drivers that structured questionnaires might not elicit. Post-test surveys assessed first impressions, appropriateness to museum context, Net Promoter Score, hardware comfort, translation accuracy and reliability perceptions, text legibility on both display modalities, response speed subjective experience, and technostress or cybersickness symptoms particularly concerning given wearable hardware introducing potential eye strain, focal distance discomfort, or vestibular disruption from peripheral vision text movement, following responsible research ethics ensuring participant wellbeing monitoring throughout validation sessions.

Task Completion Results and Hardware Usability Assessment

Usability testing revealed stark contrast between mobile application performance and AR wearable hardware limitations, fundamentally shaping deployment recommendations. Wearable donning and comfort achieved 89.2% successful completion without help, 8.1% requiring assistance with fit adjustment or positioning, zero user abandonment, and 2.7% technical failure from ergonomic incompatibility where participants wearing prescription eyeglasses could not comfortably accommodate ActiveLook glasses simultaneously, immediately highlighting hardware adoption barrier for substantial visitor demographic requiring vision correction. Mobile application startup and Bluetooth pairing to glasses reached only 64.9% completion without help, 32.4% requiring tester assistance with pairing protocols or connection troubleshooting, zero abandonment (participants persisted despite difficulty), and 2.7% technical failure, demonstrating connection complexity as major friction point where Bluetooth pairing conventions proved non-intuitive for approximately one-third of participants requiring explicit guidance completing multi-step configuration process. Language selection interface showed 75.7% successful source language configuration without help, 24.3% requiring assistance interpreting dropdown options or understanding source-target distinction, zero technical failure, whilst target language selection improved to 89.2% completion without help and 10.8% assistance required, suggesting asymmetric difficulty where initial language selector interaction proved more challenging than subsequent repetition of similar UI pattern. Translated text reading and comprehension on AR wearable achieved 83.8% success without help, 10.8% with assistance positioning glasses or adjusting head angle for optimal text visibility, zero abandonment, and 5.4% technical failure from illegibility preventing comprehension, whilst mobile device text reading reached substantially higher 94.6% success without help, zero assistance required, 2.7% user abandonment (preference for wearable despite difficulty), and 2.7% technical failure, clearly demonstrating mobile phone display superiority over AR glasses across usability, reliability, and user preference dimensions. Technical failure analysis revealed AR wearable hardware limitations as dominant constraint: users with prescription eyeglasses reported difficulty wearing ActiveLook glasses comfortably over corrective lenses, text display micro-OLED resolution and brightness proved insufficient for legibility particularly in outdoor lighting conditions, focal distance mismatch between real-world tour guide observation and near-eye text display created eye strain requiring continuous refocusing, and compact display area limited subtitle length forcing awkward text truncation or scrolling. Multiple participants explicitly stated preference for mobile phone display despite initially expecting AR glasses would prove superior, with qualitative feedback including "glasses are awkward, but the application is good", "good idea but not working as expected", and strong sentiment that whilst wearable concept appealed aesthetically, European-manufactured AR hardware available for pilot deployment proved commercially non-viable for sustained text reading applications requiring clarity and comfort over extended tour durations.

Translation Quality Assessment and Language Pair Performance Variance

Translation quality evaluation revealed dramatic performance variance across language pairs fundamentally dependent on training corpus availability and linguistic resource investment for specific languages within VOXReality Neural Machine Translation component. German-to-English translation, representing high-resource language pair with extensive parallel training corpora availability and decades of machine translation research investment, achieved generally acceptable accuracy with participants rating translation as reliable and comprehensible for following tour content, enabling effective communication of archaeological explanations, construction techniques, and historical context despite occasional terminology awkwardness or phrasing that native speakers identified as non-idiomatic but semantically correct. English-to-Latvian and particularly Latvian-involving language pairs demonstrated severe quality degradation with participants reporting "very poor" or "comical" translation output including non-existent word inventions combining morphemes incorrectly, repetitive phrase generation ("it is very important that it is very important" appearing multiple times), grammatical construction errors violating Latvian linguistic rules, and semantic translation failures where target language output conveyed meaning unrelated or opposite to source speech content, rendering output worse than useless by providing misleading information that could confuse visitors more than no translation would. Structured assessment revealed only 8.1% strongly agreed translation felt accurate and reliable, 40.5% agreed, 18.9% neutral, 16.2% disagreed, and 16.2% strongly disagreed, with approximately one-third of participants expressing substantial doubt about translation trustworthiness despite technical system operation appearing functional through successful text generation and display. The Latvian language quality problem proved particularly significant given museum location in Latvia serving primarily domestic visitor population for whom Latvian-language tour translation to English or German would enable international visitors to attend regular scheduled tours rather than requiring special English-language guide booking, whilst poor quality prevented any practical deployment consideration despite representing highest-value use case for Āraiši Ezerpils operational needs. Participant free-text feedback emphasised translation quality as deployment-blocking limitation ("Latvian translation was very bad", "No feedback on save button, Latvian translation was very bad"), whilst praising application concept ("easy to use", "good concept") and mobile interface implementation indicating technical architecture soundness and interaction design appropriateness when translation component performed adequately. This quality variance demonstrates critical European AI development priority: commercial platforms optimised for high-resource languages (English, German, French, Spanish, Chinese) prove insufficient for European heritage sector serving linguistically diverse populations including minority and regional languages requiring dedicated investment in parallel corpus development, domain-specific terminology training, and continuous quality improvement that general commercial translation services do not prioritise sufficiently for specialised cultural heritage vocabulary and grammatical patterns.

Net Promoter Score Analysis and User Satisfaction Patterns

Net Promoter Score calculation yielded negative 14 (12 promoters rating 9-10 out of 10, 8 passives rating 7-8, 17 detractors rating 0-6), representing only VAARHeT pilot receiving net negative recommendation likelihood and indicating substantial user dissatisfaction driven primarily by AR wearable hardware limitations and Latvian translation quality failures rather than fundamental application concept rejection. The detractor population of 17 from 37 total participants (46%) substantially exceeded avatar pilot's 26% detractor rate and VR augmentation's minimal 3% detractor proportion, positioning translation agent in "needs significant improvement" category requiring major refinement before acceptable deployment consideration. Deeper analysis revealed detractor concentration among participants experiencing Latvian language pair translations where quality proved unacceptable, and among users attempting sustained AR glasses use where eye strain and legibility limitations created negative experience overwhelming translation utility benefits. Participants testing German-English translation on mobile device display demonstrated markedly more positive reception, with several rating experience 8-9 out of 10 and providing constructive feedback about functionality improvements rather than fundamental rejection, suggesting successful translation quality combined with usable display hardware could achieve acceptable satisfaction levels comparable to VR augmentation outcomes. Appropriateness to museum context assessment showed 29.7% strongly agreed, 35.1% agreed, 16.2% neutral, 13.5% disagreed, and 5.4% strongly disagreed, achieving 64.8% positive perception substantially lower than VR augmentation's 97.4% but notably higher than avatar's 55.3%, indicating translation application concept resonated with heritage context requirements despite implementation quality limitations. Positive addition to museum experience perception reached 29.7% strong agreement, 35.1% agreement, 16.2% neutral, 13.5% disagreement, 5.4% strong disagreement, with combined 64.8% positive reception again demonstrating majority acceptance tempered by significant minority experiencing substantial dissatisfaction from quality or usability failures. First impression themes divided into hardware assessment ("glasses were interesting but uncomfortable to use", "novel, a bit like in the movies"), translation quality judgement ("translations were good except Latvian which was very poor or comical", "good idea but not working as expected"), and concept validation ("confusing" representing learning curve or expectation mismatch), revealing that approximately half of participants appreciated innovation potential whilst half encountered friction sufficient to diminish enthusiasm despite conceptual merit. Liked most aspects concentrated on ease of use and concept strength, whilst liked least feedback emphasised "glasses were not useful", "no feedback on save button" (UI clarity gap), and "Latvian translation was very bad", providing clear prioritisation for improvement investment focusing translation quality and mobile-only deployment rather than pursuing AR wearable enhancement given commercially available hardware inadequacy for text-intensive applications.

Voice Interaction Assessment and Translation Speed Performance

Voice interaction evaluation for translation application diverged from welcome avatar and VR augmentation patterns, with participants not noting voice capture as novel feature but rather accepting speech input as expected and appropriate interaction modality for translation applications where audio recording inherently requires microphone activation regardless of interaction paradigm. The translation application assessment table documented that voice interaction represented expected modality for this application category unlike conversation or command contexts where voice proved optional alternative to text input, with participants implicitly accepting ASR transcription as baseline requirement rather than evaluating whether voice added value beyond manual text entry. Positive aspects included very fast input processing and translation generation with median 2076 milliseconds end-to-end latency perceived as acceptably responsive given translation complexity, with 24.3% rating speed as "very acceptable", 67.6% as "acceptable", 5.4% uncertain, 2.7% "not acceptable", and zero "not acceptable at all", achieving 91.9% acceptable-or-better perception exceeding project KPI thresholds and demonstrating VOXReality component performance adequacy. Negative dimensions highlighted processing parcelling limitation where push-to-talk mechanics required manual initiation for each speech segment preventing continuous translation of ongoing guide narration, introducing interaction burden that some participants noted as awkward compared to commercial real-time translation applications offering automatic continuous transcription without user intervention, though others acknowledged parcelling enabled clear segment boundary definition and processing reliability versus streaming approaches introducing synchronisation complexity and error propagation risks. Local minority language support absence proved critical limitation with Latvian native speakers expressing strong preference for native language interface and translation rather than requiring English or German intermediary languages, whilst acknowledging current Latvian translation quality rendered feature unusable despite demand. Participants familiar with commercial translation technologies including Microsoft Translator mobile applications and Meta wearable glasses with integrated translation capabilities noted competitive alternatives already addressed use case with superior hardware ergonomics, larger established user bases, broader language support, and mature quality assurance, questioning differentiation value proposition for VAARHeT translation agent beyond proving VOXReality component integration feasibility within research project context versus establishing defensible commercial positioning.

Hardware Ergonomics and Display Modality Comparison

Hardware assessment revealed ActiveLook AR wearable glasses as principal deployment limitation preventing commercial viability recommendation despite mobile application demonstrating functional adequacy. Comfort evaluation showed only 27% strongly agreed glasses were comfortable to wear, 21.6% agreed, 21.6% neutral, 13.5% disagreed, and 16.2% strongly disagreed, with combined negative perception of 29.7% and neutral-or-negative of 51.3% indicating majority of participants experienced ergonomic issues ranging from mild discomfort to complete rejection. Free-text feedback specified problems including glasses weight causing pressure on nose bridge during extended wear, incompatibility with prescription eyeglasses requiring users to choose between vision correction and translation access, insufficient padding or adjustment mechanisms preventing secure comfortable fit across diverse head shapes and sizes, and general awkwardness of wearing unfamiliar device creating self-consciousness about appearance and concerns about equipment stability ("I thought the glasses could fall off easily"). Text legibility on wearable display achieved only 10.8% strong agreement, 29.7% agreement, 13.5% neutral, 32.4% disagreement, and 13.5% strong disagreement, with combined 45.9% negative perception and merely 40.5% positive assessment indicating text reading proved problematic for majority of users due to micro-OLED resolution limitations, brightness inadequacy particularly in outdoor ambient lighting, focal distance requiring eye refocusing from real-world tour guide observation to near-eye display reading creating continuous accommodation adjustment fatigue, and compact display area constraining subtitle length forcing abbreviation or scrolling that disrupted reading flow. Technostress symptoms specifically attributable to AR glasses included eye strain reported by multiple participants from focal distance switching and sustained near-eye display viewing, mild anxiety about equipment damage or loss during active tour participation, and general awkwardness using unfamiliar wearable technology in semi-public museum setting creating social self-consciousness. Contrasting mobile device text legibility showed 43.2% strong agreement, 45.9% agreement, 8.1% neutral, 2.7% disagreement, and zero strong disagreement, achieving 89.1% positive assessment and clearly establishing mobile phone displays as vastly superior modality for text reading despite sacrificing hands-free benefit that AR glasses theoretically provided. Participant behaviour revealed overwhelming preference for mobile display with most defaulting to phone screen reading even when glasses successfully paired and operational, with testers noting visitors held phones viewing guide through camera whilst occasionally glancing at text rather than relying primarily on wearable display as intended interaction pattern, demonstrating actual usage diverged from designed experience when usability friction made prescribed interaction uncomfortable or impractical. This hardware assessment generated unambiguous recommendation: AR wearable glasses for text display applications represent immature technology category in European market lacking commercially viable products suitable for sustained reading tasks, with ActiveLook and comparable devices optimised for brief notification display rather than continuous subtitle consumption, whilst mobile phone screens provide proven reliable comfortable display modality that visitors already carry and understand requiring no additional hardware procurement or adoption burden.

Translation Quality Impact on Deployment Viability and Commercial Positioning

The dramatic translation quality variance across language pairs fundamentally shaped commercial viability assessment and strategic positioning for heritage translation applications. German-English translation quality proving generally acceptable whilst Latvian translation quality failing completely demonstrated that Neural Machine Translation deployment viability depends critically on language-specific model training quality and parallel corpus availability varying dramatically across European linguistic landscape, with high-resource languages benefiting from decades of computational linguistics research investment whilst minority languages receive insufficient attention from commercial providers optimising for largest addressable markets. Heritage institutions serving regional populations require local language support as non-negotiable baseline rather than optional enhancement, with Latvian language absence excluding domestic visitor access whilst translation quality inadequacy preventing even experimental deployment that might tolerate occasional errors if overall utility remained positive. Participant feedback revealed translation accuracy assessment as deployment-blocking criterion: when users experienced even single instance of obviously wrong translation generating semantically nonsensical output, trust erosion prevented continued use and recommendation to others regardless of subsequent translation quality improvement, demonstrating accuracy threshold requirements similar to Pilot 1 AI dialogue findings where heritage contexts demand near-perfect correctness rather than tolerating occasional errors acceptable in general commercial applications. Strategic implications suggested that rather than attempting to develop proprietary translation capabilities competing with established commercial providers, Culturama Platform should evaluate integration partnerships with translation API providers offering superior quality across broader language support whilst Nuwa concentrates development investment on unique heritage-specific capabilities including archaeological terminology training, curator-validated translation memory for technical vocabulary, and cultural sensitivity review ensuring translated content maintains appropriate tone and historical accuracy beyond purely linguistic correctness. Competitive positioning analysis comparing VAARHeT translation agent against Microsoft Translator, Google Translate mobile applications, and Meta smart glasses with integrated translation revealed substantial capability overlap, with commercial alternatives offering broader language support, mature quality assurance, established user bases familiar with interaction patterns, superior hardware ergonomics from major consumer electronics manufacturers, and zero marginal cost to heritage institutions versus custom application development and maintenance. This honest competitive assessment exemplified evidence-driven strategic analysis: validation proving technical feasibility provides insufficient justification for commercial development when equivalent or superior alternatives already serve market need without requiring new product investment, suggesting translation application represents research demonstration of VOXReality component integration rather than viable commercial differentiation opportunity for Culturama Platform development priorities.

Minority Language Support Requirements for European Heritage Applications

The Latvian translation quality failure generated one of VAARHeT's most strategically significant findings: European cultural heritage XR and AI applications must support minority and regional languages as baseline requirement rather than optional feature, with English-only or major-language-only deployment contradicting cultural missions to serve local communities as primary constituencies whilst inadvertently creating digital exclusion reinforcing linguistic inequalities favouring international visitors over domestic populations. Āraiši Ezerpils museum stakeholders emphasised that technology supposedly enhancing accessibility yet excluding Latvian speakers (majority visitor demographic) proved worse than no technology deployment, raising legitimate concerns about digital innovation serving tourist convenience whilst neglecting local cultural communities that heritage institutions fundamentally exist to serve. This principle extends across European heritage sector characterised by exceptional linguistic diversity including 24 official EU languages, numerous regional and minority languages protected under European Charter frameworks, and countless local dialects and linguistic variations reflecting cultural heritage that institutions preserve and celebrate, requiring technology platforms to support authentic multilingual representation rather than forcing linguistic homogenisation toward dominant languages contradicting cultural diversity preservation missions. Technical implications require substantial AI development investment in parallel corpus creation for minority languages, domain-specific training incorporating archaeological and heritage terminology across linguistic contexts, continuous quality monitoring and improvement cycles, and curator validation workflows ensuring translations maintain cultural appropriateness and historical accuracy beyond purely technical linguistic correctness. Economic challenges emerge from limited commercial incentive for minority language AI development given smaller addressable markets and higher per-language development costs when spreading investment across dozens of languages rather than concentrating on handful of high-resource languages serving majority of potential users, requiring public funding, cultural preservation programme support, or cross-subsidisation models where profitable major language deployments finance minority language development serving cultural equity objectives rather than pure market optimisation. Culturama Platform roadmap incorporating this lesson specifies multilingual foundation architecture supporting English, French, German, Spanish, Italian, Latvian, and Lithuanian as baseline language set with extensibility framework enabling addition of regional languages serving specific heritage institution contexts, whilst acknowledging translation quality assurance and terminology validation require collaboration with linguistic experts and cultural heritage professionals ensuring appropriate representation rather than treating language support as purely technical feature deployable through general-purpose translation APIs without domain customisation.

User Experience Insights and Interaction Pattern Preferences

Participant feedback revealed nuanced understanding distinguishing translation application concept appreciation from implementation quality criticism, with approximately two-thirds perceiving value proposition whilst substantial minority experienced sufficiently poor quality to reject recommendation despite conceptual merit. First impression of translation application divided into three themes: hardware assessment expressing interest in AR glasses whilst acknowledging discomfort ("glasses are awkward but the application is good"), translation quality judgement noting German-English adequacy versus Latvian failure ("Latvian translation is bad, otherwise works well"), and overall functionality perception recognising potential whilst experiencing execution gaps ("good idea but not working as expected"). Translator reaction smoothness to voice input showed 16.2% strong agreement, 56.8% agreement, 13.5% neutral, 10.8% disagreement, 2.7% strong disagreement, achieving 73% positive assessment and validating ASR component responsiveness despite language pair coverage limitations. Translation accuracy and reliability perception revealed 8.1% strong agreement, 40.5% agreement, 18.9% neutral, 16.2% disagreement, 16.2% strong disagreement, with combined 32.4% negative assessment driven primarily by Latvian quality failures affecting substantial test cohort whilst German-English testing participants contributed to positive perception majority. Participant commentary revealed expectation that translation applications should provide continuous real-time processing without manual activation requirements, with push-to-talk mechanics perceived as limitation reducing utility for following extended tour guide narrations compared to automatic continuous transcription and translation enabling passive listening without interaction burden. Several participants noted desire for translation history or replay capability enabling review of previously translated segments when comprehension gaps emerged or attention diverted from text display during critical explanation moments, representing feature enhancement improving usability beyond pure real-time translation proving insufficient for complex technical content requiring repeated engagement for full understanding. Technostress reporting specifically called out eye strain as primary discomfort factor, with focal distance switching between real-world guide observation and near-eye wearable text reading creating continuous accommodation adjustment fatigue, whilst mobile phone holding duration proved secondary concern for some users though generally tolerated for 10-15 minute tour segments typical for museum guided experiences. These insights informed refined interaction design recommendations: translation applications should default to mobile display modality with optional AR wearable support for users specifically requesting hands-free operation, implement automatic continuous translation initiated once at session start rather than requiring repeated push-to-talk activation creating interaction overhead, provide translation history enabling text review and comprehension verification, and offer adjustable display parameters including text size, contrast, background opacity, and positioning accommodating diverse user vision capabilities and preference patterns.

Strategic Recommendations and Development Pathway Assessment

Validation evidence generated clear strategic recommendations substantially diverging from initial VAARHeT proposal assumptions. Primary recommendation advised against AR wearable glasses deployment for cultural heritage translation applications given commercially available European hardware proving inadequate for sustained text reading across ergonomic comfort, display legibility, and user acceptance dimensions, with mobile phone displays providing superior experience without additional hardware procurement representing sunk costs for technology users already unlikely to adopt. Secondary recommendation emphasised minority language translation quality as deployment-blocking requirement preventing any Latvian heritage site deployment until Neural Machine Translation component quality achieves acceptable accuracy thresholds through additional training investment, domain-specific corpus development, and heritage terminology validation ensuring archaeological and craft vocabulary translates correctly preserving educational content integrity. Tertiary recommendation questioned commercial differentiation viability given established translation applications from Microsoft, Google, and other providers already addressing museum multilingual accessibility with mature products, extensive language support, proven reliability, and zero marginal deployment cost to heritage institutions versus custom application development, maintenance, and user adoption burden that proprietary solutions introduce. These honest assessments exemplified evidence-driven strategic analysis: validation investments revealing substantial limitations or competitive disadvantages prove equally valuable as strongly positive results when findings prevent wasted commercial development pursuing markets where differentiation proves unsustainable or deployment barriers prove insurmountable within reasonable resource investment. For Culturama Platform development, translation capability emerged as potential integration partnership opportunity rather than proprietary development priority: leverage established translation API providers for linguistic processing whilst focusing Nuwa development on heritage-specific value including curator-validated terminology databases, archaeological vocabulary training corpora, integration with museum content management workflows, and multimodal presentation supporting heritage interpretation requirements that general commercial translators do not address. Alternative strategic positioning suggested translation agent validation primarily contributed methodological learning and partnership model evidence applicable to other VAARHeT scenarios rather than establishing viable standalone product, with insights about voice interaction expectations, hardware limitations, and minority language requirements informing broader Culturama Platform development beyond translation-specific functionality. The experience demonstrated value of comprehensive pilot validation including scenarios ultimately not recommended for commercial development: negative or qualified results preventing resource waste on low-viability products prove equally valuable as positive validation results confirming development priorities, with honest limitation acknowledgement and evidence-based strategic adjustment representing research project success rather than failure when findings inform better-grounded commercial decisions.

Implications for Culturama Platform Multilingual Architecture

Despite qualified commercial viability assessment for standalone translation agent deployment, the validation generated critical architectural and strategic insights fundamentally shaping Culturama Platform multilingual support requirements. European heritage institutions operating across linguistically diverse regions require authentic multilingual capability as baseline platform feature rather than English-centric system with translation afterthought, with content authoring, curator workflows, visitor experiences, and administrative interfaces all requiring native language support enabling heritage professionals to operate in their working languages without forcing English intermediation that introduces barriers and delays. Translation requirements extend beyond visitor-facing tour guide interpretation to encompass multilingual content management where curators author exhibition descriptions, archaeological explanations, and historical narratives in source language with professional translation validation ensuring accuracy and cultural appropriateness rather than relying on automated translation potentially introducing errors or inappropriate phrasing. Metadata and semantic annotation systems must support multilingual vocabulary and ontology mappings enabling cross-linguistic content discovery and interoperability with European heritage aggregators including Europeana requiring metadata in multiple languages for accessibility and discoverability. Collaborative curation workflows involving distributed experts from different European countries require interface localisation and real-time translation enabling participation regardless of linguistic background, supporting inclusive knowledge creation and heritage interpretation development that reflects European cultural diversity. Voice interaction applications within Culturama including conversational AI guides, voice-activated content navigation, or audio description for accessibility require ASR and NMT component quality substantially exceeding general commercial thresholds given heritage sector accuracy requirements and minority language support obligations. These multilingual architecture requirements position language support as foundational platform capability requiring early investment in extensible language infrastructure supporting plugin-based language addition, curator validation workflows for translation quality assurance, terminology database management enabling heritage-specific vocabulary training, and continuous quality monitoring ensuring translation accuracy maintenance as content libraries expand and language models evolve. The finding that mobile displays prove superior to current-generation AR wearables for text-intensive translation applications informs interface design priorities: Culturama should optimise for smartphone and tablet compatibility as primary deployment targets whilst maintaining awareness of emerging AR hardware evolution that might eventually deliver comfortable extended-duration text reading when next-generation devices address current ergonomic and display limitations, avoiding premature commitment to immature wearable platforms whilst preserving architectural flexibility for future hardware integration when market readiness improves.