Xaia's Scientific Validation

We believe that developers of AI systems should provide details to fully describe how their systems work and how data are managed. Here, we provide information about Xaia's data pipeline. For further information, please refer to our published paper in Nature Digital Medicine.

Xaia Data Pipeline

You do not need to provide the Xaia app with your personal information or health information. The app requests only a first name, or pseudonym, so Xaia knows how to address the individual. Xaia has been trained by psychotherapists in several different forms of therapy. Using content analysis, Xaia determines which form of therapy is optimal for the topic and injects the appropriate model. This process is dynamic; if the topic changes and/or the user expresses low interest to a particular form of therapy, Xaia can switch paradigms within a session. Conversations are then processed through a HIPAA-compliant server: audio is recorded, transcribed by speech-to-text AI, and responded to by large language model (LLM). The audio is then immediately and irrecoverably deleted, and the text is encrypted.

Next, the LLM output is sent through an “Appropriateness Classifier”—a stand-alone AI to detect any potentially inappropriate or unhelpful responses. If triggered, the LLM is again queried, and its response is again analyzed. Otherwise, the output is released to a text-to-speech AI, which finally, together with sentiment analysis meta-data controlling Xaia’s expressions, plays to the user. The system’s development involved iterative testing with therapists role-playing clinical scenarios, leading to continuous refinement of its psychotherapeutic communication. Finally, Xaia selects from a library of audiovisual effects to select a generated reality (GR) environment that ideally suits the topic of discussion.

If at any time the user expresses a suicidal ideation, then they are directed to seek crisis intervention and immediate support and are provided with information for emergency services. If the user raises medical issues outside the scope of talk therapy, Xaia is programmed to advise the user to seek care from a medical healthcare professional. For more information, please refer to our privacy policy and terms of use.

Training and Testing of Xaia

To train Xaia, we initially collected transcriptions of simulated patient-therapist interactions performed by an expert psychotherapist to improve the program’s adherence to the style and cadence of an experienced human therapist. From these, we discerned recurring exchanges and encoded these patterns into a system prompts for the large language model (LLM). For example, we added system prompts to reframe cognitive distortions, identify automatic negative thoughts, approach conversations without judgement, and avoid technical jargon or patronizing language, and manage high risk scenarios like suicidal ideations or domestic violence, among a list of over seventy other psychotherapy best practices.

Working with expert psychotherapist and experienced psychiatrists, we iteratively updated and refined the LLM prompts to optimize idealized responses of a compassionate, non-judgmental, and helpful therapist. The system was then systematically evaluated by licensed mental health professionals assuming the roles of patients across a wide range of clinical scenarios (e.g., discussing anxiety, depression, work-life balance, relationship issues, trust issues, post-traumatic stress, grief, self-compassion, emotional regulation, and social isolation, among other reasons people seek talk therapy). Their detailed feedback allowed further refinement and expansion of the system prompts and model responses.

Subsequently, we broadened our user base, enabling interactions with Xaia within a supervised environment. Each transcript was reviewed by a mental health expert, pinpointing areas of potential improvement. This iterative process, comprising prompt adjustment followed by evaluation and expert review, was repeated over a hundred times. We continued this cycle until feedback consistently indicated substantial improvement.

We then tested performance of Xaia with several hundred additional simulated dialogues with digital standardized patients (DSPs), each one an AI simulation of a human client with anxiety or depression. We examined all transcripts for evidence of inappropriate advice and measured tone with the Linguistic Inquiry and Word Count (LIWC-22) tool. We further examined whether LIWC-22 scores varied by race, ethnicity, gender, or income. We did not find evidence of variation when systematically varying sociodemographic features of the DSPs (data in submission).

After confirmatory testing, we invited consenting participants with anxiety or depression to engage Xaia in a single therapy session through an IRB-approved study at Cedars-Sinai. Participants reported that Xaia was considered acceptable, helpful, and safe. Full results are published in Nature Digital Medicine, with additional studies ongoing.