Responsive Menu

CRYSTAL-RTTH

Fall School

2025

The only intensive European school in speech analytics in conversational systems

Add to calendar

Gain insights

at Fall School

Join us to connect, learn, and be inspired by the forefront of conversational systems and Speech Analytics

Bilbao, Basque Country, Spain

November 11-14, 2025

This four-day school brings together the brightest minds in speech and conversational technologies, explore the latest trends, share ideas, and network.

Join us to gain insights from keynote speakers and discover cutting-edge advancements shaping the future of technology.

Keynote Speakers:

Christian Dugast

Apptek, Germany

Dayana Ribas

SmartVoice, BTS, España

David Thulke

Apptek, Germany

Jorge Andrés Gómez

Agilbiomech, Spain

Keynote Speakers:

Antonio Miguel Peinado

Universidad de Granada, España

Stephan Schlögl

Management Center Innsbruck, Austria

Anna Esposito

Università Luigi Vanvitelli

Javier Hernando

Universitat Politècnica de Catalunya-Barcelona Super Computing Center

Keynotes & workshops

Participate in theoretical keynotes and practical workshops, and discover cutting-edge advancements shaping the future of speech analytics in conversational systems.

3 minutes Thesis

Master's or PhD student interested in participating in the 3 Minute Thesis Presentations, please submit a title and abstract of your research project by November 2nd, 2025

Machine Learning

AI

Speech Analytics & Processing

Conversational Systems

Spoken dialogue

Practical Applications for Companies

Unleashing Innovation:

Interational School of Speech Analytics and Conversational Systems

Partners:

Venue

Bizkaia Aretoa

Abandoibarra Etorb., 3, Abando, 48009 Bilbao,

Join us this year at CRYSTAL - RTTH Fall School 2025

Secure your spot today! Early bird discounts available until November 2nd, 2025!

Add to calendar

Programme

11 NOV

09:30 - 10:00

Opening

Start of the Fall School.

11 NOV

10:00 - 11:30

Keynote - Assessment of voice and health disorders from the speech.

Jorge Andrés Gómez (CAR)

11 NOV

11:30 - 12:00

Coffee Break

Coffee and networking.

11 NOV

12:00 - 13:30

Workshop - Assessment of voice and health disorders from the speech.

Jorge Andrés Gómez (CAR)

11 NOV

13:30 - 15:30

Lunch - At Torre Iberdrola

11 NOV

15:30 - 17:00

Keynote - Towards Natural Realtime Voice Interaction with LLMs.

David Thulke (Apptek)

12 NOV

09:30 - 11:00

Keynote - Speech Analytics: Transforming Voice Data into Business Action.

Dayana Ribas (Smart Voice)

12 NOV

11:00 - 11:30

Coffee Break

Coffee and networking.

12 NOV

11:30 - 13:00

Keynote - Machines that Understand Us: Emotion Recognition and Behavioral AI in Mental Health.

Anna Esposito (UVA)

12 NOV

13:00 - 14:30

Lunch - At Torre Iberdrola

12 NOV

14:30 - 16:00

Keynote - Socio-technical Perspectives of Conversational Systems Design: Investigating Human-AI Interaction beyond Speed, Accuracy and Task Success.

Stephan Schlögl (MCI)

12 NOV

16:00 - 16:30

Coffee Break

Coffee and networking.

12 NOV

16:30 - 18:00

Three minutes thesis presentations

13 NOV

09:30 - 11:00

Keynote - Speech Technologies at Barcelona Supercomputing Center in the LLM era.

Javier Hernando (UPC_BSC)

13 NOV

11:00 - 11:30

Coffee Break

Coffee and networking.

13 NOV

11:30 - 13:00

Keynote - Verifying call center agent script-compliancy while correcting ASR errors.

Christian Dugast (Apptek)

13 NOV

13:00 - 14:00

Lunch - At Torre Iberdrola

13 NOV

14:00 - 21:00

Cultural Trip

Bizcay Marine Energy- Wave plant. Visit to Mutriku and Light Dinner.

14 NOV

09:30 - 11:00

Keynote Antonio Peinado (UGR)

Secure voice processing and communication.

14 NOV

11:00 - 11:30

Coffee Break

Coffee and networking.

14 NOV

11:30 - 13:30

Workshop Antonio Peinado (UGR)

Secure voice processing and communication.

14 NOV

13:30 - 14:00

Closing

Farewell and acknowledgments.

14 NOV

14:00 - 15:00

Lunch - At Torre Iberdrola

Keynote Speakers

7 Speaker Cards (3-3-1 Layout)

Keynotes & Sessions

Assessment of voice and health disorders from the speech.

In recent years, a research domain has emerged that seeks to advance an adapt speech technology for the analysis and evaluation of disordered speech. Within this context, acoustic analysis has established itself as a non-invasive and effective methodology for he objective assessment of vocal function.

Moreover, it constitutes a complementary approach to evaluation techniques based on direct visualization of the vocal folds through video endoscopy. The integration of these methods provides the foundation for the early detection and monitoring of not only voice disorders, but also neurological and respiratory conditions such as Parkinson’s disease, Alzheimer’s disease, and Obstructive Sleep Apnea, all of which also manifest in alterations of the phonatory process.

The application of acoustic and speech-based methodologies extends beyond clinical practice. Their relevance has also been demonstrated in forensic acoustics, in the assessment and preservation of vocal quality among professional voice users, and in the evaluation of extralinguistic factors such as stress and fatigue.

The purpose of this module is to present an overview of the state of the art in this field, with particular emphasis on those interpretable methodologies currently employed for the diagnosis and characterization of voice pathologies.

The presentation will review key contributions developed over the past decade at the Universidad Politécnica de Madrid, Spain, and will conclude with perspectives on emerging research directions.

Day 1

9:00-12:00

Antonio Peinado

Day 1

10:00 - 11:30

Jorge Andrés Gómez (CAR)

Towards Natural Realtime Voice Interaction with LLMs

Large language models (LLMs) excel at text-based reasoning and generation, but natural realtime voice interaction requires them to listen and speak.This talk surveys approaches to equipping LLMs with speech capabilities, covering both cascaded pipelines and integrated architectures.

I will discuss the requirements for realtime, robust, and natural voice interaction including accurate ASR, expressive TTS, low-latency processing, and support for full-duplex operation, where systems listen and respond simultaneously. On the dialogue side, I will explore how LLMs can support applications ranging from open-ended chit-chat to more constrained task-oriented scenarios that demand explicit state tracking and system-led interaction.

I will also touch on evaluation, highlighting how architecture and application context define the metrics we need. The talk combines an overview of recent literature with AppTek’s ongoing research, reflecting both opportunities and challenges on the path towards natural realtime voice interaction with LLMs.

Day 1

9:00-12:00

Christian Dugast

Day 1

15:30 - 17:00

David Thulke (Apptek)

Speech Analytics: Transforming Voice Data into Business Action

This lecture explores the journey of developing a conversational analytics technology that leverages speech and audio analytics to transform customer interactions into actionable insights.

We begin by examining how language, voice, and conversation shape perceptions, decisions, and customer satisfaction, drawing on evidence from cognitive science and behavioral research to highlight the powerful role of words and the way they are delivered.

We then present the technology behind conversational analysis, traditionally rooted in automatic speech recognition (ASR) and natural language processing (NLP), and extend it by incorporating information directly from the audio signal—capturing paralinguistic cues such as tone, pauses, overlaps, and emotional markers that reflect the affective dimension of conversations.

Building on this foundation, we share the real-world experience of implementing this methodology within a company setting, demonstrating its impact in contact centers, where speech analytics enables the identification of successful behavioral patterns, the detection of systemic service barriers, and automated quality monitoring—linking conversational dynamics directly to operational performance.

The lecture concludes with insights into the industrialization of a methodology originally conceived in academia, showing how to bridge the gap between research and business application, and culminating in the process of securing a patent to protect and scale this innovation.

Day 1

9:00-12:00

Dayana Ribas

Day 2

9:30 - 11:00

Dayana Ribas (SmartVoice, BTS)

Machines that Understand Us: Emotion Recognition and Behavioral AI in Mental Health.

Depression is a leading contributor to global mental health burden, and addressing it calls for intelligent systems capable of detecting behavioral signals that precede or accompany its onset. This contribution explores how Artificial Intelligence (AI) can support the development of autonomous, socially-aware systems by analyzing behavioral patterns that are indicative of mental illness.

Our work focuses on how individuals with depression differ from neurotypical populations in their ability to express, perceive, and interpret emotional cues, both in language and social interactions. These differences are treated as behavioral biomarkers which, when processed through AI models, can contribute to more accurate diagnoses, timely interventions, and deeper understanding of mental health dynamics.

Rather than relying on broad physiological signals, we prioritize the modeling of human behavior: how people speak, write, respond to emotionally charged content, and decode others’ emotions. These elements are critical to designing ICT systems that are not only intelligent but also emotionally credible and capable of interacting meaningfully with users.

The ultimate goal is to build AI-enhanced tools that assist in clinical decision-making, provide on-demand support, and promote social inclusion and psychological wellbeing

Day 1

9:00-12:00

Christian Dugast

Day 2

11:30 - 13:00

Anna Esposito (UVA)

Socio-technical Perspectives of Conversational Systems Design: Investigating Human-AI Interaction beyond Speed, Accuracy and Task Success

Recent advancements in Conversational User Interfaces (CUIs) have moved the technology form initially being considered a utilitarian task enabler to it becoming a companion-like artifact rooted in anthropomorphic behavior.

Consequently, CUI design imperatives are increasingly moving beyond traditional performance metrics such as speed, accuracy, and task success, so as to embrace more socio-technical dimensions governing successful human-AI coexistence and acceptance.

In this lecture I will thus discuss multi-disciplinary research across several independent studies, establishing an empirical foundation for designing socially competent AI. We will look into the findings of an expert-led investigation that resulted in a number of distinct characteristics of social intelligence required for next-generation agents.

Furthermore, we will explore the concept of authenticity, and talk about features such as transparent purpose, learning from experience, and conversational coherence

as vital requirements for fostering genuine user interactions and mitigating theperception of CUIs as mere work simplifiers. Importantly, we will also examine the user’s role and societal impact where empirical results demonstrate that individual personality traits - specifically the general propensity to trust and affinity for technology - significantly influence trust perception with CUIs and respective intelligent agents.

We will also address the ethical design challenge of gender, by looking at a study

that showed how the inclusion of a nonbinary voice can successfully disrupt negative gender stereotypes, even as it presents new challenges regarding user likability. Finally, we will explore human-chatbot interaction in high-disclosure settings, revealing counter-intuitive data on embodiment, where certain human-like avatars surprisingly facilitated, rather than inhibited, the breadth and depth of user self-disclosure.

Collectively, I hope these inputs will trigger some discussions on CUI development that prioritizes ethical design, trustworthiness, and socially intelligent interaction as potential core metrics for next-generation AI systems.

Day 1

9:00-12:00

Prof. Orozco

Day 2

14:30 - 16:00

Stephan Schlögl (MCI)

Speech Technologies at Barcelona Supercomputing Center in the LLM era.

The Language Technologies Laboratory at BSC aims to advance the field of natural language processing and Artificial Intelligence through cutting-edge research and development and use of high-performance computing.

The Lab has extensive experience in several NLP areas, such as massive language model building, machine translation, speech technologies, and unsupervised learning for under-resourced languages and domains. The Language Technologies Laboratory has developed a number of relevant open source resources that can be found in reference software and data repositories. In particular, the speech technologies team is responsible for developing machine learning model tasks, such as speech recognition, speech synthesis, speaker detection and any other task that supports speech related tasks.

Day 1

9:00-12:00

Prof. Orozco

Day 3

9:30-11:00h

Javier Hernando (UPC_BSC)

Verifying call center agent script-compliancy while correcting ASR errors

Call center providers need to verify if their agent comply to certain external or internal rules like disclaimers or “has the product name being mentioned”, “is the announced price correct” or “does the summary of the call contains all necessary information”.

We describe a 2-step production system that automatically i) identifies calls that are potentially not-compliant (partially or not-at-all) and ii) present these potentially non-compliant calls to human annotators for verification.

For each call, the automatic system receives as input its Automatic Speech Recognition (ASR) transcript (best word sequences), the related confusion matrix (alternative words) and a script that describes what needs to be compliant (word sequences).

The script is pre-processed and split in non-interruptible phrases (called snippets that typically are noun-phrases) and passed to the compliant system which processes the call in 4 passes:

i) Positioning and scoring snippet sequences in the call (ASR verbatim based)

ii) Rewording the set of best snippets (using the confusion matrix and the script to generate a corrected verbatim)

iii) Making use of an NLU component to transform the corrected verbatim into a semantic representation

iv) Positioning and rescoring snippet sequences based on semantic representation

At the end of the 4 passes a compliancy report is produced that can be analysed by a human being.

This production system runs under the following constraints:

i) Being able to correct ASR errors while keeping objectivity (no hallucination)

ii) Having a very high precision (near to zero false negatives)

iii) High recall (less than 1/3 false positives)

iv) Keeping computing costs reasonable (1/10 real time factor)

Day 1

9:00-12:00

Christian Dugast

Day 3

11:30 - 13:00

Christian Dugast

Secure voice processing and communication

As digitalization becomes increasingly prevalent in our society, the generation of spoofed and fake multimedia content may pose a serious threat to various aspects and activities.

Thus, false content may be employed for impersonation, disinformation, or fraudulent access to automated services, and can be forged in many different ways, from very simple signal manipulation to the application of recent machine learning techniques, which have considerably increased the plausibility and damaging capability of the generated content. This talk will focus on the detection of spoofed and fake speech signals.

Two different approaches have been proposed to detect these attacks. On one hand, passive solutions try to directly determine whether a given speech utterance is genuine or spoofed/fake by a deep analysis of the signal itself. This is the most common approach and has been extensively studied over the last decade, boosted by a number of challenges dealing with different types of spoofed or fake speech signals. On the other hand, proactive solutions require the collaboration of the digital content provider, who must watermark the speech signal in order to allow the detection of its synthetic origin, thus avoiding any malicious use of the generated speech. Also, we must not forget that the generation of fake contents follows the dynamics of a “cat-and-mouse game,” so it is very important to understand how attacks may be forged in order to be ready to combat them. In this talk, we will review how all these topics have evolved over the last years and study the most recent, state-of-the-art techniques. We will also go through a hands-on exercise to put some of the talk’s concepts into practice.

Day 1

9:00-12:00

Antonio Peinado

Day 4

9:30-11:00

Antonio Peinado

Registration

Students

250€

Welcome pack
In-person access to the event
Access to keynotes workshops
Coffee breaks and lunches
Social events

CRYSTAL / RTTH

Members

50€

Welcome pack
In-person access to the event
Access to keynotes & workshops
Coffee breaks and lunches
Social Events

Senior / Industry

300€

Welcome pack
In-person access to the event
Access to keynotes & workshops
Coffee breaks and lunches
Social events

Three Minute Thesis Presentations

The Master and PhD students participating in the Fall School are invited to demonstrate their ability to communicate complex ideas, breakthroughs, and research findings effectively in a limited timeframe just like an elevator pitch in the professional world.

Each participant will have 3 minutes to captivate the audience while maintaining clarity and engagement.

Why Participate?

🌐 Networking opportunities
Connect with fellow researchers and academics.

🧠 Research exposure
Learn from diverse perspectives and methods.

🤝 Idea exchange
Share feedback in a collaborative space.

💼 Communication skills
Practice concise and impactful presentations.

🏆 Win the Best Three Minute Thesis Award!

How to Participate?

If you're a Master's or PhD student interested in participating in the Three Minute Thesis Presentations, please submit a title and abstract of your research project by November 2nd, 2025 to:

📧 Send Abstract

Frequently asked questions

Answers to Common Questions About Our Event. Got questions?

What exactly is RTTH-CRYSTAL Fall School 2025?

It is a 4-day, in-person event all about conversational systems in the heart of BIlbao. It is THE event for PHD students, young researches, industry experts and anybody who wants to learn and be pioneer in this sector.

Where can I stay in BIlbao?

Bilbao offers a wide range of accommodation options to suit every preference and budget. From boutique hotels to modern apartments, you'll find plenty of choices throughout the city. The Bilbao tourism official website lists various options to help you find the perfect stay.

If you're looking to stay close to the event venue, consider the Guggenheim area, the Centro Azkuna area or the San Mamés area, all of which offer excellent hotel choices. However, Bilbao is a compact city, and most accommodations are within a 15 to 30-minute walk from the venue, ensuring convenient access no matter where you stay..

What do I need to prepare before joining?

Just bring an open mind and your desire to learn. You don’t need any experience, a product, or fancy equipment.

What if I have other questions?

We’re here to help! Feel free to reach out to our support team or message us on social and someone from our team will get back to you as soon as possible.

Email: [email protected]

CRYSTAL-RTTH

Fall School

2025

The only intensive European school in speech analytics in conversational systems

Gain insights

at Fall School

This four-day school brings together the brightest minds in speech and conversational technologies, explore the latest trends, share ideas, and network.

Keynote Speakers:

Keynote Speakers:

Keynotes & workshops

3 minutes Thesis

Interational School of Speech Analytics and Conversational Systems

Venue

Join us this year at CRYSTAL - RTTH Fall School 2025

Programme

Keynote Speakers

Dayana Ribas

Lead Scientist – SmartVoice, BTS

David Thulke

Lead Architect- Apptek

Christian Dugast

Scientist – AppTek

Antonio Miguel Peinado

Professor of Signal and Speech Processing at Universidad de Granada

Anna Esposito

Full Professor of Computer Science, Università della Campania "Luigi Vanvitelli

Jorge Andrés Gómez

CSIC Center for Automation and Robotics (CAR) - Spanish National Research Council

Stephan Schlögl

Professor of Human-Centered Computing at MCI The Entrepreneurial School

Javier Hernando

Professor Universitat Politècnica de Catalunya (UPC) - Head of Research on Speech Technologies BSC

Keynotes & Sessions

Assessment of voice and health disorders from the speech.

Towards Natural Realtime Voice Interaction with LLMs

Speech Analytics: Transforming Voice Data into Business Action

Machines that Understand Us: Emotion Recognition and Behavioral AI in Mental Health.

Socio-technical Perspectives of Conversational Systems Design: Investigating Human-AI Interaction beyond Speed, Accuracy and Task Success

Speech Technologies at Barcelona Supercomputing Center in the LLM era.

Verifying call center agent script-compliancy while correcting ASR errors

Secure voice processing and communication

Registration

Three Minute Thesis Presentations

Why Participate?

How to Participate?

Frequently asked questions

What exactly is RTTH-CRYSTAL Fall School 2025?

Where can I stay in BIlbao?

What do I need to prepare before joining?

What if I have other questions?

CRYSTAL-RTTH

Fall School

2025

The only intensive European school in speech analytics in conversational systems

Follow us on

2025 © All rights reserved