Responsive Menu

CRYSTAL-RTTH

Fall School

2025

The only intensive European school in speech analytics in conversational systems

Gain insights

at Fall School

Join us to connect, learn, and be inspired by the forefront of conversational systems.

This four-day school brings together the brightest minds in speech and conversational technologies, explore the latest trends, share ideas, and network.

Join us to gain insights from keynote speakers and discover cutting-edge advancements shaping the future of technology.

Keynote Speakers:

Christian Dugast

Apptek, Germany

Dayana Ribas

SmartVoice, BTS, España

Juan Rafael Orozco

Universidad de Antioquía, Colombia.

Antonio Miguel Peinado

Universidad de Granada, España

Programme

Open ceremony, Keynote speeches, Panel discussions, Networking sessions

Keynote speeches, Panel discussions, Networking sessions, Coffee breaks

Keynote speeches, Panel discussions, Networking sessions, Coffee breaks

Keynote speeches, Panel discussions, Networking sessions, Closing ceremony

Keynotes & workshops

Participate in theoretical keynotes and practical workshops, and discover cutting-edge advancements shaping the future of speech analytics in conversational systems.

Machine Learning

AI

Health

Speech Analytics & Processing

Conversational Systems

Spoken dialogue

Research & Innovations

Health

Unleashing Innovation:

International School of Conversational Systems

Partners:

LogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogo

Venue

Bizkaia Aretoa

Abandoibarra Etorb., 3, Abando, 48009 Bilbao,

Abandoibarra Etorb., 3, Abando, 48009 Bilbao, Bizkaia, Spain

Join us this year at CRYSTAL - RTTH Fall School 2025

Secure your spot today! Early bird discounts available until October 15, 2025!.

Programme

Keynote Speakers

11 NOV
09:30 - 10:00
Opening
Start of the Fall School.
11 NOV
10:00 - 11:30
Keynote - Juan Ignacio Godino
11 NOV
11:30 - 12:00
Coffee Break
Coffee and networking.
11 NOV
12:00 - 13:00
Workshop - Juan Ignacio Godino
11 NOV
13:30 - 15:30
Lunch - At Torres Iberdrola
11 NOV
15:30 - 17:00
Keynote - Stephan Schlögl
12 NOV
09:30 - 11:00
Keynote Dayana
12 NOV
11:00 - 11:30
Coffee Break
Break with coffee.
12 NOV
11:30 - 13:30
Workshop Dayana
12 NOV
13:30 - 15:30
Lunch - At Torres Iberdrola
12 NOV
15:30 - 17:00
Keynote David Thulke
13 NOV
09:00 - 10:30
Keynote Juan Rafael Orozco
13 NOV
10:30 - 11:00
Coffee Break
Coffee and networking.
13 NOV
11:00 - 12:30
Keynote Christian
13 NOV
12:30 - 14:00
Lunch - At Torre Iberdrola
13 NOV
14:00 - 18:00
Cultural Trip
Visit Mutriku.
14 NOV
09:30 - 11:00
Keynote Antonio Peinado
14 NOV
11:00 - 11:30
Coffee Break
Coffee and networking.
14 NOV
11:30 - 13:30
Workshop Antonio Peinado
14 NOV
13:30 - 14:00
Closing
Farewell and acknowledgments.
13NOV
9.00
Keynote Juan Rafael Orozco
Descripción pendiente o personalizable del evento.
13NOV
10.30
Coffee Break
Descripción pendiente o personalizable del evento.
13NOV
11.00
Keynote Christian
Descripción pendiente o personalizable del evento.
13NOV
12.30
Lunch - Torres Iberdrola
Descripción pendiente o personalizable del evento.
13NOV
14.00
Excursion
Descripción pendiente o personalizable del evento.
14NOV
9.30
Keynote Antonio Peinado
Descripción pendiente o personalizable del evento.
14NOV
11.00
Coffee Break
Descripción pendiente o personalizable del evento.
14NOV
11.30
Workshop Antonio Peinado
Descripción pendiente o personalizable del evento.
14NOV
13.30
Cierre
Descripción pendiente o personalizable del evento.
7 Speaker Cards (3-3-1 Layout)
Dayana Ribas

Dayana Ribas

Lead Scientist – SmartVoice, BTS

Leads the scientific and innovation strategy of SmartVoice, the conversational intelligence unit of BTS – Business Telecommunications Services, where she drives research, development, and innovation initiatives in conversational analytics using advanced artificial intelligence and audio processing technologies.

She holds a Ph.D. in Sciences (2016) and completed multiple postdoctoral appointments through international research initiatives. Author of numerous scientific publications and several patents, her expertise includes machine learning applied to speech and audio signals, robust speech enhancement, paralinguistic information analysis, e.g. emotion and pathological speech recognition, biometrics, and voice quality assessment.

In academia, she has taught undergraduate and graduate courses in Audiovisual Communications and Speech Technologies within Telecommunications Engineering programs. She continues her academic involvement as an associate researcher with the VIVOLAB group at the University of Zaragoza, where she contributes to research and technology transfer projects and supervises students. In 2023, she transitioned to the industry, taking on the leadership of R&D+i at BTS, where she combines solid scientific rigor with a clear orientation toward real-world impact, ensuring that voice—both as a channel and as data—becomes a strategic source of value for businesses and the digital ecosystem.

David Thulke

David Thulke

Scientist – AppTek

He works on large language models (LLMs) with a particular interest in speech. He is currently contributing to AppTek’s research on voice-enabled LLMs, collaborating with colleagues in speech recognition and text-to-speech. His broader research centers on advancing foundation models and applying them to diverse natural language understanding and generation tasks, with a strong emphasis on improving factuality in the context of retrieval-augmented generation.

He was the first author of ClimateGPT, a family of domain-specific LLMs for climate-related information retrieval and factual text generation. He is also completing his PhD at RWTH Aachen University in the Machine Learning and Human Language Technology Group under Prof. Hermann Ney, focusing on natural language understanding, pre-training of LLMs and retrieval-augmented generation, with earlier research experience in speech recognition from his master’s studies.

Christian Dugastr

Christian Dugast

Scientist – AppTek

Christian received his Ph.D. degree in Computer Science from the University of Toulouse (France) in 1987. Having started his career as a research scientist in Automatic Speech Recognition (ASR), he was the first to propose an hybrid approach for ASR combining Neural Networks and Hidden Markov Models (IEEE paper 1994).

Christian had the honour to present the first commercially available continuous speech dictation product worldwide (Eurospeech ’93 in Berlin).

In 1995, he left research to enter the business world. Building up the European subsidiary of Nuance Communications, he gained extensive experience in introducing complex technologies to the market. In 2012, he went back into the research world to practice again on the technology side working in Karlsruhe with Prof Alex Waibel on Speech Translation, at the German Research Center for Artificial Intelligence with Prof Josef van Genabith on Machine Translation to finally come back in 2018 to Prof Hermann Ney (whom he worked with in the 1990s), working on Natural Language Understanding as Lead Architect for AppTek.

Antonio Miguel Peinado

Antonio Miguel Peinado

PhD – Universidad de Granada

Antonio M. Peinado (M’95–SM’05) received the M.S. and Ph.D. degrees in Physics (electronics specialty) from the University of Granada, Granada, Spain, in 1987 and 1994, respectively.

In 1988, he worked with Inisel as a Quality Control Engineer. Since 1988, he has been with the University of Granada, where he has led and participated in multiple research projects related to signal processing and transmission.

In 1989, he was a Consultant with the Speech Research Department, AT&T Bell Labs, Murray Hill, NJ, USA, and, in 2018, a Visiting Scholar with the Language Technologies Institute of CMU, Pittsburgh, PA, USA. He has held the positions of an Associate Professor from 1996 to 2010 and a Full Professor since 2010 with the Department of Signal Theory, Networking and Communications, University of Granada, where he is currently Head of the research group on Signal Processing, Multimedia Transmission and Speech/Audio Technologies. He authored numerous publications in international journals and conferences, and has co-authored the book entitled Speech Recognition Over Digital Channels (Wiley, 2006).

His current research interests are focused on several speech technologies such as antispoofing for automatic speaker verification, deepfake detection, voice watermarking and speech enhancement. Prof. Peinado has been a reviewer for a number of international journals and conferences, an evaluator for project and grant proposals, and a Member of the technical program committee of several international conferences.

Juan Rafael Orozco Arroyave

Juan Rafael Orozco Arroyave

Professor – Universidad de Antioquía

Prof. Juan Rafael Orozco-Arroyave received the B.S. degree in Electronics Engineering from Universidad de Antioquia (UdeA) in 2004, after which he received a postgraduate in Marketing from EAFIT University.

In 2011 he finished his M.Sc in Telecommunications from UdeA, and in 2015 he received his Dr.-Ing. degree in Computer Science from the Friedrich-Alexander-Universität Erlangen-Nürnberg, (Erlangen, Germany) in a cotutelle program with the Faculty of Egineering at UdeA.

Currently he is a Full Professor at Universidad de Antioquia, leading GITA Lab and Adjunct Researcher at the Pattern Recognition Lab in the Friedrich-Alexander-Universität Erlangen-Nürnberg. His research interests include Speech Processing, Pattern Recognition, Multimodal Analysis, Digital Signal Processing, and Signals Theory.

Juan Ignacio Godino

Juan Ignacio Godino

Scientist – Universidad Politécnica de Madrid

Juan I. Godino-Llorente was born in Madrid, Spain, in 1969. He received the B.Sc. and M.Sc. degrees in Telecommunications Eng., and the PhD. degree in Computer Science in 1992, 1996 and 2002, respectively, all from Universidad Politécnica de Madrid (UPM), Spain. From 1996 to 2003 he was with the UPM as Ass. Professor at the Circuits and Systems Eng. Dept. From 2003 to 2005 he joined the Signal Theory and Communications Dept. at the University of Alcala. From 2005, he joined again UPM, being the Head of the Circuits and Systems Eng. Dept. from 2006 till 2010. Since 2011 he is Full Professor in the field of Signal Theory and Communications. In 2006, he won the associate professor position after a national qualifying competitive call with 130 candidates, in which was ranked 1st.

During the academic term 2003-2004, he was a Visiting Professor at Salford University, Manchester, UK; and in 2016, he has been a Visiting Researcher at the Massachusetts Institute of Technology, USA funded by a Fulbright grant. He has served as editor for the IEEE Journal of Selected Topics in Signal Processing, for the IEEE Trans. on Audio, Speech and Language Processing, for the Speech Communication Journal, and for the EURASIP Journal of Advances in Signal Processing. He has participated as invited speaker in several international advanced schools, and has delivered more than 20 invited speeches at different universities and events, including Harvard University, Johns Hopkins University, Tampere University and National University of Colombia.

He has chaired the 3rd Advanced Voice Function Assessment Workshop, and of the 1st and 2nd Automatic Assessment of Parkinsonian Speech Workshop. Likewise, since 2004, he is part of different panels of experts of the European Commission, and has been national coordinator of COST 2103, and COST CA24128 Actions. He is also expert evaluator of research proposals for the Spanish, Portuguese, Latvian, Polish, Israeli, Czech, Icelandic, Romanian, Belgian, and Norwegian research agencies.

He has published more than 80 papers in international journals included in the Science Citation Index and more than 50 in top ranked conferences. The international impact of his research activities is supported by the large number of publications in international journals, which have attracted more than 6000 citations (h-index=40). He has also led 10 competitive projects and 12 research projects financed by companies and public institutions. The previous work has been recognized through: BSc Thesis Extraordinary Award 1992; UPM Extraordinary PhD Award 2001/2002; 2004 Award for Research or Technological Development for Professors of the UPM; 2002 "SIDAR-Universal Access" Award; finalist of the 2009 "best paper award" of the IEEE Engineering in Medicine and Biology Conference; 2010 and 2018 best research paper award of the Spanish Excellence network on speech technology; finalist of the “2012 Best Demo award” of the Spanish Excellence network on speech technology; “2015 Entrepreneur award” of the IEEE Spain with the startup IngeVox, 2008. Moreover, he has been appointed Fulbright Scholar, senior member of the IEEE, ELLIS member, and honorary professor at National University of Colombia.

Stephan Schlögl

Stephan Schlögl

Prof. Dr. – University College London

Prof. Dr. Stephan Schlögl holds an MSc in Human-Computer Interaction from University College London and a PhD in Computer Science from Trinity College Dublin. His main research focuses on natural language based human-computer interaction.

In his doctoral research, he investigated Wizard of Oz prototyping as a design and research instrument. This work was continued through a post-doctoral position at Institute Mines Télécom, Télécom ParisTech. There, Dr. Schlögl was involved in the development of an adaptive communication interface to support simple and efficient interaction with elderly users. His research usually includes the investigation of people interacting with different types of natural language user interfaces as well as the early-stage prototyping of these interfaces.

In November 2013, Dr. Schlögl joined the MCI as a faculty member. Next to his research, he now teaches courses in Human-Computer Interaction, Software Engineering, Business Analytics, Artificial Intelligence and Research Methods. He has also co-authored more than 100 international research papers and has been a member of various regional, national as well as European research projects such as AI4VET4AI (https://www.ai4vet4ai.eu/), EMPATHIC (https://cordis.europa.eu/project/id/769872) and CRYSTAL (https://project-crystal.eu/)

"

Keynotes & Sessions

Verifying call center agent script-compliancy while correcting ASR errors

Call center providers need to verify if their agent comply to certain external or internal rules like disclaimers or “has the product name being mentioned”, “is the announced price correct” or “does the summary of the call contains all necessary information”.

We describe a 2-step production system that automatically i) identifies calls that are potentially not-compliant (partially or not-at-all) and ii) present these potentially non-compliant calls to human annotators for verification.

For each call, the automatic system receives as input its Automatic Speech Recognition (ASR) transcript (best word sequences), the related confusion matrix (alternative words) and a script that describes what needs to be compliant (word sequences).

The script is pre-processed and split in non-interruptible phrases (called snippets that typically are noun-phrases) and passed to the compliant system which processes the call in 4 passes:

i) Positioning and scoring snippet sequences in the call (ASR verbatim based)

ii) Rewording the set of best snippets (using the confusion matrix and the script to generate a corrected verbatim)

iii) Making use of an NLU component to transform the corrected verbatim into a semantic representation

iv) Positioning and rescoring snippet sequences based on semantic representation

At the end of the 4 passes a compliancy report is produced that can be analysed by a human being.

This production system runs under the following constraints:

i) Being able to correct ASR errors while keeping objectivity (no hallucination)

ii) Having a very high precision (near to zero false negatives)

iii) High recall (less than 1/3 false positives)

iv) Keeping computing costs reasonable (1/10 real time factor)

Day 1

9:00-12:00

Christian Dugast

Day 1

9:00-12:00

Christian Dugast

Secure voice processing and communication

As digitalization becomes increasingly prevalent in our society, the generation of spoofed and fake multimedia content may pose a serious threat to various aspects and activities.

Thus, false content may be employed for impersonation, disinformation, or fraudulent access to automated services, and can be forged in many different ways, from very simple signal manipulation to the application of recent machine learning techniques, which have considerably increased the plausibility and damaging capability of the generated content. This talk will focus on the detection of spoofed and fake speech signals.

Two different approaches have been proposed to detect these attacks. On one hand, passive solutions try to directly determine whether a given speech utterance is genuine or spoofed/fake by a deep analysis of the signal itself. This is the most common approach and has been extensively studied over the last decade, boosted by a number of challenges dealing with different types of spoofed or fake speech signals. On the other hand, proactive solutions require the collaboration of the digital content provider, who must watermark the speech signal in order to allow the detection of its synthetic origin, thus avoiding any malicious use of the generated speech. Also, we must not forget that the generation of fake contents follows the dynamics of a “cat-and-mouse game,” so it is very important to understand how attacks may be forged in order to be ready to combat them. In this talk, we will review how all these topics have evolved over the last years and study the most recent, state-of-the-art techniques. We will also go through a hands-on exercise to put some of the talk’s concepts into practice.

Day 1

9:00-12:00

Antonio Peinado

Day 1

9:00-12:00

Antonio Peinado

Multimodal analysis of Parkinson’s disease symptoms

Movement disorders (MD) are produced due to the impaired functioning of certain areas of the brain. Several millions of people suffer from MD such as Parkinson’s, Huntington’s, and others, however diagnosis and monitoring are still highly subjective, time-consuming, and expensive.

PD is the second most prevalent MD in the world and medical evaluations used to assess the neurological state of patients cover different aspects, including activities of daily living, motor tasks, speech production, and mood. Its multi symptomatic nature makes the diagnosis and monitoring of PD a very challenging task. Several bio-signals need to be modeled and the impact of the disease differs among patients which produces another level of complexity especially when thinking of producing clinically acceptable/practical results. This talk tries to show different approaches that consider several bio-signals (e.g., speech, language, gait, and handwriting) and methods of Pattern Recognition with the aim to find suitable models for PD diagnosis and monitoring. Results with classical feature extraction and classification methods will be presented together with experiments with CNN, LSTM, and attention-based architectures.

Day 1

9:00-12:00

Prof. Orozco

Day 1

9:00-12:00

Rafael Orozco-Arroyave

Towards Natural Realtime Voice Interaction with LLMs

Large language models (LLMs) excel at text-based reasoning and generation, but natural realtime voice interaction requires them to listen and speak.This talk surveys approaches to equipping LLMs with speech capabilities, covering both cascaded pipelines and integrated architectures.

I will discuss the requirements for realtime, robust, and natural voice interaction including accurate ASR, expressive TTS, low-latency processing, and support for full-duplex operation, where systems listen and respond simultaneously. On the dialogue side, I will explore how LLMs can support applications ranging from open-ended chit-chat to more constrained task-oriented scenarios that demand explicit state tracking and system-led interaction.

I will also touch on evaluation, highlighting how architecture and application context define the metrics we need. The talk combines an overview of recent literature with AppTek’s ongoing research, reflecting both opportunities and challenges on the path towards natural realtime voice interaction with LLMs.

Day 1

9:00-12:00

Christian Dugast

Day 1

9:00-12:00

David Thulke

Speech Analytics: Transforming Voice Data into Business Action

This lecture explores the journey of developing a conversational analytics technology that leverages speech and audio analytics to transform customer interactions into actionable insights.

We begin by examining how language, voice, and conversation shape perceptions, decisions, and customer satisfaction, drawing on evidence from cognitive science and behavioral research to highlight the powerful role of words and the way they are delivered.

We then present the technology behind conversational analysis, traditionally rooted in automatic speech recognition (ASR) and natural language processing (NLP), and extend it by incorporating information directly from the audio signal—capturing paralinguistic cues such as tone, pauses, overlaps, and emotional markers that reflect the affective dimension of conversations.

Building on this foundation, we share the real-world experience of implementing this methodology within a company setting, demonstrating its impact in contact centers, where speech analytics enables the identification of successful behavioral patterns, the detection of systemic service barriers, and automated quality monitoring—linking conversational dynamics directly to operational performance.

The lecture concludes with insights into the industrialization of a methodology originally conceived in academia, showing how to bridge the gap between research and business application, and culminating in the process of securing a patent to protect and scale this innovation.

Day 1

9:00-12:00

Dayana Ribas

Day 1

9:00-12:00

Dayana Ribas

Assessment of voice and health disorders from the speech.

In recent years, a research domain has emerged that seeks to advance an adapt speech technology for the analysis and evaluation of disordered speech. Within this context, acoustic analysis has established itself as a non-invasive and effective methodology for he objective assessment of vocal function.

Moreover, it constitutes a complementary approach to evaluation techniques based on direct visualization of the vocal folds through video endoscopy. The integration of these methods provides the foundation for the early detection and monitoring of not only voice disorders, but also neurological and respiratory conditions such as Parkinson’s disease, Alzheimer’s disease, and Obstructive Sleep Apnea, all of which also manifest in alterations of the phonatory process.

The application of acoustic and speech-based methodologies extends beyond clinical practice. Their relevance has also been demonstrated in forensic acoustics, in the assessment and preservation of vocal quality among professional voice users, and in the evaluation of extralinguistic factors such as stress and fatigue.

The purpose of this module is to present an overview of the state of the art in this field, with particular emphasis on those interpretable methodologies currently employed for the diagnosis and characterization of voice pathologies.

The presentation will review key contributions developed over the past decade at the Universidad Politécnica de Madrid, Spain, and will conclude with perspectives on emerging research directions.

Day 1

9:00-12:00

Antonio Peinado

Day 1

9:00-12:00

Juan Godino

LogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogoLogo

Registration

Students

200€

250€

PRICE INCREASES ON 02/11!

  • Welcome pack

  • In-person access to the event

  • Access to keynotes workshops

  • Coffee breaks and lunches

  • Social events

CRYSTAL / RTTH

Members

0€

50€

PRICE INCREASES ON 02/11!

  • Welcome pack

  • In-person access to the event

  • Access to keynotes & workshops

  • Coffee breaks and lunches

  • Social Events

Senior / Industry

250€

300€

PRICE INCREASES ON 02/11!

  • Welcome pack

  • In-person access to the event

  • Access to keynotes & workshops

  • Coffee breaks and lunches

  • Social events

Frequently asked questions

Answers to Common Questions About Our Event. Got questions?

What exactly is RTTH-CRYSTAL Fall School 2025?

It is a 4-day, in-person event all about conversational systems in the heart of BIlbao. It is THE event for PHD students, young researches, industry experts and anybody who wants to learn and be pioneer in this sector.

Where can I stay in BIlbao?

Bilbao offers a wide range of accommodation options to suit every preference and budget. From boutique hotels to modern apartments, you'll find plenty of choices throughout the city. The Bilbao tourism official website lists various options to help you find the perfect stay.

If you're looking to stay close to the event venue, consider the Guggenheim area, the Centro Azkuna area or the San Mamés area, all of which offer excellent hotel choices. However, Bilbao is a compact city, and most accommodations are within a 15 to 30-minute walk from the venue, ensuring convenient access no matter where you stay..

What do I need to prepare before joining?

Just bring an open mind and your desire to learn. You don’t need any experience, a product, or fancy equipment.

What if I have other questions?

We’re here to help! Feel free to reach out to our support team or message us on social and someone from our team will get back to you as soon as possible.

Email: [email protected]

CRYSTAL-RTTH

Fall School

2025

The only intensive European school in speech analytics in conversational systems

Follow us on

2025 © All rights reserved