
Join us to connect, learn, and be inspired by the forefront of conversational systems and Speech Analytics
Join us to gain insights from keynote speakers and discover cutting-edge advancements shaping the future of technology.

Christian Dugast
Apptek, Germany

Dayana Ribas
SmartVoice, BTS, España

David Thulke
Apptek, Germany

Jorge Andrés Gómez
Agilbiomech, Spain

Antonio Miguel Peinado
Universidad de Granada, España

Stephan Schlögl
Management Center Innsbruck, Austria

Anna Esposito
Università Luigi Vanvitelli

Javier Hernando
Universitat Politècnica de Catalunya-Barcelona Super Computing Center
Participate in theoretical keynotes and practical workshops, and discover cutting-edge advancements shaping the future of speech analytics in conversational systems.
Master's or PhD student interested in participating in the 3 Minute Thesis Presentations, please submit a title and abstract of your research project by November 2nd, 2025
Machine Learning
AI
Speech Analytics & Processing
Conversational Systems
Spoken dialogue
Practical Applications for Companies
Leads the scientific and innovation strategy of SmartVoice, the conversational intelligence unit of BTS – Business Telecommunications Services, where she drives research, development, and innovation initiatives in conversational analytics using advanced artificial intelligence and audio processing technologies.
She holds a Ph.D. in Sciences (2016) and completed multiple postdoctoral appointments through international research initiatives. Author of numerous scientific publications and several patents, her expertise includes machine learning applied to speech and audio signals, robust speech enhancement, paralinguistic information analysis, e.g. emotion and pathological speech recognition, biometrics, and voice quality assessment.
In academia, she has taught undergraduate and graduate courses in Audiovisual Communications and Speech Technologies within Telecommunications Engineering programs. She continues her academic involvement as an associate researcher with the VIVOLAB group at the University of Zaragoza, where she contributes to research and technology transfer projects and supervises students. In 2023, she transitioned to the industry, taking on the leadership of R&D+i at BTS, where she combines solid scientific rigor with a clear orientation toward real-world impact, ensuring that voice—both as a channel and as data—becomes a strategic source of value for businesses and the digital ecosystem.
He works on large language models (LLMs) with a particular interest in speech. He is currently contributing to AppTek’s research on voice-enabled LLMs, collaborating with colleagues in speech recognition and text-to-speech. His broader research centers on advancing foundation models and applying them to diverse natural language understanding and generation tasks, with a strong emphasis on improving factuality in the context of retrieval-augmented generation.
He was the first author of ClimateGPT, a family of domain-specific LLMs for climate-related information retrieval and factual text generation. He is also completing his PhD at RWTH Aachen University in the Machine Learning and Human Language Technology Group under Prof. Hermann Ney, focusing on natural language understanding, pre-training of LLMs and retrieval-augmented generation, with earlier research experience in speech recognition from his master’s studies.
Christian received his Ph.D. degree in Computer Science from the University of Toulouse (France) in 1987. Having started his career as a research scientist in Automatic Speech Recognition (ASR), he was the first to propose an hybrid approach for ASR combining Neural Networks and Hidden Markov Models (IEEE paper 1994).
Christian had the honour to present the first commercially available continuous speech dictation product worldwide (Eurospeech ’93 in Berlin).
In 1995, he left research to enter the business world. Building up the European subsidiary of Nuance Communications, he gained extensive experience in introducing complex technologies to the market. In 2012, he went back into the research world to practice again on the technology side working in Karlsruhe with Prof Alex Waibel on Speech Translation, at the German Research Center for Artificial Intelligence with Prof Josef van Genabith on Machine Translation to finally come back in 2018 to Prof Hermann Ney (whom he worked with in the 1990s), working on Natural Language Understanding as Lead Architect for AppTek.
Antonio M. Peinado (M’95–SM’05) received the M.S. and Ph.D. degrees in Physics (electronics specialty) from the University of Granada, Granada, Spain, in 1987 and 1994, respectively.
In 1988, he worked with Inisel as a Quality Control Engineer. Since 1988, he has been with the University of Granada, where he has led and participated in multiple research projects related to signal processing and transmission.
In 1989, he was a Consultant with the Speech Research Department, AT&T Bell Labs, Murray Hill, NJ, USA, and, in 2018, a Visiting Scholar with the Language Technologies Institute of CMU, Pittsburgh, PA, USA. He has held the positions of an Associate Professor from 1996 to 2010 and a Full Professor since 2010 with the Department of Signal Theory, Networking and Communications, University of Granada, where he is currently Head of the research group on Signal Processing, Multimedia Transmission and Speech/Audio Technologies. He authored numerous publications in international journals and conferences, and has co-authored the book entitled Speech Recognition Over Digital Channels (Wiley, 2006).
His current research interests are focused on several speech technologies such as antispoofing for automatic speaker verification, deepfake detection, voice watermarking and speech enhancement. Prof. Peinado has been a reviewer for a number of international journals and conferences, an evaluator for project and grant proposals, and a Member of the technical program committee of several international conferences.
Anna Esposito received her “Laurea” Degree summa cum laude from Salerno University of Salerno with a thesis on Neural Networks (published on Complex System, 6(6), 507-517, 1992). She received the PhD degree in Applied Mathematics and Computer Science from Naples University “Federico II”, with a PhD thesis developed at MIT, RLE Lab (Boston, USA) on mathematical models of speech production (published on Phonetica, 59(4), 197-231, 2002). Anna has been postdoc at the International Institute for Advanced Scientific Studies, and Assistant Professor at Dept of Physics, Salerno University, where she has taught Laboratory of Cybernetics, Neural Networks, and Speech Processing (1996-2000). From Nov 2000 to Nov 2002 she holds a research professor position at Wright State University, Dept. of Computer Science and Engineering, OH, USA. Anna has been associate professor (from 2003 to 2019) and currently (from 2020 to date) is full professor in Artificial Intelligence, Cybernetics and Multimodal Communication at Università della Campania “L. Vanvitelli”. Her current teaching duties are on Cognitive and Algorithmic Issues of Multimodal Communication, Cognitive Economy, and Fundamentals of Artificial Intelligence and Neural Networks. She is the head of the Behaving Cognitive Systems laboratory (BeCogSys https://becogsys.com/becogsy). The lab participated to several H2020 (among those : a) Empathic, www.empathic-project.eu/ and b) Menhir, menhir-project.eu/ ), Italian (among those: c) SIROBOTICS, d) ANDROIDS, and SALICE), and Erasmus funded project (among those G-Guidance: https://g-guidance.eu/language/en/). The currently running projects are the Italian PRIN PNRR IRRESPECTIVE and AI PATTERNS and the H2020 project CRYSTAL (https://project-crystal.eu/)
Anna is member of the European Science Foundations (ESF) and the EU networks EUCogntion, Chair of the IAPR Conferences & Meetings Committee, https://iapr.org/committees/conferences-and-meetings-committee, and President of SIREN (https://www.siren-neural-net.it/ ). She authored 350+ peer reviewed publications and edited/co-edited 35+ international books.
Jorge Andrés Gómez received his PhD from the Technical University of Madrid (UPM), Spain, in 2018. He is currently a Ramón y Cajal Fellow at the Center for Automation and Robotics (CAR), a joint research center of the Spanish National Research Council (CSIC) and UPM.
He leads the research line on Artificial Intelligence for Neurodegenerative Disorders within the BioRobotics Group, where his work focuses on the development of computational methods to detect and quantify neurological pathologies. His research integrates diverse data modalities, including inertial sensors (IMU), electromyography (EMG), photogrammetry, and speech, to characterize motor and vocal impairments. He has also contributed to the design of diagnostic methodologies based on voice and speech analysis for conditions such as Parkinson’s disease, vocal fold polyps, and neurogenic disorders.
Dr. Gómez maintains strong collaborations with industry. He currently works with Agilbiomech, a CSIC spin-off, applying biomechanics, signal processing, and AI to improve post-stroke assessment. His broader research interests lie in AI-driven, multimodal, and longitudinal evaluation of motor symptoms in neurodegenerative diseases, with the goal of advancing clinical monitoring, early diagnosis, and personalized therapeutic strategies.
Prof. Dr. Stephan Schlögl holds an MSc in Human-Computer Interaction from University College London and a PhD in Computer Science from Trinity College Dublin. His main research focuses on natural language based human-computer interaction.
In his doctoral research, he investigated Wizard of Oz prototyping as a design and research instrument. This work was continued through a post-doctoral position at Institute Mines Télécom, Télécom ParisTech. There, Dr. Schlögl was involved in the development of an adaptive communication interface to support simple and efficient interaction with elderly users. His research usually includes the investigation of people interacting with different types of natural language user interfaces as well as the early-stage prototyping of these interfaces.
In November 2013, Dr. Schlögl joined the MCI as a faculty member. Next to his research, he now teaches courses in Human-Computer Interaction, Software Engineering, Business Analytics, Artificial Intelligence and Research Methods. He has also co-authored more than 100 international research papers and has been a member of various regional, national as well as European research projects such as AI4VET4AI (https://www.ai4vet4ai.eu/), EMPATHIC (https://cordis.europa.eu/project/id/769872) and CRYSTAL (https://project-crystal.eu/)
Javier Hernando received the M.S. and Ph.D.degrees in telecommunication engineering from the Technical University of Catalonia (UPC), Barcelona, Spain, in 1988 and 1993, respectively. Since 1988, he has been in the Department of Signal Theory and Communications, UPC, where he is currently a Full Professor and the Director of the Research Center for Language and Speech. During the academic year 2002–2003, he was a Visiting Researcher in the Panasonic Speech Technology Laboratory, Santa Barbara, CA, USA. He is currently Head of Research on Speech Technologies in the Barcelona Supercomputing Center (BSC). He has led the UPC team in several European, Spanish, and Catalan projects. His research interests include robust speech analysis, speech recognition and translation, speaker verification and localization, and multimodal large language models. He is the author or coauthor of about 200 publications in book chapters, review articles, and conference papers on these topics. He received the 1993 Extraordinary Ph.D. Award of UPC.
In recent years, a research domain has emerged that seeks to advance an adapt speech technology for the analysis and evaluation of disordered speech. Within this context, acoustic analysis has established itself as a non-invasive and effective methodology for he objective assessment of vocal function.
Moreover, it constitutes a complementary approach to evaluation techniques based on direct visualization of the vocal folds through video endoscopy. The integration of these methods provides the foundation for the early detection and monitoring of not only voice disorders, but also neurological and respiratory conditions such as Parkinson’s disease, Alzheimer’s disease, and Obstructive Sleep Apnea, all of which also manifest in alterations of the phonatory process.
The application of acoustic and speech-based methodologies extends beyond clinical practice. Their relevance has also been demonstrated in forensic acoustics, in the assessment and preservation of vocal quality among professional voice users, and in the evaluation of extralinguistic factors such as stress and fatigue.
The purpose of this module is to present an overview of the state of the art in this field, with particular emphasis on those interpretable methodologies currently employed for the diagnosis and characterization of voice pathologies.
The presentation will review key contributions developed over the past decade at the Universidad Politécnica de Madrid, Spain, and will conclude with perspectives on emerging research directions.
Day 1
9:00-12:00
Antonio Peinado
Day 1
10:00 - 11:30
Jorge Andrés Gómez (CAR)
Large language models (LLMs) excel at text-based reasoning and generation, but natural realtime voice interaction requires them to listen and speak.This talk surveys approaches to equipping LLMs with speech capabilities, covering both cascaded pipelines and integrated architectures.
I will discuss the requirements for realtime, robust, and natural voice interaction including accurate ASR, expressive TTS, low-latency processing, and support for full-duplex operation, where systems listen and respond simultaneously. On the dialogue side, I will explore how LLMs can support applications ranging from open-ended chit-chat to more constrained task-oriented scenarios that demand explicit state tracking and system-led interaction.
I will also touch on evaluation, highlighting how architecture and application context define the metrics we need. The talk combines an overview of recent literature with AppTek’s ongoing research, reflecting both opportunities and challenges on the path towards natural realtime voice interaction with LLMs.
Day 1
9:00-12:00
Christian Dugast
Day 1
15:30 - 17:00
David Thulke (Apptek)
This lecture explores the journey of developing a conversational analytics technology that leverages speech and audio analytics to transform customer interactions into actionable insights.
We begin by examining how language, voice, and conversation shape perceptions, decisions, and customer satisfaction, drawing on evidence from cognitive science and behavioral research to highlight the powerful role of words and the way they are delivered.
We then present the technology behind conversational analysis, traditionally rooted in automatic speech recognition (ASR) and natural language processing (NLP), and extend it by incorporating information directly from the audio signal—capturing paralinguistic cues such as tone, pauses, overlaps, and emotional markers that reflect the affective dimension of conversations.
Building on this foundation, we share the real-world experience of implementing this methodology within a company setting, demonstrating its impact in contact centers, where speech analytics enables the identification of successful behavioral patterns, the detection of systemic service barriers, and automated quality monitoring—linking conversational dynamics directly to operational performance.
The lecture concludes with insights into the industrialization of a methodology originally conceived in academia, showing how to bridge the gap between research and business application, and culminating in the process of securing a patent to protect and scale this innovation.
Day 1
9:00-12:00
Dayana Ribas
Day 2
9:30 - 11:00
Dayana Ribas (SmartVoice, BTS)
Depression is a leading contributor to global mental health burden, and addressing it calls for intelligent systems capable of detecting behavioral signals that precede or accompany its onset. This contribution explores how Artificial Intelligence (AI) can support the development of autonomous, socially-aware systems by analyzing behavioral patterns that are indicative of mental illness.
Our work focuses on how individuals with depression differ from neurotypical populations in their ability to express, perceive, and interpret emotional cues, both in language and social interactions. These differences are treated as behavioral biomarkers which, when processed through AI models, can contribute to more accurate diagnoses, timely interventions, and deeper understanding of mental health dynamics.
Rather than relying on broad physiological signals, we prioritize the modeling of human behavior: how people speak, write, respond to emotionally charged content, and decode others’ emotions. These elements are critical to designing ICT systems that are not only intelligent but also emotionally credible and capable of interacting meaningfully with users.
The ultimate goal is to build AI-enhanced tools that assist in clinical decision-making, provide on-demand support, and promote social inclusion and psychological wellbeing
Day 1
9:00-12:00
Christian Dugast
Day 2
11:30 - 13:00
Anna Esposito (UVA)
Recent advancements in Conversational User Interfaces (CUIs) have moved the technology form initially being considered a utilitarian task enabler to it becoming a companion-like artifact rooted in anthropomorphic behavior.
Consequently, CUI design imperatives are increasingly moving beyond traditional performance metrics such as speed, accuracy, and task success, so as to embrace more socio-technical dimensions governing successful human-AI coexistence and acceptance.
In this lecture I will thus discuss multi-disciplinary research across several independent studies, establishing an empirical foundation for designing socially competent AI. We will look into the findings of an expert-led investigation that resulted in a number of distinct characteristics of social intelligence required for next-generation agents.
Furthermore, we will explore the concept of authenticity, and talk about features such as transparent purpose, learning from experience, and conversational coherence
as vital requirements for fostering genuine user interactions and mitigating theperception of CUIs as mere work simplifiers. Importantly, we will also examine the user’s role and societal impact where empirical results demonstrate that individual personality traits - specifically the general propensity to trust and affinity for technology - significantly influence trust perception with CUIs and respective intelligent agents.
We will also address the ethical design challenge of gender, by looking at a study
that showed how the inclusion of a nonbinary voice can successfully disrupt negative gender stereotypes, even as it presents new challenges regarding user likability. Finally, we will explore human-chatbot interaction in high-disclosure settings, revealing counter-intuitive data on embodiment, where certain human-like avatars surprisingly facilitated, rather than inhibited, the breadth and depth of user self-disclosure.
Collectively, I hope these inputs will trigger some discussions on CUI development that prioritizes ethical design, trustworthiness, and socially intelligent interaction as potential core metrics for next-generation AI systems.
Day 1
9:00-12:00
Prof. Orozco
Day 2
14:30 - 16:00
Stephan Schlögl (MCI)
The Language Technologies Laboratory at BSC aims to advance the field of natural language processing and Artificial Intelligence through cutting-edge research and development and use of high-performance computing.
The Lab has extensive experience in several NLP areas, such as massive language model building, machine translation, speech technologies, and unsupervised learning for under-resourced languages and domains. The Language Technologies Laboratory has developed a number of relevant open source resources that can be found in reference software and data repositories. In particular, the speech technologies team is responsible for developing machine learning model tasks, such as speech recognition, speech synthesis, speaker detection and any other task that supports speech related tasks.
Day 1
9:00-12:00
Prof. Orozco
Day 3
9:30-11:00h
Javier Hernando (UPC_BSC)
Call center providers need to verify if their agent comply to certain external or internal rules like disclaimers or “has the product name being mentioned”, “is the announced price correct” or “does the summary of the call contains all necessary information”.
We describe a 2-step production system that automatically i) identifies calls that are potentially not-compliant (partially or not-at-all) and ii) present these potentially non-compliant calls to human annotators for verification.
For each call, the automatic system receives as input its Automatic Speech Recognition (ASR) transcript (best word sequences), the related confusion matrix (alternative words) and a script that describes what needs to be compliant (word sequences).
The script is pre-processed and split in non-interruptible phrases (called snippets that typically are noun-phrases) and passed to the compliant system which processes the call in 4 passes:
i) Positioning and scoring snippet sequences in the call (ASR verbatim based)
ii) Rewording the set of best snippets (using the confusion matrix and the script to generate a corrected verbatim)
iii) Making use of an NLU component to transform the corrected verbatim into a semantic representation
iv) Positioning and rescoring snippet sequences based on semantic representation
At the end of the 4 passes a compliancy report is produced that can be analysed by a human being.
This production system runs under the following constraints:
i) Being able to correct ASR errors while keeping objectivity (no hallucination)
ii) Having a very high precision (near to zero false negatives)
iii) High recall (less than 1/3 false positives)
iv) Keeping computing costs reasonable (1/10 real time factor)
Day 1
9:00-12:00
Christian Dugast
Day 3
11:30 - 13:00
Christian Dugast
As digitalization becomes increasingly prevalent in our society, the generation of spoofed and fake multimedia content may pose a serious threat to various aspects and activities.
Thus, false content may be employed for impersonation, disinformation, or fraudulent access to automated services, and can be forged in many different ways, from very simple signal manipulation to the application of recent machine learning techniques, which have considerably increased the plausibility and damaging capability of the generated content. This talk will focus on the detection of spoofed and fake speech signals.
Two different approaches have been proposed to detect these attacks. On one hand, passive solutions try to directly determine whether a given speech utterance is genuine or spoofed/fake by a deep analysis of the signal itself. This is the most common approach and has been extensively studied over the last decade, boosted by a number of challenges dealing with different types of spoofed or fake speech signals. On the other hand, proactive solutions require the collaboration of the digital content provider, who must watermark the speech signal in order to allow the detection of its synthetic origin, thus avoiding any malicious use of the generated speech. Also, we must not forget that the generation of fake contents follows the dynamics of a “cat-and-mouse game,” so it is very important to understand how attacks may be forged in order to be ready to combat them. In this talk, we will review how all these topics have evolved over the last years and study the most recent, state-of-the-art techniques. We will also go through a hands-on exercise to put some of the talk’s concepts into practice.
Day 1
9:00-12:00
Antonio Peinado
Day 4
9:30-11:00
Antonio Peinado
Students
250€
Welcome pack
In-person access to the event
Access to keynotes workshops
Coffee breaks and lunches
Social events
CRYSTAL / RTTH
Members
50€
Welcome pack
In-person access to the event
Access to keynotes & workshops
Coffee breaks and lunches
Social Events
Senior / Industry
300€
Welcome pack
In-person access to the event
Access to keynotes & workshops
Coffee breaks and lunches
Social events
The Master and PhD students participating in the Fall School are invited to demonstrate their ability to communicate complex ideas, breakthroughs, and research findings effectively in a limited timeframe just like an elevator pitch in the professional world.
Each participant will have 3 minutes to captivate the audience while maintaining clarity and engagement.
If you're a Master's or PhD student interested in participating in the Three Minute Thesis Presentations, please submit a title and abstract of your research project by November 2nd, 2025 to:
Answers to Common Questions About Our Event. Got questions?
It is a 4-day, in-person event all about conversational systems in the heart of BIlbao. It is THE event for PHD students, young researches, industry experts and anybody who wants to learn and be pioneer in this sector.
Bilbao offers a wide range of accommodation options to suit every preference and budget. From boutique hotels to modern apartments, you'll find plenty of choices throughout the city. The Bilbao tourism official website lists various options to help you find the perfect stay.
If you're looking to stay close to the event venue, consider the Guggenheim area, the Centro Azkuna area or the San Mamés area, all of which offer excellent hotel choices. However, Bilbao is a compact city, and most accommodations are within a 15 to 30-minute walk from the venue, ensuring convenient access no matter where you stay..
Just bring an open mind and your desire to learn. You don’t need any experience, a product, or fancy equipment.
We’re here to help! Feel free to reach out to our support team or message us on social and someone from our team will get back to you as soon as possible.
Email: [email protected]
