Moving AI in Epilepsy Beyond the Buzzwords

Signals. Two weeks ago, I participated in this year’s AI in Epilepsy Conference. The meeting brought together a field that is moving quickly, but not always at the same pace. I presented our computational approaches for real world data and natural history studies. However, in many areas of epilepsy, AI is already more advanced than what we currently use in genetics and rare disease research. Therefore, I was mainly there to learn. Here are three aspects of AI in medicine that I have thought about more since returning from the conference.

Figure 1. Visual impression from the 4th International Conference on Artificial Intelligence in Epilepsy and Neurological Disorders (San Juan, Puerto Rico, March 16–19, 2026). The meeting brought together clinicians, neuroscientists, and computational researchers to discuss advances in AI across epilepsy care, spanning video-based seizure detection, real-world data analysis, and clinical implementation. The conference also featured a video-based seizure detection challenge focused on identifying infantile spasms from short recordings using pose estimation data, highlighting both the promise and current limitations of AI approaches in epilepsy.

Figure 1. Visual impression from the 4th International Conference on Artificial Intelligence in Epilepsy and Neurological Disorders (San Juan, Puerto Rico, March 16–19, 2026). The meeting brought together clinicians, neuroscientists, and computational researchers to discuss advances in AI across epilepsy care, spanning video-based seizure detection, real-world data analysis, and clinical implementation. The conference also featured a video-based seizure detection challenge focused on identifying infantile spasms from short recordings using pose estimation data, highlighting both the promise and current limitations of AI approaches in epilepsy.

1 – Reinforcement Learning and The Bitter Lesson. The tone was set early in the meeting during the keynote by Peter Stone, a computer scientist known for his work on reinforcement learning, robotics, and autonomous agents. While his presentation focused on how interaction with humans can help models learn faster, he also referred to what Richard S. Sutton described as the “Bitter Lesson.” In a widely read 2019 essay, Sutton argued that many of the biggest advances in artificial intelligence have not come from systems built on human expertise and designed rules, but from approaches that learn from large amounts of data and computation. Methods that scale often outperform methods that rely on what we think should actually matter. This was visible in the AI field in the transition from feature-based approaches to deep learning, and now to foundation models. The implication is somewhat uncomfortable. Our expert intuitions about what matters may not be the key factor in developing AI tools in rare diseases. Instead, the ability to aggregate data, train at scale, and generalize becomes decisive.

This also reframes how we think about expertise in rare disease. If progress comes from scale rather than structure, the role of domain knowledge changes. Here, domain knowledge refers to the clinical understanding we have built over decades. With more complex model, our knowledge does not disappear but moves “upstream”. Expert knowledge helps frame questions and define what counts as signal rather than noise. However, at the same time, models trained on large datasets become less interpretable. This tension was present throughout the meeting.

2 – From Models to Foundation Models. This year’s meeting included a competition focused on analyzing videos of infantile spasms, following up on a 2025 paper in npj Digital Medicine. I wanted to highlight this as video analysis provides a useful example of how model complexity has evolved. Early methods relied on skeleton-based representations, extracting joint positions and reducing the problem to a relatively simple structure. This was followed by spatiotemporal models that incorporate motion and context over time. More recently, foundation model approaches learn directly from large-scale video data without relying on features that were defined earlier. These models are less limited by what is initially assumed, but harder to interpret. With highly powered models available, the question is no longer only whether a model can detect a seizure, but how it arrives at that conclusion, and whether this reasoning can actually be trusted. For example, if a model differentiates between infantile spasms and non-epileptic movements, the features driving this distinction may potentially be unrelated to the infant’s movements.

The progression from simple to more complex models also highlights new limitations with respect to performance. Earlier approaches were limited by how well we could define relevant features. In contrast, newer models are limited by data availability, computational requirements, and the ability to be used in different settings. In epilepsy, and particularly in rare genetic epilepsies where data is fragmented and heterogeneous, this becomes the central challenge.

3 – The implementation gap. Several speakers at this year’s AI conference returned to the same point. AI in medicine is no longer primarily a technology problem, but a translation problem. Building models is the visible part, but what follows is often more important: clinical validation in the real world, alignment with regulatory expectations, and integration into clinical workflows. This also includes the “human factor”, which can undermine even strong models if they are not adopted. Without this, even the most elegant model remains a demonstration. The field of cardiology provides a useful example. In an influential 2019 study on AI analysis of electrocardiograms, a deep learning model identified patients with otherwise unrecognized cardiac dysfunction. What made this work stand out was not only performance, but the path toward prospective validation and clinical use. This is the difference between an algorithm that performs and one that matters.

At the same time, several analyses have highlighted a clear gap in implementation. We are producing more models than we can integrate into clinical workflows. In a systematic review of diagnostic AI systems by Eric Topol in 2019, many studies reported performance exceeding human experts, but only a small fraction were tested prospectively or evaluated in real clinical settings. Another widely read perspective argued that the central challenge is no longer developing new algorithms, but integration, trust, and workflow redesign. Taken together, AI in medicine has outpaced its ability to translate. We are not waiting for better models. We are waiting for the ones we already have to cross into clinical care. For rare genetic epilepsies, we are often focused on models that predict diagnoses or outcomes, such as our 2024 work on EMR-based prediction of genetic etiologies. The experience from cardiology suggests that thinking about implementation early is critical.

What you need to know. The state of AI in epilepsy mirrors a broader trend in medicine: the field is not held back by a lack of algorithms, but by the gap between development and use. We have moved from handcrafted models to systems that learn directly from data at scale. Progress now depends on validation, regulatory alignment, and clinical integration. The central challenge is no longer whether AI can work, but how it can be implemented in clinical practice.

In rare disease, and particularly in rare genetic epilepsies, we are still at an earlier stage. In many cases, we do not yet have an implementation problem, because we are still developing the models and assembling the data needed to make them useful. However, perspectives from other areas of medicine are helpful. They show what comes next, and where the real challenges will emerge once these approaches begin to mature.

Ingo Helbig is a child neurologist and epilepsy genetics researcher working at the Children’s Hospital of Philadelphia (CHOP), USA.