Large scale. Novel epilepsy genes are typically discovered through collaborative studies that combine information across various centers and research groups. However, there are also large-scale sequencing initiatives on a national level that include individuals with epilepsy. In a recent study published in Nature, a wide range of clinical phenotypes were assessed in an initial cohort of 34,000 individuals in the UK 100,000 Genomes Project. Let me dive into the associations related to epilepsy in this publication.

Figure 1. Burden analysis of rare variants in 34,000 individuals of the 100,000 Genomes Project before applying additional prioritization filters. Well-established epilepsy genes including SCN1A, KCNQ2, and DEPDC5 show strong associations, including associations through different analysis methods (duplicate dots for SCN1A). This QQ plot was generated from the Open Access Supplemental Data by Cipriani and collaborators. The insert compares data from the 100K Genomes Project to raw Epi25 data from the Epi25 browser.
100KGP. Despite the widespread availability of exome and genome sequencing, up to 80% of patients with suspected Mendelian conditions still do not receive a diagnosis for their disease, especially for conditions that typically are not assessed broadly in a diagnostic setting. Accordingly, identifying potential candidate genes for rare disorders in large sequencing projects may be an alternative method to discover genes causing rare disease. In a recent publication in Nature, Cipriani and colleagues pursued this approach and performed a gene burden analysis on the massive dataset of the 100,000 Genomes Project (100 KGP), analyzing 34,851 families across 226 rare diseases. In addition, the authors analyzed a wide range of variant classes, including predicted loss-of-function (LoF), pathogenic missense, constrained coding region, and de novo variants.
Filtering down. With such a massive search space and more than 13,000 significant associations, the authors applied very strict criteria to narrow gene-disease relationships. The authors only included diseases with at least five probands and required that each disease–gene–test combination has at least four individuals carrying relevant rare variants, including a minimum of two probands. Only gene burdens that were more frequent in cases than controls—indicating potential disease-causing variants—were considered for further analysis. Requiring a threshold of 0.5% FDR, this reduced the large number of significant findings to 306 unique disease–gene signals.
From gene signals to novel genes. In addition to limiting the analysis to novel genes, the authors furthermore applied exclusion criteria that removed (1) non-protein-coding genes, (2) genes with a gnomAD loss-of-function observed/expected (oe_lof) ratio ≥ 0.5 for autosomal dominant conditions, and (3) genes for which at least one case was already solved by a different gene. This left them with a small group of 25 genes, including 5 genes that provided ancillary information after manual review. For the epilepsies, a single gene remained: RBFOX3. Given the massive analysis, the number of variants providing evidence for RBFOX3 as a candidate gene for generalized epilepsies (the particular phenotype) might be surprising: RBFOX3 variants were present in 2/123 individuals with GGE but only 2/20805 controls. These two variants alone drove the association signal in the only novel epilepsy gene in the entire analysis.
Further evidence. Let’s take one step back and look at the epilepsy association in the 100 KGP dataset more generally (Figure 1). Several epilepsy phenotypes were included, and the rare variant association largely shows the genes that we would expect: namely SCN1A, DEPDC5, and several genes associated with the developmental and epileptic encephalopathies (DEE, KCNQ2 was significant for missense variants in conserved coding regions). When looking at the Epi25 browser, RBFOX3 was not formally included in the last analysis, but variant data is available. There is a small excess of rare variant in individuals with epilepsy with a p-value of slightly less than 0.05. This alone is not very convincing evidence but may make sense when including additional datasets.
RBFOX3. The RBFOX3 gene has hovered at the edge of epilepsy relevance for quite some time after initially being proposed as a candidate gene for Rolandic epilepsies alongside its paralog, RBFOX1. The RBFOX1 gene has since been validated a likely disease gene for complex neurodevelopmental disorders with de novo variants in several individuals. However, de novo variants in RBFOX3 have never been identified. Both genes are splicing regulators specifically targeting neuronal genes. Accordingly, mild changes in RBFOX3 function may affect a large number of downstream targets.
Mechanism. It is important to point out that RBFOX3 is not a highly constrained gene. There are large numbers of protein-truncating variants in the general population, so haploinsufficiency cannot be the disease mechanism. It is possible that RBFOX3 haploinsufficiency represents a risk factor for generalized epilepsies, but additional data would be required to convincingly state this case. For now, we are left with an intriguing association signal driven by a very small number of individuals, as is frequently observed when large-scale genome sequencing studies break down analysis findings to rare variants in rare disorders.
What you need to know. Large-scale sequencing efforts such as the 100,000 Genomes Project in the UK (100 KGP) significantly contribute to rare disease research by approaching the identification of candidate genes from a different angle than traditional cohort-based studies. These frameworks hold great promise to identify gene-disease associations in conditions that are commonly overlooked. In the first rare disease analysis of the 100 KGP, RBFOX3 was identified as a novel candidate gene for familial generalized epilepsy (GGE), reviving interest neuronal splicing regulators that were initially proposed as candidate genes more than a decade ago.