
"The whole of a work is to be understood from the individual words and their connections with each other, and yet the full comprehension of the individual parts presupposes comprehension of the whole." - Wilhelm Dilthey
Research
We study how 'Dilthey's dilemma' is implemented in biological evolution. The central principle of complex systems is that parts are required to comprehend the whole. Yet the whole is needed to make sense of the parts. In other words, the whole is greater than the sum of its parts. We study how the complexity of various interconnected biological systems evolves through the gain or loss of individual components in the course of evolution, yet maintaining the core function. To address these questions, we take an interdisciplinary approach, integrating concepts and computational tools from the fields of comparative genomics, structural biology, and metabolism.
The incredible diversity of life on Earth originates from a corresponding diversity of its various interconnected biomolecular systems.
Jump to...
Biological Systems studied in our Lab
Current Research Projects
Biological Systems studied in our Lab

A protein complex is a group of two or more proteins that physically associate to perform a specific biological function. The building blocks are individual proteins. They are interconnected through direct physical associations. These complexes function as integrated molecular machines.


We study the origin and evolution of transcriptional units across the Tree of Life.
Most eukaryotes exhibit monocistronic transcription: a single mRNA molecule is transcribed from a single gene and encodes for a single protein. This allows for the precise and independent regulation of individual genes.
In bacteria, the transcriptional units are operons: clusters of functionally related genes that are transcribed together as a single polycistronic mRNA molecule under the control of a single promoter. Hence, both the operon (at the DNA level) and its mature mRNA (at the RNA level) output are polycistronic.
Finally, Euglenozoa protists, Nematode worms, some Tunicates, and certain algae exhibit an unusual polycistronic transcription system, which is an intermediate to those described above. Multiple co-linear genes are transcribed under a single promoter into a single pre-mRNA molecule, which is then processed into individual, monocistronic mRNAs through co-transcriptional trans-splicing and polyadenylation. Hence, the polycistronic structure exists only at the DNA level but not at the RNA level.

A metabolic network is a complex web of all chemical reactions that occur within a cell, converting food into biomass and energy, and synthesizing all essential small-molecule metabolites for life. The building blocks are individual enzymes. They are interconnected through consecutive chemical reactions.
Current Research Projects
Accretionary Evolution of Protein Complexes

Protein complexes account for over half of all proteins encoded in any genome and exhibit a wide diversity of shape and size. Each complex is composed of a distinct set of subunits. A central unsolved problem in evolutionary biology is when and how these complexes arose and increased in complexity. Specifically, in which historical order did the genes of different subunits originate, and what were the molecular principles underlying their recruitment to an emerging complex?
In this project, we investigate how present-day multisubunit complexes originated from simpler ancestral assemblies by incrementally integrating new subunits. We refer to this as the accretionary model of protein complex evolution. Our overarching goal is to characterize the chronology of subunit origins for all known heteromeric complexes throughout eukaryotic evolution. We also aim to test whether that chronology is imprinted in the present-day structural organization of the complex.
To this end, we aim to compile a dataset of thousands of known protein complexes in human cells. This data will include thousands of human heteromeric complexes with annotated subunit compositions. For those with resolved 3D structures, we aim to quantify each subunit’s core-versus-periphery placement and pairwise stabilization between subunit pairs, thereby estimating each subunit’s contribution to the complex’s structural integrity.
Next, we will implement a systematic phylogenetic approach to reconstruct the subunit origin chronologies for each complex. We will characterize the genetic origin mechanism (de novo emergence, gene duplication, fission, fusion, or horizontal transfer) and emergence timing of each subunit's gene across eukaryotic evolution.
Mapping these reconstructed histories onto the complexes' structural organization will test a key hypothesis:
whether core subunits essential for the structural integrity of the complex are evolutionarily older and strongly conserved across species. Conversely, peripheral, partner-stabilized subunits arose recently and are frequently lost between species, as their absence leaves the rest of the complex intact.
Attritionary Evolution of Protein Complexes

Intracellular endosymbiosis, in which a unicellular symbiont lives within host cells, has profound implications in eukaryotic evolution (including the origins of mitochondria and chloroplasts) and continues to shape biodiversity, ecosystem stability, and organismal evolution. These associations are typically accompanied by reductive evolution of the endosymbiont’s genome. Within the host cell, functions once essential for autonomous life become dispensable, leading to widespread gene loss, erosion of noncoding DNA, and miniaturization of retained proteins. Despite extensive studies across multiple intracellular lineages, most accounts of reductive evolution focus on the loss of individual genes and thus lack a mechanistic understanding of how cells’ functional units, multimeric protein complexes, are simplified. Specifically, which complexes are simplified, which subunits are lost or miniaturized, and in what chronological order, and the underlying biological principles remain unknown.
We propose the attritionary model of protein complex evolution, which posits that the organization of different subunits in 3D space, how they stabilize each other, their functional roles, and the costs associated with their biosynthesis govern the simplification of multisubunit protein complexes.
To test this proposed model, we combine phylogenetics, multi-omics, and large-scale structural quantification across multiple independently evolved intracellular lineages. Specifically, we use sequence similarity to characterize multimeric complexes and their subunit compositions in the closest free-living relatives of endosymbionts. A phylogenetic approach then reconstructs per-complex simplification trajectories by mapping subunit gene loss and miniaturization events as intracellular endosymbionts diverged from their free-living relatives. We test whether complexes are simplified by preferentially pruning non-catalytic and/or non-scaffolding subunits, while catalytic or scaffolding subunits with high biosynthetic costs are retained, yet miniaturized by removing disordered regions and interdomain linkers rather than catalytic sites or interface patches. The existence of specific structural and functional constraints underlying complex simplification naturally predicts convergent simplification trajectories across independently evolved lineages, which we systematically examine for orthologous complexes.
Evolutionary Origins of Polycistronic Transcription

Many eukaryotes exhibit polycistronic transcription units (PTUs), in which multiple co-linear genes are co-transcribed into a single pre-mRNA molecule, which is then processed into individual, monocistronic mRNAs through co-transcriptional trans-splicing and polyadenylation.
Remarkably, polycistronic transcription has evolved multiple times in eukaryotic evolution, from ancestors with conventional monocistronic transcription. We call this process genome tramlining, by which individual genes lose their promoters and are progressively consolidated into PTUs. We ask three fundamental questions:
(i) Which genetic mechanisms facilitate genome tramlining?
(ii) What are the molecular principles that govern the order of genes within a PTU?
(iii) What are the molecular principles that govern the organization of PTUs within chromosomes?
To address these questions, we reconstruct ancestral gene orientations and examine how genes are gradually organized to form polycistronic transcription units. This enables (i) the reconstruction of the intermediate states of genome tramlining, and (ii) the identification of the underlying genetic mechanisms, e.g., chromosome fission/fusion and relocation (with or without inversion), inversion, loss, gain, or duplication of genes. To discover the principles underlying gene organization within PTUs, we examine how genes encoding subunits of the same protein complex or enzymes of the same metabolic pathway are organized among PTUs.
Origin of Parasitism through Metabolic Remodeling

Symbiotic relationships have shaped the ecology and evolution of countless species. They range from mutualism to commensalism to parasitism, and also from facultative to obligatory. Intracellular endosymbiosis is an extreme case in which a symbiont lives inside a host cell. These relationships have profoundly contributed to the origin of complex life by giving rise to mitochondria and chloroplasts, and almost always result in an extensive reduction of the endosymbiont’s genome accompanying its lifestyle shift. Within the host cell, many cellular functions once essential for autonomous life become dispensable, facilitating loss of the corresponding genomic regions in a process known as reductive evolution.
The origin of host association is often manifested in the endosymbionts' reduced metabolic capabilities. As genomes are miniaturized in reductive evolution, metabolic pathways involved in synthesizing essential amino acids, nucleotides, lipids, vitamins, and cofactors are often lost. As a result, the endosymbiont depends on its host for those essential nutrients.
Our goal is to identify the evolutionary trajectories of metabolism reduction. Specifically, do different endosymbiotic lineages become host-dependent through convergent loss of metabolic pathways, or does each lineage follow its own unique route toward parasitism? To address these questions, we integrate phylogenomics with Flux Balance Analysis (FBA).
Phylogenomics. Phylogenomics uses whole-genome data to reconstruct how species and their genes have evolved. By examining thousands of genes across related organisms, we can infer patterns of gene gain, loss, and duplication. This allows us to trace when parasites lost key metabolic genes as they diverged from their free-living ancestors and to map changes in their metabolic networks over evolutionary time.
Flux Balance Analysis (FBA). FBA is a computational method for simulating cellular metabolism. Using all known metabolic reactions of an organism, FBA predicts how metabolites flow through the network and whether the cell can produce all the components needed for growth. This allows us to estimate whether a metabolic network is capable of sustaining independent life.
We combine these methods in the following way:
(i) Reconstruct ancestral metabolic networks using genomic annotations of free-living relatives.
(ii) Use FBA to test whether these ancestral networks were self-sustaining, meaning they could produce biomass and all essential metabolites.
(iii) Introduce gene-loss events in the phylogenetic order in which they occurred.
(iii) Re-run FBA after each loss to determine whether the reduced network can still support autonomous growth.
(iv) Identify the failure point where the network can no longer sustain independent life—marking the origin of obligate host dependence.
These failure points will illuminate the metabolic basis of obligatory host dependency and enable us to pinpoint the extact sequence of events by which that originates. By comparing these sequences of events across linegaes, we will test for convergent versus unique routes of endosymbiosis.





























