top of page

ESM3: A Powerful Protein Language Model Simulating 500 Million Years of Evolution

Simulating 500 Million Years of Evolution
Simulating 500 Million Years of Evolution

Researchers from EvolutionaryScale have developed ESM3, a groundbreaking protein language model that can simulate hundreds of millions of years of protein evolution. This powerful AI system demonstrates an unprecedented ability to generate functional proteins that are vastly different from any known natural proteins, while maintaining their intended structure and function.

ESM3 represents a major advancement in protein language modeling. Unlike previous models that focused solely on protein sequences, ESM3 incorporates multiple modalities - including sequence, structure, and function - to create a more comprehensive understanding of proteins. This multimodal approach allows the model to reason about proteins in ways that more closely mimic natural evolutionary processes.

One of the most striking demonstrations of ESM3's capabilities is its generation of a novel green fluorescent protein (GFP) named esmGFP. This protein is functional and bright, yet shares only 58% sequence identity with its closest known relative. In evolutionary terms, this represents a divergence equivalent to over 500 million years of natural evolution. The ability to generate such a distant yet functional protein showcases ESM3's potential to explore vast, unexplored regions of the protein design space.

The researchers also demonstrated ESM3's versatility in protein design tasks. The model can follow complex prompts combining different aspects of protein properties, allowing for highly controlled and creative protein design. For example, ESM3 was able to generate proteins with specific structural motifs placed into entirely new structural contexts, or redesign existing proteins with new properties while maintaining critical functional elements.

ESM3's architecture builds upon previous protein language models but incorporates several key innovations. It uses a tokenization scheme for protein structures and a novel "geometric attention" mechanism that allows the model to reason effectively about 3D protein structures. The model also incorporates function annotations, allowing it to link sequence and structure to protein function in a more direct way than previous models.

The implications of this research are far-reaching. ESM3 opens up new possibilities for protein engineering and drug design by allowing researchers to explore protein designs that are radically different from anything found in nature. This could lead to the development of new enzymes, therapeutics, and biomaterials with properties that were previously unattainable.

However, the researchers also acknowledge the potential risks associated with such powerful protein design capabilities. They have implemented safeguards in the publicly released version of the model, ESM3-open, to mitigate potential misuse while still allowing for broad scientific applications.

ESM3 represents a significant leap forward in our ability to model and design proteins. By simulating evolutionary processes at an unprecedented scale and fidelity, it promises to accelerate scientific discovery and open up new frontiers in biotechnology and medicine.





Thanks for subscribing!

bottom of page