Futurism logo

Artificial intelligence predicts almost the entire "protein universe".

Protein universe

By Alee WhitakerPublished 2 years ago 4 min read
Like

DeepMind's collaboration with the European BioInformation Institute has unveiled a major leap forward in biology. They used the artificial intelligence (AI) system AlphaFold to predict 214 million protein structures in more than 1 million species, covering nearly all known proteins on Earth. The breakthrough will accelerate the development of new drugs and revolutionize basic science.

The AlphaFold tool has determined the structures of about 200 million proteins from nearly every known organism on Earth. The study was published in Nature.

From now on, determining the 3D shape of almost every protein known to science will be as simple as using a search engine.

Researchers have used AlphaFold, a revolutionary artificial intelligence (AI) network, to predict the structures of about 200 million proteins from 1 million species, covering nearly all known proteins on Earth. In essence, it covers the entire protein world.

The 3D shape, or structure, of a protein determines its function in the cell. Most drugs are designed using structural information, and an accurate map is often the first step in discovering how a protein works.

Computational biologist at University College London, who uses the AlphaFold database to identify new protein families, commented that researchers are gearing up for the release of this huge treasure trove.

AlphaFold's release last year caused a stir in the life science community, which has been scrambling to take advantage of the tool. The network makes highly accurate predictions about the 3D shape or structure of proteins. It also provides information about the accuracy of its predictions, so researchers know what to use as a tool. Traditionally, scientists have used time-consuming and expensive experimental methods such as X-ray crystallography and cryo-electron microscopy to solve protein structure problems.

According to EMBL-EBI, about 35 percent of the more than 214 million predictions are considered highly accurate, meaning they are the same as experimentally determined structures. Another 45 percent are considered confident enough to support many applications.

Many AlphaFold structures are sufficient to replace experimental structures for some applications. In other cases, researchers use AlphaFold predictions to validate and understand experimental data. Some of these are caused by an intrinsic disorder in the protein itself, which means that it has no definite shape, at least not in the presence of other molecules.

The 200 million predictions released today are based on sequences from another database called UNIPROT. Scientists may already have an idea of the shape of some of these proteins because they are covered by databases of experimental structures, or similar to other proteins in these repositories. But these entries tend to be biased toward human, mouse, and other mammalian proteins, so the AlphaFold dump may add important knowledge as it comes from more diverse organisms.

Since the AlphaFold software has been available for a year, researchers already have the ability to predict the structure of any protein they want. Many say providing predictions in a single database will save researchers time, money and trouble.

Having almost all known proteins in the database will also enable new types of research. Orengo's team has already used the AlphaFold database to identify novel protein families, which they will now do on a much larger scale. Her lab will also use the expanded database to understand the evolution of proteins with potent properties, such as the ability to consume plastic and those that can drive cancer. The identification of distant relatives of these proteins in databases can determine the basis of their properties.

Still, AlphaFold has room for improvement. How to develop models to predict how a protein folds, rather than just its final structure, is the next question the team will tackle, suggested the UCL doctor.

A year ago, the team made AlphaFold's source code and database freely available to researchers. At present, more than 500,000 scholars from 190 countries and regions have accessed the database. The data are already being used in malaria vaccine development, the fight against antibiotic resistance and plastic pollution, and are helping researchers accelerate the development of new drugs.

Now, the team is once again making its latest database freely available, with all of the more than 200 million protein structures available for download. This unprecedented wealth of data will help us explore the endless mysteries of life science, and provide great help to the research of biology and medicine.

artificial intelligence
Like

About the Creator

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Sign in to comment

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2024 Creatd, Inc. All Rights Reserved.