You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**PiFold: Toward effective and efficient protein inverse folding (Spotlight)**
6
+
[Zhangyang Gao](https://westlake-drug-discovery.github.io/zhangyang_gao.html), [Cheng Tan](https://chengtan9907.github.io/), [Stan Z. Li](https://scholar.google.com/citations?user=Y-nyLGIAAAAJ&hl). In [ICLR](https://openreview.net/forum?id=oMsN9TYwJ0j), 2023.
7
+
8
+
Note that we renamed **ProDesign** to **PiFold** to avoid naming conflicts with previous work.
9
+
2
10
3
-
The pre-print paper is available at [this link](https://github.com/A4Bio/ProDesign/blob/main/assets/ProDesign.pdf).
4
11
5
12
## 1. Introduction
6
-
How to design protein sequences folding into the desired structures effectively and efficiently? Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose ProDesign, which contains a novel residue featurizer and ProGNN layers to generate protein sequences in a one-shot way with improved recovery. Experiments show that ProDesign could achieve 51.66\% recovery on CATH 4.2, while the inference speed is 70 times faster than the autoregressive competitors. In addition, ProDesign achieves 58.72\% and 60.42\% recovery scores on TS50 and TS500, respectively. We conduct comprehensive ablation studies to reveal the role of different types of protein features and model designs, inspiring further simplification and improvement.
13
+
How can we design protein sequences folding into the desired structures effectively and efficiently? Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose PiFold, which contains a novel residue featurizer and PiGNN layers to generate protein sequences in a one-shot way with improved recovery. Experiments show that PiFold could achieve 51.66\% recovery on CATH 4.2, while the inference speed is 70 times faster than the autoregressive competitors. In addition, PiFold achieves 58.72\% and 60.42\% recovery scores on TS50 and TS500, respectively. We conduct comprehensive ablation studies to reveal the role of different types of protein features and model designs, inspiring further simplification and improvement.
7
14
8
15
<palign="center">
9
-
<imgsrc='./assets/acc_speed.png'width="600">
16
+
<imgsrc='./assets/acc_speed2.png'width="600">
10
17
</p>
11
18
12
19
## 2. Framework
13
-
We show the overall ProDesign framework. The inputs are protein structures, and outputs are protein sequences expected to fold into the input structures. We propose a novel residue featurizer and ProGNN layer to learn expressive residue representations. Specifically, the residue featurizer constructs comprehensive residue features and creates learnable virtual atoms to capture information complementary to real atoms. The ProGNN
14
-
considers multi-scale residue interactions in node, edge, and global context levels. ProDesign could generate protein sequences in a one-shot manner with a higher recovery than previous autoregressive or iterative models.
20
+
We show the overall PiFold framework. The inputs are protein structures, and outputs are protein sequences expected to fold into the input structures. We propose a novel residue featurizer and PiGNN layer to learn expressive residue representations. Specifically, the residue featurizer constructs comprehensive residue features and creates learnable virtual atoms to capture information complementary to real atoms. The PiGNN
21
+
considers multi-scale residue interactions in node, edge, and global context levels. PiFold could generate protein sequences in a one-shot manner with a higher recovery than previous autoregressive or iterative models.
15
22
16
23
<palign="center">
17
24
<imgsrc='./assets/framework.png'width="600">
@@ -21,14 +28,14 @@ considers multi-scale residue interactions in node, edge, and global context lev
21
28
We comprehensively evaluate different results on CATH, TS50 and TS500.
22
29
23
30
<palign="center">
24
-
<imgsrc='./assets/results_CATH.png'width="600">
31
+
<imgsrc='./assets/results_CATH2.png'width="600">
25
32
</p>
26
33
27
34
<palign="center">
28
35
<imgsrc='./assets/results_TS.png'width="600">
29
36
</p>
30
37
31
-
You can reproduce results of ProDesign on colab:
38
+
You can reproduce results of PiFold on colab:
32
39
33
40
<ahref="https://colab.research.google.com/drive/1HgXQCbsoK09mcVZmPgIWlCczY64l0iIX?usp=sharing"target="_parent"><imgsrc="https://colab.research.google.com/assets/colab-badge.svg"alt="Open In Colab"/></a>
34
41
@@ -43,6 +50,27 @@ For a given protein backbone design a new sequence that folds into that conforma
43
50
<ahref="https://colab.research.google.com/drive/1z6vpKA5L1iAmBLfREbmy8VNOtDYlkY4Q?usp=sharing"target="_parent"><imgsrc="https://colab.research.google.com/assets/colab-badge.svg"alt="Open In Colab"/></a>
0 commit comments