Skip to content

Commit e9cd8ee

Browse files
Merge pull request #448 from macrocosm-os/mainframe
mainframe
2 parents 6113633 + 5a42d8a commit e9cd8ee

File tree

8 files changed

+29
-186
lines changed

8 files changed

+29
-186
lines changed

README.md

+23-183
Original file line numberDiff line numberDiff line change
@@ -14,203 +14,43 @@
1414

1515
<div align="center">
1616

17-
# **Protein Folding Subnet 25** <!-- omit in toc -->
18-
[![Discord Chat](https://img.shields.io/discord/308323056592486420.svg)](https://discord.gg/bittensor)
19-
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
20-
21-
---
22-
23-
## The Incentivized Internet <!-- omit in toc -->
24-
25-
[Discord](https://discord.gg/bittensor)[Network](https://taostats.io/)[Research](https://bittensor.com/whitepaper)
26-
</div>
27-
28-
This repository is the official codebase for Bittensor Subnet Folding (SN25), which was registered on May 20th, 2024. To learn more about the Bittensor project and the underlying mechanics, [read here.](https://docs.bittensor.com/)
29-
30-
**IMPORTANT**: This repo has a functional **testnet 141** as of May 13th. You should be testing your miners here before launching on main.
31-
32-
---
33-
34-
<div align="center">
35-
<img src="./assets/protein_tao.png" alt="Alt generative-folding-tao">
36-
</div>
37-
38-
# Introduction
39-
The protein folding subnet is Bittensors’ first venture into academic use cases, built and maintained by [Macrocosmos AI](https://www.macrocosmos.ai). While the current subnet landscape consists of mainly AI and web-scraping protocols, we believe that it is important to highlight to the world how Bittensor is flexible enough to solve almost any problem.
40-
41-
This subnet is designed to produce valuable academic research in Bittensor. Researchers and universities can use this subnet to simulate almost any protein, on demand, for free. It is our hope that this subnet will empower researchers to conduct world-class research and publish in top journals while demonstrating that decentralized systems are an economic and efficient alternative to traditional approaches.
42-
43-
# Installation
44-
As a validator, you are **required** to have Weights and Biases (Wandb) active on your machine. We open-source our logging to the community, so this is a necessary component. The repo will not work without Wandb.
45-
46-
As a miner, this is an optional include. As such, we do not have logic for logging natively in the base miner, but can be easily added.
47-
48-
This repository requires python3.8 or higher. To install it, simply clone this repository and run the [install.sh](./install.sh) script. Below are all the steps needed to ensure that your machine is running properly:
49-
50-
Firstly, you must install conda:
51-
```bash
52-
mkdir -p ~/miniconda3
53-
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
54-
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
55-
rm ~/miniconda3/miniconda.sh
56-
57-
source ~/miniconda3/bin/activate
58-
conda init --all
59-
```
60-
61-
Install wandb:
62-
```bash
63-
pip install wandb
64-
wandb login
65-
```
66-
67-
We use a combination of `conda` and `poetry` to manage our environments. It is very important to create the environment with python 3.11 as this is necesssary for `bittensor` and `openmm`
68-
```bash
69-
git clone https://github.com/macrocosm-os/folding.git
70-
cd folding
71-
72-
conda create --name folding python=3.11
73-
bash install.sh
74-
```
75-
76-
Bash install will use poetry to build the environment correctly.
77-
78-
79-
# What is Protein Folding?
80-
81-
Proteins are the biological molecules that "do" things, they are the molecular machines of biochemistry. Enzymes that break down food, hemoglobin that carries oxygen in blood, and actin filaments that make muscles contract are all proteins. They are made from long chains of amino acids, and the sequence of these chains is the information that is stored in DNA. However, its a large step to go from a 2D chain of amino acids to a 3D structure capable of working.
82-
83-
The process of this 2D structure folding on itself into a stable, 3D shape is called **protein folding**. For the most part, this process happens naturally and the end structure is in a much lower free energy state than the string. Like a bag of legos though, it is not enough to just know the building blocks being used, its the way they're supposed to be put together that matters. *"Form defines function"* is a common phrase in biochemsitry, and it is the quest to determine form, and thus function of proteins, that makes this process so important to understand and simulate.
84-
85-
# Why is Folding a Good Subnet Idea?
86-
An ideal incentive mechanism defines an asymmetric workload between the validators and miners. The necessary proof of work (PoW) for the miners must require substantial effort and should be impossible to circumvent. On the other hand, the validation and rewarding process should benefit from some kind of privileged position or vantage point so that an objective score can be assigned without excess work. Put simply, **rewarding should be objective and adversarially robust**.
87-
88-
Protein folding is also a research topic that is of incredibly high value. Research groups all over the world dedicate their time to solving particular niches within this space. Providing a solution to attack this problem at scale is what Bittensor is meant to provide to the global community.
89-
90-
# Simulation Backend and Reproducibility
91-
Moleccular dynamics (MD) simulations require a physics-based engine to run them, and SN25 utilizes the open-source project [OpenMM](https://openmm.org). As their tagline suggests, they are a "high performance, customizable molecular simulation" package.
92-
93-
One of the key advantages of using OpenMM for MD-simulations is the built-in capabilities for *reproducability*. This is a key component in the reward stack and all miners should be intimately familiar with this. For more information, please read this [document](./documentation/reproducibility.md).
94-
95-
# Reward Mechanism
96-
Protein folding is a textbook example of this kind of asymmetry; the molecular dynamics simulation involves long and arduous calculations which apply the laws of physics to the system over and over again until an optimized configuration is obtained. There are no reasonable shortcuts.
97-
98-
While the process of simulation is exceedingly compute-intensive, the evaluation process is actually straightforward. **The reward given to the miners is based on the ‘energy’ of their protein configuration (or shape)**. The energy value compactly represents the overall quality of their result, and this value is precisely what is decreased over the course of a molecular dynamics simulation. The energy directly corresponds to the configuration of the structure, and can be computed in closed-form. The gif below illustrates the energy minimization over a short simulation procedure.
99-
100-
<div align="center">
101-
<img src="./assets/8emf_pdb_loss.gif" alt="Alt Folded-protein" width="500" height="350">
10217
</div>
10318

104-
When the simulations finally converge (ΔE/t < threshold), they produce the form of the proteins as they are observed in real physical contexts, and this form gives rise to their biological function. Thus, the miners provide utility by preparing ready-for-study proteins on demand. An example of such a protein is shown below.
10519

10620
<div align="center">
107-
<img src="./assets/8emf_pdb_protein.gif" alt="Alt Folded-protein" width="600" height="500">
21+
<img src="./assets/mainframe_official.png" alt="mainframe-official">
10822
</div>
10923

110-
# Running the Subnet
111-
## Requirements
112-
Protein folding utilizes an open-source package called [OpenMM](https://openmm.org). To run, you will need:
113-
1. A Linux-based machine
114-
2. At least 1 CUDA-compatible GPU. We recommend an RTX 4090.
115-
3. Conda Distribution (we recommend [Miniconda](https://docs.anaconda.com/miniconda/)). Using conda is an [OpenMM requirement](http://docs.openmm.org/latest/userguide/application/01_getting_started.html#installing-openmm).
116-
117-
For more information regarding recommended hardware specifications, look at [min_compute.yml](./min_compute.yml)
118-
119-
## Registering on Mainnet
120-
```
121-
btcli subnet register --netuid 25 --wallet.name <YOUR_COLDKEY> --wallet.hotkey <YOUR_HOTKEY>
122-
```
123-
124-
## Registering on Testnet
125-
Netuids that are larger than 99 must be set explicity when registering your hotkey. Use the following command:
126-
```
127-
btcli subnet register --netuid 141 --wallet.name <YOUR_COLDKEY> --wallet.hotkey <YOUR_HOTKEY>
128-
```
129-
130-
## Launch Commands
131-
### Validator
132-
There are many parameters that one can configure for a simulation. The base command-line args that are needed to run the validator are below.
133-
```bash
134-
python neurons/validator.py
135-
--netuid <25/141>
136-
--subtensor.network <finney/test>
137-
--wallet.name <your wallet> # Must be created using the bittensor-cli
138-
--wallet.hotkey <your hotkey> # Must be created using the bittensor-cli
139-
--axon.port <your axon port> #VERY IMPORTANT: set the port to be one of the open TCP ports on your machine
140-
```
141-
142-
As a validator, you should change these base parameters in `scripts/run_validator.py`.
143-
144-
For additional configuration, the following params are useful:
145-
```bash
146-
python neurons/validator.py
147-
--netuid <25/141>
148-
--subtensor.network <finney/test>
149-
--wallet.name <your wallet> # Must be created using the bittensor-cli
150-
--wallet.hotkey <your hotkey> # Must be created using the bittensor-cli
151-
--neuron.queue_size <number of pdb_ids to submit>
152-
--neuron.sample_size <number of miners per pdb_id>
153-
--protein.max_steps <number of steps for the simulation>
154-
--protein.input_source <database of proteins to choose from>
155-
--logging.debug # Run in debug mode, alternatively --logging.trace for trace mode
156-
--axon.port <your axon port> #VERY IMPORTANT: set the port to be one of the open TCP ports on your machine
157-
```
158-
159-
Validators are heavily recommended to run the autoprocess script to ensure that they are always up to date with the most recent version of folding. We have version tagging that will disable validators from setting weights if they are not on the correct version.
160-
```bash
161-
bash run_autoprocess.sh
162-
```
163-
164-
The following environment variables need to be set in your system or application environment (`.env` file):
165-
- `S3_REGION = "nyc3"`: The AWS region or S3-compatible region where the bucket is located.
166-
- `S3_ENDPOINT = "https://nyc3.digitaloceanspaces.com"`: The endpoint URL for your S3-compatible service.
167-
- `S3_KEY`: Your S3 access key ID.
168-
- `S3_SECRET`: Your S3 secret access key.
169-
see `.env.example` for an example of how to set these variables.
170-
171-
### Miner
172-
There are many parameters that one can configure for a simulation. The base command-line args that are needed to run the miner are below.
173-
```bash
174-
python neurons/miner.py
175-
--netuid <25/141>
176-
--subtensor.network <finney/test>
177-
--wallet.name <your wallet> # Must be created using the bittensor-cli
178-
--wallet.hotkey <your hotkey> # Must be created using the bittensor-cli
179-
--neuron.max_workers <number of processes to run on your machine>
180-
--axon.port <your axon port> #VERY IMPORTANT: set the port to be one of the open TCP ports on your machine
181-
```
182-
183-
Optionally, pm2 can be run for both the validator and the miner using our utility scripts found in pm2_configs.
184-
```bash
185-
pm2 start pm2_configs/miner.config.js
186-
```
187-
or
188-
```bash
189-
pm2 start pm2_configs/validator.config.js
190-
```
191-
Keep in mind that you will need to change the default parameters for either the [miner](./scripts/run_miner.sh) or the [validator](./scripts/run_validator.sh).
192-
193-
Miners now have the opportunity to interact with the global job pool (GJP) locally. By creating a read-only node via `start_read_node.sh`, miners sync with the GJP on their local machine in the `db` directory. We have provided a script `scripts/query_rqlite.py` that returns jobs based on their priority in the GJP, or returns a specific job specified by `pdb_id`. With this information, miners can experiment with customizing their job queue. This script can also be helpful for downloading and analyzing checkpoint files from other miners. Please see the updated environment variables in `.env.example` and specify your public IP address in the following fields: `RQLITE_HTTP_ADV_ADDR`,`RQLITE_RAFT_ADV_ADDR`.
194-
## How does the Subnet Work?
24+
# The Mission
25+
Mainframe asks a simple question: *could bittensor be used for generalized scientific compute?* We at Macrocosmos believe so.
19526

196-
In this subnet, validators create protein folding challenges for miners, who in turn run simulations using OpenMM to obtain stable protein configurations. At a high level, each role can be broken down into parts:
27+
Rational simulation-guided design of atomic systems has been a dream of researchers across the chemical sciences for decades. Enabling rapid and performant experimentation to experts would unlock massive potential to accelerate chemical science
19728

198-
### Validation
29+
For decentralized science to succeed, we need:
30+
1. Industry/Research collaboration
31+
2. Well defined metrics of success
19932

200-
1. Validator creates a `neuron.queue_size` number of proteins to fold.
201-
2. These proteins get distributed to a `neuron.sample_size` number of miners (ie: 1 PDB --> sample_size batch of miners).
202-
3. Validator is responsible for keeping track of `sample_size * queue_size` number of individual tasks it has distributed out.
203-
4. Validator queries and logs results for all jobs based on a timer, `neuron.update_interval`.
33+
## Our Principles
34+
- *fast*
35+
- *accurate*
36+
- *reliable* simulation
20437

205-
For more detailed information, look at [validation.md](./documentation/validation.md)
38+
## Our Focus
39+
- *molecular dynamics*
40+
- *material propery prediction*
41+
- *neural network potentials (NNPs)*
42+
- *model training*
20643

207-
### Mining
208-
Miners are expected to run many parallel processes, each executing an energy minimization routine for a particular `pdb_id`. The number of protein jobs a miner can handle is determined via the `config.neuron.max_workers` parameter.
44+
## Our Goals
45+
- academic papers with research groups
46+
- revenue generating pipelines
20947

210-
For detailed information, read [mining.md](./documentation/mining.md).
48+
## Our Accomplishments (last updated April 30th, 2025)
49+
1. the first desci subnet on bittensor ✅
50+
1. professional and revenue generatting collaboration with [Rowan Scientific](https://rowansci.com) to build DTF pipelines on sn25 ✅
21151

21252

213-
## License
53+
# License
21454

21555
This repository is licensed under the MIT License.
21656
```text

assets/mainframe_official.png

3.45 MB
Loading

documentation/mining.md renamed to documentation/molecular_dynamics/mining.md

+3
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ Once the connection to rqlite is made through the read node, a local snapshot of
2525

2626
Currently, the base miner uses the read node API to fetch job information from the global job pool, rather than using the local snapshot. This is to ensure you have the most up-to-date information.
2727

28+
## UPDATE:
29+
Miners now have the opportunity to interact with the global job pool (GJP) locally. By creating a read-only node via `start_read_node.sh`, miners sync with the GJP on their local machine in the `db` directory. We have provided a script `scripts/query_rqlite.py` that returns jobs based on their priority in the GJP, or returns a specific job specified by `pdb_id`. With this information, miners can experiment with customizing their job queue. This script can also be helpful for downloading and analyzing checkpoint files from other miners. Please see the updated environment variables in `.env.example` and specify your public IP address in the following fields: `RQLITE_HTTP_ADV_ADDR`,`RQLITE_RAFT_ADV_ADDR`.
30+
2831
## Running the base miner `FoldingMiner`
2932

3033
The base miner is launched in a `pm2` process with the following command: `pm2 start pm2_configs/miner.config.js`.

folding/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "2.6.0"
1+
__version__ = "3.0.0"
22
version_split = __version__.split(".")
33
__spec_version__ = (
44
(10000 * int(version_split[0]))

pyproject.toml

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[tool.poetry]
22
name = "folding"
3-
version = "2.6.0"
4-
description = "Macrocosmos Subnet 25: Folding"
3+
version = "3.0.0"
4+
description = "Macrocosmos Subnet 25: Mainframe"
55
authors = ["Brian McCrindle <[email protected]>", "Sergio Champoux <[email protected]>", "Szymon Fonau <[email protected]>"]
66

77
[tool.poetry.dependencies]

0 commit comments

Comments
 (0)