timezone |
---|
Europe/Berlin |
- 自我介绍
- Chloe,ETHPanda Core team,prev EIP Fun project lead
- 去年参加了第一期 EPF study group,对以太坊底层协议研发开始上瘾,去年10周 study group 的笔记也可以作为参考:https://hackmd.io/@chloezhux/epfsg_notes , 目前我对 protocol network/ light client 比较关注和感兴趣
- 底层协议的信息量巨大,前沿领域也在不断发展,需要一遍遍不断学习,so here I am~
- 我的 Twitter 和 Telegram
- 你认为你会完成本次残酷学习吗?
- 一定!
-
What's a P2P network
- Definition
- A decentralized communication model where nodes in the network can communicate directly with each other witout a central server
- Unlike traditional client/ server model, where a centralized authority manage all connection & data transfer, p2p network distribute workload and data among participants
- Key features of p2p network
- decentralized: no central server, nodes share data directly
- scalable: network grows as more nodes join
- fault tolerance: no single point of failure
- resource sharing: peers can share computing power, storage, or bandwidth
- Type of p2p network
- unstructured p2p
- nodes randomly connect (eg. Gnutella, Kazaa)
- structured p2p
- use algo to route data (eg. DHT in BitTorrent, Kademila)
- hybrid p2p
- mix of decentralized peers and some centralized componenets
- unstructured p2p
- Definition
-
What type of p2p is Ethereum and Bitcoin
- Bitcoin: mostly unstructured p2p with a gossip protocol for tx & block propagation
- Network structure
- bitcoin nodes randomly connect to other nodes
- tx and blocks are relayed to neighbours, which propagate them further
- nodes discover & main peer lists dynamically
- Data progagtion
- Uses flooding (gossip protocol) where each node forwards data to its connected peers
- Peer discovery
- use DNS seed nodes, hardcoded bootstrap nodes, and peer exchanges
- Network structure
- Ethereum: structured p2p with Kademlia DHT
- Network structure
- used a modified Kademlia DHT to structure peer discovery & routing
- nodes are identified by unique IDs and stored in tree-like structure for efficient lookup
- allow for faster peer discovery & data retrieval compared to Bitcoin
- Data propagation
- also use gossip protocol
- has additional subnetworks (devp2p, libp2p) for different types of data, eg. state sync, block propagation, tx relaying
- Peer discovery
- use a Kademlia DHT for peer lookup
- nodes maintain a routing table that organizes peers based on proximity in the DHT
- Network structure
Bitcoin Ethereum network type unstructred p2p structured p2p (kademlia DHT) node discovery random peer selection, DNS seed kademlia DHT for structured peer lookup data porpagation gossip-based (flooding) gossip-based + DHT routing efficiency redundant message forwarding more efficient lookup - Bitcoin: mostly unstructured p2p with a gossip protocol for tx & block propagation
- What's DHT and Kademlia DHT
- DHT (Distributed Hash Table)
- a decentralized system for storing & retrieving key-value pairs in a distributed network
- How it works
- Each node in the network store a portion of the key-value pairs
- Keys are hashed to produce a unique identifier -> determine which node is responsible for storing the cooresponding value
- When a node wants to retrive a value, it uses the DHT to locate the node responsible for that key
- Kademlia DHT
- a specific implementation of a DHT that is widely used in P2P networks, including Ethereum, BitTorrent, and IPFS. It was introduced in 2002 by Petar Maymounkov and David Mazières and is known for its efficiency, simplicity, and robustness
- Key features
- use a binary tree-based routing algo to locate nodes & data in O(logN) steps
- use XOR (excl. OR) to measure the distance btw nodes and keys
- send queries to multiple nodes simultaneously
- each node & key is assigned a unqiue 160-bit ID
- each node maintains a routing table (k-bucket) that stores info about other nodes in the network
- How it works
- Node ID assignment: each node is assigned a unique 160-bit ID, usually generated by hashing its IP address or public key
- Key-value storage: keys are also hashed to 160-bit ID; each key-value pair is stored on the node whose ID is closest to the key ID (bassed on XOR)
- Lookup process: a node sends a lookup request to the nodes in its routing table that are closest to the key's ID; These nodes respond with information about even closer nodes, and the process repeats until the closest node (responsible for the key) is found
- Routing table maintenance: nodes periodically update their routing tables by querying other nodes and exchanging information about peers
- Application
- Used in Ethereum, BitTorrent, IPFS
- Other types of DHT
- Chord (consistent hashing-based)
- use ring structure: node ID and keys are arranged in a circular space
- each node maintains a finger table pointing to nodes at exponentially increasing distance in the ring
- require more maintenance when nodes join/ leave, whereas Kademlia’s XOR-based buckets provide better resilience
- Pastry (prefiex-based routing)
- use prefix-matching for routing. Nodes and keys have numerical IDs, and nodes forward requests to peers whose ID shares the longest prefix with the target
- each node keeps a leaf set (close nodes) and a routing table for long-range hops
- need more state per node (bigger routing table) than Kademlia
- CAN (content addressable network)
- use a d-dimensional coordinate space, where each node owns a zone
- keys are mapped to coordinate points in this space, and nodes forward queries toward the target zone
- lookup complexity is O(d N^(1/d)), scalable with more dimensions
- less efficient than Kademlia for large networks because it requires more hops in high-dimensional spaces
- Why Kademlia is a better choice
- XOR-based distance, enables parallel lookups
- better fault tolerance: node cache more peer info, more resilient to churn
- efficient lookups: O(logN) hops with min maintenance overhead
- Chord (consistent hashing-based)
- DHT (Distributed Hash Table)
- What's Ethereum Protocol design in high level
- Design philo
- Simplicity, Universality, Modularity, Non-discrimination, Agility
- Main component
- EL: execution engine
- handle user tx and all state (addr, contract data)
- CL: implement pos mechanism
- ensure security and fault tolerance
- EL: execution engine
- Implementation & development
- Client: an implementation of the EL or CL
- Node: a computer running this client & connecting to the network; a node is a pair of EL and CL clients actively participating in the network
- Client diversity strategy
- Testing & security
- Different testing tools for state transition testing, fuzzing, shadow forks, RPC tests, client unit tests and CI/CD, etc.
- Coordination
- Design philo
- Protocol architecture
- Graph: https://epf.wiki/#/wiki/protocol/architecture
- What's user APIs and beacon APIs
- User API (aka JSON-RPC API)
- primary interface for interacting with the EL
- used by wallet, dapps etc.
- Key features
- JSON-RPC protocol: a lightweight remote procedure call (RPC) protocol, that allows clients to send & receive response in JSON format
- Common use: send tx, query blockchain data (eg. balance, contract states), deploy & interact with smart conracts, listen for events (logs emitted by smart contract)
- Endpoints: expose endpoints eg. eth_sendTransaction, eth_getBalance, eth_call, eth_getLogs
- Beacon API
- interface to interact with the beacon chain, which coordinate validators and achieve consensus
- Key features
- RESTful interface
- Common use: query info about the beacon chain (block headers, validator status), submmit attestations & block proposals from validators, monitor the status of the beacon network
- Endpoints: eg. /eth/v1/beacon/blocks (retrive beacon chain blocks), /eth/v1/validator/attestation (submit an attestation from a validator)
- Staking pools & monitor tools use the api to track validator performance and network health
- User API (aka JSON-RPC API)
- Issue with JSON RPC api
- Centralization
- rely on centralized infra provider (eg. infura, alchemy, quicknode) to access Ethereum nodes via json rpc. These service act as intermediaries, reducing the need for developers to run their own nodes
- barrier to run full nodes: it requires significiant resources (storage, bandwidth, computation power) to run a full node, so many devs opt for centralized service instread
- Scalability
- high load on nodes: can lead to performance bottlenecks and increased cost for node operators
- inefficient data retrieval: not optimized for querying large amounts of data, can result in slow response time and high latency
- Security
- json-rpc endpoints can expose sensitive info if not properly secured (eg. account balance, tx history)
- public json-rpc endpoints are often targeted by DDoS attacks
- by default json-rpc don't require authentication, making it easy for unauthorized user to access node data
- Lack of modern features
- No RESTful design
- limited tool: lack support for features like filtering, sorting etc.
- verbose and complex
- Potential Alternative/ Solution
- Decentralized node infra: eg. the Graph, EPNS
- Light clients and stateless: reduce the resource required for running nodes
- RESTful api: eg. Besu
- Improved json-rpc: add support for batch request, better error handling etc.
- Centralization
- Blockchain level protocol
- reference link: https://epf.wiki/#/wiki/protocol/design-rationale
- Accounts over UTXOs
- UTXO (unspent tx output)
- Account
- What's the pros/ cons of account vs UTXO? Why Ethereum chooses account-based model?
- Merkle patricia trie
- a modified MPT
- deterministic and cryptographically verifiable
- Verkle tree
- vector commitments allow for much smaller proofs (aka witness)
- RLP (recursive length prefix)
- SSZ (simple serialize)
- Hunt for finality: Casper FFG + LMD GHOST
- Discv5: the discovery protocol
- a kademlia based DHT to store ENR records
- ENR (ethereum node record) contain routing info to establish connections between peers