金融决策思维链数据集构建框架 / Financial Decision-Making Chain Dataset Construction Framework

项目概述 / Project Overview

本项目设计并实现了一个创新的金融决策思维链数据集构建框架，融合了多Agent系统、强化学习（RL）和大型语言模型（LLM），实现了无监督的高效思维链数据集生成。该框架旨在提升金融决策过程的多样性、深度和结构化水平，为金融分析和决策支持提供高质量的数据基础。

This project designs and implements an innovative Financial Decision-Making Chain Dataset Construction Framework, integrating multi-agent systems, Reinforcement Learning (RL), and Large Language Models (LLM) to achieve efficient, unsupervised generation of decision-making chain datasets. The framework aims to enhance the diversity, depth, and structuring of financial decision-making processes, providing high-quality data foundations for financial analysis and decision support.

主要功能 / Key Features

A. 多样化Agent生态系统 / A. Diverse Agent Ecosystem

设计与训练多样化Agents：构建并训练了超过400种具有不同风格和目标的智能Agent，每个Agent具备独特的决策逻辑和行为模式。

Design and Train Diverse Agents: Developed and trained over 400 intelligent Agents with varying styles and objectives, each possessing unique decision-making logic and behavior patterns.
生成多角度决策日志：通过多Agent协作，生成涵盖不同市场情景和决策路径的多层次决策日志，确保数据的全面性和多样性。

Generate Multi-Angle Decision Logs: Through multi-Agent collaboration, generate multi-level decision logs covering various market scenarios and decision paths, ensuring data comprehensiveness and diversity.
定制化Agent行为：根据金融决策的具体需求，定制Agent的行为策略超参数，实现对不同金融产品和市场条件的适应。

Customized Agent Behavior: Tailor Agent behavior strategy hyperparameters based on specific financial decision-making requirements to adapt to different financial products and market conditions.

B. 强化学习驱动的Agent筛选和组合机制 / B. RL-Driven Agent Selection and Combination Mechanism

多目标RL系统开发：开发了基于多目标强化学习的Agent筛选和组合系统，优化多个目标如决策有效性、多样性和长期收益。

Development of Multi-Objective RL System: Developed an Agent selection and combination system based on multi-objective Reinforcement Learning, optimizing multiple goals such as decision effectiveness, diversity, and long-term rewards.
奖励函数设计：设计复杂的奖励函数，平衡短期和长期奖励，确保Agent的筛选过程既能实现高效决策又保持多样性。

Reward Function Design: Designed complex reward functions to balance short-term and long-term rewards, ensuring the Agent selection process achieves efficient decision-making while maintaining diversity.
优化算法应用：采用PPO强化学习算法，提高Agent筛选的准确性和效率。

Application of Optimization Algorithms: Utilized Proximal Policy Optimization (PPO) reinforcement learning algorithms to enhance the accuracy and efficiency of Agent selection.

C. LLM与RL融合的思维链生成器 / C. LLM and RL Integrated Decision-Making Chain Generator

集成大型语言模型：将GPT-4 o1等先进LLM集成到思维链生成器中，提升自然语言理解和生成能力。

Integration of Large Language Models: Integrated advanced LLMs such as GPT-4 o1 into the decision-making chain generator to enhance natural language understanding and generation capabilities.
数据模板设计：创建结构化数据模板，确保生成的思维链具有一致的格式和高质量的内容。

Data Template Design: Created structured data templates to ensure the generated decision-making chains have consistent formats and high-quality content.
RL优化路径：使用强化学习优化Agent推荐路径，确保生成的思维链既符合逻辑又具备创新性。

RL Path Optimization: Employed Reinforcement Learning to optimize Agent recommendation paths, ensuring the generated decision-making chains are both logical and innovative.
协同生成机制：实现LLM与RL模块的无缝协作，通过中间层接口协调两者的工作流程，提升思维链生成的效率和质量。

Collaborative Generation Mechanism: Achieved seamless collaboration between LLM and RL modules by coordinating their workflows through intermediate layer interfaces, enhancing the efficiency and quality of decision-making chain generation.

D. 双轨模型优化策略 / D. Dual-Track Model Optimization Strategy

监督微调（SFT）：利用标注数据对LLM进行监督微调，提升模型在金融决策推理中的准确性和可靠性。

Supervised Fine-Tuning (SFT): Utilized labeled data to perform supervised fine-tuning on LLMs, enhancing the accuracy and reliability of the model in financial decision-making reasoning.
近端策略优化（PPO）：应用PPO算法优化Agent的决策策略，提高模型在复杂决策场景下的表现。

Proximal Policy Optimization (PPO): Applied PPO algorithms to optimize Agent decision strategies, improving model performance in complex decision-making scenarios.
协同训练流程：设计了阶段性训练流程，使SFT和PPO策略互补，共同提升LLM的思维链推理能力。

Collaborative Training Process: Designed a phased training process where SFT and PPO strategies complement each other, jointly enhancing the LLM's decision-making chain reasoning capabilities.
性能提升评估：通过定量指标（如决策准确率、思维链一致性）评估优化策略的效果，验证模型性能的显著提升。

Performance Improvement Evaluation: Assessed the effectiveness of optimization strategies using quantitative metrics (e.g., decision accuracy, chain consistency) to verify significant enhancements in model performance.

E. 无监督的快速扩充高质量思维链数据集 / E. Unsupervised Rapid Expansion of High-Quality Decision-Making Chain Dataset

自动化数据生成流程：构建了全自动化的数据生成流程，实现高效的思维链数据集扩充，无需人工干预。

Automated Data Generation Process: Established a fully automated data generation process to efficiently expand the decision-making chain dataset without the need for manual intervention.
质量控制机制：引入多层次过滤和验证机制，确保生成的思维链数据具备高质量和相关性，排除低质量或无关数据。

Quality Control Mechanism: Introduced multi-level filtering and validation mechanisms to ensure the generated decision-making chain data maintains high quality and relevance, excluding low-quality or unrelated data.
数据集规模与覆盖面：目前已构建超过1万条思维链数据，涵盖多种金融决策场景如风险评估和市场分析。

Dataset Scale and Coverage: Currently constructed over 10,000 decision-making chain data entries, covering various financial decision-making scenarios such as risk assessment and market analysis.

F. 未来计划 / F. Future Plans

强化学习作为LLM Agent的记忆与规划模块：计划将强化学习技术应用于LLM Agent的记忆和规划功能，实现智能的营销策略和文案生成。

Reinforcement Learning as Memory and Planning Modules for LLM Agents: Plan to apply Reinforcement Learning techniques to the memory and planning functions of LLM Agents, enabling intelligent generation of marketing strategies and copywriting.
自动化营销策略生成：通过组合多源素材，自动生成针对不同市场需求的定制化营销策略和文案，提高营销效率和效果。

Automated Marketing Strategy Generation: Automatically generate customized marketing strategies and copywriting tailored to different market demands by combining multi-source materials, enhancing marketing efficiency and effectiveness.

技术栈与工具 / Tech Stack and Tools

编程语言 / Programming Languages: Python, Java
框架与库 / Frameworks and Libraries: PyTorch, OpenAI GPT-4, Stable Baselines 3
工具与平台 / Tools and Platforms: GitHub, Docker, Azure (Cloud Computing)
方法与技术 / Methods and Technologies: 多Agent系统设计, 强化学习（PPO）, 监督微调 (SFT), 数据模板设计, 无监督学习
Multi-agent system design, Reinforcement Learning (PPO), Supervised Fine-Tuning (SFT), Data Template Design, Unsupervised Learning

安装与使用 / Installation and Usage

前提条件 / Prerequisites

Python 3.8+
Docker（可选，用于容器化部署） / Docker (optional, for containerized deployment)
云计算资源（如AWS或GCP） / Cloud computing resources (e.g., AWS or GCP)

安装步骤 / Installation Steps

克隆仓库 / Clone the repository：

git clone https://github.com/wlvh/RBT.git
cd RBT/RL

创建并激活虚拟环境 / Create and activate a virtual environment：

python -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate  # Windows

安装依赖 / Install dependencies：
```
pip install -r requirements.txt
```
运行项目 / Run the project：
```
python main.py
```

使用示例 / Usage Examples

详细的使用说明和示例代码请参考 Wiki 或示例教程。

For detailed usage instructions and example code, please refer to the Wiki or Example Tutorials.

项目链接 / Project Links

GitHub 仓库 / GitHub Repository: https://github.com/wlvh/RBT/tree/main/RL

成就与成果 / Achievements and Outcomes

成功生成并扩充了高质量的金融决策思维链数据集，显著提升了金融决策支持系统的性能。

Successfully generated and expanded a high-quality financial decision-making chain dataset, significantly enhancing the performance of financial decision support systems.
通过双轨模型优化策略，提升LLM在复杂金融决策推理任务中的准确性和一致性。

Enhanced the accuracy and consistency of LLMs in complex financial decision-making reasoning tasks through dual-track model optimization strategies.
实现了无监督的数据扩充方法，大幅减少了数据准备的时间和人力成本。

Implemented an unsupervised data expansion method, greatly reducing the time and labor costs associated with data preparation.

贡献 / Contributions

欢迎贡献！请阅读贡献指南了解如何参与。

Welcome contributions! Please read the Contributing Guide to learn how to participate.

许可证 / License

本项目采用 MIT 许可证。

This project is licensed under the MIT License.

联系方式 / Contact

如有任何问题或建议，请通过 Issues 联系我们。

For any questions or suggestions, please contact us through Issues.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
BaseTradingEnv.py		BaseTradingEnv.py
LICENSE		LICENSE
README.md		README.md
RL_data_process.py		RL_data_process.py
RL_select_model.py		RL_select_model.py
RL_training.py.py		RL_training.py.py
config.json		config.json
features_extractor.py		features_extractor.py
future_data_collector.py		future_data_collector.py
future_data_pipline.py		future_data_pipline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

金融决策思维链数据集构建框架 / Financial Decision-Making Chain Dataset Construction Framework

项目概述 / Project Overview

主要功能 / Key Features

A. 多样化Agent生态系统 / A. Diverse Agent Ecosystem

B. 强化学习驱动的Agent筛选和组合机制 / B. RL-Driven Agent Selection and Combination Mechanism

C. LLM与RL融合的思维链生成器 / C. LLM and RL Integrated Decision-Making Chain Generator

D. 双轨模型优化策略 / D. Dual-Track Model Optimization Strategy

E. 无监督的快速扩充高质量思维链数据集 / E. Unsupervised Rapid Expansion of High-Quality Decision-Making Chain Dataset

F. 未来计划 / F. Future Plans

技术栈与工具 / Tech Stack and Tools

安装与使用 / Installation and Usage

前提条件 / Prerequisites

安装步骤 / Installation Steps

使用示例 / Usage Examples

项目链接 / Project Links

成就与成果 / Achievements and Outcomes

贡献 / Contributions

许可证 / License

联系方式 / Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

wlvh/RBT_RL

Folders and files

Latest commit

History

Repository files navigation

金融决策思维链数据集构建框架 / Financial Decision-Making Chain Dataset Construction Framework

项目概述 / Project Overview

主要功能 / Key Features

A. 多样化Agent生态系统 / A. Diverse Agent Ecosystem

B. 强化学习驱动的Agent筛选和组合机制 / B. RL-Driven Agent Selection and Combination Mechanism

C. LLM与RL融合的思维链生成器 / C. LLM and RL Integrated Decision-Making Chain Generator

D. 双轨模型优化策略 / D. Dual-Track Model Optimization Strategy

E. 无监督的快速扩充高质量思维链数据集 / E. Unsupervised Rapid Expansion of High-Quality Decision-Making Chain Dataset

F. 未来计划 / F. Future Plans

技术栈与工具 / Tech Stack and Tools

安装与使用 / Installation and Usage

前提条件 / Prerequisites

安装步骤 / Installation Steps

使用示例 / Usage Examples

项目链接 / Project Links

成就与成果 / Achievements and Outcomes

贡献 / Contributions

许可证 / License

联系方式 / Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages