a) Release test environment management and configuration. b) Release test case management and configuration. c) Release test story management and configuration. d) Release the open-source test case generation tool: Use hyperparameter enumeration to fill in one configuration file to generate multiple test cases.
Release the PCB-AoI public dataset, its corresponding preprocessing, and baseline algorithm projects. Ianvs is the first open-source site for that dataset.
a) Test environments and test cases that support the single-task learning paradigm. b) Test environments and test cases that support the incremental learning paradigm.
a) Release PCB-AoI benchmark cases based on single-task learning, including leaderboards and test reports. b) Release PCB-AoI benchmark cases based on incremental learning, including leaderboards and test reports.
This version of Ianvs supports the following functions of unstructured lifelong learning:
1. Support lifelong learning throughout the entire lifecycle, including task definition, task assignment, unknown task recognition, and unknown task handling, among other modules, with each module being decoupled.
- Support unknown task recognition and provide corresponding usage examples based on semantic segmentation tasks in this example.
- Support multi-task joint inference and provide corresponding usage examples based on object detection tasks in this example.
- Support lifelong learning system metrics such as BWT and FWT.
- Support visualization of lifelong learning results.
3. Provide real-world datasets and rich examples for lifelong learning testing, to better evaluate the effectiveness of lifelong learning algorithms in real environments.
- Provide cloud-robotics datasets in this website.
- Provide cloud-robotics semantic segmentation examples in this example.
Ianvs v0.3.0 brings powerful new LLM-related features, including comprehensive (1) LLM testing and benchmarking tools, (2) advanced cloud-edge collaborative inference paradigms, and (3) innovative algorithms tailored for large model optimization.
Ianvs now supports robust testing for both locally deployed LLMs and public LLM APIs (e.g., OpenAI). This release introduces three specialized benchmarks for evaluating LLM capabilities in diverse scenarios:
-
Government-Specific Large Model Benchmark: Designed to assess LLM accuracy and reasoning in government-specific scenarios. using objective (multiple-choice) and subjective (Q&A) tests. Explore the benchmark dataset, try the example.
-
Smart Coding Benchmark: This benchmark evaluates the debugging capabilities of LLMs using real-world coding issues from GitHub repositories. Learn more through the example and read the background documentation.
-
Large Language Model Edge Benchmark: Focused on testing LLM performance in edge environments, this benchmark evaluates resource efficiency and deployment performance. Access datasets and examples here and check out the detailed documentation.
This release introduces new paradigms and algorithms for collaborative inference to optimize cloud-edge cooperation and improve performance:
-
Cloud-Edge Collaborative Inference Paradigm: A new architecture enables efficient cloud-edge collaboration for LLM inference, featuring a baseline algorithm that delivers up to 50% token cost savings without compromising accuracy. Try the example.
-
Speculative Decoding Algorithm (EAGLE, ICML'24): Integrated within the collaborative inference framework, this algorithm accelerates inference speeds by 20% or more. Try the example and explore detailed documentation.
-
Joint Inference Paradigm for Pedestrian Tracking: A multi-edge inference paradigm for pedestrian tracking utilizing the pretrained ByteTrack model (ECCV'22). See the pedestrian tracking example or refer to the background documentation.
Ianvs includes new algorithms to improve LLM performance and usability in various scenarios:
-
Personalized LLM Agent Algorithm: This algorithm supports single-task learning using the pretrained Bloom model, enabling personalized LLM operations. Explore the example and review the documentation.
-
Multimodal Large Model Joint Learning Algorithm: A joint learning algorithm for multimodal understanding with the pretrained RFNet model. Try the example here and learn more in the documentation.
-
Unseen Task Processing Algorithm: Supports lifelong learning with pretrained models to handle unseen tasks effectively. Access the example and gain insights from the background documentation.