Agent Benchmark for Government Scenarios Based on KubeEdge-Ianvs

**Description:**

With the rapid advancement of large model technology, its application potential in government scenarios is becoming increasingly prominent. The intelligent upgrade of government services involves three core scenarios: internal government collaboration, public services, and enterprise services, all of which urgently require large model technology to enhance efficiency and service quality. However, government scenarios demand high levels of professionalism, standardization, and security. The existing evaluation systems for large models lack standardized assessment methods tailored to the vertical domain of government affairs, posing challenges in technology implementation, such as accuracy, compliance, and scenario adaptability.  
Therefore, this project aims to leverage the KubeEdge-Ianvs distributed collaborative framework to construct a large model evaluation Pipeline and Benchmark specifically for government scenarios. This will provide quantifiable and reusable capability assessment tools for the intelligent transformation of government services, promoting the secure and efficient application of large model technology in typical scenarios such as official document drafting, smart transportation, and government Q&A.

**Expected Outcomes:**

1. Introduce datasets from the e-government domain, categorize and reorganize existing datasets according to three standardized task categories: Government Services (e.g., administrative services, hotline support, business facilitation), Government Office Operations (e.g., government knowledge Q&A, document information extraction, document generation), Urban Governance (e.g., urban data analysis, event perception, event dispatching, event analysis)
2. Select at least one of the above scenarios in KubeEdge-Ianvs to provide a standardized test suite, including datasets, test environments, and evaluation metrics, and standardize and organize the datasets into a unified data format.
3. Implement baseline algorithms for e-government agents in KubeEdge-Ianvs based on the standardized test suite.

**Recommended Skills:**

LLM Agent, LLM Benchmark, VQA

**Useful Links:**

- https://ianvs.readthedocs.io/en/latest/
- https://github.com/kubeedge/ianvs/tree/main/examples/llm_simple_qa
- https://github.com/kubeedge/ianvs/tree/main/examples/llm-edge-benchmark-suite
- https://github.com/kubeedge/ianvs/tree/main/examples/llm-agent
- https://github.com/kubeedge/ianvs/tree/main/examples/government/singletask_learning_bench
- https://github.com/kubeedge/ianvs/blob/main/docs/proposals/scenarios/llm-benchmarks/llm-benchmarks.md
- https://github.com/kubeedge/ianvs/blob/main/docs/proposals/scenarios/llm-benchmark-suite/llm-edge-benchmark-suite.md
- https://github.com/kubeedge/ianvs/blob/main/docs/proposals/scenarios/Smart_Coding/Smart%20Coding%20benchmark%20suite%20Proposal.md
- https://github.com/kubeedge/ianvs/blob/main/docs/proposals/algorithms/joint-inference/cloud-edge-collaboration-inference-for-llm.md
- https://github.com/kubeedge/ianvs/blob/main/docs/proposals/algorithms/joint-inference/cloud-edge-speculative-decoding-for-llm.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Benchmark for Government Scenarios Based on KubeEdge-Ianvs #199

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Agent Benchmark for Government Scenarios Based on KubeEdge-Ianvs #199

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions