Skip to content

Agent Benchmark for Government Scenarios Based on KubeEdge-Ianvs #199

@IcyFeather233

Description

@IcyFeather233

Description:

With the rapid advancement of large model technology, its application potential in government scenarios is becoming increasingly prominent. The intelligent upgrade of government services involves three core scenarios: internal government collaboration, public services, and enterprise services, all of which urgently require large model technology to enhance efficiency and service quality. However, government scenarios demand high levels of professionalism, standardization, and security. The existing evaluation systems for large models lack standardized assessment methods tailored to the vertical domain of government affairs, posing challenges in technology implementation, such as accuracy, compliance, and scenario adaptability.
Therefore, this project aims to leverage the KubeEdge-Ianvs distributed collaborative framework to construct a large model evaluation Pipeline and Benchmark specifically for government scenarios. This will provide quantifiable and reusable capability assessment tools for the intelligent transformation of government services, promoting the secure and efficient application of large model technology in typical scenarios such as official document drafting, smart transportation, and government Q&A.

Expected Outcomes:

  1. Introduce datasets from the e-government domain, categorize and reorganize existing datasets according to three standardized task categories: Government Services (e.g., administrative services, hotline support, business facilitation), Government Office Operations (e.g., government knowledge Q&A, document information extraction, document generation), Urban Governance (e.g., urban data analysis, event perception, event dispatching, event analysis)
  2. Select at least one of the above scenarios in KubeEdge-Ianvs to provide a standardized test suite, including datasets, test environments, and evaluation metrics, and standardize and organize the datasets into a unified data format.
  3. Implement baseline algorithms for e-government agents in KubeEdge-Ianvs based on the standardized test suite.

Recommended Skills:

LLM Agent, LLM Benchmark, VQA

Useful Links:

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions