Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
0ee0b1a
create pipeline for evals testing for apiview copilot
kristapratico May 2, 2025
e9c4067
switch to tests.yml
kristapratico May 2, 2025
bc7d0bb
actually link the env vars
kristapratico May 2, 2025
69e2a3e
move
kristapratico May 2, 2025
406bc9b
try env
kristapratico May 2, 2025
d00caae
try again
kristapratico May 2, 2025
476778e
Merge branch 'main' into apiview-copilot/evals-in-ci
kristapratico May 2, 2025
8ba0ede
fix
kristapratico May 2, 2025
97bd64b
changing model judge + other fixes to run.py
kristapratico May 3, 2025
2693ae2
updates in run in pipeline and record results in foundry
kristapratico May 3, 2025
b3c9737
fixes
kristapratico May 3, 2025
d12c308
fixes
kristapratico May 3, 2025
9e3c119
debug errors
kristapratico May 3, 2025
a8fdf48
debug errors
kristapratico May 3, 2025
235f90c
revert
kristapratico May 3, 2025
ec182a3
try
kristapratico May 3, 2025
4903019
try
kristapratico May 3, 2025
2100481
try
kristapratico May 3, 2025
608b480
fix
kristapratico May 3, 2025
9b393de
try
kristapratico May 3, 2025
e7d3004
fix
kristapratico May 3, 2025
ad92df2
try
kristapratico May 3, 2025
471081f
try
kristapratico May 3, 2025
58ceca0
try
kristapratico May 3, 2025
a2f41bd
try
kristapratico May 3, 2025
334f0bd
revert
kristapratico May 3, 2025
89deb41
refactor tests/results into one file
kristapratico May 4, 2025
4a1e2fd
try adding auth task
kristapratico May 5, 2025
6dbaeeb
update sub
kristapratico May 5, 2025
f5ff279
try different value
kristapratico May 5, 2025
7900a71
no venv
kristapratico May 5, 2025
755ff28
fix path
kristapratico May 5, 2025
8a8df47
debug
kristapratico May 5, 2025
aa8a4ea
set log level
scbedd May 6, 2025
93e8240
remove the override settings for the login. allow the azureSubscripti…
scbedd May 6, 2025
4f93be5
just try api key
kristapratico May 6, 2025
7357cfb
try again
kristapratico May 6, 2025
1bf7feb
print endpoint
kristapratico May 6, 2025
e4d5c6a
oops
kristapratico May 6, 2025
81b9523
missing one
kristapratico May 6, 2025
0a69285
try again
kristapratico May 6, 2025
34f5297
try again
kristapratico May 6, 2025
5999d95
try az login
kristapratico May 6, 2025
61d7ffb
try az login
kristapratico May 6, 2025
baf3f27
try w/o key
kristapratico May 7, 2025
1106ae6
try
kristapratico May 7, 2025
a88b8b9
revert
kristapratico May 7, 2025
736eebc
debug more
kristapratico May 7, 2025
e38452c
try
kristapratico May 7, 2025
b0178b6
Merge branch 'main' into apiview-copilot/evals-in-ci
kristapratico May 7, 2025
35e47be
remove file
kristapratico May 7, 2025
8052c79
try sp
kristapratico May 8, 2025
f3b4523
Merge branch 'main' into apiview-copilot/evals-in-ci
kristapratico May 13, 2025
7218214
revert
kristapratico May 13, 2025
e7a3869
try according to docs
kristapratico May 19, 2025
aa1b937
try
kristapratico May 19, 2025
f04f735
revert
kristapratico May 19, 2025
9a5a6a4
try ai solution
kristapratico May 19, 2025
d1942bb
try
kristapratico May 19, 2025
d01a1a2
try
kristapratico May 19, 2025
6b5a899
idk anymore
kristapratico May 20, 2025
3273f19
try AzurePipelineCredential
kristapratico May 20, 2025
78e403b
typo
kristapratico May 20, 2025
c87a78f
try env vars
kristapratico May 20, 2025
6448d09
missing env var
kristapratico May 20, 2025
39c3703
try setting tenant
kristapratico May 21, 2025
377727d
does api key work
kristapratico May 21, 2025
462379f
oops
kristapratico May 21, 2025
eaab048
add data
kristapratico May 21, 2025
f5c4881
missed config in run
kristapratico May 21, 2025
57e2b4f
try a full run
kristapratico May 21, 2025
b3a5ca4
try nano
kristapratico May 21, 2025
a7adc8f
remove dups
kristapratico May 21, 2025
abeea6e
revert debug
kristapratico May 22, 2025
40dea99
Merge branch 'main' into apiview-copilot/evals-in-ci
kristapratico May 22, 2025
86074a5
update docs and reqs
kristapratico May 22, 2025
2540eb5
add rag env vars
kristapratico May 22, 2025
dab4a24
remove publish package step
kristapratico May 27, 2025
e35a9d3
add missing param
kristapratico May 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions packages/python-packages/apiview-copilot/evals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ This directory contains the evaluation testing for APIView Copilot.

## Running Evaluations

### In DevOps pipeline

Evals runs can be triggered by the [tools - apiview-copilot - tests](https://dev.azure.com/azure-sdk/internal/_build?definitionId=7662&_a=summary) pipeline. Results of the run can be found on the Evaluation tab in the Azure AI Foundry portal for the `apiview-ai` project.

### Locally

Running evaluations will run evals on test files for the language given and give the choice to record the baseline (aka write the results to `evals/results/language`).

The main evaluation script is `run.py`. Here are the common ways to use it:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ azure-ai-evaluation==1.5.0
python-dotenv==1.0.1
tabulate==0.9.0
openai==1.67.0
azure-identity==1.21.0
37 changes: 30 additions & 7 deletions packages/python-packages/apiview-copilot/evals/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,13 @@
import dotenv
from tabulate import tabulate
from azure.ai.evaluation import evaluate, SimilarityEvaluator, GroundednessEvaluator
from azure.identity import AzurePipelinesCredential

dotenv.load_dotenv()

NUM_RUNS: int = 3
# for best results, this should always be a different model from the one we are evaluating
MODEL_JUDGE = "gpt-4.1"
MODEL_JUDGE = "gpt-4.1-nano"

model_config: dict[str, str] = {
"azure_endpoint": os.environ["AZURE_OPENAI_ENDPOINT"],
Expand All @@ -38,6 +39,9 @@
}


def in_ci():
return os.getenv("TF_BUILD", False)


class CustomAPIViewEvaluator:
"""Evaluator for comparing expected and actual APIView comments."""
Expand Down Expand Up @@ -142,6 +146,7 @@ def _evaluate_generic_comments(self, query: str, language: str, generic_comments
"exceptions": exceptions,
"language": language,
},
configuration={"api_key": os.getenv("AZURE_OPENAI_API_KEY")}
)
comment["valid"] = "true" in response.lower()

Expand Down Expand Up @@ -387,12 +392,14 @@ def calculate_coverage(args: argparse.Namespace, rule_ids: set[str]) -> None:
def establish_baseline(args: argparse.Namespace, all_results: dict[str, Any]) -> None:
"""Establish the current results as the new baseline."""

establish_baseline = input("\nDo you want to establish this as the new baseline? (y/n): ")
if establish_baseline.lower() == "y":
for name, result in all_results.items():
output_path = pathlib.Path(__file__).parent / "results" / args.language / name[:-1]
with open(str(output_path), "w") as f:
json.dump(result, indent=4, fp=f)
# only ask if we're not in CI
if in_ci() is False:
establish_baseline = input("\nDo you want to establish this as the new baseline? (y/n): ")
if establish_baseline.lower() == "y":
for name, result in all_results.items():
output_path = pathlib.Path(__file__).parent / "results" / args.language / name[:-1]
with open(str(output_path), "w") as f:
json.dump(result, indent=4, fp=f)

# whether or not we establish a baseline, we want to write results to a temp dir
log_path = pathlib.Path(__file__).parent / "results" / args.language / ".log"
Expand Down Expand Up @@ -484,6 +491,21 @@ def record_run_result(result: dict[str, Any], rule_ids: Set[str]) -> list[dict[s
"resource_group_name": os.environ["AZURE_FOUNDRY_RESOURCE_GROUP"],
"project_name": os.environ["AZURE_FOUNDRY_PROJECT_NAME"],
}
if in_ci():
service_connection_id = os.environ["AZURESUBSCRIPTION_SERVICE_CONNECTION_ID"]
client_id = os.environ["AZURESUBSCRIPTION_CLIENT_ID"]
tenant_id = os.environ["AZURESUBSCRIPTION_TENANT_ID"]
system_access_token = os.environ["SYSTEM_ACCESSTOKEN"]
kwargs = {
"credential": AzurePipelinesCredential(
service_connection_id=service_connection_id,
client_id=client_id,
tenant_id=tenant_id,
system_access_token=system_access_token,
)
}
else:
kwargs = {}

run_results = []
for run in range(args.num_runs):
Expand All @@ -508,6 +530,7 @@ def record_run_result(result: dict[str, Any], rule_ids: Set[str]) -> list[dict[s
target=review_apiview,
fail_on_evaluator_errors=True,
azure_ai_project=azure_ai_project,
**kwargs
)

run_result = record_run_result(result, rule_ids)
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ class ApiViewContextMode:
DEFAULT_CONTEXT_MODE = ApiViewContextMode.RAG


def in_ci():
return os.getenv("TF_BUILD", False)


# create enum for the ReviewMode
class ApiViewReviewMode:
FULL = "full"
Expand Down Expand Up @@ -486,7 +490,12 @@ def _run_prompt(self, prompt_path: str, inputs: dict, max_retries: int = 5) -> s
"""

def execute_prompt() -> str:
return prompty.execute(prompt_path, inputs=inputs)
if in_ci():
configuration={"api_key": os.getenv("AZURE_OPENAI_API_KEY")}
else:
configuration = {}

return prompty.execute(prompt_path, inputs=inputs, configuration=configuration)

def on_retry(exception, attempt, max_attempts):
logger.warning(
Expand Down
66 changes: 66 additions & 0 deletions packages/python-packages/apiview-copilot/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
parameters:
- name: PythonVersion
type: string
default: '3.10'

trigger: none
extends:
template: /eng/pipelines/templates/stages/1es-redirect.yml
parameters:
stages:
- stage: 'Build'
variables:
- template: /eng/pipelines/templates/variables/globals.yml
- template: /eng/pipelines/templates/variables/image.yml
jobs:
- job: 'Build'

pool:
name: $(LINUXNEXTPOOL)
image: $(LINUXNEXTVMIMAGE)
os: linux

steps:
- template: /eng/pipelines/templates/steps/use-python-version.yml
parameters:
versionSpec: '${{ parameters.PythonVersion }}'

- script: |
python --version
python -m pip install virtualenv aiohttp chardet trio setuptools wheel packaging
displayName: 'Setup Python Environment'

- script: |
python -m pip install -r dev_requirements.txt
python -m pip install -e .
displayName: 'Install Test Requirements'
workingDirectory: $(Build.SourcesDirectory)/packages/python-packages/apiview-copilot

- task: AzureCLI@2
displayName: Run Evals (AzureCLI@2)
inputs:
azureSubscription: azure-sdk-tests-playground
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
# Login using service principal (handled automatically by Azure DevOps)
az account set --subscription "faa080af-c1d8-40ad-9cce-e1a450ca5b57"

# Verify the context
az account show --query '{subscription:name,tenant:tenantId}'

python packages/python-packages/apiview-copilot/evals/run.py

exit $?
env:
AZURE_OPENAI_ENDPOINT: $(python-openai-endpoint)
AZURE_OPENAI_API_KEY: $(python-openai-key)
AZURE_SUBSCRIPTION_ID: faa080af-c1d8-40ad-9cce-e1a450ca5b57
AZURE_TENANT_ID: 2f4a9838-26b7-47ee-be60-ccc1fdec5953
AZURE_FOUNDRY_RESOURCE_GROUP: openai-shared
AZURE_FOUNDRY_PROJECT_NAME: apiview-ai
OPENAI_API_VERSION: 2025-03-01-preview
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
AZURE_SEARCH_NAME: archagent-search
AZURE_COSMOS_ACC_NAME: archagent-cosmos
AZURE_COSMOS_DB_NAME: archagent-db
Loading