Skip to content

C parser still indexes call sites as functions and misses nginx-style definitions #331

@justrach

Description

@justrach

Problem

After deploying codedb release/0.2.579 to api.wiki.codes and reindexing a C validation set, C symbol coverage is no longer zero, but parser precision/recall still needs work.

Prod evidence

Validation reindex on 2026-04-26:

curl-curl        files=4279 symbols=14949
nginx-nginx       files=526  symbols=4583
openssl-openssl  files=3997 symbols=61892
sqlite-sqlite    files=2178 symbols=18339

Good hits:

/api/curl-curl/symbol?name=Curl_close
  -> lib/url.c, kind=function, detail="CURLcode Curl_close(struct Curl_easy **datap)"

/api/curl-curl/symbol?name=curl_easy_init
  -> lib/easy.c, kind=function, detail="CURL *curl_easy_init(void)"

/api/openssl-openssl/symbol?name=SSL_new
  -> ssl/ssl_lib.c, kind=function, detail="SSL *SSL_new(SSL_CTX *ctx)"

Bad false positives:

/api/curl-curl/symbol?name=curl_easy_perform
  -> 89 results, mostly docs/examples call sites and strings such as:
     fprintf(stderr, "curl_easy_perform() failed: %s\\n",

Nginx recall gap:

/api/nginx-nginx/symbol?name=ngx_http_init_connection      -> 0
/api/nginx-nginx/symbol?name=ngx_http_create_request       -> 0
/api/nginx-nginx/symbol?name=ngx_http_process_request      -> 0
/api/nginx-nginx/symbol?name=ngx_http_finalize_request     -> 0
/api/nginx-nginx/outline?path=src/http/ngx_http_request.c  -> 12 symbols, mostly includes/calls, not the main function definitions

Example nginx false positives from outline:

SSL_CTX_get_verify_mode, kind=function, line 991
SSL_get_options, kind=function, line 998
ngx_strcasestrn, kind=function, line 1928

These are call expressions inside function bodies, not definitions.

Expected behavior

The C parser should index high-confidence declarations/definitions and skip call expressions inside bodies/strings/log messages.

For issue #319 follow-up, useful acceptance criteria would be:

  • curl_easy_perform should either return its definition only, or no hit if the definition is macro/alias-backed and not confidently parsed.
  • nginx request functions should be detected in src/http/ngx_http_request.c.
  • function calls such as SSL_get_options(...) inside a statement should not become kind=function symbols.
  • strings containing foo() should not become function symbols.

Cloud validation context

The cloud reindex itself worked; this is now parser quality, not deployment. api.wiki.codes has the validation data live for curl/nginx/openssl/sqlite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions