Skip to content

[experiment] Auto-generate documentation for jaeger-v2 configuration structs via AST #6628

Open
@yurishkuro

Description

@yurishkuro

We are still blocked on the main issue #6186 by schema-first efforts in OTEL Collector not progressing. I wonder if we could instead use the Go's AST library to navigate the hierarchy of known config structs and extract the comments and other metadata needed for the docs, and/or config examples.

There are various blog posts showing examples of using AST.

The tool could have just a hardcoded list of starting configuration structs, both from Jaeger and from OTEL code base, e.g. cmd/jaeger/internal/extension/jaegerquery/config.go.

The prototype is available in draft PR #7064.

Rough outline of the milestones:

This is another outline of the task from Gemini:

Feature: Generate JSON Schema with Comments and Defaults

Goal: Implement a tool or function that generates JSON schema for a collection of Go objects, incorporating comments as descriptions and using the current field values as defaults.

Implementation Outline:

I. Initialization and Package Loading:

  1. Input:
    • A slice or map of Go objects to generate schemas for.
    • The package paths where the types of these objects are defined.
  2. Load Packages:
    • Utilize the "golang.org/x/tools/go/packages" library to load the specified Go packages.
    • Configure packages.Config to include necessary information for parsing comments and type structures (e.g., NeedTypes, NeedSyntax, NeedName, NeedImports, NeedDeps, NeedFiles, NeedCompiledGoFiles, NeedExportFile, NeedModule).
  3. Type Information:
    • For each input Go object, obtain its reflect.Type using the reflect package for runtime inspection.

II. Reflecting and Parsing Types:

  1. Iterate Through Objects: Loop through each Go object in the input collection.
  2. Get reflect.Type and reflect.Value:
    • Obtain the reflect.Type to analyze the structure.
    • Obtain the reflect.Value to access the current field values for defaults.
  3. Find Corresponding ast.TypeSpec:
    • For the reflect.Type, locate the corresponding ast.TypeSpec within the parsed packages (pkg.Syntax).
    • This will involve traversing the syntax trees and matching the ast.TypeSpec.Name.Name with the Go type's name.
    • Handle potential complexities like embedded types and type aliases.
  4. Extract Field Information: For each field of the reflect.Type:
    • Get the field name (field.Name).
    • Get the field type (field.Type).
    • Extract struct tags (field.Tag), specifically looking for the json tag to determine the JSON property name and omitempty.
    • Get the current value of the field from the reflect.Value (Value.Field(i)).
  5. Extract Comment Information:
    • Locate the corresponding ast.Field in the ast.TypeSpec.
    • Extract the associated comment from ast.Field.Doc or ast.Field.Comment.

III. Building the JSON Schema:

  1. Schema Structure:
    • Define a structure for the generated JSON schema, likely using the "definitions" section for type schemas and a top-level schema referencing these definitions.
  2. Type Mapping:
    • Create a mapping between Go types (from reflect.Type) and their corresponding JSON schema types (e.g., string, integer, boolean, array, object).
    • Handle basic types, slices, maps, and nested structs.
  3. Schema Properties: For each Go field, create a property in the JSON schema:
    • type: Mapped from the Go field type.
    • description: The extracted Go field comment.
    • default: The current value of the Go field (serialized appropriately for JSON schema).
    • Potentially include other keywords like format, nullable, and constraints based on struct tags.
  4. Handling Nested Objects:
    • If a field is another Go object, recursively process its type and add a $ref to its definition in the "definitions" section.
  5. Handling Slices and Maps:
    • For slice and map types, define the items or additionalProperties schema, referencing the schema of the element/value type.

IV. Data Structures:

  • TypeCache (Map: reflect.Type -> *ast.TypeSpec): Caches the mapping between reflect.Type and its ast.TypeSpec to avoid redundant lookups.
  • SchemaDefinitions (Map: string -> map[string]interface{}): Stores the generated JSON schema definitions for each Go type, keyed by the type name.
  • ProcessedTypes (Set: reflect.Type): Tracks already processed Go types to prevent infinite recursion with nested or circular dependencies.
  • FieldInfo (Struct): Holds intermediate information about each field:
    type FieldInfo struct {
        Name        string
        JSONName    string
        Type        reflect.Type
        Value       reflect.Value
        Comment     string
        Tags        reflect.StructTag
    }
  • PackageInfo (Struct): Stores information about a loaded Go package, including a mapping of type names to their ast.TypeSpec:
    type PackageInfo struct {
        Package *packages.Package
        TypeSpecs map[string]*ast.TypeSpec
    }

V. Output:

  1. Root Schema: Construct the final JSON schema object, including the $schema and the "definitions" section. The root schema might also define properties for the top-level object(s).
  2. Serialization: Serialize the JSON schema structure into a JSON string using encoding/json.

Key Considerations and Challenges:

  • Handling embedded types correctly.
  • Managing type aliases.
  • Detecting and handling circular dependencies between types.
  • Deciding how to handle unexported fields.
  • Mapping custom Go types to appropriate JSON schema types.
  • Implementing robust error handling.
  • Optimizing performance for large and complex type structures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationgood first issueGood for beginnershelp wantedFeatures that maintainers are willing to accept but do not have cycles to implementv2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions