Skip to content

Support for parsing, preserving, and emitting comments in Verible (CST) #2476

@gmCAD

Description

@gmCAD

Summary
Verible currently does not expose comments as first-class tokens/nodes in the parse tree/formatter output, which makes it hard to build tooling that needs to analyze, transform, or preserve comments (e.g., documentation extraction, lint exceptions, structured metadata, or formatter-stable annotations). Please add comprehensive support for comments to CST APIs.

Motivation
Comments in SystemVerilog are routinely used for:

  • Documentation and metadata: module/port documentation, code-generation hints, requirement IDs.
  • Formatter stability: inline explanations, TODOs, and region markers that should remain attached to specific syntax nodes.
  • Code review hygiene: preserving reviewer notes or rationale near the code they describe.

Requested Behavior

  1. Parsing / CST:
    • Option A (preferred): Attach comments to nearby CST nodes via well-defined association rules (e.g., leading/trailing/inner comments with stable heuristics).
    • Option B: Store comments in a side channel with accurate positional mapping, plus utility APIs to query “comments near node X”.
    • Support comments in otherwise empty regions (file headers, between declarations, and end-of-file).

Use Cases

  • Documentation pipelines: Extracting module/port descriptions from comments for auto-generated specifications.
  • Compliance/traceability: Linking code to requirement IDs embedded in comments.
  • Review tooling: Ensuring comments survive formatting and refactors without being displaced.

Examples

Input (SystemVerilog):

// File header: Project X, Rev 2.1
module fifo #(
  parameter int DEPTH = 16 // default depth
) (
  input  logic clk,  // clock domain A
  input  logic rst_n, // active-low reset
  input  logic wr_en, /* write enable */
  input  logic rd_en,
  output logic full, // NOLINT: intentional behavior
  output logic empty
);
// verilog_lint: waive rule=no-async-reset -- legacy block

/* Multi-line
 * description for implementation details */
always_ff @(posedge clk or negedge rst_n) begin
  if (!rst_n) begin
    // initialize pointers
  end else begin
    /* normal operation */
  end
end

endmodule // fifo

Desired outcomes:

  • Comments are preserved in their relative positions after formatting.
  • APIs allow querying comments attached to module fifo, to port clk, and to the always_ff block.

Acceptance Criteria

  • Running the formatter on files with mixed line/block comments preserves comment text and stable placement across multiple runs.
  • New API surface (or documented existing APIs) to:
    • Iterate all comment tokens with positional metadata.
    • Associate comments with CST nodes using deterministic rules.
  • Documentation updated to describe comment handling, association rules, and formatter flags.

Alternatives

  • Many language formatters (e.g., clang-format) implement stable heuristics for leading/trailing comment association and idempotent preservation.
  • Some parsers expose comments as trivia/hidden tokens with position data; others attach them to CST/AST nodes via heuristics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    rejects-valid syntaxIf the parser wrongly rejects syntactically valid code (according to SV-2017).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions