Skip to content

Clarify PGMIndex::search return semantics for keys not present  #63

@cylindrical2002

Description

@cylindrical2002

In the current code and comment:

/**
 * Returns the approximate position and the range where @p key can be found.
 */
ApproxPos search(const K &key) const {
    auto k   = std::max(first_key, key);
    auto it  = segment_for_key(k);
    auto pos = std::min<size_t>((*it)(k), std::next(it)->intercept);
    auto lo  = PGM_SUB_EPS(pos, Epsilon);
    auto hi  = PGM_ADD_EPS(pos, Epsilon, n);
    return {pos, lo, hi};
}

Questions:

  1. Does search guarantee that lower_bound(key) is always inside [lo, hi) even when key is
    not present in the data (i.e., it's a predecessor query)?

  2. What are the guarantees for out-of-domain queries:

    • key < first_key (seems clamped by k = max(first_key, key))
    • key > last_key (seems to rely on a sentinel segment with next(it)->intercept == n)
      Are these invariants part of the API (i.e., sentinel is always present)?
  3. The docstring currently says “range where key can be found”, which might be interpreted as
    “key must exist”. Would you consider rephrasing to something like:

    • “returns an approximate position and a window that contains lower_bound(key)”
    • or “where the key would be located if present”

Thanks for confirming the intended guarantees. This clarification helps downstream users who implement
membership or predecessor queries based on search.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions