Skip to content

cre2_new silently truncates pattern if length is passed incorrectly #32

@Tonyyang0606

Description

@Tonyyang0606

Step to reproduce

#include <stdio.h>
#include <string.h>
#include <cre2.h>

int main() {
    const char *pattern = "^admin:[0-9]{4}$";
    // Expected: only matches "admin:1234"

    int wrong_len = 7;  // ❌ Intentionally too short

    cre2_options_t *opt = cre2_opt_new();
    cre2_regexp_t *re = cre2_new(pattern, wrong_len, opt);

    if (cre2_error_code(re) != 0) {
        printf("Compilation failed: %s\n", cre2_error_string(re));
    } else {
        // Actually compiled as "^admin:" (truncated)
        printf("Compiled successfully, but pattern was truncated!\n");

        const char *test = "admin:xxx2312x";
        cre2_string_t match;
        int matched = cre2_match(re, test, (int)strlen(test),
                                 0, (int)strlen(test),
                                 CRE2_UNANCHORED, &match, 1);
        printf("Match result: %d\n", matched);
    }

    cre2_delete(re);
    cre2_opt_delete(opt);
    return 0;
}

Actual behaviour

  • The regex compiles successfully but only uses the first len characters.
  • No error is reported if len < strlen(pattern).
  • This can lead to serious logic bugs: the user thinks the pattern is compiled correctly, but it’s silently truncated.

Example output

Compiled successfully, but pattern was truncated!
Match result: 1

Expected behaviour

  1. If len < strlen(pattern), the library should:
    • Either fail with an error code, or
    • Emit a clear warning (especially in debug builds).
  2. Provide a safe API for null-terminated strings, e.g.:
cre2_regexp_t *cre2_new_cstr(const char *pattern, cre2_options_t *opt);

This would automatically use strlen(pattern) and avoid common mistakes.

Suggestion

  • Add cre2_new_cstr (safe wrapper).
  • Improve error handling when the provided len does not match the actual string length.
  • At minimum, document this pitfall more explicitly, since it can cause subtle and dangerous bugs in security-sensitive regex use cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions