Skip to content

cloc ignores nested .tar.gz inside .src.rpm #974

@Tenount

Description

@Tenount

Describe the bug

When running cloc directly on a source RPM file (.src.rpm), it reports zero code lines detected. The .src.rpm contains a nested .tar.gz archive with C++ sources, but cloc does not recursively extract the inner archive — it only processes the top-level files (cppzmq.spec, cppzmq-4.10.0.tar.gz), leaving the C++ code inside the tarball uncounted.

After manual extraction (rpm2cpio | cpio | tar), the same files are correctly counted by cloc.

cloc; OS; OS version

  • cloc version: 2.00
  • OS: Alpine Linux
  • OS version: 3.20 (aarch64)
  • Reproducible via podman or docker

To Reproduce

  1. Start an Alpine container:

    podman run --rm -it alpine:3.20 /bin/sh
  2. Install dependencies:

    apk add --no-cache cloc gzip rpm2cpio cpio tar wget
  3. Download a Fedora source RPM (cppzmq — a pure C++ library):

    wget -q \
      "https://dl.fedoraproject.org/pub/fedora/linux/releases/44/Everything/source/tree/Packages/c/cppzmq-4.10.0-12.fc44.src.rpm" \
      -O cppzmq.src.rpm
  4. Run cloc directly on the .src.rpm file:

    cloc cppzmq.src.rpm
  5. Observe the result — only 1 text file counted, C++ inside the nested .tar.gz is ignored:

    104 blocks
            1 text file.
            0 unique files.
            2 files ignored.
    

    The .src.rpm contains cppzmq.spec (the 1 text file) and cppzmq-4.10.0.tar.gz (ignored — cloc does not recursively extract it).

  6. Now extract manually and run cloc again:

    mkdir ext && rpm2cpio cppzmq.src.rpm | cpio -idm -D ext 2>/dev/null
    tar xzf ext/cppzmq-4.10.0.tar.gz -C ext/
    cloc ext/cppzmq-4.10.0/
  7. Observe the correct result:

    github.com/AlDanial/cloc v 2.00  T=0.02 s (1625.4 files/s, 403155.3 lines/s)
    -------------------------------------------------------------------------------
    Language                     files          blank        comment           code
    -------------------------------------------------------------------------------
    C/C++ Header                     3            428            258           2883
    C++                             17            421             96           2832
    CMake                            7             59             27            220
    YAML                             2             17             12            219
    Markdown                         1             34              0            162
    Bourne Shell                     1              1              4             16
    -------------------------------------------------------------------------------
    SUM:                            31            960            397           6332
    -------------------------------------------------------------------------------
    

Expected result

When running cloc cppzmq.src.rpm, the language table should match the extracted content — showing C++ (~2832 lines) and C/C++ Header (~2883 lines). Instead, cloc silently returns success with no languages counted.

Actual result

$ cloc cppzmq.src.rpm
104 blocks
       1 text file.
       0 unique files.
       2 files ignored.

Only cppzmq.spec is counted; cppzmq-4.10.0.tar.gz (containing all the C++ source) is silently skipped.

Minimal reproducible Dockerfile

FROM alpine:3.20

RUN apk add --no-cache cloc gzip rpm2cpio cpio tar wget

RUN wget -q \
  "https://dl.fedoraproject.org/pub/fedora/linux/releases/44/Everything/source/tree/Packages/c/cppzmq-4.10.0-12.fc44.src.rpm" \
  -O /cppzmq.src.rpm

RUN mkdir -p /extracted && \
  rpm2cpio /cppzmq.src.rpm | cpio -idm -D /extracted 2>/dev/null && \
  tar xzf /extracted/cppzmq-4.10.0.tar.gz -C /extracted/

RUN echo "=== TEST 1: cloc on .src.rpm directly ===" && \
  cloc /cppzmq.src.rpm

RUN echo "=== TEST 2: cloc after extraction ===" && \
  cloc /extracted/cppzmq-4.10.0/

Additional context

  • The .src.rpm contains: cppzmq-4.10.0.tar.gz + cppzmq.spec
  • cloc correctly identifies and extracts the .src.rpm (exit code 0, rpm2cpio runs)
  • The extracted tarball (cppzmq-4.10.0.tar.gz) inside the RPM is not recursively processed
  • Related: cloc should try figuring out more than one level-deep archives  #161 (same root cause — cloc does not recursively extract nested archives, e.g. .tar.gz inside .deb; closed in 2017 without a fix)
-v 3 verbose trace for cloc cppzmq.src.rpm
-> load_from_config_file(/root/.config/cloc/options.txt)
<- load_from_config_file() (no such file: /root/.config/cloc/options.txt)
-> replace_git_hash_with_tarfile(cppzmq.src.rpm)
-> no_autogen(0)
<- no_autogen()
-> uncompress_archive_cmd(/cppzmq.src.rpm)
<- uncompress_archive_cmd
mkdir /tmp/CEQfe4j5be
cd    /tmp/CEQfe4j5be
rpm2cpio '/cppzmq.src.rpm' | cpio -i
104 blocks
-> make_file_list(/tmp/CEQfe4j5be)
Using temp file list [/tmp/01ryDCKuFb]
-> find_preprocessor(/tmp/CEQfe4j5be)
<- find_preprocessor(cppzmq-4.10.0.tar.gz cppzmq.spec)
       1 text file.
classifying /tmp/CEQfe4j5be/cppzmq.spec
-> classify_file(/tmp/CEQfe4j5be/cppzmq.spec)
/tmp/CEQfe4j5be/cppzmq.spec extension=[spec]
-> peek_at_first_line(/tmp/CEQfe4j5be/cppzmq.spec)
-> first_line(/tmp/CEQfe4j5be/cppzmq.spec, 1)
<- first_line(/tmp/CEQfe4j5be/cppzmq.spec, 1, '## START: Set by rpmautospec')
<- peek_at_first_line(/tmp/CEQfe4j5be/cppzmq.spec)
<- classify_file(/tmp/CEQfe4j5be/cppzmq.spec)=(unknown)
<- make_file_list()
-> remove_duplicate_files
<- remove_duplicate_files
       0 unique files.
-> count_files()
<- count_files()
       2 files ignored.
-> write_null_results
<- write_null_results

Key observation: find_preprocessor() returns cppzmq-4.10.0.tar.gz and cppzmq.spec, but cloc never recurses into the .tar.gz — it only classifies cppzmq.spec (as unknown) and writes null results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions