python-samples/claude-vulscan.py at main · wilsonmar/python-samples · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.12"
# dependencies = [
#   "anthropic",
#   "ollama",
#   "beautifulsoup4",
#   "certifi",
#   "openai",
# ]
# ///
# See https://docs.astral.sh/uv/guides/scripts/#using-a-shebang-to-create-an-executable-file
# -*- coding: utf-8 -*-
# SPDX-License-Identifier: MPL-2.0

#### SECTION 01: Define

"""claude-vulscan.py here.

This Python program calls Anthropic Claude APIs to obtain status and to
scan Python code for vulnerabilities.
Additional LLM models
https://bomonike.github.io/claude-vulscan

RISK ACCEPTED: **Potential sensitive data exposure in output** (line 178): The scanned file contents are sent to an external API and findings are printed to stdout, which could leak secrets if scanning files containing credentials.

Vulnerabilities Anthropic Claude is told to check for:
* **Injection** : SQL injection, command injection, LDAP injection
* **Secrets** : Hardcoded passwords, API keys, tokens
* **Crypto** : Weak hashing (MD5/SHA1), insecure random
* **Auth** : Broken auth, missing rate limiting
* **Input validation** : Missing sanitization, path traversal
* **Dependencies** : Outdated/vulnerable imports
* **Deserialization** : Unsafe `pickle`, `yaml.load()`
* **SSRF / XSS** : In web frameworks like Flask/Django

BEFORE RUNNING, on internet browser:
   At https://platform.claude.com/settings/organization click "Set up organization".
   Answer questions about country, usage, etc. Submit to "Allow creating new API keys in default workspace".
   At https://platform.claude.com/settings/admin-keys click "Create admin key". Name such as "admin261231"
   Click "Copy key" and paste in your secrets manager or file ~/.secrets.env specified in .gitignore.
   The value is retrieved by code as api03="supersecret"
   ANTHROPIC_ADMIN_KEY="sk-ant-admin01-..." from console by org admins
   ANTHROPIC_API_KEY="sk-ant-api03-..."
   # POLICY: On the CLI Terminal, do not export system variables containing sensitive values, so they are not stored in CLI logs.

BEFORE RUNNING, on Terminal:
   # POLICY: Create a folder for git clone repositories to be created.
   git clone https://github.com/wilsonmar/python-samples.git --depth 1
   cd python-samples
   # uv init was run to set pyproject.toml & .python-version
   python3 -m pip install uv
   python -m venv .venv   # creates bin, include, lib, pyvenv.cfg
   uv venv .venv
   source .venv/bin/activate       # on macOS & Linux
        # ./scripts/activate       # PowerShell only
        # ./scripts/activate.bat   # Windows CMD only
   # POLICY: Add vulnerability scanning utilities. Fail if pyproject.toml and uv.lock are out of sync.
   uv add bandit safety semgrep dynaconf --frozen  # instead of pip install of utilities
   # POLICY: In production, uv sync --frozen --no-build installs project dependencies exactly as specified in the lockfile, without allowing any changes, with --no-build from source, only from pre-built .whl (wheel) executable binaries.

   ruff check claude-vulscan.py
   bandit -r ./my_project          # Security linter
   safety scan claude-vulscan.py   # Check dependencies in pyproject.toml for bad CVEs
   semgrep --config=auto .         # Pattern-based analysis

   chmod +x claude-vulscan.py
   uv run claude-vulscan.py -v -vv -b -m "haiku" -f "claude-vulscan.py"
      # -v for verbose, -b for bill (stats), -sl --sizelimit of code in bytes "1gb"
      # OPTIONAL: -pt for --prompt, -r --recursive,
      # -f for file to --target for scanning (at end of CWD: /Users/johndoe/github-wilsonmar/python-samples/)
           # Not specifying -f would result in this program processing all .py files in the current folder
      # -m for --model ID recognized by Claude ("claude-opus-4-7" or "claude-sonnet-4-20250514")
      # --nometric to not write csv file of results for each call.
      # Avg run time: Terminal should not freeze.
   # Press control+C to cancel/interrupt run.

AFTER RUN:
    deactivate  # uv
    rm -rf .venv .pytest_cache __pycache__

"""
#### SECTION 02: Dundar variables for git command gxp to git add, commit, push

# POLICY: Dunder (double-underline) variables readable from CLI outside Python
__commit_date__ = "2026-04-21"
__commit_msg__ = "26-04-21 v024 model select @claude-vulscan.py"
__repository__ = "https://github.com/bomonike/google/blob/main/claude-vulscan.py"
# __repository__ = "https://github.com/wilsonmar/python-samples/blob/main/claude-vulscan.py"
__status__ = "WORKING: ruff check claude-vulscan.py => All checks passed!"
# STATUS: Python 3.13.3 working on macOS Sequoia 15.3.1

# based on https://github.com/trkonduri/vulscan/blob/master/claude-vulscan.py

# TODO: Display menu of CLI parameters.
# TODO: Get default model_id from .env file.
# TODO: Add external enterprise robust logging
# TODO: import myutils  # in folder python-samples
# TODO: Track externally history of requests & responses metrics for trending
# batch https://platform.claude.com/docs/en/api/sdks/python#getting-results-from-a-batch

import argparse
import base64
import csv

# import json
import os
import re
import ssl
import subprocess
import sys
import time
from calendar import monthrange
from datetime import datetime, timezone  # , timedelta
from pathlib import Path

# POLICY: Use of 3rd-party packages are limited to minimize potential supply chain attacks,
import anthropic  # Anthropic Client SDK - from anthropic import Anthropic
from bs4 import BeautifulSoup
import certifi
import httpx
import ollama
import requests
from dotenv import load_dotenv  # install python-dotenv
from openai import OpenAI


# defaults overriden by command:
def parse_args():
    """Read parameters from command CLI."""
    parser = argparse.ArgumentParser(description="Claude vulnerability scanner")

    parser.add_argument("--verbose", "-v", action="store_true", help="Enable verbose output")
    parser.add_argument("--trace", "-vv", action="store_true", help="Enable detailed trace output")
    parser.add_argument("--bill", "-b", action="store_true", help="Enable billing output")
    parser.add_argument(
        "--target", "-f", type=str, required=False, help="-t = --target file to process within current folder"
    )
    parser.add_argument("--recursive", "-r", type=str, required=False, help="-r = --recursive process sub-folders too.")
    parser.add_argument(
        "--sizelimit", "-sl", type=str, required=False, help="-sl = --sizelimit of code other than default 2gb"
    )
    parser.add_argument("--prompt", "-pt", type=str, required=False, help="-pt = --prompt of ext for AI to process")
    parser.add_argument("--nometric", type=str, required=False, help="--nometric write to csv file")
    parser.add_argument("--model", "-m", type=str, help="-m --model = family or specific model_id to load")
    # https://www.aimadetools.com/blog/best-ollama-models-coding-2026/
    # ollama pull devstral-small:24b    # winner for pure coding tasks. It was specifically trained for agentic coding workflows — multi-file edits, terminal automation, and code repair. On a Mac with 16GB+, it runs smoothly.
    # ollama pull qwen3.5:27b           # Best all-rounder
    parser.add_argument(
        "--output", "-o", type=str, default="results.json", help="Output file path (default: results.json)"
    )
    # POLICY: No processing occurs if neither -r nor -f is specified.
    return parser.parse_args()


#### SECTION TODO: Move these functions to myutils.py and call the module.


def elapsedsecs_timestamp():
    """Capture timestamp for  elapsed time."""
    # POLICY: Use a common function to capture elapsed timestamps to ensure method is consistent.
    # POLICY: Capture start time for measuring standard python library load time.
    # NOTE: time.time() has been obsoleted.
    # from time import perf_counter_ns
    return time.monotonic()


def add_commas_in_int_string(number_string):
    """Add commas for thousands in a number within a string."""
    return f"{int(number_string):,}"  # Remove .2f if you don't want decimal places


def infer_from_utc(utc_timestamp) -> str:
    """Infer from system the local timestamp for UTC timestamp like 2026-04-16T21:58:31Z."""
    # from datetime import datetime
    utc_time = datetime.fromisoformat(utc_timestamp.replace("Z", "+00:00"))
    local_time = utc_time.astimezone()  # uses system timezone
    return local_time.strftime("%Y-%m-%d %I:%M:%S %p %Z %z")
    # See https://www.geeksforgeeks.org/python/python-strftime-function/


def parse_bytes(size_str: str) -> int:
    """Convert human-readable byte size string to number of bytes."""
    units = {
        "b": 1,
        "kb": 1024,
        "mb": 1024**2,
        "gb": 1024**3,
        "tb": 1024**4,
        "pb": 1024**5,
    }

    size_str = size_str.strip().lower()

    # Split number and unit:
    i = 0
    while i < len(size_str) and (size_str[i].isdigit() or size_str[i] == "."):
        i += 1

    number = float(size_str[:i])
    unit = size_str[i:].strip()

    if unit not in units:
        raise ValueError(f"Unknown unit: '{unit}'. Valid units: {list(units.keys())}")

    return int(number * units[unit])


def format_bytes(num_bytes: int, precision: int = 2) -> str:
    """Convert number of bytes to human-readable string."""
    units = ["b", "kb", "mb", "gb", "tb", "pb"]

    value = float(num_bytes)
    for unit in units:
        if abs(value) < 1024 or unit == units[-1]:
            if unit == "b":
                return f"{int(value)}b"
            formatted = f"{value:.{precision}f}".rstrip("0").rstrip(".")
            return f"{formatted}{unit}"
        value /= 1024
    return


def get_user_local_timestamp(format_str: str = "yymmddhhmm") -> str:
    """Return a string formatted with datetime stamp in local timezone.

    Not used in logs which should be in UTC.
    Example: "07:17 AM (07:17:54) 2025-04-21 MDT"
    """
    current_time = time.localtime()  # localtime([secs])
    year = str(current_time.tm_year)[-2:]  # Last 2 digits of year
    month = str(current_time.tm_mon).zfill(2)  # .zfill(2) = zero fill
    day = str(current_time.tm_mday).zfill(2)  # Day with leading zero
    hour = str(current_time.tm_hour).zfill(2)  # Day with leading zero
    minute = str(current_time.tm_min).zfill(2)  # Day with leading zero
    if format_str == "yymmdd":
        return f"{year}{month}{day}"
    if format_str == "yymmddhhmm":
        return f"{year}{month}{day}{hour}{minute}"


def format_elapsed_time(time_str: str) -> str:
    """Format elapsed time."""
    # Remove leading "00:" groups
    # import re
    result = re.sub(r"^(00:)+", "", str(time_str))
    return result


def elapsed_time2format(seconds) -> str:
    """Format elapsed monotonic floating number to human-readable."""
    # seconds = time.monotonic()
    # import time
    hours, remainder = divmod(seconds, 3600)
    minutes, secs = divmod(remainder, 60)
    readable = f"{int(hours):02}:{int(minutes):02}:{secs:06.3f}"

    # POLICY: Match regex ^(00:)+ one or more 00 so groups at the start of the string are removed all at once:
    # import re  # regular expression
    truncated = re.sub(r"^(00:)+", "", str(readable))  # 00:00:45.123 to 45.123
    return truncated


def program_greeting(pgm_name: str, args, elapsedsecs):
    """Print start-of-program greeting."""
    if args.verbose:
        print(f"STARTING: {pgm_name} from uptime: {elapsed_time2format(elapsedsecs)} ({elapsedsecs}).")
    if args.trace:
        print(f"TRACE: __commit_msg__={__commit_msg__}")


def format_bytes_test():
    """Test format_bytes function."""
    print("parse_bytes tests:")
    # For use by
    test_cases = [
        ("1kb", 1024),
        ("1mb", 1048576),
        ("1gb", 1073741824),
        ("1.5gb", 1610612736),
        ("512b", 512),
        ("2tb", 2199023255552),
    ]
    for s, expected in test_cases:
        result = parse_bytes(s)
        status = "✓" if result == expected else "✗"
        print(f"  {status} parse_bytes({s!r}) = {result} (expected {expected})")

    print("\nformat_bytes tests:")
    roundtrip = ["1kb", "1mb", "1gb", "2tb"]
    for s in roundtrip:
        b = parse_bytes(s)
        back = format_bytes(b)
        print(f"  {s!r} → {b} bytes → {back!r}")

    print("\nformat_bytes edge cases:")
    for b in [0, 500, 1023, 1536, 1048576 * 2.5]:
        print(f"  format_bytes({int(b)}) = {format_bytes(int(b))!r}")


def print_table(headers, rows, col_width=25):
    """Print table with lines."""
    separator = "+" + "+".join(["-" * (col_width + 2)] * len(headers)) + "+"

    def format_row(cells):
        return "|" + "|".join(f" {str(c):<{col_width}} " for c in cells) + "|"

    print(separator)
    print(format_row(headers))
    print(separator)
    for row in rows:
        print(format_row(row))
    print(separator)


#### SECTION files and folder handling utilities


#### SECTION 03 - .env file


def open_env_file(global_env_path: str) -> str:
    """Load global variables from .env file based on hard-coded default location.

    Args: global ENV_FILE
    See https://wilsonmar.github.io/python-samples/#envLoad
    See https://stackoverflow.com/questions/40216311/reading-in-environment-variables-from-an-environment-file
    """
    # from pathlib import Path
    # PROTIP: Check if .env file on global_env_path is readable:
    if not os.path.isfile(global_env_path):
        global_env_path = None
        print(f'FATAL: {sys._getframe().f_code.co_name}(): global_env_path: not at "{global_env_path}" ')
        exit()

    # from dotenv import load_dotenv
    # See https://www.python-engineer.com/posts/dotenv-python/
    # See https://pypi.org/project/python-dotenv/
    load_dotenv(global_env_path)  # using load_dotenv
    # Wait until variables for print_trace are retrieved:
    print(f'VERBOSE: {sys._getframe().f_code.co_name}(): global_env_path="{global_env_path}" ')
    return


def get_str_from_env_file(key_in: str) -> str:
    """Return a value of string data type from OS environment or .env file."""
    # load the .env file:
    # load_dotenv(Path.home() / "python-samples.env")

    # retrieve a variable like key_in = "API_KEY":
    env_value = os.getenv(key_in)

    # POLICY: Display only first 3 characters of a potentially secret long string.
    # if len(env_var) > 5:
    #     print_("TRACE: (key_in + "=\"" + str(env_var[:5]) +" (remainder removed)")
    # else:
    #     print("TRACE: (key_in + "=\"" + str(env_var) + "\" from .env")
    #     return str(env_var)

    return env_value


def safe_path(base: Path, target: str) -> Path:
    """Return file path if it's resolved as not escapable and thus safe to use."""
    # Utility.
    resolved = (base / target).resolve()
    if not resolved.is_relative_to(base):
        raise ValueError(f"Path traversal detected: '{target}' escapes the base directory.")
        return None
    # TODO: Apply scan on file
    return resolved


def read_github_repo(owner, repo, branch="main", token=None):
    """
    Read all files within a public GitHub repo via the GitHub API.

    files = read_github_repo("owner", "repo-name")
    for path, content in files.items():
        print(f"--- {path} ---")
        print(content[:500])  # Print first 500 chars of each file

    Args:
        owner: GitHub username or org (e.g. "torvalds")
        repo: Repository name (e.g. "linux")
        branch: Branch name (default "main")
        token: Optional GitHub personal access token (for private repos / higher rate limits)
    """
    headers = {"Accept": "application/vnd.github+json"}
    if token:
        # POLICY: Validate token to prevent HTTP header injection.
        # GitHub tokens (classic PATs, fine-grained, OAuth) are alphanumeric + underscores/hyphens only.
        if not re.match(r"^[\w\-]+$", token):
            raise ValueError("GitHub token contains invalid characters.")
        headers["Authorization"] = f"Bearer {token}"

    base_url = f"https://api.github.com/repos/{owner}/{repo}"

    def get_files(path=""):
        url = f"{base_url}/contents/{path}?ref={branch}"
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        files = {}
        for item in response.json():
            if item["type"] == "file":
                # Fetch and decode file content
                file_response = requests.get(item["url"], headers=headers)
                file_response.raise_for_status()
                content = base64.b64decode(file_response.json()["content"]).decode("utf-8", errors="replace")
                files[item["path"]] = content
            elif item["type"] == "dir":
                # Recurse into subdirectory
                files.update(get_files(item["path"]))

        return files

    return get_files()


def target_within_sizelimit(code_size, args) -> bool:
    """Format messages around True if file is within limit defined by args.sizelimit or default_sizelimit."""
    # Convert code_size to human-readable format like "2gb"
    code_size_formatted = format_bytes(code_size)
    if args.verbose:
        print(
            f"INFO: code file {args.target} contains {code_size_formatted} bytes ({add_commas_in_int_string(code_size)} characters)"
        )

    if args.sizelimit:  # -sl specified in command parameter:
        code_size_limit = parse_bytes(args.sizelimit)
    else:  # TODO: Adjust sizelimit scientifically rather than a random default of "1gb".
        code_size_limit = parse_bytes("1gb")

    if code_size > code_size_limit:
        print(
            f"ERROR: code file {args.target} is larger than the {add_commas_in_int_string(code_size_limit)} character limit."
        )
        return False
    else:
        print(
            f"GREAT: code file {args.target} is within the {add_commas_in_int_string(code_size_limit)} character limit."
        )
        return True


#### SECTION


def _make_anthropic_client(api_key: str) -> anthropic.Anthropic:
    """Create an Anthropic client with strict SSL verification via certifi CA bundle.

    POLICY: Always pass an explicit httpx.Client so SSL hardening is active for every
    Anthropic API call. Using anthropic.Anthropic() without http_client uses the SDK's
    default transport which does not enforce our certificate pinning policy.
    """
    ssl_ctx = ssl.create_default_context()
    ssl_ctx.load_verify_locations(certifi.where())
    ssl_ctx.verify_mode = ssl.CERT_REQUIRED
    ssl_ctx.check_hostname = True
    http_client = httpx.Client(verify=ssl_ctx, timeout=30.0)
    return anthropic.Anthropic(api_key=api_key, http_client=http_client)


def _extract_anthropic_error_message(err: Exception) -> str:
    """Return the human-readable `error.message` from an Anthropic APIStatusError, or ''.

    The SDK wraps HTTP errors with a `.response` httpx.Response; its JSON body looks like
    {"type": "error", "error": {"type": "invalid_request_error", "message": "..."}}.
    We defensively try multiple access paths so a malformed body never masks the original
    exception from the caller's except-block.
    """
    # Preferred: parse the response body.
    response = getattr(err, "response", None)
    if response is not None:
        try:
            body = response.json()
            if isinstance(body, dict):
                error_obj = body.get("error") or {}
                if isinstance(error_obj, dict):
                    msg = error_obj.get("message")
                    if isinstance(msg, str) and msg:
                        return msg
        except (ValueError, AttributeError):
            pass
    # Fallback: SDK-populated attribute, then str(err).
    msg = getattr(err, "message", None)
    if isinstance(msg, str) and msg:
        return msg
    return str(err) if err else ""


def openai_vulscan_code(filepath: str, code: str, prompt_text: str, model_id: str) -> dict | None:
    """Run OpenAI API call via Ollama."""
    try:
        # POLICY: To access Ollama's local AI server, use the OpenAI API interface standard it follows.
        # from openai import OpenAI
        # call_api_key = get_str_from_env_file("OPENAI_API_KEY")
        client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
        response = client.chat.completions.create(
            model={model_id}, messages=[{"role": "user", "content": {prompt_text}}]
        )
        return response.choices[0].message.content
    # print("FATAL: Run cannot continue without OpenAI client!")
    # POLICY: Even on failure, do not exit program until billing info for run is displayed.
    except FileNotFoundError:
        print(f"Error: Target file '{filepath}' not found.")
    except PermissionError:
        print(f"Error: Permission denied to access '{filepath}'.")
    except KeyError as e:
        print(f"Error: Expected key missing in scan results: {e}")
    except TypeError as e:
        print(f"Error: Unexpected return type from ant_vulscan_code(): {e}")
    except Exception as e:
        print(f"Unexpected error while scanning '{filepath}': {e}")


def load_llm_into_ollama(model_name: str) -> dict:
    """
    Load (pull) an LLM model into Ollama.

    Args:
        model_name: Name of the model to load (e.g., 'llama3.2', 'mistral', 'gemma2')

    Returns
    -------
        dict with 'success' bool and 'message' string
    """
    try:
        # import subprocess
        result = subprocess.run(
            ["ollama", "pull", model_name],
            capture_output=True,
            text=True,
            timeout=600,  # 10 min timeout for large models
        )

        if result.returncode == 0:
            return {"success": True, "message": f"Model '{model_name}' loaded successfully."}
        else:
            return {"success": False, "message": result.stderr.strip()}

    except FileNotFoundError:
        return {"success": False, "message": "Ollama is not installed or not in PATH."}
    except subprocess.TimeoutExpired:
        return {"success": False, "message": "Timed out. The model may still be downloading."}

    # Using CLI
    # print(load_llm_into_ollama("llama3.2"))


def load_llm_via_api(model_name: str, ollama_url: str = "http://localhost:11434") -> dict:
    """
    Load (pull) an LLM model into Ollama via its REST API.

    Args:
        model_name: Name of the model to load (e.g., 'llama3.2', 'mistral')
        ollama_url: Base URL of the Ollama server
    Returns:
        dict with 'success' bool and 'message' string
    """
    try:
        response = requests.post(f"{ollama_url}/api/pull", json={"name": model_name, "stream": False}, timeout=600)
        response.raise_for_status()
        data = response.json()

        if data.get("status") == "success":
            return {"success": True, "message": f"Model '{model_name}' loaded successfully."}
        else:
            return {"success": False, "message": data.get("status", "Unknown response.")}

    except requests.exceptions.ConnectionError:
        return {"success": False, "message": f"Could not connect to Ollama at {ollama_url}. Is it running?"}
    except requests.exceptions.HTTPError as e:
        return {"success": False, "message": f"HTTP error: {e}"}
    except requests.exceptions.Timeout:
        return {"success": False, "message": "Request timed out. The model may still be downloading."}

    # Using REST API
    # print(load_llm_via_api("mistral"))


def print_model_info(model: dict, indent: int = 4) -> None:
    """Print model info from json."""
    pad = " " * indent
    for key, value in model.items():
        if isinstance(value, dict):
            print(f"{pad}{key}:")
            for k, v in value.items():
                print(f"{pad}  {k}: {v}")
        else:
            print(f"{pad}{key}: {value}")


def run_is_within_budget(model_id: str, code_from_file_bytes: bytes) -> float | None:
    """
    Issue an Anthropic API to print out subscription token limits for the org.

    This is used instead of using the Console at https://platform.claude.com/usage
    Although Anthropic currently doesn't have a 'get subscription plan' endpoint,
    so we infer tier info from the rate limit headers returned on every API call.
    Limits are set at the organization level on the Limits page in the Claude Console.
    Rate limits differ by tier: Free (Tier 1) vs paid tiers (Tier 2, 3, 4).
    See: https://platform.claude.com/docs/en/api/rate-limits#spend-limits
    Under Anthropic's token bucket algorithm (https://en.wikipedia.org/wiki/Token_bucket)
    tiers are increased automatically as you reach certain thresholds while using the API.
    Maximum input and output tokens per minute vary by model version.
    See https://platform.claude.com/settings/limits
    """
    # POLICY: Use _make_anthropic_client() to enforce SSL hardening on all Anthropic API calls.
    client_api_key = get_str_from_env_file("ANTHROPIC_API_KEY")
    # POLICY: Fail fast with a clean message (no traceback, no secret) if the key is missing.
    if not client_api_key:
        print("FATAL: ANTHROPIC_API_KEY is not set. Export it or add it to ~/.claude-vulscan.env before running.")
        return None
    client = _make_anthropic_client(client_api_key)
    client_api_key = ""
    # Make a minimal API call to capture response headers:
    # POLICY: This probe itself consumes credits, so billing failures must be surfaced here
    # with an actionable message instead of letting a raw 400 traceback escape to the user.
    try:
        response = client.messages.with_raw_response.create(
            model=model_id, max_tokens=10, messages=[{"role": "user", "content": "Hi"}]
        )
    except anthropic.BadRequestError as e:  # 400 — includes "credit balance too low"
        api_message = _extract_anthropic_error_message(e)
        lowered = api_message.lower()
        if "credit balance" in lowered or "insufficient" in lowered or "billing" in lowered or "upgrade" in lowered:
            print("FATAL: Anthropic credit balance is too low to access the API.")
            print("   Add credits or upgrade your plan at:")
            print("   https://console.anthropic.com/settings/billing")
            if api_message:
                print(f"   API said: {api_message}")
        else:
            print(f"FATAL: Anthropic API rejected the budget probe (400): {api_message or e}")
        return None
    except anthropic.AuthenticationError:  # 401
        print("FATAL: Invalid or missing Anthropic API key (ANTHROPIC_API_KEY).")
        return None
    except anthropic.PermissionDeniedError as e:  # 403
        print(f"FATAL: Anthropic API permission denied for model '{model_id}': {e}")
        return None
    except anthropic.NotFoundError:  # 404
        print(f"FATAL: Anthropic model '{model_id}' not found or not available to this key.")
        return None
    except anthropic.RateLimitError:  # 429
        print("ERROR: Rate limit exceeded during budget probe — back off and retry shortly.")
        return None
    except anthropic.APIConnectionError as e:
        print(f"FATAL: Could not connect to Anthropic API: {e}")
        return None
    except anthropic.APIStatusError as e:  # catch-all for other non-2xx responses
        api_message = _extract_anthropic_error_message(e)
        print(f"FATAL: Anthropic API error {e.status_code}: {api_message or getattr(e, 'message', e)}")
        return None

    print("=== Anthropic Claude Organization Limits: Rate Limits on API capacity ===")
    # Also shown on GUI Console at https://platform.claude.com/settings/limits
    headers = response.headers
    # POLICY: Keep timestamps using GMT/UTC but convert to local time zone for printing out to user.
    token_reset_local_time = infer_from_utc(headers.get("anthropic-ratelimit-tokens-reset"))
    requests_local_time = infer_from_utc(headers.get("anthropic-ratelimit-requests-reset"))
    print(f"requests_reset on : {requests_local_time} ({headers.get('anthropic-ratelimit-requests-reset')}) UTC")
    print(f"tokens_reset on   : {token_reset_local_time} ({headers.get('anthropic-ratelimit-tokens-reset')}) UTC")
    # Infer approximate tier from requests-per-minute limit:

    # NOTE: Rate limit is to protect the vendor from sudden rush crashing their system:
    rpm = int(headers.get("anthropic-ratelimit-requests-limit"))
    # if limits["requests_limit"] else None
    if not rpm:
        print("No rpm to identify tier!")
        tier = "???"
    else:
        if rpm <= 50:
            tier = "Tier 1 (Build - likely free or new account)"
        elif rpm <= 1000:
            tier = "Tier 2 (Build)"
        elif rpm <= 2000:
            tier = "Tier 3 (Scale)"
        else:
            tier = "Tier 4 (Scale) or higher"
        # print(f"\nInferred tier: {tier}")
    limits = {
        "requests_limit": (f"{headers.get('anthropic-ratelimit-requests-limit')}", f"per minute = {tier}"),
        "input_tokens_limit": (f"{headers.get('anthropic-ratelimit-input-tokens-limit')}", "per minute"),
        "output_tokens_limit": (f"{headers.get('anthropic-ratelimit-output-tokens-limit')}", "per minute"),
        "requests_remaining": (f"{headers.get('anthropic-ratelimit-requests-remaining')}", "-"),
        "tokens_limit": (f"{headers.get('anthropic-ratelimit-tokens-limit')}", "-"),
        "tokens_remaining": (f"{headers.get('anthropic-ratelimit-tokens-remaining')}", "-"),
    }
    for key, (value, extra) in limits.items():
        if value:
            extra_col = extra if extra else "N/A"
            print(f"  {key:<20}: {value:<8}  {extra_col:<9}")

    # NOTE: Claude.ai plans (Free/Pro/Team/Enterprise) are for the chat interface
    # NOTE: Claude API accounts use a tiered system (Tier 1–4) based on usage history and spending, reflected in rate limits.
    # See https://docs.anthropic.com/en/api/rate-limits

    # print(f"ERROR: Not enough tokens to use {tokens_expected} tokens for this run.")
    #   return False

    # TODO: POLICY: Plug in a random number until we can figure out what nmber to give ;)
    tokens_expected = 2048

    return tokens_expected


def get_token_usage(response) -> dict:
    """Extract token usage from an Anthropic API response."""
    usage = response.usage
    return {
        "input_tokens": usage.input_tokens,
        "output_tokens": usage.output_tokens,
        "total_tokens": usage.input_tokens + usage.output_tokens,
    }


def get_billing_period(admin_api_key: str) -> dict:
    """
    Return the current billing period (calendar month).

    Also fetches usage cost from Anthropic's Cost Report API.
    Requires an Admin API key (sk-ant-admin...) from the Claude Console.
    """
    now = datetime.now(timezone.utc)

    # Billing resets ond day 1 (the start) of each calendar month:
    period_start = now.replace(day=1, hour=0, minute=0, second=0, microsecond=0)
    last_day = monthrange(now.year, now.month)[1]
    period_end = now.replace(day=last_day, hour=23, minute=59, second=59, microsecond=0)
    days_remaining = (period_end - now).days + 1

    # Query Anthropic Cost Report API for this billing period
    url = "https://api.anthropic.com/v1/organizations/cost_report"
    headers = {
        "x-api-key": admin_api_key,
        "anthropic-version": "2023-06-01",
    }
    params = {
        "starting_at": period_start.strftime("%Y-%m-%dT%H:%M:%SZ"),
        "ending_at": now.strftime("%Y-%m-%dT%H:%M:%SZ"),
        "bucket_width": "1d",
    }
    try:
        # POLICY: Use hardened SSL context (certifi CA bundle, CERT_REQUIRED) for all outbound
        # HTTPS calls carrying sensitive credentials — including the admin API key.
        ssl_ctx = ssl.create_default_context()
        ssl_ctx.load_verify_locations(certifi.where())
        ssl_ctx.verify_mode = ssl.CERT_REQUIRED
        ssl_ctx.check_hostname = True
        with httpx.Client(verify=ssl_ctx, timeout=10.0) as http_client:
            response = http_client.get(url, headers=headers, params=params)
        response.raise_for_status()
        cost_data = response.json()
    except httpx.HTTPStatusError as e:
        cost_data = {"error": str(e)}
    except httpx.RequestError as e:
        cost_data = {"error": f"Request failed: {e}"}

    return {
        "billing_period_start": period_start.isoformat(),
        "billing_period_end": period_end.isoformat(),
        "days_elapsed": now.day,
        "days_remaining": days_remaining,
        "queried_at": now.isoformat(),
        "cost_report": cost_data,
    }


def obtain_code_from_file(args, target: str, filepath: str) -> str | None:
    """Obtain code of individual file targeted."""
    # TODO: try @retries
    try:
        with open(filepath, "r") as f:
            code = f.read()
    # POLICY: Catch specific exceptions for better debugging.
    except FileNotFoundError:
        print(f"Error: File '{filepath}' not found.")
        return None  # or raise, or sys.exit(1)
    except PermissionError:
        print(f"Error: Permission denied to read '{filepath}'.")
        return None
    except IsADirectoryError:
        print(f"Error: '{filepath}' is a directory, not a file.")
        return None
    except UnicodeDecodeError as e:
        print(f"Error: Unable to decode file '{filepath}': {e}")
        return None
    except OSError as e:
        print(f"Error: OS error while reading '{filepath}': {e}")
        return None

    # POLICY: Before proceeding, use Guard check to ensure code to be processed did not encounter exception.
    if code is None:
        return None
    else:
        file_size = len(code)
        if args.verbose:
            print(f"TRACE: code_from_file contains {file_size} characters.")
        if target_within_sizelimit(file_size, args):
            # POLICY: Limit **Unrestricted file read** Any file readable by the process can be scanned and its contents exfiltrated to the external API.
            print(f"TRACE: obtain_code_from_file() returning code with file_size {file_size}.")
            return code
        else:
            # ERROR message is issued by the called function so it can be customized using args settings.
            print(f"ERROR: obtain_code_from_file() returning None with file_size {file_size}.")
            return None


def ant_vulscan_code(
    args, filepath: str, code: str, prompt_text: str, model_id: str, api_max_tokens=2048
) -> dict | None:
    """Scan file using Anthropic API call.

    CAUTION: **Sensitive data sent to external API** File contents are sent to Anthropic's API without sanitization. If scanned files contain secrets/credentials, they are exfiltrated.
    """
    # POLICY: Hard-code a api_max_tokens variable to ensure one.
    # TODO: POLICY: Specify api_max_tokens based emphirically what is needed for code size, tokens consumed, etc.
    # if not api_max_tokens:
    #    api_max_tokens = 2048 # or 1024

    # TODO: POLICY: To keep secrets off logs, obtain api_keys by lookup from a secrets manager rather than from CLI parameters.
    # POLICY: It's better to take a bit longer than to expose the key while running code that doesn't require the secret.
    client_api_key = get_str_from_env_file("ANTHROPIC_API_KEY")
    # POLICY: Avoid **API key length logged to stdout** hackers use for fingerprinting encryption.
    # POLICY: Exit the run immediately if the API KEY is unavailable.
    if not client_api_key:
        raise EnvironmentError("ANTHROPIC_API_KEY is not set. Please export it before running this script.")
    try:
        client = _make_anthropic_client(client_api_key)
        client_api_key = ""
        # POLICY: Set a generous per-request timeout for large file scans to prevent indefinite hangs.
        response = client.messages.create(
            model=model_id,
            max_tokens=api_max_tokens,
            timeout=120.0,
            messages=[{"role": "user", "content": f"{prompt_text}\n\n{code}"}],
        )
        # See https://platform.claude.com/docs/en/api/sdks/python#token-counting
        # print(f"DEBUGGING: {response.input_tokens}")
        # Usage(input_tokens=25, output_tokens=13)
        # QUESTION: Still specify filepath here?
        return {"file": filepath, "findings": response.content[0].text}

    # print("FATAL: Run cannot continue without Anthropic client!")
    # POLICY: Even on failure, do not exit program until billing info for run is displayed.
    except anthropic.AuthenticationError as e:
        # POLICY: Auth failures are security events — re-raise to halt execution.
        print(f"Error: Authentication failed — check ANTHROPIC_API_KEY: {e}")
        raise
    except anthropic.RateLimitError:
        print(f"Error: Rate limit exceeded while scanning '{args.target}'.")
    except anthropic.APIConnectionError as e:
        print(f"Error: Connection to Anthropic API failed: {e}")
    except anthropic.APIStatusError as e:
        print(f"Error: Anthropic API error {e.status_code} while scanning '{args.target}': {e.message}")
    except FileNotFoundError:
        print(f"Error: Target file '{args.target}' not found.")
    except PermissionError:
        print(f"Error: Permission denied to access '{args.target}'.")
    except KeyError as e:
        print(f"Error: Expected key missing in scan results: {e}")
    except TypeError as e:
        print(f"Error: Unexpected return type from ant_vulscan_code(): {e}")
    except Exception as e:
        print(f"Unexpected error while scanning '{args.target}': {e}")


def expose_global_args(args) -> str | None:
    """Expose specific args to become global."""
    return args.prompt


def write_call_to_csv(
    args,
    target_file,
    call_seq,
    call_start_utc: str,
    elapsed_seconds: float,
    bytes_processed: int,
    model_id: str,
    lines_out: str,
    metrics_filepath: str,
) -> None:
    """Write line to call metadata csv."""
    # POLICY: Use a --nometric parameter to optionally not write call metrics to a .csv file.
    if args.nometric:
        print("METRIC: Not shown due to --nometric parameter in program call in CLI.")
        return None

    bytes_processed_fmt = format_bytes(bytes_processed)
    elapsed_seconds_fmt = format_elapsed_time(elapsed_seconds)
    # POLICY: Do not put sensitive text within unencrypted csv files.
    print(
        f"\nMETRIC: At {call_start_utc}, {bytes_processed_fmt} bytes {target_file} took {elapsed_seconds_fmt} secs for {lines_out} findings thru {model_id}."
    )

    # import csv
    row = {
        "call_seq": call_seq,
        "target_file": target_file,
        "start_utc": call_start_utc,
        "elapsed_seconds": elapsed_seconds,
        "bytes_processed": bytes_processed,
        "model_id": model_id,
        "lines_out": lines_out,
    }

    if args.verbose:
        print(f"VERBOSE: {sys._getframe().f_code.co_name}( filepath = {metrics_filepath}")
    if not metrics_filepath:
        return None

    file_exists = os.path.exists(metrics_filepath)
    with open(metrics_filepath, "a", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=row.keys())
        if not file_exists:
            writer.writeheader()
        writer.writerow(row)


def print_cost_report(cost_report):
    """Print Anthropic cost report line.

    cost_report: {'data': [{'starting_at': '2026-04-01T00:00:00Z', 'ending_at': '2026-04-02T00:00:00Z', 'results': []}, {'starting_at': '2026-04-02T00:00:00Z', 'ending_at': '2026-04-03T00:00:00Z', 'results': [{'currency': 'USD', 'amount': '59.225', 'workspace_id': None, 'description': None, 'cost_type': None, 'context_window': None, 'model': None, 'service_tier': None, 'token_type': None, 'inference_geo': None}]}, {'starting_at': '2026-04-03T00:00:00Z', 'ending_at': '2026-04-04T00:00:00Z', 'results': []}, {'starting_at': '2026-04-04T00:00:00Z', 'ending_at': '2026-04-05T00:00:00Z', 'results': []}, {'starting_at': '2026-04-05T00:00:00Z', 'ending_at': '2026-04-06T00:00:00Z', 'results': []}, {'starting_at': '2026-04-06T00:00:00Z', 'ending_at': '2026-04-07T00:00:00Z', 'results': []}, {'starting_at': '2026-04-07T00:00:00Z', 'ending_at': '2026-04-08T00:00:00Z', 'results': []}], 'has_more': True, 'next_page': 'page_?='}
    """
    # Summarize:
    pairs = [(r["currency"], float(r["amount"])) for item in cost_report["data"] for r in item["results"]]
    # TODO: Lookup currency symbols for currencies of all countries' currencies.
    currency_symbols = {"USD": "$", "EUR": "€", "GBP": "£"}

    # Default to blank currency_symbol.
    currency_symbol = ""
    for currency, amount in pairs:
        currency_symbol = currency_symbols.get(currency, "")
        print(f"  cost_report: {currency}: {currency_symbol}{amount} MTD (Month-To-Date)")
        #   cost_report: USD: $59.225 MTD


def ant_billing(model_id) -> float | None:
    """Make API call to get rate limit headers."""
    # Billing runs on a calendar month cycle — invoices are issued at the end of every calendar month via Stripe.
    # The ANTHROPIC_ADMIN_KEY (sk-ant-admin...) required to get the Cost Report is different from a standard API key
    admin_api_key = get_str_from_env_file("ANTHROPIC_ADMIN_KEY")
    if not admin_api_key:
        print("ERROR: ANTHROPIC_ADMIN_KEY retrieval from .env failed!")
        return None

    # POLICY: Admin keys (sk-ant-admin01-...) are only valid for Admin API endpoints, not the Messages API.
    # Use ANTHROPIC_API_KEY for messages.create() and ANTHROPIC_ADMIN_KEY only for billing endpoints.
    client_api_key = get_str_from_env_file("ANTHROPIC_API_KEY")
    if not client_api_key:
        print("ERROR: ANTHROPIC_API_KEY retrieval from .env failed!")
        return None
    client = _make_anthropic_client(client_api_key)
    # POLICY: Delete each secret value after every use rather than let secret keys linger (exposed to theft).
    client_api_key = ""

    # POLICY: Use a appropriate number of max_tokens when calling API for response headers, identified by experimentation.
    # POLICY: Catch Anthropic API errors (auth, rate limit, connection, billing/400 BadRequest, etc.) so that
    # a billing display failure does not abort the run or mask prior findings output.
    response = None
    try:
        response = client.messages.create(
            model=model_id, max_tokens=10, messages=[{"role": "user", "content": "Hi"}]
        )
    except anthropic.AuthenticationError as e:
        print(f"ERROR: ant_billing() authentication failed — check ANTHROPIC_API_KEY: {e}")
    except anthropic.RateLimitError as e:
        print(f"ERROR: ant_billing() rate limit exceeded: {e}")
    except anthropic.APIConnectionError as e:
        print(f"ERROR: ant_billing() connection to Anthropic API failed: {e}")
    except anthropic.BadRequestError as e:
        # e.g., "Your credit balance is too low to access the Anthropic API."
        print(f"ERROR: ant_billing() Anthropic API 400 bad request: {e}")
    except anthropic.APIStatusError as e:
        print(f"ERROR: ant_billing() Anthropic API error {e.status_code}: {e.message}")
    except Exception as e:
        print(f"ERROR: ant_billing() unexpected error obtaining token usage headers: {e}")

    try:
        result = get_billing_period(admin_api_key)  # make the API call
    except Exception as e:
        print(f"ERROR: ant_billing() failed to retrieve billing period: {e}")
        result = None
    # POLICY: Delete ANTHROPIC_ADMIN_KEY value after every use rather than let secret keys to linger (exposed to theft).
    admin_api_key = ""

    if result:
        print(f'\nFor model: "{my_model_id}" ')
        print(f"Billing period month: {result['billing_period_start']} → {result['billing_period_end']}")
        # 2026-04-01T00:00:00+00:00 → 2026-04-30T23:59:59+00:00
        print(f"  Days elapsed   : {result['days_elapsed']}")
        print(f"  Days remaining : {result['days_remaining']}")
        print_cost_report(result["cost_report"])

    # POLICY: Only attempt to extract token usage if the messages.create() call succeeded.
    tokens = get_token_usage(response) if response is not None else None
    if tokens: