Skip to content

Commit 5454978

Browse files
authored
Merge pull request #297 from ansible/TamiTakamiya/AAP-67022/remove-llama-index-dependencies
Copy required files from Lightspeed Core to minimize external dependencies
2 parents 1322807 + 19ed122 commit 5454978

20 files changed

Lines changed: 2470 additions & 101 deletions

Containerfile-aap

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,28 @@
1-
FROM quay.io/lightspeed-core/rag-content-cpu as builder
1+
FROM registry.access.redhat.com/ubi9/python-312 as builder
22
ARG EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
33
ARG AAP_VERSION=2.6
44

55
WORKDIR /rag-content
6+
USER 0
67

78
COPY aap-product-docs-plaintext ./aap-product-docs-plaintext
89
COPY additional_docs ./additional_docs
910

1011
COPY scripts/custom_processor-aap.py .
11-
RUN uv run python custom_processor-aap.py \
12+
COPY requirements.txt ./requirements.txt
13+
COPY embeddings_model ./embeddings_model
14+
COPY src/aap_rag_content ./aap_rag_content
15+
16+
RUN python3.12 -m venv .venv
17+
RUN pip3.12 install --upgrade pip
18+
RUN pip3.12 install --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt
19+
20+
RUN cd embeddings_model; if [ ! -f model.safetensors ]; then \
21+
curl -L -O https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/9a3225965996d404b775526de6dbfe85d3368642/model.safetensors; \
22+
fi
23+
24+
ENV PYTHONPATH=/rag-content
25+
RUN python custom_processor-aap.py \
1226
-o ./llama_stack_vector_db \
1327
-f /rag-content \
1428
-md embeddings_model/ \

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ verify: ## Verify the code using various linters
5353
black --check scripts
5454
ruff check scripts --per-file-ignores=scripts/*:S101
5555

56+
unit-test: ## Run unit tests
57+
PYTHONPATH=src:$$PYTHONPATH .venv/bin/pytest tests/ -v -c pyproject.toml
58+
5659
update-docs: ## Update the plaintext OCP docs in ocp-product-docs-plaintext/
5760
@set -e && for OCP_VERSION in $$(ls -1 ocp-product-docs-plaintext); do \
5861
scripts/get_ocp_plaintext_docs.sh $$OCP_VERSION; \

README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,10 @@ Knowledge Portal (OKP), which provides off-line access to
1313
Red Hat product documentation.
1414
Copies of markdown files are stored in this repository.
1515

16-
For generating vector DB, we are using scripts provided by
17-
the [lightspeed-core/rag-content](https://github.com/lightspeed-core/rag-content/)
18-
repository with our own [custom_processor-aap.py script](./scripts/custom_processor-aap.py).
19-
See [the README.md file of lightspeed-core/rag-content](https://github.com/lightspeed-core/rag-content/blob/main/README.md)
20-
for technical details.
16+
For generating vector DB, we use the `aap_rag_content` Python package
17+
located in the `src/` directory, which provides document processing and
18+
vector database generation capabilities. The [custom_processor-aap.py script](./scripts/custom_processor-aap.py)
19+
uses this package to process AAP documentation and build the vector database.
2120

2221

2322
## Input files for Vector DBs
@@ -36,6 +35,21 @@ extension.
3635
used as the input for vector DB. Its `.metadata` subdirectory contains
3736
metadata JSON files.
3837

38+
## Running Unit Tests
39+
40+
To run the unit tests for the `aap_rag_content` package:
41+
42+
```commandline
43+
make unit-test
44+
```
45+
46+
This will execute all tests in the `tests/` directory using pytest with verbose output.
47+
The test suite includes tests for:
48+
- Metadata processing
49+
- Document processing
50+
- Vector database generation
51+
- Utility functions
52+
3953
## Build a container image
4054

4155
Following command builds a container image, which includes the generated vector DB.

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ line-length = 100
1515
[tool.mypy]
1616
disable_error_code = ["union-attr", "return-value", "arg-type", "import-untyped"]
1717
ignore_missing_imports = true
18+
mypy_path = ["src"]
1819

1920
# https://docs.astral.sh/uv/guides/integration/pytorch/
2021
[tool.uv.sources]
@@ -41,8 +42,8 @@ dependencies = [
4142
"cryptography==46.0.5", # Transient dep pinned to handle CVE
4243
"faiss-cpu==1.12",
4344
"filelock==3.20.3",
44-
"llama-stack-client==0.2.22",
45-
"llama-stack==0.2.22",
45+
"llama-stack-client==0.4.3",
46+
"llama-stack==0.4.3",
4647
"pillow==12.1.1", # Transient dep pinned to handle CVE
4748
"pyasn1==0.6.3",
4849
"python-multipart==0.0.22", # Transient dep pinned to handle CVE

requirements.txt

Lines changed: 63 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -169,17 +169,13 @@ cryptography==46.0.5 \
169169
--hash=sha256:f145bba11b878005c496e93e257c1e88f154d278d2638e6450d17e0f31e558d2
170170
# via
171171
# aap-rag-content
172-
# python-jose
172+
# pyjwt
173173
distro==1.9.0 \
174174
--hash=sha256:2fa77c6fd8940f116ee1d6b94a2f90b13b5ea8d019b98bc8bafdcabcdd9bdbed \
175175
--hash=sha256:7bffd925d65168f85027d8da9af6bddab658135b840670a223589bc0c8ef02b2
176176
# via
177177
# llama-stack-client
178178
# openai
179-
ecdsa==0.19.1 \
180-
--hash=sha256:30638e27cf77b7e15c4c4cc1973720149e1033827cfd00661ca5c8cc0cdb24c3 \
181-
--hash=sha256:478cba7b62555866fcb3bb3fe985e06decbdb68ef55713c4e5ab98c57d508e61
182-
# via python-jose
183179
faiss-cpu==1.12.0 \
184180
--hash=sha256:016e391f49933875b8d60d47f282f2e93d8ea9f9ffbda82467aa771b11a237db \
185181
--hash=sha256:2f87cbcd603f3ed464ebceb857971fdebc318de938566c9ae2b82beda8e953c0 \
@@ -194,7 +190,9 @@ faiss-cpu==1.12.0 \
194190
fastapi==0.128.0 \
195191
--hash=sha256:1cc179e1cef10a6be60ffe429f79b829dce99d8de32d7acb7e6c8dfdf7f2645a \
196192
--hash=sha256:aebd93f9716ee3b4f4fcfe13ffb7cf308d99c9f3ab5622d8877441072561582d
197-
# via llama-stack
193+
# via
194+
# llama-stack
195+
# llama-stack-api
198196
filelock==3.20.3 \
199197
--hash=sha256:18c57ee915c7ec61cff0ecf7f0f869936c7c30191bb0cf406f1341778d0834e1 \
200198
--hash=sha256:4b0dda527ee31078689fc205ec4f1c1bf7d56cf88b6dc9426c4f230e46c2dce1
@@ -284,7 +282,6 @@ huggingface-hub==0.36.0 \
284282
--hash=sha256:47b3f0e2539c39bf5cde015d63b72ec49baff67b6931c3d97f3f84532e2b8d25 \
285283
--hash=sha256:7bcc9ad17d5b3f07b57c78e79d527102d08313caa278a641993acddcb894548d
286284
# via
287-
# llama-stack
288285
# sentence-transformers
289286
# tokenizers
290287
# transformers
@@ -333,7 +330,9 @@ joblib==1.5.3 \
333330
jsonschema==4.26.0 \
334331
--hash=sha256:0c26707e2efad8aa1bfc5b7ce170f3fccc2e4918ff85989ba9ffa9facb2be326 \
335332
--hash=sha256:d489f15263b8d200f8387e64b4c3a75f06629559fb73deb8fdfb525f2dab50ce
336-
# via llama-stack
333+
# via
334+
# llama-stack
335+
# llama-stack-api
337336
jsonschema-specifications==2025.9.1 \
338337
--hash=sha256:98802fee3a11ee76ecaca44429fda8a41bff98b00a0f2838151b113f210cc6fe \
339338
--hash=sha256:b540987f239e745613c7a9176f3edb72b832a4ac465cf02712288397832b5e8d
@@ -352,16 +351,18 @@ librt==0.7.8 ; platform_python_implementation != 'PyPy' \
352351
--hash=sha256:ad64a14b1e56e702e19b24aae108f18ad1bf7777f3af5fcd39f87d0c5a814449 \
353352
--hash=sha256:bb7a7807523a31f03061288cc4ffc065d684c39db7644c676b47d89553c0d714
354353
# via mypy
355-
llama-stack==0.2.22 \
356-
--hash=sha256:576752dedc9e9f0fb9da69f373d677d8b4f2ae4203428f676fa039b6813d8450 \
357-
--hash=sha256:c6bbda6b5a4417b9a73ed36b9d581fd7ec689090ceefd084d9a078e7acbdc670
354+
llama-stack==0.4.3 \
355+
--hash=sha256:423207eae2b640894992a9075ff9dd6300ff904ab06a49fe38cfe0bb809d4669 \
356+
--hash=sha256:70d379ae9dbb5b1d0693f14054d9817aba183ffcd805133f0a4442baee132c6d
357+
# via aap-rag-content
358+
llama-stack-api==0.4.4 \
359+
--hash=sha256:3973ca3bacf86916e04e521f77e7909533eec7364d32c3eabc35dc2976dbfe7d \
360+
--hash=sha256:7bbc63330ed186502dcd48f65cae014dbeb788ba5690be738c98693cfcd2f599
361+
# via llama-stack
362+
llama-stack-client==0.4.3 \
363+
--hash=sha256:97b8cc5032bad4f0cdd1b0ae992cf44f5554679d315b7c40f46deb358c041f50 \
364+
--hash=sha256:cb807be258206e8fedeb5e5ceba7be7108d3badb31d74199406808c3d1679c35
358365
# via aap-rag-content
359-
llama-stack-client==0.2.22 \
360-
--hash=sha256:9a0bc756b91ebd539858eeaf1f231c5e5c6900e1ea4fcced726c6717f3d27ca7 \
361-
--hash=sha256:b260d73aec56fcfd8fa601b3b34c2f83c4fbcfb7261a246b02bbdf6c2da184fe
362-
# via
363-
# aap-rag-content
364-
# llama-stack
365366
markdown-it-py==4.0.0 \
366367
--hash=sha256:87327c59b172c5011896038353a81343b6754500a08cd7a4973bb48c6d578147 \
367368
--hash=sha256:cb0a2b4aa34f932c007117b194e945bd74e0ec24133ceb5bac59009cda1cb9f3
@@ -454,7 +455,9 @@ numpy==2.4.1 \
454455
openai==2.15.0 \
455456
--hash=sha256:42eb8cbb407d84770633f31bf727d4ffb4138711c670565a41663d9439174fba \
456457
--hash=sha256:6ae23b932cd7230f7244e52954daa6602716d6b9bf235401a107af731baea6c3
457-
# via llama-stack
458+
# via
459+
# llama-stack
460+
# llama-stack-api
458461
opentelemetry-api==1.39.1 \
459462
--hash=sha256:2edd8463432a7f8443edce90972169b195e7d6a05500cd29e6d13898187c9950 \
460463
--hash=sha256:fbde8c80e1b937a2c61f20347e91c0c18a1940cecf012d62e65a7caf08967c9c
@@ -469,7 +472,9 @@ opentelemetry-exporter-otlp-proto-common==1.39.1 \
469472
opentelemetry-exporter-otlp-proto-http==1.39.1 \
470473
--hash=sha256:31bdab9745c709ce90a49a0624c2bd445d31a28ba34275951a6a362d16a0b9cb \
471474
--hash=sha256:d9f5207183dd752a412c4cd564ca8875ececba13be6e9c6c370ffb752fd59985
472-
# via llama-stack
475+
# via
476+
# llama-stack
477+
# llama-stack-api
473478
opentelemetry-proto==1.39.1 \
474479
--hash=sha256:22cdc78efd3b3765d09e68bfbd010d4fc254c9818afd0b6b423387d9dee46007 \
475480
--hash=sha256:6c8e05144fc0d3ed4d22c2289c6b126e03bcd0e6a7da0f16cedd2e1c2772e2c8
@@ -481,6 +486,7 @@ opentelemetry-sdk==1.39.1 \
481486
--hash=sha256:cf4d4563caf7bff906c9f7967e2be22d0d6b349b908be0d90fb21c8e9c995cc6
482487
# via
483488
# llama-stack
489+
# llama-stack-api
484490
# opentelemetry-exporter-otlp-proto-http
485491
opentelemetry-semantic-conventions==0.60b1 \
486492
--hash=sha256:87c228b5a0669b748c76d76df6c364c369c28f1c465e50f661e39737e84bc953 \
@@ -569,17 +575,28 @@ protobuf==6.33.4 \
569575
# via
570576
# googleapis-common-protos
571577
# opentelemetry-proto
578+
psycopg2-binary==2.9.11 \
579+
--hash=sha256:31b32c457a6025e74d233957cc9736742ac5a6cb196c6b68499f6bb51390bd6a \
580+
--hash=sha256:62b6d93d7c0b61a1dd6197d208ab613eb7dcfdcca0a49c42ceb082257991de9d \
581+
--hash=sha256:a1cf393f1cdaf6a9b57c0a719a1068ba1069f022a59b8b1fe44b006745b59757 \
582+
--hash=sha256:ab8905b5dcb05bf3fb22e0cf90e10f469563486ffb6a96569e51f897c750a76a \
583+
--hash=sha256:b33fabeb1fde21180479b2d4667e994de7bbf0eec22832ba5d9b5e4cf65b6c6d \
584+
--hash=sha256:b6aed9e096bf63f9e75edf2581aa9a7e7186d97ab5c177aa6c87797cd591236c \
585+
--hash=sha256:be9b840ac0525a283a96b556616f5b4820e0526addb8dcf6525a0fa162730be4 \
586+
--hash=sha256:bf940cd7e7fec19181fdbc29d76911741153d51cab52e5c21165f3262125685e \
587+
--hash=sha256:edcb3aeb11cb4bf13a2af3c53a15b3d612edeb6409047ea0b5d6a21a9d744b34 \
588+
--hash=sha256:ef7a6beb4beaa62f88592ccc65df20328029d721db309cb3250b0aae0fa146c3 \
589+
--hash=sha256:f090b7ddd13ca842ebfe301cd587a76a4cf0913b1e429eb92c1be5dbeb1a19bc \
590+
--hash=sha256:fa0f693d3c68ae925966f0b14b8edda71696608039f4ed61b1fe9ffa468d16db
591+
# via llama-stack
572592
pyaml==25.7.0 \
573593
--hash=sha256:ce5d7867cc2b455efdb9b0448324ff7b9f74d99f64650f12ca570102db6b985f \
574594
--hash=sha256:e113a64ec16881bf2b092e2beb84b7dcf1bd98096ad17f5f14e8fb782a75d99b
575595
# via llama-stack-client
576596
pyasn1==0.6.3 \
577597
--hash=sha256:697a8ecd6d98891189184ca1fa05d1bb00e2f84b5977c481452050549c8a72cf \
578598
--hash=sha256:a80184d120f0864a52a073acc6fc642847d0be408e7c7252f31390c0f4eadcde
579-
# via
580-
# aap-rag-content
581-
# python-jose
582-
# rsa
599+
# via aap-rag-content
583600
pycparser==2.23 ; implementation_name != 'PyPy' and platform_python_implementation != 'PyPy' \
584601
--hash=sha256:78816d4f24add8f10a06d6f05b4d424ad9e96cfebf68a4ddc99c65c0720d00c2 \
585602
--hash=sha256:e5c6e8d3fbad53479cab09ac03729e0a9faf2bee3db8208a550daf5af81a5934
@@ -590,6 +607,7 @@ pydantic==2.12.5 \
590607
# via
591608
# fastapi
592609
# llama-stack
610+
# llama-stack-api
593611
# llama-stack-client
594612
# openai
595613
pydantic-core==2.41.5 \
@@ -617,6 +635,10 @@ pygments==2.19.2 \
617635
--hash=sha256:636cb2477cec7f8952536970bc533bc43743542f70392ae026374600add5b887 \
618636
--hash=sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b
619637
# via rich
638+
pyjwt==2.12.1 \
639+
--hash=sha256:28ca37c070cad8ba8cd9790cd940535d40274d22f80ab87f3ac6a713e6e8454c \
640+
--hash=sha256:c74a7a2adf861c04d002db713dd85f84beb242228e671280bf709d765b03672b
641+
# via llama-stack
620642
python-dateutil==2.9.0.post0 \
621643
--hash=sha256:37dd54208da7e1cd875388217d5e00ebd4179249f90fb72437e91a35459a0ad3 \
622644
--hash=sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427
@@ -625,10 +647,6 @@ python-dotenv==1.2.1 \
625647
--hash=sha256:42667e897e16ab0d66954af0e60a9caa94f0fd4ecf3aaf6d2d260eec1aa36ad6 \
626648
--hash=sha256:b81ee9561e9ca4004139c6cbba3a238c32b03e4894671e181b671e8cb8425d61
627649
# via llama-stack
628-
python-jose==3.5.0 \
629-
--hash=sha256:abd1202f23d34dfad2c3d28cb8617b90acf34132c7afd60abd0b0b7d3cb55771 \
630-
--hash=sha256:fb4eaa44dbeb1c26dcc69e4bd7ec54a1cb8dd64d3b4d81ef08d90ff453f2b01b
631-
# via llama-stack
632650
python-multipart==0.0.22 \
633651
--hash=sha256:2b2cd894c83d21bf49d702499531c7bafd057d730c201782048f7945d82de155 \
634652
--hash=sha256:7340bef99a7e0032613f56dc36027b959fd3b30a787ed62d310e951f7c3a3a58
@@ -657,6 +675,7 @@ pyyaml==6.0.3 \
657675
--hash=sha256:fc09d0aa354569bc501d4e787133afc08552722d3ab34836a80547331bb5d4a0
658676
# via
659677
# huggingface-hub
678+
# llama-stack
660679
# pyaml
661680
# transformers
662681
referencing==0.37.0 \
@@ -722,10 +741,6 @@ rpds-py==0.30.0 \
722741
# via
723742
# jsonschema
724743
# referencing
725-
rsa==4.9.1 \
726-
--hash=sha256:68635866661c6836b8d39430f97a996acbd61bfa49406748ea243539fe239762 \
727-
--hash=sha256:e7bdbfdb5497da4c07dfd35530e1a902659db6ff241e39d9953cad06ebd0ae75
728-
# via python-jose
729744
ruff==0.14.13 \
730745
--hash=sha256:4acdf009f32b46f6e8864af19cbf6841eaaed8638e65c8dac845aea0d703c841 \
731746
--hash=sha256:591a7f68860ea4e003917d19b5c4f5ac39ff558f162dc753a2c5de897fd5502c \
@@ -798,9 +813,7 @@ setuptools==80.9.0 \
798813
six==1.17.0 \
799814
--hash=sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274 \
800815
--hash=sha256:ff70335d468e7eb6ec65b95b99d3a2836546063f63acc5171de367e834932a81
801-
# via
802-
# ecdsa
803-
# python-dateutil
816+
# via python-dateutil
804817
sniffio==1.3.1 \
805818
--hash=sha256:2f6da418d1f1e0fddd844478f41680e794e6051915791a034ff65e5f100525a2 \
806819
--hash=sha256:f4324edc670a0f49750a81b895f35c3adb843cca46f0530f79fc1babb23789dc
@@ -817,7 +830,9 @@ sqlalchemy==2.0.46 \
817830
--hash=sha256:cf36851ee7219c170bb0793dbc3da3e80c582e04a5437bc601bfe8c85c9216d7 \
818831
--hash=sha256:ea3cd46b6713a10216323cda3333514944e510aa691c945334713fca6b5279ff \
819832
--hash=sha256:f9c11766e7e7c0a2767dda5acb006a118640c9fc0a4104214b96269bfb78399e
820-
# via aap-rag-content
833+
# via
834+
# aap-rag-content
835+
# llama-stack
821836
starlette==0.50.0 \
822837
--hash=sha256:9e5391843ec9b6e472eed1365a78c8098cfceb7a74bfd4d6b1c0c0095efb3bca \
823838
--hash=sha256:a2a17b22203254bcbc2e1f926d2d55f3f9497f769416b3190768befe598fa3ca
@@ -881,6 +896,18 @@ torch==2.9.1+cpu ; sys_platform != 'darwin' \
881896
# via
882897
# aap-rag-content
883898
# sentence-transformers
899+
tornado==6.5.5 \
900+
--hash=sha256:192b8f3ea91bd7f1f50c06955416ed76c6b72f96779b962f07f911b91e8d30e9 \
901+
--hash=sha256:2c9a876e094109333f888539ddb2de4361743e5d21eece20688e3e351e4990a6 \
902+
--hash=sha256:36abed1754faeb80fbd6e64db2758091e1320f6bba74a4cf8c09cd18ccce8aca \
903+
--hash=sha256:3f54aa540bdbfee7b9eb268ead60e7d199de5021facd276819c193c0fb28ea4e \
904+
--hash=sha256:435319e9e340276428bbdb4e7fa732c2d399386d1de5686cb331ec8eee754f07 \
905+
--hash=sha256:487dc9cc380e29f58c7ab88f9e27cdeef04b2140862e5076a66fb6bb68bb1bfa \
906+
--hash=sha256:6443a794ba961a9f619b1ae926a2e900ac20c34483eea67be4ed8f1e58d3ef7b \
907+
--hash=sha256:65a7f1d46d4bb41df1ac99f5fcb685fb25c7e61613742d5108b010975a9a6521 \
908+
--hash=sha256:dd3eafaaeec1c7f2f8fdcd5f964e8907ad788fe8a5a32c4426fbbdda621223b7 \
909+
--hash=sha256:e74c92e8e65086b338fd56333fb9a68b9f6f2fe7ad532645a290a464bcf46be5
910+
# via llama-stack
884911
tqdm==4.67.1 \
885912
--hash=sha256:26445eca388f82e72884e0d580d5464cd801a3ea01e63e5601bdff9ba6a48de2 \
886913
--hash=sha256:f8aef9c52c08c13a65f30ea34f4e5aac3fd1a34959879d7e59e63027286627f2
@@ -933,6 +960,7 @@ urllib3==2.6.3 \
933960
--hash=sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4
934961
# via
935962
# aap-rag-content
963+
# llama-stack
936964
# requests
937965
# types-requests
938966
uvicorn==0.40.0 \

scripts/custom_processor-aap.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
import json
33
from pathlib import Path
44

5-
from lightspeed_rag_content.metadata_processor import MetadataProcessor
6-
from lightspeed_rag_content.document_processor import DocumentProcessor
7-
from lightspeed_rag_content import utils
5+
from aap_rag_content.metadata_processor import MetadataProcessor
6+
from aap_rag_content.document_processor import DocumentProcessor
7+
from aap_rag_content import utils
88

99
# Folders where AAP product documentation markdown (.md) files are stored.
1010
AAP_PRODUCT_DOCS = [

src/aap_rag_content/__init__.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
"""AAP RAG Content package for document processing and vector database management."""
2+
3+
__version__ = "0.1.0"
4+
5+
__all__ = [
6+
"document_processor",
7+
"metadata_processor",
8+
"utils",
9+
]

0 commit comments

Comments
 (0)