Minio library functionalities by JuanMartinP · Pull Request #110 · telefonicasc/etl-framework

JuanMartinP · 2025-10-13T11:41:28Z

…age multiple file operations

fgalan · 2025-10-16T11:13:23Z

Incluir también las modificaciones en https://github.com/telefonicasc/etl-framework/blob/master/python-lib/tc_etl_lib/README.md

En concreto:

Documentación de las nuevas clases/métodos en https://github.com/telefonicasc/etl-framework/blob/master/python-lib/tc_etl_lib/README.md#funciones-disponibles-en-la-librer%C3%ADa
Incluir un ejemplo breve del mismo estilo a los que hay en https://github.com/telefonicasc/etl-framework/blob/master/python-lib/tc_etl_lib/README.md#uso-de-la-librer%C3%ADa
Incluir una entrada de Changelog en https://github.com/telefonicasc/etl-framework/blob/master/python-lib/tc_etl_lib/README.md#changelog

fgalan · 2025-10-16T11:10:35Z

-    'numpy==1.24.4'
+    'numpy==1.24.4',
+    'minio==7.2.7',
+    'pytest-minio-mock==0.4.19'


Si no es una dependencia de producción (es decir, se puede construir el paquete de la libreria y funciona sin esta dependencia) entonces mejor no incluirlo aqúi.

Incluirlo en la GitAction, por aquí https://github.com/telefonicasc/etl-framework/blob/master/.github/workflows/unit-testing.yml#L32

Lo he movido en el commit 882c491

fgalan · 2025-10-16T11:12:04Z

+minio_client = minio_manager.initClient()
+
+# Upload test-file.txt to python-test-bucket/output/example.txt
+# Important: the bucket must already exist, so 


"so" ...

¿quizás ha quedado incompleta la frase?

Si, la frase comentaba que hay que tener el bucket creado antes de subir el fichero, pero escribir eso me hizo pensar que eso se debería gestionar en la librería, así que lo incluí en el método de subir fichero y aparentemente me dejé la frase. La elimino.

fgalan · 2025-10-17T07:39:23Z

+                               processing_method=print)
+
+# You can define your own custom processing method and use it in the processing_method argument of the getProcessedFile method
+def customProcessingMethod(file_chunk):
+    # code to apply to the chunk of the file or to locally save the file
+


Creo que sería más ilustrativo usar un método propio directamente que haga una función sencilla (de hecho, puede ser el propio print). Algo de este estilo:

minio_manager.getProcessedFile(minio_client, bucket_name='python-test-bucket', destination_file='/output/example.txt', chunk_size=3, processing_method=process_chunk)

definiendo process_chunk() así:

def process_chunk(file_chunk): print(file_chunk)

fgalan · 2025-10-17T07:40:00Z

+    - :param obligatorio `secret_key`: contraseña necesaria para hacer login en MinIO
+  - `initClient`: inicializa un cliente de MinIO
+    - :return: cliente autenticado de MinIO.
+  - `createBucket`: comprueba si existe el bucket y si no lo crea.


¿Y si ya existe no hace nada o da error?

fgalan · 2025-10-17T07:40:11Z

+  - `createBucket`: comprueba si existe el bucket y si no lo crea.
+    - :param obligatorio `client`: cliente de MinIO.
+    - :param obligatorio `bucket_name`: nombre del bucket a crear.
+  - `removeBucket`: comprueba si existe el bucket y si existe lo borra.


¿Y si no existe no hace nada o da error?

fgalan · 2025-10-17T07:41:34Z


 ## Changelog

+0.17.0 (October 16th, 2025)


Suggested change

0.17.0 (October 16th, 2025)

De momento no marques cierre de versión (aunque es muy probable que con esta feature cerremos versión, puede que entren más cosas antes).

fgalan · 2025-10-17T07:42:18Z

 ## Changelog

+0.17.0 (October 16th, 2025)
+- Add: new class `minioManager` to manage MinIO connection and file processing


Cuando existe issue asociado, incluimos su referencia. Ver como se está haciendo en algún otro caso en el Changelog existente.

fgalan · 2025-10-17T07:42:47Z

+  - `removeBucket`: comprueba si existe el bucket y si existe lo borra.
+    - :param obligatorio `client`: cliente de MinIO.
+    - :param obligatorio `bucket_name`: nombre del bucket a borrar.
+  - `uploadFile`: sube un fichero a MinIO. Si el bucket al que se sube no existe se crea previamente.


¿Si el fichero ya existe no hace nada o da error?

fgalan · 2025-10-17T07:43:06Z

+    - :param obligatorio `client`: cliente de MinIO.
+    - :param obligatorio `bucket_name`: nombre del bucket donde se va a buscar el fichero.
+    - :param obligatorio `destination_file`: nombre del fichero en MinIO (puede incluir el path SIN el nombre del bucket al inicio).
+    - :param obligatorio `chunk_size`: tamaño en bytes de cada fragmento del fichero a recuperar.


Igual debería ser opcional, con un chunk_size por defecto razonable.

Imagino que lo más habitual será usar MinIO para ficheros grandes, pongo por defecto por ejemplo 500000 (500kB / 0,5mB)?

No tengo especial criterio, pero 500KB suena bien.

@arcosa @AlvaroVega ¿cómo lo veis?

500Kbs parece razonable para todo tipo de ficheros sin penar mucho el rendimiento.

fgalan · 2025-10-17T07:43:52Z

+  - `getProcessedFile`: procesa un fichero de MinIO por fragmentos y le aplica a cada fragmento la función provista.
+    - :param obligatorio `client`: cliente de MinIO.
+    - :param obligatorio `bucket_name`: nombre del bucket donde se va a buscar el fichero.
+    - :param obligatorio `destination_file`: nombre del fichero en MinIO (puede incluir el path SIN el nombre del bucket al inicio).


En realidad el fichero es más bien un origen de datos a procesar, no un destino.

Yo me dejaría de lios y lo llamaría simplemente "file".

fgalan · 2025-10-17T07:46:01Z

+    - :param obligatorio `destination_file`: nombre del fichero en MinIO (puede incluir el path SIN el nombre del bucket al inicio).
+    - :param obligatorio `chunk_size`: tamaño en bytes de cada fragmento del fichero a recuperar.
+    - :param obligatorio `processing_method`: método a aplicar a cada fragmento del fichero.



Estoy viendo que todos los métodos usan client como parámetro. ¿Es realmente necesario? Entiendo que ese cliente está asociado al propio minioManager en sí. En otros manager (cbManager, iotaManager) no se está usando un cliente como parámetro en las invocaciones.

Por otro lado, igual podemos simplificar la vida al usuario y que no tenga que hacer initClient. Que se haga en la propia construcción del manager o al invocar al primer método que lo requiera.

fgalan · 2025-10-17T07:46:58Z

+
+# Custom method that writes the file chunks in a CSV (he receives and writes bytes)
+def customCSVProcessingMethod(file_chunk):
+    fichero_procesado = open("salida.csv", "ab")


fichero_procesado

Mejor usar todos los nombres en inglés.

(No sé si habrá más casos como este, revisar)

fgalan · 2025-10-20T12:49:48Z

+    - :param obligatorio `bucket_name`: nombre del bucket donde se va a buscar el fichero.
+    - :param obligatorio `file`: nombre del fichero en MinIO (puede incluir el path SIN el nombre del bucket al inicio).
+    - :param obligatorio `processing_method`: método a aplicar a cada fragmento del fichero.
+    - :param optional `chunk_size`: tamaño en bytes de cada fragmento del fichero a recuperar.


Especificar el default (similar al comentario anterior).

Cambiado en el commit fe9d4ff

Co-authored-by: Fermín Galán Márquez <fgalan@users.noreply.github.com>

arcosa · 2025-10-20T16:14:20Z

Por seguir un poco la misma nomenclarutra que el resto de módulos que componen la librería (auth/cb/iot/minio), renombraría el minioManager.py a minio.py. Ajustando el __init__.py. Y dentro del minio.py ya estaría el manager o otras clases que pudiera necesitar relacionadas con minio.

JuanMartinP · 2025-10-21T10:33:02Z

Por seguir un poco la misma nomenclarutra que el resto de módulos que componen la librería (auth/cb/iot/minio), renombraría el minioManager.py a minio.py. Ajustando el __init__.py. Y dentro del minio.py ya estaría el manager o otras clases que pudiera necesitar relacionadas con minio.

Cambiado en el commit d985e5a

arcosa

LGTM y dejo paso a @fgalan

fgalan · 2025-10-21T14:22:28Z

    - :param obligatorio: `data`: Datos a enviar. Puede ser una lista de diccionarios o un DataFrame.
    - :raises SendBatchError: Se levanta cuando se produce una excepción dentro de `send_http`. Atrapa la excepción original y se guarda y se imprime el índice donde se produjo el error.

+- Clase `minioManager`: En esta clase están las funciones relacionadas con la solución de almacenamiento de objetos MinIO.


Un tema de style.

Mirando el style que estamos usando en esta librería, diría que los nombres de los manager van en camelCase (en ese sentido minioManage OK) pero los métodos van en snake_case.

En este caso ajustar

create_bucket

remove_bucket

etc.

Tanto en docu, como en test, como en ejemplos, etc.

(Debería ser relativamente fácil con un "Replace All" en el IDE)

Cambiado en el commit 1c8ec6c

Co-authored-by: Fermín Galán Márquez <fgalan@users.noreply.github.com>

fgalan

LGTM

JuanMartinP added 14 commits October 13, 2025 13:30

add minio to dependencies

df619d1

add minio to index and library

aec8272

minio test etl

d47b943

changed class attributes to method parameters to allow library to man…

a68c5f4

…age multiple file operations

added getProcessedFile logic and removeBucket

ab43498

Use case of the library

da6f926

reorder methods

6cb3a92

add client as getProcessedFile parameter instead of local initialization

db95c56

change file name minio to minioManager to avoid import conflicts

fc1bff1

pytest-minio dependency and tests

727e44a

example string credentials

0daa4ec

uploadFile now returns status, control cases in getProcessedFile

f48f109

change import in etl

99dc27c

remove http flag

5cdb282

JuanMartinP requested a review from fgalan October 16, 2025 11:03

fgalan requested changes Oct 16, 2025

View reviewed changes

JuanMartinP added 3 commits October 16, 2025 17:21

move test dependecy from setup.py to unit-testing

882c491

remove leftover comment

29d7322

documentation, change me and examples

ea3133b

JuanMartinP requested a review from fgalan October 16, 2025 16:50

fgalan reviewed Oct 17, 2025

View reviewed changes

arcosa reviewed Oct 17, 2025

View reviewed changes

Comment thread python-lib/tc_etl_lib/tc_etl_lib/minioManager.py Outdated

Comment thread python-lib/tc_etl_lib/tc_etl_lib/minioManager.py Outdated

JuanMartinP added 5 commits October 17, 2025 12:18

added client as class property to auto initialize it

f880365

names, doc and optional argument for getProcessedFile method

dfc7ce7

changed prints for logger.debug and exceptions

34787c1

change name: getProcessedFile to processFile

c4f4e65

fix tests

71ac702

JuanMartinP requested a review from fgalan October 20, 2025 10:11

JuanMartinP added 2 commits October 20, 2025 12:33

add optional value to method documentation

d209992

add optional secure flag to client initialitation

24be586

fgalan reviewed Oct 20, 2025

View reviewed changes

Comment thread python-lib/tc_etl_lib/README.md Outdated

fgalan reviewed Oct 20, 2025

View reviewed changes

JuanMartinP and others added 2 commits October 20, 2025 15:06

Update python-lib/tc_etl_lib/README.md

479c410

Co-authored-by: Fermín Galán Márquez <fgalan@users.noreply.github.com>

doc: chunk_size optional value

fe9d4ff

change file name from minioManager to Minio

d985e5a

arcosa reviewed Oct 21, 2025

View reviewed changes

fgalan requested changes Oct 21, 2025

View reviewed changes

JuanMartinP and others added 3 commits October 21, 2025 16:37

Apply suggestions from code review

93b9da9

Co-authored-by: Fermín Galán Márquez <fgalan@users.noreply.github.com>

snake case for methods

1c8ec6c

remnant from name change

1bc91ec

JuanMartinP requested a review from fgalan October 22, 2025 07:00

fgalan reviewed Oct 22, 2025

View reviewed changes

Comment thread python-lib/tc_etl_lib/README.md Outdated

Update python-lib/tc_etl_lib/README.md

14819e3

fgalan approved these changes Oct 22, 2025

View reviewed changes

fgalan merged commit add632b into master Oct 22, 2025
5 checks passed

fgalan deleted the minio-library-functionalities branch October 22, 2025 07:49

fgalan mentioned this pull request Oct 22, 2025

Procesamiento de CSV almacenados en Minio #109

Closed

Conversation

JuanMartinP commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fgalan commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arcosa commented Oct 20, 2025

Uh oh!

JuanMartinP commented Oct 21, 2025

Uh oh!

arcosa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fgalan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JuanMartinP commented Oct 13, 2025 •

edited

Loading