Cognition integration provider by andhreljaKern · Pull Request #302 · code-kern-ai/refinery-gateway

andhreljaKern · 2025-05-15T13:26:28Z

This is the main PR.

Related PRs

Notes

after merge double check all env var are correct in release PR

New repository

https://github.com/code-kern-ai/cognition-integration-provider

Important

Retrieve:

Tenant and Client IDs in Azure Portal (@kern.ai) => Key Vaults => dev-krn-accompio => Objects -> Secrets => O365-TENANT-ID and O365-UNATTENDED-CLIENT-ID
Certificate .pfx file in Azure Portal (@kern.ai) => Key Vaults => dev-krn-admin => Objects -> Certificates => dev-accompio-certificate => Download in PFX/PEM format. The passphrase is an empty string. Use " " (space) as an environment variable value (reference).
Document Library ID: b!zzhsLojhuEaDy3fQIjUZxLC67xk1l9lFpzlKVAQ1-uDYfIQ6DMGYQrkemTRS4V0Q

Use dev-setup@cognition-integration-provider to run cognition-integration-provider (bash start -a -b cognition-integration-provider)

Tests

Tests were not developed for this container due to long running extraction and transformation tasks

Affected areas

dev-setup, deployment-cognition, deployment-managed-cognition
- added a new internal container cognition-integration-provider
refinery-submodule-model
- added 2 new cognition schema tables (integrations + access) and individual integrations tables (new integration schema)
cognition-task-master
- added a new "INTEGRATION" task that runs "execute-integration" (delta loads)
admin-dashboard
- added the ability to "assign" integrations to organizations
cognition-ui
- added a new "Integrations" page next to ETL page
refinery-ui
- added info tooltips that label refinery projects as integration created
refinery-gateway
- alembic migrations
cognition-gateway
- routes to cognition-integration-provider (CRUD, Sync, Execute, Check for Updates)
- added support for creating Cognition projects from Integrations
cognition-integration-provider
- new container with integration extraction/transformation logic

Performance

MP - multiprocessing (# workers)
SP - singleprocessing

…-ai/refinery-gateway into cognition-integration-provider

JWittmeyer · 2025-07-02T14:54:44Z

Just started everything fresh and got an error for env var collection

all builds are done so im assuming it's an acutal issue, the databse looks fine imo

resolved
ignored for now

edit jwittmeyer
starting excluded didn't result in the same error so maybe something else was up. ill double check after the merge with dev

…provider

JWittmeyer · 2025-07-02T15:24:02Z

Checked further on folder permission for a external user (jens.wittmeyer@kern.ai)

Folder with permission:

Items in Folder without permission:

Not sure how easy it would be to collect an email from that but would potentially be needed for the user access

to double check i did the same with my private email with the same resulting in the same uuid (so not every user gets a new uuid) so i'm assuming there are some steps involved in the process to get the emails

also i never got an actual invite link via email but that is a sharepoint issue not ours

resolved
not possible in given time

…ode-kern-ai/refinery-gateway into cognition-integration-provider

JWittmeyer · 2025-07-03T06:46:37Z

Something is problematic when i start unexcluded. I'm not sure what but in the docker compose file i always get some kind of database error during execution:

Error 1

2025-07-03 08:37:12 concurrent.futures.process._RemoteTraceback: 
2025-07-03 08:37:12 """
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
2025-07-03 08:37:12     self.engine.dialect.do_commit(self.connection)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
2025-07-03 08:37:12     dbapi_connection.commit()
2025-07-03 08:37:12 psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 
2025-07-03 08:37:12 The above exception was the direct cause of the following exception:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
2025-07-03 08:37:12     r = call_item.fn(*call_item.args, **call_item.kwargs)
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 112, in __extract
2025-07-03 08:37:12     integration_util.delta_load(
2025-07-03 08:37:12   File "/app/src/util/integration.py", line 99, in delta_load
2025-07-03 08:37:12     integration_record_manager.create(
2025-07-03 08:37:12   File "/app/submodules/model/integration_objects/manager.py", line 143, in create
2025-07-03 08:37:12     general.add(integration_record, with_commit)
2025-07-03 08:37:12   File "/app/submodules/model/business_objects/general.py", line 99, in add
2025-07-03 08:37:12     flush_or_commit(with_commit)
2025-07-03 08:37:12   File "/app/submodules/model/business_objects/general.py", line 142, in flush_or_commit
2025-07-03 08:37:12     session.commit()
2025-07-03 08:37:12   File "<string>", line 2, in commit
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
2025-07-03 08:37:12     self._transaction.commit(_to_root=self.future)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 836, in commit
2025-07-03 08:37:12     trans.commit()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2459, in commit
2025-07-03 08:37:12     self._do_commit()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2649, in _do_commit
2025-07-03 08:37:12     self._connection_commit_impl()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2620, in _connection_commit_impl
2025-07-03 08:37:12     self.connection._commit_impl()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1091, in _commit_impl
2025-07-03 08:37:12     self._handle_dbapi_exception(e, None, None, None, None)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
2025-07-03 08:37:12     util.raise_(
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
2025-07-03 08:37:12     raise exception
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
2025-07-03 08:37:12     self.engine.dialect.do_commit(self.connection)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
2025-07-03 08:37:12     dbapi_connection.commit()
2025-07-03 08:37:12 sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 (Background on this error at: https://sqlalche.me/e/14/4xp6)
2025-07-03 08:37:12 """
2025-07-03 08:37:12 
2025-07-03 08:37:12 The above exception was the direct cause of the following exception:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/app/src/handler/base.py", line 25, in extract
2025-07-03 08:37:12     return handler.extract(integration=integration)
2025-07-03 08:37:12   File "/app/src/handler/manager.py", line 24, in extract
2025-07-03 08:37:12     return sharepoint.extract(integration)
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 177, in extract
2025-07-03 08:37:12     documents = _extract_multi_process(
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 281, in _extract_multi_process
2025-07-03 08:37:12     for document in map(lambda x: x.result(), futures):
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 281, in <lambda>
2025-07-03 08:37:12     for document in map(lambda x: x.result(), futures):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2025-07-03 08:37:12     return self.__get_result()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2025-07-03 08:37:12     raise self._exception
2025-07-03 08:37:12 sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 (Background on this error at: https://sqlalche.me/e/14/4xp6)
2025-07-03 08:37:12 
2025-07-03 08:37:12 Exception in thread Thread-4:
2025-07-03 08:37:12 concurrent.futures.process._RemoteTraceback: 
2025-07-03 08:37:12 """
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
2025-07-03 08:37:12     self.engine.dialect.do_commit(self.connection)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
2025-07-03 08:37:12     dbapi_connection.commit()
2025-07-03 08:37:12 psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 
2025-07-03 08:37:12 The above exception was the direct cause of the following exception:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
2025-07-03 08:37:12     r = call_item.fn(*call_item.args, **call_item.kwargs)
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 112, in __extract
2025-07-03 08:37:12     integration_util.delta_load(
2025-07-03 08:37:12   File "/app/src/util/integration.py", line 99, in delta_load
2025-07-03 08:37:12     integration_record_manager.create(
2025-07-03 08:37:12   File "/app/submodules/model/integration_objects/manager.py", line 143, in create
2025-07-03 08:37:12     general.add(integration_record, with_commit)
2025-07-03 08:37:12   File "/app/submodules/model/business_objects/general.py", line 99, in add
2025-07-03 08:37:12     flush_or_commit(with_commit)
2025-07-03 08:37:12   File "/app/submodules/model/business_objects/general.py", line 142, in flush_or_commit
2025-07-03 08:37:12     session.commit()
2025-07-03 08:37:12   File "<string>", line 2, in commit
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
2025-07-03 08:37:12     self._transaction.commit(_to_root=self.future)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 836, in commit
2025-07-03 08:37:12     trans.commit()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2459, in commit
2025-07-03 08:37:12     self._do_commit()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2649, in _do_commit
2025-07-03 08:37:12     self._connection_commit_impl()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2620, in _connection_commit_impl
2025-07-03 08:37:12     self.connection._commit_impl()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1091, in _commit_impl
2025-07-03 08:37:12     self._handle_dbapi_exception(e, None, None, None, None)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
2025-07-03 08:37:12     util.raise_(
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
2025-07-03 08:37:12     raise exception
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
2025-07-03 08:37:12     self.engine.dialect.do_commit(self.connection)
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
2025-07-03 08:37:12     dbapi_connection.commit()
2025-07-03 08:37:12 sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 (Background on this error at: https://sqlalche.me/e/14/4xp6)
2025-07-03 08:37:12 """
2025-07-03 08:37:12 
2025-07-03 08:37:12 The above exception was the direct cause of the following exception:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/app/src/handler/base.py", line 25, in extract
2025-07-03 08:37:12     return handler.extract(integration=integration)
2025-07-03 08:37:12   File "/app/src/handler/manager.py", line 24, in extract
2025-07-03 08:37:12     return sharepoint.extract(integration)
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 177, in extract
2025-07-03 08:37:12     documents = _extract_multi_process(
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 281, in _extract_multi_process
2025-07-03 08:37:12     for document in map(lambda x: x.result(), futures):
2025-07-03 08:37:12   File "/app/src/handler/custom/sharepoint.py", line 281, in <lambda>
2025-07-03 08:37:12     for document in map(lambda x: x.result(), futures):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2025-07-03 08:37:12     return self.__get_result()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2025-07-03 08:37:12     raise self._exception
2025-07-03 08:37:12 sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq
2025-07-03 08:37:12 (Background on this error at: https://sqlalche.me/e/14/4xp6)
2025-07-03 08:37:12 
2025-07-03 08:37:12 During handling of the above exception, another exception occurred:
2025-07-03 08:37:12 
2025-07-03 08:37:12 Traceback (most recent call last):
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner
2025-07-03 08:37:12     self.run()
2025-07-03 08:37:12   File "/usr/local/lib/python3.9/threading.py", line 917, in run
2025-07-03 08:37:12     self._target(*self._args, **self._kwargs)
2025-07-03 08:37:12   File "/app/src/controller/integrations/manager.py", line 97, in __execute_langchain_integration
2025-07-03 08:37:12     documents, delta_criteria = handler.extract(integration=integration)
2025-07-03 08:37:12   File "/app/src/handler/base.py", line 35, in extract
2025-07-03 08:37:12     raise HTTPException(status_code=500, detail="Integration failed")
2025-07-03 08:37:12 fastapi.exceptions.HTTPException: 500: Integration failed

Error 2

2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
2025-07-03 08:35:45     r = call_item.fn(*call_item.args, **call_item.kwargs)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 85, in __extract
2025-07-03 08:35:45     extract_kwargs = make_extract_kwargs(integration)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 48, in make_extract_kwargs
2025-07-03 08:35:45     extract_kwargs["credentials"] = o365_util.get_credentials(
2025-07-03 08:35:45   File "/app/src/util/o365.py", line 106, in get_credentials
2025-07-03 08:35:45     certificate_passphrase = env_vars_db_bo.get_by_name_and_org_id(
2025-07-03 08:35:45   File "/app/submodules/model/cognition_objects/environment_variable.py", line 57, in get_by_name_and_org_id
2025-07-03 08:35:45     session.query(CognitionEnvironmentVariable)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2823, in first
2025-07-03 08:35:45     return self.limit(1)._iter().first()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1452, in first
2025-07-03 08:35:45     return self._only_one_row(
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 559, in _only_one_row
2025-07-03 08:35:45     row = onerow(hard_close=True)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1340, in _fetchone_impl
2025-07-03 08:35:45     return self._real_result._fetchone_impl(hard_close=hard_close)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1743, in _fetchone_impl
2025-07-03 08:35:45     row = next(self.iterator, _NO_ROW)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 147, in chunks
2025-07-03 08:35:45     fetch = cursor._raw_all_rows()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 393, in _raw_all_rows
2025-07-03 08:35:45     return [make_row(row) for row in rows]
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 393, in <listcomp>
2025-07-03 08:35:45     return [make_row(row) for row in rows]
2025-07-03 08:35:45 RuntimeError: number of values in row (20) differ from number of column processors (9)
2025-07-03 08:35:45 """
2025-07-03 08:35:45 
2025-07-03 08:35:45 The above exception was the direct cause of the following exception:
2025-07-03 08:35:45 
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/app/src/handler/base.py", line 25, in extract
2025-07-03 08:35:45     return handler.extract(integration=integration)
2025-07-03 08:35:45   File "/app/src/handler/manager.py", line 24, in extract
2025-07-03 08:35:45     return sharepoint.extract(integration)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 177, in extract
2025-07-03 08:35:45     documents = _extract_multi_process(
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 281, in _extract_multi_process
2025-07-03 08:35:45     for document in map(lambda x: x.result(), futures):
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 281, in <lambda>
2025-07-03 08:35:45     for document in map(lambda x: x.result(), futures):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2025-07-03 08:35:45     return self.__get_result()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2025-07-03 08:35:45     raise self._exception
2025-07-03 08:35:45 RuntimeError: number of values in row (20) differ from number of column processors (9)
2025-07-03 08:35:45 
2025-07-03 08:35:45 Exception in thread Thread-2:
2025-07-03 08:35:45 concurrent.futures.process._RemoteTraceback: 
2025-07-03 08:35:45 """
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
2025-07-03 08:35:45     r = call_item.fn(*call_item.args, **call_item.kwargs)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 85, in __extract
2025-07-03 08:35:45     extract_kwargs = make_extract_kwargs(integration)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 48, in make_extract_kwargs
2025-07-03 08:35:45     extract_kwargs["credentials"] = o365_util.get_credentials(
2025-07-03 08:35:45   File "/app/src/util/o365.py", line 106, in get_credentials
2025-07-03 08:35:45     certificate_passphrase = env_vars_db_bo.get_by_name_and_org_id(
2025-07-03 08:35:45   File "/app/submodules/model/cognition_objects/environment_variable.py", line 57, in get_by_name_and_org_id
2025-07-03 08:35:45     session.query(CognitionEnvironmentVariable)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2823, in first
2025-07-03 08:35:45     return self.limit(1)._iter().first()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1452, in first
2025-07-03 08:35:45     return self._only_one_row(
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 559, in _only_one_row
2025-07-03 08:35:45     row = onerow(hard_close=True)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1340, in _fetchone_impl
2025-07-03 08:35:45     return self._real_result._fetchone_impl(hard_close=hard_close)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 1743, in _fetchone_impl
2025-07-03 08:35:45     row = next(self.iterator, _NO_ROW)
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 147, in chunks
2025-07-03 08:35:45     fetch = cursor._raw_all_rows()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 393, in _raw_all_rows
2025-07-03 08:35:45     return [make_row(row) for row in rows]
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/result.py", line 393, in <listcomp>
2025-07-03 08:35:45     return [make_row(row) for row in rows]
2025-07-03 08:35:45 RuntimeError: number of values in row (20) differ from number of column processors (9)
2025-07-03 08:35:45 """
2025-07-03 08:35:45 
2025-07-03 08:35:45 The above exception was the direct cause of the following exception:
2025-07-03 08:35:45 
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/app/src/handler/base.py", line 25, in extract
2025-07-03 08:35:45     return handler.extract(integration=integration)
2025-07-03 08:35:45   File "/app/src/handler/manager.py", line 24, in extract
2025-07-03 08:35:45     return sharepoint.extract(integration)
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 177, in extract
2025-07-03 08:35:45     documents = _extract_multi_process(
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 281, in _extract_multi_process
2025-07-03 08:35:45     for document in map(lambda x: x.result(), futures):
2025-07-03 08:35:45   File "/app/src/handler/custom/sharepoint.py", line 281, in <lambda>
2025-07-03 08:35:45     for document in map(lambda x: x.result(), futures):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
2025-07-03 08:35:45     return self.__get_result()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
2025-07-03 08:35:45     raise self._exception
2025-07-03 08:35:45 RuntimeError: number of values in row (20) differ from number of column processors (9)
2025-07-03 08:35:45 
2025-07-03 08:35:45 During handling of the above exception, another exception occurred:
2025-07-03 08:35:45 
2025-07-03 08:35:45 Traceback (most recent call last):
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner
2025-07-03 08:35:45     self.run()
2025-07-03 08:35:45   File "/usr/local/lib/python3.9/threading.py", line 917, in run
2025-07-03 08:35:45     self._target(*self._args, **self._kwargs)
2025-07-03 08:35:45   File "/app/src/controller/integrations/manager.py", line 97, in __execute_langchain_integration
2025-07-03 08:35:45     documents, delta_criteria = handler.extract(integration=integration)
2025-07-03 08:35:45   File "/app/src/handler/base.py", line 35, in extract
2025-07-03 08:35:45     raise HTTPException(status_code=500, detail="Integration failed")
2025-07-03 08:35:45 fastapi.exceptions.HTTPException: 500: Integration failed

The PGRES_TUPLES_OK error seems to be related to the process pool working wrongly with the session object (at least according to chatgpt :D )

GPT Answer

The root of what you’re seeing is actually two different but related problems:

1. “number of values in row (20) differ from number of column processors (9)”

This comes straight out of SQLAlchemy’s row‐mapping machinery: your ORM class definition for CognitionEnvironmentVariable declares 9 mapped columns, but the actual query is returning 20 columns. In other words, your Python model and your PostgreSQL table schema are out of sync.

How to verify and fix

Inspect the real table schema:

\d cognition_environment_variable

in psql (or

SELECT column_name, data_type
  FROM information_schema.columns
 WHERE table_name='cognition_environment_variable';

if you’re not in psql).

Compare to your ORM model:
In Python, open app/submodules/model/cognition_objects/environment_variable.py and count how many Column(...) definitions the class has, and what their names/types are.
Reconcile:
- If the database has columns you no longer need, either drop them in a migration or explicitly exclude them in your ORM.
- If your code added new columns that the DB doesn’t yet know about, create an Alembic (or whatever migration tool you’re using) revision to ALTER TABLE ADD COLUMN ….

After you’ve aligned model ←→ table, that 20-vs‑9 error will go away.

2. “psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpq” in child processes

This one is a classic side‑effect of using a forked worker (via ProcessPoolExecutor) after you’ve already established a connection pool in the parent. Post‑fork you end up with two processes sharing the same sockets/pool internals, and psycopg2 can enter a weird state where the libpq call succeeds but doesn’t return a proper message.

The cure is: make sure each worker process creates its own engine/session, and never inherits an open pool from the parent.

Two straightforward strategies:

Use spawn instead of the default fork
On Python 3.8+ you can do:
```
import multiprocessing as mp

mp.set_start_method("spawn", force=True)
```
before you import or create your SQLAlchemy engine. “spawn” will import and initialize everything anew, so no shared sockets.

Re-create your session/engine inside each worker
Instead of passing a pre-built Session() into your __extract function, have the worker do:

# Inside __extract(), at the very top:
from your_app.db import create_engine_and_sessionmaker

engine, SessionLocal = create_engine_and_sessionmaker()
session = SessionLocal()
try:
    # … do your queries/commits …
finally:
    session.close()
    engine.dispose()

That guarantees no connection is ever shared across process boundaries.

Either of those approaches will eliminate the “PGRES_TUPLES_OK” commit‐without‐message error.

Putting it all together

Synchronize your model and your DB schema
‣ Run an ALTER TABLE … ADD COLUMN or drop extra columns until your model’s 9 columns match the table’s 9 columns.
Isolate your DB connections per worker
‣ Either switch to the “spawn” start method or reinstantiate your Engine and Session in each child process so you don’t inadvertently share a socket/pool.

Once those are both addressed, your sharepoint extraction should proceed cleanly:

no more “column processor” mismatch,
no more silent commit failures.

Let me know if you hit any snags while reconciling your schema or refactoring your multiprocessing setup!

I'll work on this (jwittmeyer).

resolved

alembic/versions/0e975f452a77_step_template_table.py

controller/project/manager.py

fast_api/routes/embedding.py

JWittmeyer · 2025-07-03T13:47:52Z

I think currently we can remove language form the calculated attributes since it's calculating from the summary which we force to be in the language of the tokenizer

resolved
kept because of overhead

JWittmeyer · 2025-07-03T14:01:40Z

Filter merging with access management doesn't work (filter e.g. extension + ANNOTATOR user)

i'll look into it

resolved

andhreljaKern added 2 commits May 15, 2025 15:26

perf(alembic): adds integration providers

24e6e5c

chore: update submodules

f7efa5d

andhreljaKern mentioned this pull request May 15, 2025

Cognition integration provider code-kern-ai/refinery-submodule-model#167

Merged

andhreljaKern added 14 commits May 16, 2025 08:57

perf: rename integration providers

e425c84

chore: update submodules

2d54ed6

chore: update submodules

b246d1d

perf: update integration last_extraction

3d36c65

chore: update submodules

4b0ab79

perf(alembic): use list integration access types

5620cd0

chore: update submodules

fcce2c7

perf: add integration providers

bb5746c

chore: update submodules

ae22650

perf(alembic): adds integration providers

d78ab1a

chore: update submodules

a776438

perf: update integration providers

b3eae84

perf: task manipulation

1b0bd86

chore: update submodules

2728ba4

lumburovskalina mentioned this pull request May 19, 2025

Cellcomponents for integratios code-kern-ai/submodule-react-components#47

Merged

andhreljaKern added 2 commits May 20, 2025 11:33

chore: update submodules

c7a2b7c

perf(alembic): integration provider

5171baa

lumburovskalina mentioned this pull request May 20, 2025

Added integration type as enum code-kern-ai/submodule-javascript-functions#15

Merged

andhreljaKern added 9 commits May 26, 2025 10:15

perf: add org_id to integration provider

e781ccb

chore: update submodules

3893760

chore: update submodules

0f8a518

chore: update submodules

02e8655

perf(alembic): recreate integration providers

5632966

chore: update submodules

d5c72e9

perf(alembic): add started_at

5bfeb6b

chore: update submodules

d1494c6

perf(alembic): add integration records

550332f

LennartSchmidtKern and others added 6 commits July 1, 2025 13:30

commit frequency

6c89195

Merge branch 'cognition-integration-provider' of github.com:code-kern…

2648cca

…-ai/refinery-gateway into cognition-integration-provider

typing, set operations

49d01c4

group auth

8fac084

model

35f5900

PR comments

512e4b2

Merge remote-tracking branch 'origin/dev' into cognition-integration-…

cce1cb5

…provider

andhreljaKern added 2 commits July 2, 2025 23:31

chore: update submodules

2f86ba2

Merge branch 'cognition-integration-provider' of https://github.com/c…

13e9f2d

…ode-kern-ai/refinery-gateway into cognition-integration-provider

remove unique group

7fadd05

JWittmeyer mentioned this pull request Jul 3, 2025

Cognition integration provider code-kern-ai/refinery-neural-search#87

Merged

JWittmeyer reviewed Jul 3, 2025

View reviewed changes

alembic/versions/0e975f452a77_step_template_table.py Show resolved Hide resolved

JWittmeyer reviewed Jul 3, 2025

View reviewed changes

controller/project/manager.py Outdated Show resolved Hide resolved

JWittmeyer reviewed Jul 3, 2025

View reviewed changes

fast_api/routes/embedding.py Show resolved Hide resolved

andhreljaKern added 2 commits July 3, 2025 15:44

chore: update submodules

5d248d8

perf: db update

c6ff6b1

andhreljaKern and others added 5 commits July 3, 2025 16:04

chore: update submodules

e7afb3c

perf: remove language attribute + POST embedding_name

07c1a55

chore: update submodules

e4ad039

Merge branch 'dev' into cognition-integration-provider

5ad7e0a

user ids fix

032989b

JWittmeyer approved these changes Jul 4, 2025

View reviewed changes

Submodule update

f2b1ede

JWittmeyer merged commit d010f94 into dev Jul 4, 2025
1 check passed

JWittmeyer deleted the cognition-integration-provider branch July 4, 2025 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cognition integration provider#302

Cognition integration provider#302
JWittmeyer merged 96 commits intodevfrom
cognition-integration-provider

andhreljaKern commented May 15, 2025 •

edited by JWittmeyer

Loading

Uh oh!

JWittmeyer commented Jul 2, 2025 •

edited

Loading

Uh oh!

JWittmeyer commented Jul 2, 2025

Uh oh!

JWittmeyer commented Jul 3, 2025 •

edited

Loading

1. “number of values in row (20) differ from number of column processors (9)”

2. “psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpq” in child processes

Putting it all together

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JWittmeyer commented Jul 3, 2025 •

edited by andhreljaKern

Loading

Uh oh!

JWittmeyer commented Jul 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

andhreljaKern commented May 15, 2025 • edited by JWittmeyer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related PRs

Notes

New repository

Tests

Affected areas

Performance

Uh oh!

JWittmeyer commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JWittmeyer commented Jul 2, 2025

Uh oh!

JWittmeyer commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. “number of values in row (20) differ from number of column processors (9)”

2. “psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpq” in child processes

Putting it all together

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JWittmeyer commented Jul 3, 2025 • edited by andhreljaKern Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JWittmeyer commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andhreljaKern commented May 15, 2025 •

edited by JWittmeyer

Loading

JWittmeyer commented Jul 2, 2025 •

edited

Loading

JWittmeyer commented Jul 3, 2025 •

edited

Loading

JWittmeyer commented Jul 3, 2025 •

edited by andhreljaKern

Loading

JWittmeyer commented Jul 3, 2025 •

edited

Loading