Skip to content

Comments

Teradata Source and Destination Connector#606

Merged
potter-potter merged 15 commits intomainfrom
potter/teradata
Jan 7, 2026
Merged

Teradata Source and Destination Connector#606
potter-potter merged 15 commits intomainfrom
potter/teradata

Conversation

@potter-potter
Copy link
Contributor

@potter-potter potter-potter commented Nov 14, 2025

Note

Introduces Teradata support across the ingest framework.

  • New teradata connector: TeradataConnectionConfig, TeradataIndexer, TeradataDownloader, TeradataUploader, TeradataUploadStager in processes/connectors/sql/teradata.py
  • Handles Teradata specifics: quoted identifiers for reserved words, TOP 1 instead of LIMIT, qmark (?) parameter style, and JSON-serializing list/dict columns in the stager
  • Registers entries in processes/connectors/sql/__init__.py for both source and destination
  • Adds extras dependency teradata in pyproject.toml pointing to requirements/connectors/teradata.txt (includes pandas, teradatasql)
  • Comprehensive unit tests in test/unit/connectors/sql/test_teradata.py
  • Updates CHANGELOG.md; bumps __version__ to 1.2.32

Written by Cursor Bugbot for commit a994f21. This will update automatically on new commits. Configure here.

@potter-potter
Copy link
Contributor Author

potter-potter commented Nov 18, 2025

I hit an encoding error when trying to add the winter-sports.epub sample doc. It looks like we can add charset: "UTF8" to the conn_params, but I'm not sure how we usually address this with other connectors.

From cursor:

I checked our SQL connectors; none explicitly set client encoding. Postgres “just works” because the stack is Unicode by default.
Postgres: connection has no charset setting; psycopg negotiates client_encoding to the DB’s encoding (commonly UTF8), and text/varchar store full Unicode.
postgres.pyLines 58-66
connection = connect(    user=self.username,    password=access_config.password,    dbname=self.database,    host=self.host,    port=self.port,)
SQLite: Python’s sqlite3 passes Python str as Unicode; SQLite stores TEXT as UTF‑8/UTF‑16.
Snowflake: NVARCHAR/VARIANT are Unicode; connector handles UTF‑8 by default.
SingleStore: modern drivers default to utf8mb4 unless overridden.
Teradata is the outlier: if the session is LATIN or columns are CHAR/VARCHAR … CHARACTER SET LATIN, Unicode characters (curly quotes/emoji) trigger 6706. So we need to (a) set the Teradata session charset to UTF‑8 and (b) ensure target columns are UNICODE.
2025-11-18 16:14:48,937 MainProcess ERROR    Exception raised while running upload
Traceback (most recent call last):
  File "/Users/austin/repos/unstructured-ingest/unstructured_ingest/pipeline/interfaces.py", line 171, in run_async
    return await self._run_async(fn=fn, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/austin/repos/unstructured-ingest/unstructured_ingest/pipeline/steps/upload.py", line 53, in _run_async
    fn(**fn_kwargs)
  File "/Users/austin/repos/unstructured-ingest/unstructured_ingest/processes/connectors/sql/sql.py", line 456, in run
    self.upload_dataframe(df=df, file_data=file_data)
  File "/Users/austin/repos/unstructured-ingest/unstructured_ingest/processes/connectors/sql/teradata.py", line 226, in upload_dataframe
    cursor.executemany(stmt, values)
  File "/Users/austin/repos/unstructured-ingest/.venv/lib/python3.12/site-packages/teradatasql/__init__.py", line 1054, in executemany
    raise OperationalError (sErr)
teradatasql.OperationalError: [Version 20.0.0.47] [Session 1269] [Teradata Database] [Error 6706] The string contains an untranslatable character.

oh. good find. Teradata is a real stickler for this type of stuff. Seems like something to warn on. I believe the UNICODE would need to be set in the table on creation. So that is a user side setup that we would call out in the Documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants