Skip to content

Conversation

@sammingasainath
Copy link

Pull Request Type

  • ✨ feat

Relevant Issues

resolves #xxx (Please replace with actual issue number if exists)

What is in this change?

This PR adds Google Docs integration and synchronization functionality to AnythingLLM. Key features include:

  1. Google Docs Connection & Authentication

    • Support for connecting Google Docs as a document source
    • Secure handling of Google Doc authentication
  2. Document Synchronization

    • Automatic syncing of Google Docs content
    • Real-time updates when documents change
    • Proper handling of document IDs and metadata
    • Vector store integration for synchronized content
  3. Workspace Integration

    • Documents are properly associated with workspaces
    • Namespace management for vector operations
    • Queue system for periodic document updates
  4. Error Handling & Logging

    • Robust error handling for sync operations
    • Detailed logging for debugging and monitoring
    • Graceful fallback mechanisms

Additional Information

The implementation includes:

  • Support for various Google Doc ID formats
  • Automatic vector store namespace initialization
  • Document metadata preservation during syncs
  • Background sync queue management
  • Configurable sync intervals

Developer Validations

  • I ran yarn lint from the root of the repo & committed changes
  • [ ✔] Relevant documentation has been updated
  • [ ✔] I have tested my code functionality
  • Docker build succeeds locally

@timothycarambat
Copy link
Member

Needs to have head pulled up so it is up to date.

@sammingasainath
Copy link
Author

Needs to have head pulled up so it is up to date.

Thank you, will do that and update here ...😊🧑‍💻

@dev3py
Copy link

dev3py commented Mar 27, 2025

@sammingasainath Anyupdates on when will this be in live

@timothycarambat
Copy link
Member

So there are a lot of things blocking this PR for now:

  • The formatting of the code doesn't really fit into our layout right now
    -- What is doc.pdf and docs and why are these endpoints outside of the traditional server folder
    -- There are several seemingly dev.js files unrelated to the core functionality here
    -- Lots of frontend changes just to render Gdoc content

There are also 58 files in this PR, which is a lot in and of itself, but with the layout being spread all through the app it is a bit more confusing to reason about easily. The overall footprint of this PR is really large and a lot of conditional specifically for Google Docs, which would be fine if all we did was do good docs, but as you can imagine we do lots of docs and will do more so building a lot of conditional logic for GDocs is not a good long term solution since ultimately - I have to maintain this since once your PR is merged you can just walk away.

It would make more sense to coerce GDoc responses into a standard format so it looks like other docs, as opposed to modifying the code across the server/collector/frontend to make a GDoc work.

The docs for setting this up are nice to have. I know sending people to the cloud console is really not ideal because of OAuth and all the tricky tradeoffs that come with that when it comes to your average person trying to accomplish this. It might be the only way to do this though.

@sammingasainath
Copy link
Author

sammingasainath commented May 7, 2025

So there are a lot of things blocking this PR for now:

  • The formatting of the code doesn't really fit into our layout right now
    -- What is doc.pdf and docs and why are these endpoints outside of the traditional server folder
    -- There are several seemingly dev.js files unrelated to the core functionality here
    -- Lots of frontend changes just to render Gdoc content

There are also 58 files in this PR, which is a lot in and of itself, but with the layout being spread all through the app it is a bit more confusing to reason about easily. The overall footprint of this PR is really large and a lot of conditional specifically for Google Docs, which would be fine if all we did was do good docs, but as you can imagine we do lots of docs and will do more so building a lot of conditional logic for GDocs is not a good long term solution since ultimately - I have to maintain this since once your PR is merged you can just walk away.

It would make more sense to coerce GDoc responses into a standard format so it looks like other docs, as opposed to modifying the code across the server/collector/frontend to make a GDoc work.

The docs for setting this up are nice to have. I know sending people to the cloud console is really not ideal because of OAuth and all the tricky tradeoffs that come with that when it comes to your average person trying to accomplish this. It might be the only way to do this though.

Thank you @timothycarambat , I am grateful for the feedback , I tried a lot to maintain the same structure but things were not happening as planned so I had to do this apprach . I am a undergrad , If you feel that I have done something worthwhile ( Even though I failed to contribute to the repo ) . Please refer me to a good internship where I can polish my skills . Thank you. My Email id : xxx

@sammingasainath
Copy link
Author

sammingasainath commented May 7, 2025

So there are a lot of things blocking this PR for now:

  • The formatting of the code doesn't really fit into our layout right now
    -- What is doc.pdf and docs and why are these endpoints outside of the traditional server folder
    -- There are several seemingly dev.js files unrelated to the core functionality here
    -- Lots of frontend changes just to render Gdoc content

There are also 58 files in this PR, which is a lot in and of itself, but with the layout being spread all through the app it is a bit more confusing to reason about easily. The overall footprint of this PR is really large and a lot of conditional specifically for Google Docs, which would be fine if all we did was do good docs, but as you can imagine we do lots of docs and will do more so building a lot of conditional logic for GDocs is not a good long term solution since ultimately - I have to maintain this since once your PR is merged you can just walk away.
It would make more sense to coerce GDoc responses into a standard format so it looks like other docs, as opposed to modifying the code across the server/collector/frontend to make a GDoc work.
The docs for setting this up are nice to have. I know sending people to the cloud console is really not ideal because of OAuth and all the tricky tradeoffs that come with that when it comes to your average person trying to accomplish this. It might be the only way to do this though.

Thank you @timothycarambat , I am grateful for the feedback , I tried a lot to maintain the same structure but things were not happening as planned so I had to do this apprach . I am a undergrad , If you feel that I have done something worthwhile ( Even though I failed to contribute to the repo ) . Please refer me to a good internship where I can polish my skills . Thank you. My Email id :

And Yes , I will definitely try my best to rewrite the code for Google Docs doing better this time

@timothycarambat
Copy link
Member

@sammingasainath you have my absolute gratitude for even chipping away at this. Without question the work you laid out here will be used in some way, shape or form, and we will put you in the PR or have some way to give credit.

In the past, I actually thought about making a custom google apps script, since it can handle the OAuth and stuff easily and act as an API to also get content from all of Gdrive (Docs, sheets + Drive) - The OAuth way you did is probably the best and most stable, but believe me when i say the regular person who have a drive will be lost at step 2 making an app :(

Either we can do this and make some really awesome step-by-step docs, or find something even easier! None of the work you did was wrong, it would just be harder to maintain and had a bunch of conflicts. If it worked as it was, that was like 90% of the hard part

@sammingasainath
Copy link
Author

Thank you so much @timothycarambat — I really appreciate your kind words. I'm glad to know the work could still contribute in some way.

Even though it didn’t fully land this time, I learned a lot while working through it — especially about OAuth, architecture tradeoffs, and how important maintainability is in real-world projects.

I’d love to stay involved and continue contributing where I can. Your support and thoughtful feedback genuinely mean a lot — especially at a time when I’ve been questioning myself whether I’m really a good problem solver or not. You know, even though we are confident in ourselves it becomes difficult after much longer if no one actually recognizes our efforts or talent .

Thank you once again for taking the time to review and respond so generously — it truly means a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants