Skip to content

Cumulative translation#148

Merged
yihong0618 merged 37 commits into
yihong0618:mainfrom
sunyuechi:cumulative-translation
Mar 16, 2023
Merged

Cumulative translation#148
yihong0618 merged 37 commits into
yihong0618:mainfrom
sunyuechi:cumulative-translation

Conversation

@sunyuechi

@sunyuechi sunyuechi commented Mar 11, 2023

Copy link
Copy Markdown
Collaborator

Wait for how many tokens have been accumulated before starting the translation

--accumulated_num num

The strange part: I don't know how to judge how the cumulative translation results correspond to the original paragraph, so use sep = "\n\n\n\n\n", if sep="\n\n", it is often translated into the same line

A possible bug is that even if \n\n\n\n\n, it may still be translated to the same line, and then there will be problems where the translation appears, this will only affect the last num characters, after which it will return to normal

A good effect is that it can understand the context more, and the prompt is more effective

I have used various prompts before to ask it not to translate people's names, but to no avail, with context it works(It will most likely work, and in a few cases it will still translate, but it can be translated to the same name)

name2

left:

time OPENAI_API_SYS_MSG="You are an assistant who translates computer technology books but don't translate people's names, there is a blank line after each of your translation results" python3 "make_book.py" --book_name "$filepath" --openai_key "${openai_apikey_book}" --language "zh-hans" --accumulated_num 1500

right:

time python3 "make_book.py" --book_name "$filepath" --openai_key "${openai_apikey_book}" --language "zh-hans"

Comment thread book_maker/loader/epub_loader.py Outdated
Comment thread book_maker/loader/epub_loader.py Outdated
Comment thread book_maker/loader/epub_loader.py Outdated
Comment thread book_maker/loader/epub_loader.py Outdated
Comment thread book_maker/loader/epub_loader.py Outdated
Comment thread book_maker/loader/epub_loader.py Outdated
Comment thread book_maker/translator/chatgptapi_translator.py Outdated
@sunyuechi sunyuechi force-pushed the cumulative-translation branch from 8144fdd to 6ef21e7 Compare March 11, 2023 14:14
Comment thread book_maker/loader/epub_loader.py Outdated
@sunyuechi sunyuechi force-pushed the cumulative-translation branch from 6ef21e7 to 2baf359 Compare March 11, 2023 14:21
@sunyuechi sunyuechi force-pushed the cumulative-translation branch from 4eecaa6 to 3523469 Compare March 11, 2023 16:08
@yihong0618

Copy link
Copy Markdown
Owner

Will take a look later for this PR.

@yihong0618

Copy link
Copy Markdown
Owner

the function part let me do it is OK, when I got time

@sunyuechi

Copy link
Copy Markdown
Collaborator Author

@yihong0618 Ok, I'm trying to get a better prompt so that it can translate the same number of paragraphs as the original paragraphs, and still often have problems

@yihong0618

Copy link
Copy Markdown
Owner

@hleft FYI, I want to keep the original prompt as default because of that it costs the least tokens cc @ConanChou, most users do not care how good of all, they just want it to help them read, so users can DIY their prompt but we do not change the default one unless we found a better one better for translate and for token cost.

@sunyuechi

Copy link
Copy Markdown
Collaborator Author

@yihong0618 I mean only modify the default prompt when --accumulated_num is enabled, because if don't modify it, you will get an error and not just a bad translation (the translated result appears in the wrong place).

@yihong0618

Copy link
Copy Markdown
Owner

yep thanks~

@sunyuechi

sunyuechi commented Mar 12, 2023

Copy link
Copy Markdown
Collaborator Author

Still in testing, the current main progress is that I understand that it is not enough to just modify the prompt and retry, and some special cases must be manually handled, such as <sup>, Source: xxx link

@sunyuechi sunyuechi force-pushed the cumulative-translation branch from 8e669d6 to 8badb41 Compare March 12, 2023 11:30
@sunyuechi sunyuechi force-pushed the cumulative-translation branch 4 times, most recently from 22f3b2e to 7773778 Compare March 15, 2023 12:26

@yihong0618 yihong0618 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM will test and merge.
amazing work!

Comment thread book_maker/translator/chatgptapi_translator.py
@zstone12

Copy link
Copy Markdown
Collaborator

LGTM+1.
The fantastic work!

@zstone12

zstone12 commented Mar 15, 2023

Copy link
Copy Markdown
Collaborator

Please add the relevant content of the --accumulated_num parameter to README:D

refactor: _process_paragraph

refactor: translate_paragraphs_acc
@sunyuechi sunyuechi force-pushed the cumulative-translation branch from 5aa35c4 to cc85d6b Compare March 15, 2023 13:14
@sunyuechi sunyuechi force-pushed the cumulative-translation branch from 780147b to 173b756 Compare March 15, 2023 15:30
@yihong0618

Copy link
Copy Markdown
Owner

can not test...
image

@yihong0618

Copy link
Copy Markdown
Owner

@hleft can this merge now?

@sunyuechi

Copy link
Copy Markdown
Collaborator Author

@yihong0618 yes ready

@yihong0618 yihong0618 merged commit e38a236 into yihong0618:main Mar 16, 2023
@yihong0618

Copy link
Copy Markdown
Owner

please check new CI if it failed we need to fix it quick

@danparizher danparizher mentioned this pull request Mar 16, 2023
wayhome pushed a commit to wayhome/bilingual_book_maker that referenced this pull request Aug 29, 2024
wayhome pushed a commit to wayhome/bilingual_book_maker that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants