You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(content): 13.2/13.3 cost math executes as printed, token equivalences
- 13.2.13: prose, code, and printed output now agree (Opus 4.6 $5/$25,
avg 175K history -> traditional $878.75, cached $106.00, batch
$439.38, breakeven 571; previous printed block did not match what the
code computes - verified by running it); summary attribution corrected
to Opus 4.6 and $106.00
- 1M-cache scenario: cache reads bill the 800K cached portion, not 1M
(prose $1.50 -> $1.40, code fixed in both spots)
- chars-per-token unified: 1 hanzi ~ 1-1.5 tokens across the estimator,
the equivalence table (was '3-4 token/char' -> 300-350K chars/1M) and
ch13 summary
- 200K context shipped Nov 2023 (Claude 2.1), not mid-2024
- mixed-language system-prompt strings cleaned; savings range 80-94% ->
80-90% (90% is the cache-read ceiling); 13.3 dangling sentence
Copy file name to clipboardExpand all lines: 13_advanced/13.2_infinite_chats.md
+30-23Lines changed: 30 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -149,10 +149,10 @@ class LongConversationManager:
149
149
defestimate_tokens(self, text: str) -> int:
150
150
"""粗估文本 token 数(简化版)"""
151
151
# 英文:约 1 token 每 4 字符
152
-
# 中文:约 1 token 每 1.3 字符
152
+
# 中文:1 个汉字约合 1-1.5 个 token(随分词器浮动,这里取 1.2)
153
153
english_chars =sum(1for c in text iford(c) <128)
154
154
chinese_chars =len(text) - english_chars
155
-
returnint(english_chars /4+ chinese_chars /1.3)
155
+
returnint(english_chars /4+ chinese_chars *1.2)
156
156
157
157
defshould_summarize(self) -> bool:
158
158
"""检查是否应该进行总结"""
@@ -250,7 +250,7 @@ class LongConversationManager:
250
250
response =self.client.messages.create(
251
251
model=self.model,
252
252
max_tokens=2000,
253
-
system=system_prompt or"You are a helpful assistant engaged in a 长对话.",
253
+
system=system_prompt or"You are a helpful assistant engaged in a long-running conversation.",
254
254
messages=context_messages
255
255
)
256
256
@@ -525,7 +525,7 @@ class RobustLongConversationManager:
525
525
response =self.client.messages.create(
526
526
model=self.model,
527
527
max_tokens=2000,
528
-
system=system_prompt or"You are a helpful assistant engaged in a 长对话.",
528
+
system=system_prompt or"You are a helpful assistant engaged in a long-running conversation.",
529
529
messages=context_messages,
530
530
timeout=60.0# 设置超时
531
531
)
@@ -939,15 +939,15 @@ graph TD
939
939
940
940
### 13.2.10 1M Token 窗口的现实
941
941
942
-
从 2024 年中期开始,Claude 的上下文窗口已升至 200K token。到 Claude Opus 4.6/4.7/4.8 与 Sonnet 4.6,Claude API 长上下文能力已扩展到 1M token 档位,并按标准 API token 价格计费;但 Microsoft Foundry 等平台可能仍有独立上限,账号、平台和区域可用性仍要按官方模型页与价格页核验。上下文管理的重点也从“能不能塞进去”逐步转向“如何高质量地利用超长上下文”。
942
+
Claude 早在 2023 年 11 月(Claude 2.1)就把上下文窗口升至 200K token,并在 Claude 3 全系延续。到 Claude Opus 4.6/4.7/4.8 与 Sonnet 4.6,Claude API 长上下文能力已扩展到 1M token 档位,并按标准 API token 价格计费;但 Microsoft Foundry 等平台可能仍有独立上限,账号、平台和区域可用性仍要按官方模型页与价格页核验。上下文管理的重点也从“能不能塞进去”逐步转向“如何高质量地利用超长上下文”。
0 commit comments