Status: Active Last Updated: 2026-03-13
Traditional cursor-based pagination protocols using update_time are suitable for strict timeline replay systems (like chat history). However, this system relies on search engines and recommendation systems, with core requirements: "ensure content freshness, balance relevance, and filter duplicate content already seen by users."
To resolve the architectural conflict of "scoring systems cannot provide stable time cursors," this system abandons traditional cursor protocols and adopts a "stateless client + strong server-side cache + global rolling deduplication" design paradigm.
- Comprehensive Scoring Recall: Search engine uses base relevance score × Gaussian time decay function, ensuring recall results match user interests while favoring recently published content.
- Global Rolling Bloom Filter: Abandons per-user independent records, adopts daily/weekly rolling global bloom filters. Inactive users consume zero memory, naturally supporting "automatic expiration forgetting" of historical records.
- Read-Before-Cache: After recalling large batches from the engine, immediately deduplicate, truncate and deliver clean data, store remaining in user-specific Redis List as pagination cache, greatly improving subsequent scroll-up loading performance.
Records all users' impression history within specific time periods.
- Key Design:
bf:global:{YYYYMMDD}(daily rolling, e.g.,bf:global:20260306) - Value Format:
{agent_id}:{group_id}(e.g.,u_10086:doc_9527) - Lifecycle: Keep last 7 days of keys, 8th day key automatically expires (implements "only deduplicate last 7 days seen" business logic)
- Operations: Relies on RedisBloom module's
BF.MADDandBF.MEXISTS
Stores user's current session "clean candidate set" after engine recall and deduplication.
- Key Design:
feed:cache:{agent_id} - Data Structure: Redis LIST
- Value Format:
group_id - Lifecycle: Short, e.g., 30 minutes expiration (forced re-recall after session expiration)
Clients no longer maintain complex cursors or timestamps, only pass pull action and required quantity to server.
{
// Pull action
// "refresh": pull-to-refresh, timed fetch, or cold start (requires latest data)
// "load_more": scroll up for more (continue from current cache)
"action": "refresh",
// Expected items per request
"limit": 20
}{
"code": 0,
"msg": "success",
"data": {
"items": [
{
"id": "doc_9527",
"title": "Latest High-Quality Content",
"update_time": 1709658000
// ... other business fields
}
],
// Indicates if server's delivery buffer has remaining data
// Client can decide whether to show "no more" footer based on this field
"has_more": true
}
}This is the heaviest logic, responsible for fetching from search engine and cleaning.
- Clear Old Cache: Server actively deletes user's current cache queue
DEL feed:cache:{agent_id} - Engine Recall: Query search engine with time decay parameters, fetch Top N candidate
group_idlist (e.g., N=500) - Batch Deduplication Check:
- Concatenate these 500
group_idinto{agent_id}:{group_id}format - Use Redis Pipeline to concurrently execute
BF.MEXISTSon last 7 days' 7 bloom filters - If any day's bloom filter returns
true, remove thatgroup_idfrom candidate set
- Concatenate these 500
- Truncate and Cache:
- Assume 300 "clean data" remain after deduplication
- Truncate first
limititems (e.g., 20) to prepare for client return - Write remaining 280 items via
RPUSHtofeed:cache:{agent_id}, set 30-minute expiration
- Record Impression: Asynchronously write 20 items to be delivered via
BF.MADDto today's global bloom filterbf:global:{Today} - Assemble Return: Extract
group_idbased onlimit, fill content details, return to client,has_more = true
This is the lightest logic, pure memory operation, extremely fast response.
- Direct Cache Read: Server directly fetches next batch from Redis cache:
LPOP feed:cache:{agent_id} {limit} - Cache Empty Fallback:
- If
LPOPreturns 0 items (cache key expired or empty), current session cache exhausted, system automatically downgrades torefresh, silently executes full "Scenario 1" engine recall and filtering logic - If
LPOPreturns items > 0 but <limit(cache tail insufficient for one page), normally return popped data,has_more = false. Don't downgrade to avoid discarding already popped data
- If
- Record Impression: Asynchronously write
LPOPpopped data to today's global bloom filterbf:global:{Today} - Assemble Return: Fill content details and deliver
- False Positive Tolerance: Bloom filters have extremely low probability (e.g., 1% or lower, depending on initialization parameters) of false positives. In recommendation feed scenarios, false positives only mean "small probability of missing one unread item," no substantial impact on overall user experience, reasonable engineering tradeoff.
- Asynchronous Impression Write: Writing records to bloom filter (
BF.MADD) must be asynchronous (e.g., Go Goroutine or message queue), never block critical path of delivering data to client. - Client Timed Fetch Strategy: For "fetch updates in background after some time" requirement, client only needs to silently initiate
action: "refresh"request in background. If server returns non-emptyitemslist, client can show red dot or "New" badge in UI to remind user; if empty, silently ignore.