You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the latest round of architectural changes of Advisors in #1422 there now are two types of advisors:
CallAroundAdvisor
StreamAroundAdvisor
In the case of the non-streaming one, it's easy to take some actions based on the entire response. However, the streaming manipulates an entire stream. A next advisor in the chain can also manipulate the entire stream. If a stream advisor in the middle is acting upon each chunk of the response, all should be fine. However, if the advisor is only interested in the entire aggregation it would modify the stream in a way that aggregates everything in a side channel, e.g. using org.springframework.ai.chat.model.MessageAggregator class. If multiple advisors perform the same type of aggregation it is inefficient in terms of both time and memory.
Having that, I propose a new interface, StreamAggregationAdvisor. Instances of this type would be fed with an aggregation of the original stream of chunks coming back from the model on their way into the application before any other advisors have a chance to manipulate the stream. The aggregation would then be performed once and could deal with the unaltered view of the exchange. The way to implement this behaviour would be based on utilizing the innermost StreamAroundAdvisor that is created in the DefaultChatClient.
The text was updated successfully, but these errors were encountered:
In the latest round of architectural changes of Advisors in #1422 there now are two types of advisors:
CallAroundAdvisor
StreamAroundAdvisor
In the case of the non-streaming one, it's easy to take some actions based on the entire response. However, the streaming manipulates an entire stream. A next advisor in the chain can also manipulate the entire stream. If a stream advisor in the middle is acting upon each chunk of the response, all should be fine. However, if the advisor is only interested in the entire aggregation it would modify the stream in a way that aggregates everything in a side channel, e.g. using
org.springframework.ai.chat.model.MessageAggregator
class. If multiple advisors perform the same type of aggregation it is inefficient in terms of both time and memory.Having that, I propose a new interface,
StreamAggregationAdvisor
. Instances of this type would be fed with an aggregation of the original stream of chunks coming back from the model on their way into the application before any other advisors have a chance to manipulate the stream. The aggregation would then be performed once and could deal with the unaltered view of the exchange. The way to implement this behaviour would be based on utilizing the innermostStreamAroundAdvisor
that is created in theDefaultChatClient
.The text was updated successfully, but these errors were encountered: