Streaming¤

Some LLM providers support streaming of the LLM responses. This is very useful when you want to get the results as soon they are generated, and not wait for the entire response to be generated.

Streaming example¤

import declarai

gpt_35 = declarai.openai(model="gpt-3.5-turbo")


@gpt_35.task(streaming=True)  # (1)!
def say_something_about_movie(movie: str) -> str:
    """
    Say something short about the following movie
    :param movie: The movie name
    """

    return declarai.magic(movie)


res = say_something_about_movie(movie="Avengers") # (2)!_

for chunk in res:
    print(chunk.response)

"Av"
"Avengers"
"Avengers is"
"Avengers is an"
"Avengers is an action"
"Avengers is an action-packed"
"Avengers is an action-packed superhero"
"Avengers is an action-packed superhero extrav"
"Avengers is an action-packed superhero extravagan"
"Avengers is an action-packed superhero extravaganza"
"Avengers is an action-packed superhero extravaganza that"
"Avengers is an action-packed superhero extravaganza that brings"
"Avengers is an action-packed superhero extravaganza that brings together"
"Avengers is an action-packed superhero extravaganza that brings together Earth"
"Avengers is an action-packed superhero extravaganza that brings together Earth's"
"Avengers is an action-packed superhero extravaganza that brings together Earth's might"
"Avengers is an action-packed superhero extravaganza that brings together Earth's mightiest"
"Avengers is an action-packed superhero extravaganza that brings together Earth's mightiest heroes"
"Avengers is an action-packed superhero extravaganza that brings together Earth's mightiest heroes to"
"Avengers is an action-packed superhero extravaganza that brings together Earth's mightiest heroes to save"
"Avengers is an action-packed superhero extravaganza that brings together Earth's mightiest heroes to save the"
"Avengers is an action-packed superhero extravaganza that brings together Earth's mightiest heroes to save the world"
"Avengers is an action-packed superhero extravaganza that brings together Earth's mightiest heroes to save the world."
"Avengers is an action-packed superhero extravaganza that brings together Earth's mightiest heroes to save the world."

Set the streaming flag to True when defining the task
res is a generator. You can iterate over the generator to get the results.

Currently only OpenAI & Azure OpenAI support streaming.

Turn on streaming¤

In order to enable streaming, all you have to do is set the streaming flag to True when defining the task.

import declarai

gpt_35 = declarai.openai(model="gpt-3.5-turbo")

@gpt_35.task(streaming=True)  # (1)!
def my_task()
    ...


@gpt_35.experimental.chat(streaming=True) # (2)!
class MyChat
    ...

Set the streaming flag to True when defining the task
Set the streaming flag to True when defining the chat class

You can also enable streaming globally by settings stream=True when initializing the declarai object.

import declarai

gpt_35_with_streaming = declarai.openai(
    model="gpt-3.5-turbo",
    stream=True
)

azure_gpt_35_with_streaming = declarai.azure_openai(
    model="gpt-3.5-turbo",
    stream=True
)

Accessing the results¤

The results are returned as a generator. You can iterate over the generator to get the results.

import declarai

gpt_35 = declarai.openai(model="gpt-3.5-turbo")


@gpt_35.task(streaming=True)
def say_something_about_movie(movie: str) -> str:
    """
    Say something short about the following movie
    :param movie: The movie name
    """

    return declarai.magic(movie)


res_stream = say_something_about_movie(movie="Avengers")

type(res_stream)  # <class 'generator'>

for chunk in res_stream:
    type(chunk)  # <class 'declarai.operators.llm.LLMResponse'>

The responses are also saved on the task object.

import declarai

gpt_35 = declarai.openai(model="gpt-3.5-turbo")


@gpt_35.task(streaming=True)
def say_something_about_movie(movie: str) -> str:
    """
    Say something short about the following movie
    :param movie: The movie name
    """

    return declarai.magic(movie)


res_stream = say_something_about_movie(movie="Avengers")

say_something_about_movie.llm_response  # Empty unless you call next on the generator

say_something_about_movie.llm_stream_response  # <generator object BaseTask.stream_handler at ...>

Access the delta of the response¤

You can access the delta of the response by accessing the raw_response attribute of the LLMResponse object.

The delta is the difference between the current response and the previous response.

This is particularly useful when you want to stream the response to a chatbot, and there is no need to send the entire response all over again and again.

```py
import declarai

gpt_35 = declarai.openai(model="gpt-3.5-turbo", stream=True)

@gpt_35.task
def say_something_about_movie(movie: str) -> str:
    """
    Say something short about the following movie
    :param movie: The movie name
    """

    return declarai.magic(movie)


stream_res = say_something_about_movie("Avengers")

for chunk in stream_res:
    print(chunk.raw_response["choices"][0]["delta"])

# Output
{'role': 'assistant', 'content': ''}
{'content': '"'}
{'content': 'Av'}
{'content': 'engers'}
{'content': ' is'}
{'content': ' an'}
{'content': ' action'}
{'content': '-packed'}
{'content': ' superhero'}
{'content': ' extrav'}
{'content': 'agan'}
{'content': 'za'}
{'content': ' that'}
{'content': ' brings'}
{'content': ' together'}
{'content': ' Earth'}
{'content': "'s"}
{'content': ' might'}
{'content': 'iest'}
{'content': ' heroes'}
{'content': ' to'}
{'content': ' save'}
{'content': ' the'}
{'content': ' world'}
{'content': '."'}
{}