Streaming Responses
Stream responses in real-time for better user experience.
Streaming allows you to receive responses progressively as they are generated, rather than waiting for the complete response. This creates a more responsive user experience, especially for longer outputs.
- Faster perceived response time - Users see output immediately
- Better UX for long responses - Progress is visible
- Lower memory usage - Process chunks as they arrive
Basic Streaming
Enable Streaming
Set stream: true in your request
javascript
import ModelPilot from 'modelpilot';
const client = new ModelPilot({
apiKey: process.env.MODELPILOT_API_KEY,
routerId: process.env.MODELPILOT_ROUTER_ID,
});
async function streamResponse() {
const stream = await client.chat.completions.create({
messages: [
{ role: 'user', content: 'Write a short story about a robot' }
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
console.log('\n\nStream complete!');
}
streamResponse();Stream Chunk Format
Chunk Structure
Each chunk follows this format
json
{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"created": 1677858242,
"model": "gpt-5",
"choices": [
{
"index": 0,
"delta": {
"content": "Hello" // Incremental content
},
"finish_reason": null
}
]
}
// Last chunk includes finish_reason
{
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
]
}Token Usage Tracking
Get Token Counts in Streaming
Request usage data with stream_options (OpenAI-compatible)
When streaming, you can request token usage data by setting stream_options: { include_usage: true }. This will include a final chunk with usage information before the stream completes.
javascript
const stream = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
stream_options: {
include_usage: true // Request token usage
},
});
let totalContent = '';
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
totalContent += content;
// Check for usage data in final chunk
if (chunk.usage) {
console.log('Token usage:', {
prompt_tokens: chunk.usage.prompt_tokens,
completion_tokens: chunk.usage.completion_tokens,
total_tokens: chunk.usage.total_tokens,
});
}
}Usage Chunk Format
The final chunk contains usage data with empty choices
json
// Regular chunks (content)
{
"id": "chatcmpl-abc123",
"choices": [
{
"index": 0,
"delta": { "content": "Hello" },
"finish_reason": null
}
]
}
// Final usage chunk (appears after finish_reason: "stop")
{
"id": "chatcmpl-abc123",
"choices": [], // Empty choices array
"usage": {
"prompt_tokens": 10,
"completion_tokens": 150,
"total_tokens": 160
}
}Note: The usage chunk has an empty choices array. Check for chunk.usage to detect it.
React Integration
React Component
Display streaming responses in React
javascript
import ModelPilot from 'modelpilot';
const client = new ModelPilot({
apiKey: process.env.MODELPILOT_API_KEY,
routerId: process.env.MODELPILOT_ROUTER_ID,
});
export default function ChatComponent() {
async function handleSubmit(userMessage) {
setIsLoading(true);
setResponse('');
try {
const stream = await client.chat.completions.create({
messages: [{ role: 'user', content: userMessage }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
setResponse(prev => prev + content);
}
} catch (error) {
console.error('Streaming error:', error);
} finally {
setIsLoading(false);
}
}
return (
<div>
<div className="response">
{response}
{isLoading && <span className="cursor">|</span>}
</div>
</div>
);
}Server-Sent Events (SSE)
Direct API Usage
If not using the SDK, handle SSE manually
javascript
const response = await fetch('https://modelpilot.co/api/router/{routerId}/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${API_KEY}`,
},
body: JSON.stringify({
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim() !== '');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') continue;
const parsed = JSON.parse(data);
const content = parsed.choices[0]?.delta?.content || '';
console.log(content);
}
}
}Error Handling
Handling Stream Errors
javascript
async function streamWithErrorHandling() {
try {
const stream = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
});
for await (const chunk of stream) {
try {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
} catch (chunkError) {
console.error('Error processing chunk:', chunkError);
// Continue processing other chunks
}
}
} catch (error) {
if (error.status === 429) {
console.error('Rate limit exceeded');
} else if (error.status === 503) {
console.error('Service temporarily unavailable');
} else {
console.error('Streaming error:', error.message);
}
}
}Best Practices
When to Use Streaming
- ✓ Long-form content generation
- ✓ Interactive chat applications
- ✓ Real-time code generation
- ✓ Stories, articles, or creative writing
When NOT to Use Streaming
- ✗ JSON/structured output parsing
- ✗ Batch processing
- ✗ Function calling responses
- ✗ Short responses (overhead not worth it)
Performance Tips
- • Buffer chunks for UI updates (e.g., every 50ms) to avoid excessive re-renders
- • Implement timeout handling for long-running streams
- • Handle connection drops gracefully with retry logic
- • Use abort controllers to cancel streams when needed
- • Monitor memory usage when accumulating large responses
Cancelling Streams
Using AbortController
javascript
const controller = new AbortController();
async function cancellableStream() {
try {
const stream = await client.chat.completions.create({
messages: [{ role: 'user', content: 'Long response...' }],
stream: true,
}, {
signal: controller.signal,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
console.log(content);
}
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream cancelled by user');
}
}
}
// Cancel the stream
controller.abort();