REST API
All public Messages API v3 endpoints are under:
https://msg.hidoba.com
Authentication
Send a quota API key with each generation request. Bearer auth is preferred.
curl https://msg.hidoba.com/v3/chat/completions \
-H "Authorization: Bearer YOUR_QUOTA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"google/gemini-2.5-flash","messages":[{"role":"user","content":"Say hello"}]}'
X-API-Key is also accepted:
X-API-Key: YOUR_QUOTA_API_KEY
Health
GET /health
GET /health/deep
Health endpoints are intended for uptime checks and deployment verification.
Chat Completions
POST /v3/chat/completions
The request body follows the OpenAI Chat Completions shape. Messages API v3 also accepts a small number of Hidoba-specific additions.
| Field | Type | Required | Description |
|---|---|---|---|
model | String | Yes | Requested model name, for example google/gemini-2.5-flash. |
messages | Array | Yes | OpenAI-compatible chat messages. Use assistant for prior chatbot messages and user for user messages. |
stream | Boolean | No | Set to true for streaming responses. |
max_completion_tokens | Number | No | Preferred output-token limit for compatible models. |
max_tokens | Number | No | Compatibility output-token limit. |
reasoning | Object | No | Optional thinking/reasoning control for supported models. See Reasoning. |
fallback_model | String | No | Optional fallback model to try when supported. This is a Hidoba routing option, not part of the model conversation. |
metadata.hidoba | Object | No | Hidoba character metadata. See Hidoba Metadata. |
Example
curl https://msg.hidoba.com/v3/chat/completions \
-H "Authorization: Bearer $HIDOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "Hi, can you help me write concise replies?" },
{ "role": "assistant", "content": "Yes. I will keep replies brief and clear." },
{ "role": "user", "content": "Reply with exactly: hello" }
],
"max_completion_tokens": 32
}'
The response is an OpenAI-compatible chat completion response.
Responses API
POST /v3/responses
The request body follows the OpenAI Responses API shape.
| Field | Type | Required | Description |
|---|---|---|---|
model | String | Yes | Requested model name. |
input | String or Array | Yes | Responses API input. |
instructions | String | No | Additional system instructions. Character prompts and knowledge context may be added when configured. |
stream | Boolean | No | Set to true for streaming responses when supported. |
max_output_tokens | Number | No | Output-token limit for Responses API requests. |
reasoning | Object | No | Optional thinking/reasoning control for supported models. See Reasoning. |
fallback_model | String | No | Optional fallback model to try when supported. |
metadata.hidoba | Object | No | Hidoba character metadata. See Hidoba Metadata. |
Example
curl https://msg.hidoba.com/v3/responses \
-H "Authorization: Bearer $HIDOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"input": "Write one concise sentence about Mars.",
"max_output_tokens": 80
}'
The response is an OpenAI-compatible Responses API response.
Responses API Visibility
When a /v3/responses request uses a character with knowledge context enabled, Hidoba adds OpenAI-shaped visibility items to the Responses output. These items are intended for user interfaces that want to show what is happening before the final answer arrives.
This visibility is added only for /v3/responses. Chat Completions responses keep the normal Chat Completions shape.
Output Items
When knowledge context runs, the Responses output array includes Hidoba-added items before the model answer:
- A
reasoningitem saying that Hidoba is looking through the knowledge base. - A
file_search_callitem with the searches and retrieved source previews. - A
reasoningitem saying how many relevant documents were found. - The normal model output items.
Example:
{
"output": [
{
"id": "rs_hidoba_rag_loading_123",
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "Looking through the knowledge base..."
}
]
},
{
"id": "fs_hidoba_123",
"type": "file_search_call",
"status": "completed",
"queries": ["have you ever had a horse?"],
"results": [
{
"file_id": "file_hidoba_07d6ebeddf6a421c",
"filename": "Stephen Wolfram's Personal History with Animals and the Concept of Video Games for Pets",
"score": 0.111,
"text": "Q&A about Business, Innovation, and Managing Life with Stephen Wolfram...",
"attributes": {
"chunk_index": 0,
"retrievers": "dense,splade"
}
}
]
},
{
"id": "rs_hidoba_rag_found_123",
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "Found 1 relevant documents, thinking..."
}
]
}
]
}
If knowledge context runs and finds no sources, Hidoba still returns a completed file_search_call with results: [] and a status message such as Found 0 relevant documents, thinking....
Source Result Fields
Each file_search_call.results[] item may include:
| Field | Type | Description |
|---|---|---|
file_id | String | Stable public source identifier for the returned result. |
filename | String | Human-readable source or document title. |
score | Number | Relevance score when available. |
text | String | Short source preview. Full knowledge chunks are not returned in public Responses output. |
attributes | Object | Safe source metadata, such as chunk_index and retrievers, when available. |
The text field is a compact preview only. Hidoba trims surrounding whitespace, collapses repeated whitespace, and returns a short snippet of the retrieved chunk, currently up to 200 characters.
Streaming Events
For streaming /v3/responses calls, the same information is emitted as Responses stream events before the model answer.
Typical event order:
response.created
response.in_progress
response.output_item.added # reasoning: looking through knowledge base
response.output_item.done
response.file_search_call.in_progress
response.file_search_call.searching
response.file_search_call.completed
response.output_item.added # reasoning: found N documents
response.output_item.done
...model output events...
response.completed
Render response.file_search_call.completed.item.results as the source list. Render Hidoba reasoning.summary[].text as status text. Hidoba-added item IDs use identifiable prefixes such as rs_hidoba_... and fs_hidoba_....
Model Thinking Events
Some models also emit their own reasoning or thinking output when you use the reasoning request field. This is separate from Hidoba's knowledge-status messages.
In Responses streams, model thinking may appear as:
response.reasoning_text.deltaresponse.reasoning_text.doneresponse.content_part.donewithpart.type: "reasoning_text"response.output_item.donefor an item withtype: "reasoning"andcontent[].type: "reasoning_text"
User interfaces should treat these as reasoning/thinking text, not as final answer text. Preserve whitespace when displaying reasoning text so headings and paragraphs remain readable.
Reasoning
Use reasoning when the requested model supports explicit thinking controls. Messages API v3 accepts the object and applies it to the model request; it is not used for Hidoba quota logic, character rendering, or knowledge retrieval.
Reasoning is separate from the answer output limit:
- Chat Completions output limit:
max_tokensormax_completion_tokens - Responses API output limit:
max_output_tokens - Reasoning budget:
reasoning.max_tokens
If reasoning is omitted, no explicit reasoning instruction is sent and the model default applies.
Turn reasoning off:
{
"reasoning": { "effort": "none" }
}
Use this shape to turn reasoning off.
Use an effort level:
{
"reasoning": { "effort": "low" }
}
The commonly supported effort values are none, low, medium, and high. To let the model choose its default reasoning behavior, omit the reasoning field.
Use a custom thinking-token budget:
{
"reasoning": { "max_tokens": 1024 }
}
Use this shape when you want to set an explicit reasoning-token budget.
Chat Completions example with low reasoning effort:
curl https://msg.hidoba.com/v3/chat/completions \
-H "Authorization: Bearer $HIDOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "I am comparing retrieval methods." },
{ "role": "assistant", "content": "Got it. I can compare them clearly and briefly." },
{ "role": "user", "content": "Explain semantic search in one concise paragraph." }
],
"max_tokens": 120,
"reasoning": { "effort": "low" }
}'
Responses API example with a custom reasoning budget:
curl https://msg.hidoba.com/v3/responses \
-H "Authorization: Bearer $HIDOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"input": "Compare semantic search and keyword search in three bullets.",
"max_output_tokens": 180,
"reasoning": { "max_tokens": 1024 }
}'
Streaming uses the same request fields:
{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "I want a short answer." },
{ "role": "assistant", "content": "Understood. I will be concise." },
{ "role": "user", "content": "What does a sunrise usually symbolize?" }
],
"stream": true,
"max_tokens": 120,
"reasoning": { "effort": "medium" }
}
Reasoning support depends on the selected model. Unsupported combinations may ignore the field or return a model error.
Errors
Generation requests may fail before reaching the model if the API key, quota, character, or metadata is invalid.
| Status | Meaning |
|---|---|
400 | Invalid request body, invalid metadata.hidoba, or character validation failure. |
401 | Missing or invalid quota API key. |
403 | Quota type or access is not allowed for this endpoint. |
429 | Quota is exhausted. |
5xx | Model or service failure. |
Example error:
{
"detail": {
"code": "invalid_api_key",
"message": "Invalid API key",
"context": {
"request_id": "7ed3d6f1-1a9f-4f3b-90d3-7c681a9d9fb8",
"endpoint": "/v3/chat/completions"
}
}
}
Model errors are returned in an OpenAI-compatible response shape when possible.