Skip to main content

REST API

All public Messages API v3 endpoints are under:

https://msg.hidoba.com

Authentication

Send a quota API key with each generation request. Bearer auth is preferred.

curl https://msg.hidoba.com/v3/chat/completions \
-H "Authorization: Bearer YOUR_QUOTA_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"google/gemini-2.5-flash","messages":[{"role":"user","content":"Say hello"}]}'

X-API-Key is also accepted:

X-API-Key: YOUR_QUOTA_API_KEY

Health

GET /health
GET /health/deep

Health endpoints are intended for uptime checks and deployment verification.

Chat Completions

POST /v3/chat/completions

The request body follows the OpenAI Chat Completions shape. Messages API v3 also accepts a small number of Hidoba-specific additions.

FieldTypeRequiredDescription
modelStringYesRequested model name, for example google/gemini-2.5-flash.
messagesArrayYesOpenAI-compatible chat messages. Use assistant for prior chatbot messages and user for user messages.
streamBooleanNoSet to true for streaming responses.
max_completion_tokensNumberNoPreferred output-token limit for compatible models.
max_tokensNumberNoCompatibility output-token limit.
reasoningObjectNoOptional thinking/reasoning control for supported models. See Reasoning.
fallback_modelStringNoOptional fallback model to try when supported. This is a Hidoba routing option, not part of the model conversation.
metadata.hidobaObjectNoHidoba character metadata. See Hidoba Metadata.

Example

curl https://msg.hidoba.com/v3/chat/completions \
-H "Authorization: Bearer $HIDOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "Hi, can you help me write concise replies?" },
{ "role": "assistant", "content": "Yes. I will keep replies brief and clear." },
{ "role": "user", "content": "Reply with exactly: hello" }
],
"max_completion_tokens": 32
}'

The response is an OpenAI-compatible chat completion response.

Responses API

POST /v3/responses

The request body follows the OpenAI Responses API shape.

FieldTypeRequiredDescription
modelStringYesRequested model name.
inputString or ArrayYesResponses API input.
instructionsStringNoAdditional system instructions. Character prompts and knowledge context may be added when configured.
streamBooleanNoSet to true for streaming responses when supported.
max_output_tokensNumberNoOutput-token limit for Responses API requests.
reasoningObjectNoOptional thinking/reasoning control for supported models. See Reasoning.
fallback_modelStringNoOptional fallback model to try when supported.
metadata.hidobaObjectNoHidoba character metadata. See Hidoba Metadata.

Example

curl https://msg.hidoba.com/v3/responses \
-H "Authorization: Bearer $HIDOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"input": "Write one concise sentence about Mars.",
"max_output_tokens": 80
}'

The response is an OpenAI-compatible Responses API response.

Responses API Visibility

When a /v3/responses request uses a character with knowledge context enabled, Hidoba adds OpenAI-shaped visibility items to the Responses output. These items are intended for user interfaces that want to show what is happening before the final answer arrives.

This visibility is added only for /v3/responses. Chat Completions responses keep the normal Chat Completions shape.

Output Items

When knowledge context runs, the Responses output array includes Hidoba-added items before the model answer:

  1. A reasoning item saying that Hidoba is looking through the knowledge base.
  2. A file_search_call item with the searches and retrieved source previews.
  3. A reasoning item saying how many relevant documents were found.
  4. The normal model output items.

Example:

{
"output": [
{
"id": "rs_hidoba_rag_loading_123",
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "Looking through the knowledge base..."
}
]
},
{
"id": "fs_hidoba_123",
"type": "file_search_call",
"status": "completed",
"queries": ["have you ever had a horse?"],
"results": [
{
"file_id": "file_hidoba_07d6ebeddf6a421c",
"filename": "Stephen Wolfram's Personal History with Animals and the Concept of Video Games for Pets",
"score": 0.111,
"text": "Q&A about Business, Innovation, and Managing Life with Stephen Wolfram...",
"attributes": {
"chunk_index": 0,
"retrievers": "dense,splade"
}
}
]
},
{
"id": "rs_hidoba_rag_found_123",
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "Found 1 relevant documents, thinking..."
}
]
}
]
}

If knowledge context runs and finds no sources, Hidoba still returns a completed file_search_call with results: [] and a status message such as Found 0 relevant documents, thinking....

Source Result Fields

Each file_search_call.results[] item may include:

FieldTypeDescription
file_idStringStable public source identifier for the returned result.
filenameStringHuman-readable source or document title.
scoreNumberRelevance score when available.
textStringShort source preview. Full knowledge chunks are not returned in public Responses output.
attributesObjectSafe source metadata, such as chunk_index and retrievers, when available.

The text field is a compact preview only. Hidoba trims surrounding whitespace, collapses repeated whitespace, and returns a short snippet of the retrieved chunk, currently up to 200 characters.

Streaming Events

For streaming /v3/responses calls, the same information is emitted as Responses stream events before the model answer.

Typical event order:

response.created
response.in_progress
response.output_item.added # reasoning: looking through knowledge base
response.output_item.done
response.file_search_call.in_progress
response.file_search_call.searching
response.file_search_call.completed
response.output_item.added # reasoning: found N documents
response.output_item.done
...model output events...
response.completed

Render response.file_search_call.completed.item.results as the source list. Render Hidoba reasoning.summary[].text as status text. Hidoba-added item IDs use identifiable prefixes such as rs_hidoba_... and fs_hidoba_....

Model Thinking Events

Some models also emit their own reasoning or thinking output when you use the reasoning request field. This is separate from Hidoba's knowledge-status messages.

In Responses streams, model thinking may appear as:

  • response.reasoning_text.delta
  • response.reasoning_text.done
  • response.content_part.done with part.type: "reasoning_text"
  • response.output_item.done for an item with type: "reasoning" and content[].type: "reasoning_text"

User interfaces should treat these as reasoning/thinking text, not as final answer text. Preserve whitespace when displaying reasoning text so headings and paragraphs remain readable.

Reasoning

Use reasoning when the requested model supports explicit thinking controls. Messages API v3 accepts the object and applies it to the model request; it is not used for Hidoba quota logic, character rendering, or knowledge retrieval.

Reasoning is separate from the answer output limit:

  • Chat Completions output limit: max_tokens or max_completion_tokens
  • Responses API output limit: max_output_tokens
  • Reasoning budget: reasoning.max_tokens

If reasoning is omitted, no explicit reasoning instruction is sent and the model default applies.

Turn reasoning off:

{
"reasoning": { "effort": "none" }
}

Use this shape to turn reasoning off.

Use an effort level:

{
"reasoning": { "effort": "low" }
}

The commonly supported effort values are none, low, medium, and high. To let the model choose its default reasoning behavior, omit the reasoning field.

Use a custom thinking-token budget:

{
"reasoning": { "max_tokens": 1024 }
}

Use this shape when you want to set an explicit reasoning-token budget.

Chat Completions example with low reasoning effort:

curl https://msg.hidoba.com/v3/chat/completions \
-H "Authorization: Bearer $HIDOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "I am comparing retrieval methods." },
{ "role": "assistant", "content": "Got it. I can compare them clearly and briefly." },
{ "role": "user", "content": "Explain semantic search in one concise paragraph." }
],
"max_tokens": 120,
"reasoning": { "effort": "low" }
}'

Responses API example with a custom reasoning budget:

curl https://msg.hidoba.com/v3/responses \
-H "Authorization: Bearer $HIDOBA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"input": "Compare semantic search and keyword search in three bullets.",
"max_output_tokens": 180,
"reasoning": { "max_tokens": 1024 }
}'

Streaming uses the same request fields:

{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "I want a short answer." },
{ "role": "assistant", "content": "Understood. I will be concise." },
{ "role": "user", "content": "What does a sunrise usually symbolize?" }
],
"stream": true,
"max_tokens": 120,
"reasoning": { "effort": "medium" }
}

Reasoning support depends on the selected model. Unsupported combinations may ignore the field or return a model error.

Errors

Generation requests may fail before reaching the model if the API key, quota, character, or metadata is invalid.

StatusMeaning
400Invalid request body, invalid metadata.hidoba, or character validation failure.
401Missing or invalid quota API key.
403Quota type or access is not allowed for this endpoint.
429Quota is exhausted.
5xxModel or service failure.

Example error:

{
"detail": {
"code": "invalid_api_key",
"message": "Invalid API key",
"context": {
"request_id": "7ed3d6f1-1a9f-4f3b-90d3-7c681a9d9fb8",
"endpoint": "/v3/chat/completions"
}
}
}

Model errors are returned in an OpenAI-compatible response shape when possible.