Introduction
Everything you need to integrate NeXeonAI into your application
NeXeonAI provides a unified API to access GPT-5, Claude, Gemini, DeepSeek, and 50+ models from a single endpoint. Drop-in compatible with OpenAI and Anthropic SDKs—switch your base URL and you're done.
Base URL
https://api.nexeonai.com/v1
Authentication
Bearer token or x-api-key header
OpenAI
GPT-5, GPT-4o, o1, o3
Anthropic
Claude Opus 4.5, Sonnet 4.5, Haiku 4.5
Gemini 2.5, Gemini 2.0
Quickstart
Get up and running in under 2 minutes
Get your API key
Create an account and generate an API key from your dashboard.
Install the SDK
Use the official OpenAI or Anthropic SDK—no custom packages needed.
pip install openaiMake your first request
Point the SDK to NeXeonAI and make a request.
from openai import OpenAI
client = OpenAI(
api_key="nex-...",
base_url="https://api.nexeonai.com/v1"
)
response = client.chat.completions.create(
model="gpt-5.2-chat-latest",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Authentication
Secure your API requests with API keys
All API requests require authentication via your API key. Include it in the request headers.
Bearer Token (OpenAI-compatible)
curl https://api.nexeonai.com/v1/chat/completions \
-H "Authorization: Bearer nex-..."x-api-key Header (Anthropic-compatible)
curl https://api.nexeonai.com/v1/messages \
-H "x-api-key: nex-..."Keep your API keys secure. Never expose them in client-side code or public repositories.
Models
List available models and their capabilities
/v1/modelsReturns a list of all available models across all providers, including pricing and context window information.
Request
curl https://api.nexeonai.com/v1/models \
-H "Authorization: Bearer nex-..."Response
{
"object": "list",
"data": [
{
"id": "gpt-5.2-chat-latest",
"object": "model",
"created": 1737921600,
"owned_by": "openai"
},
{
"id": "claude-sonnet-4-5-20250929",
"object": "model",
"created": 1735689600,
"owned_by": "anthropic"
},
{
"id": "claude-opus-4-5-20251101",
"object": "model",
"created": 1735689600,
"owned_by": "anthropic"
}
]
}Chat Completions
Generate conversational responses from any model
/v1/chat/completionsThe primary endpoint for generating AI responses. Compatible with OpenAI's chat completions format and works with all supported models.
Request Body
| Parameter | Type | Description |
|---|---|---|
modelrequired | string | Model ID (e.g., "gpt-5.2-chat-latest", "claude-sonnet-4-5-20250929") |
messagesrequired | array | Array of message objects with role and content |
max_tokens | integer | Maximum tokens to generate |
temperature | number | Sampling temperature (0-2) |
stream | boolean | Enable streaming responses |
top_p | number | Nucleus sampling parameter |
Request
curl https://api.nexeonai.com/v1/chat/completions \
-H "Authorization: Bearer nex-..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.2-chat-latest",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing briefly."}
],
"max_tokens": 256
}'Response
{
"id": "chatcmpl-nxn-abc123def456789",
"object": "chat.completion",
"created": 1737921600,
"model": "gpt-5.2-chat-latest",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits...",
"tool_calls": null
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 128,
"total_tokens": 152
}
}Responses API
OpenAI's new API for GPT-5+ reasoning models
/v1/responsesThe Responses API provides 3% better reasoning performance and up to 80% improved cache utilization compared to Chat Completions.
Optimized for GPT-5 and newer reasoning models with built-in tools, multi-turn support, and better performance.
Request Body
| Parameter | Type | Description |
|---|---|---|
modelrequired | string | Model ID (e.g., "gpt-5", "gpt-5.2-chat-latest") |
inputrequired | string | array | Input text or array of messages |
instructions | string | System-level instructions |
max_output_tokens | integer | Maximum tokens to generate |
tools | array | Built-in tools (web_search, file_search, code_interpreter) |
stream | boolean | Enable streaming responses |
Request
curl https://api.nexeonai.com/v1/responses \
-H "Authorization: Bearer nex-..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"instructions": "You are a helpful assistant.",
"input": "What is the weather in Tokyo?",
"tools": [{"type": "web_search"}]
}'Response
{
"id": "resp_nxn-abc123def456789",
"object": "response",
"created_at": 1737921600,
"model": "gpt-5",
"status": "completed",
"output": [
{
"type": "message",
"id": "msg_nxn-xyz789",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Based on current data..."
}
]
}
],
"usage": {
"input_tokens": 15,
"output_tokens": 48,
"total_tokens": 63
}
}Messages API
Anthropic-compatible endpoint for Claude and all models
/v1/messagesUse the Anthropic SDK with any model. Native Claude support plus automatic format bridging for OpenAI, Gemini, and DeepSeek models.
Request Body
| Parameter | Type | Description |
|---|---|---|
modelrequired | string | Model ID (works with any provider) |
messagesrequired | array | Messages in Anthropic format |
max_tokensrequired | integer | Maximum tokens to generate |
system | string | System prompt (separate from messages) |
temperature | number | Sampling temperature |
stream | boolean | Enable streaming responses |
Request
curl https://api.nexeonai.com/v1/messages \
-H "x-api-key: nex-..." \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-5-20250929",
"max_tokens": 1024,
"system": "You are a helpful assistant.",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'Response
{
"id": "msg_nxn-abc123def456789012345678",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-sonnet-4-5-20250929",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 12,
"output_tokens": 10
}
}Reasoning Models
Advanced models with chain-of-thought reasoning capabilities
Reasoning models like OpenAI's o1, o3, o4-mini and Claude's extended thinking provide superior problem-solving capabilities by "thinking" through complex tasks step-by-step before responding.
Reasoning models may take longer to respond but provide significantly better results for complex tasks like coding, math, logic puzzles, and multi-step analysis.
Available Reasoning Models
| Model | Provider | Best For |
|---|---|---|
| o1 | OpenAI | Complex reasoning, PhD-level tasks |
| o1-mini | OpenAI | Fast reasoning, coding tasks |
| o1-pro | OpenAI | Maximum reasoning depth |
| o3 | OpenAI | Next-gen reasoning |
| o3-mini | OpenAI | Fast next-gen reasoning |
| o4-mini | OpenAI | Latest compact reasoner |
| claude-opus-4-5-20251101 | Anthropic | Most capable, extended thinking |
| claude-sonnet-4-5-20250929 | Anthropic | Balanced reasoning + speed |
Key Differences from Standard Models
| Parameter | Type | Description |
|---|---|---|
max_completion_tokensrequired | integer | Use this instead of max_tokens for OpenAI reasoning models (o1, o3, o4) |
reasoning_effort | string | Control reasoning depth: "low", "medium", "high" (o1, o3 models) |
temperature | number | Fixed at 1 for reasoning models—parameter is ignored |
o1/o3 Request
curl https://api.nexeonai.com/v1/chat/completions \
-H "Authorization: Bearer nex-..." \
-H "Content-Type: application/json" \
-d '{
"model": "o1",
"messages": [
{"role": "user", "content": "Solve this step by step: If a train travels 120km in 2 hours, then stops for 30 minutes, then travels 90km in 1.5 hours, what is the average speed for the entire journey including the stop?"}
],
"max_completion_tokens": 4096,
"reasoning_effort": "high"
}'Response
{
"id": "chatcmpl-nxn-abc123def456789",
"object": "chat.completion",
"created": 1737921600,
"model": "o1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Let me solve this step by step...\n\nTotal distance: 120 + 90 = 210 km\nTotal time: 2 + 0.5 + 1.5 = 4 hours\nAverage speed: 210 / 4 = 52.5 km/h",
"tool_calls": null
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 48,
"completion_tokens": 256,
"total_tokens": 304
}
}Reasoning models do not support system messages or temperature parameters. Place all instructions in the user message.
Streaming
Real-time token-by-token responses
Enable streaming for real-time responses. The API sends Server-Sent Events (SSE) as tokens are generated, reducing time-to-first-token significantly.
Non-streaming timeout: Requests without stream: true have a 2 minute timeout. For reasoning models or long responses, always use streaming.
Streaming is recommended for user-facing applications. It provides a much better UX as users see responses as they're generated.
SSE Keep-Alive
During streaming, the API sends periodic keep-alive comments (: NX) every 300ms to prevent CDN timeouts. These are standard SSE comments and are automatically filtered by compliant clients.
from openai import OpenAI
client = OpenAI(
api_key="nex-...",
base_url="https://api.nexeonai.com/v1"
)
stream = client.chat.completions.create(
model="gpt-5.2-chat-latest",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Extended Thinking
Claude's chain-of-thought reasoning with visible thinking process
Extended thinking allows Claude models to work through complex problems step-by-step, showing their reasoning process. This is especially powerful for coding, math, analysis, and complex multi-step tasks.
Extended thinking is available on Claude Opus 4.5 and Sonnet 4.5 models via the Messages API. The thinking process is returned in a separate content block.
How Extended Thinking Works
Enable thinking
Set thinking.type to "enabled" with a budget_tokens value
Claude thinks
Model reasons through the problem internally
Get response
Receive both thinking and final answer
Request Parameters
| Parameter | Type | Description |
|---|---|---|
thinking.typerequired | string | Set to "enabled" to activate extended thinking |
thinking.budget_tokensrequired | integer | Maximum tokens for thinking (1024-32768). Higher = deeper reasoning |
max_tokensrequired | integer | Maximum tokens for the final response (separate from thinking budget) |
Request with Extended Thinking
curl https://api.nexeonai.com/v1/messages \
-H "x-api-key: nex-..." \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-opus-4-5-20251101",
"max_tokens": 8192,
"thinking": {
"type": "enabled",
"budget_tokens": 8000
},
"messages": [
{
"role": "user",
"content": "Write a Python function to find the longest palindromic substring in O(n) time using Manacher algorithm. Explain your approach."
}
]
}'Response with Thinking
{
"id": "msg_nxn-abc123def456789012345678",
"type": "message",
"role": "assistant",
"content": [
{
"type": "thinking",
"thinking": "Let me think through Manacher's algorithm...\n\nThe key insight is that palindromes have mirror properties..."
},
{
"type": "text",
"text": "Here's an implementation of Manacher's algorithm...\n\n```python\ndef longest_palindrome(s: str) -> str:\n t = '#' + '#'.join(s) + '#'\n ..."
}
],
"model": "claude-opus-4-5-20251101",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 42,
"output_tokens": 1847
}
}Python SDK Example
from anthropic import Anthropic
client = Anthropic(
api_key="nex-...",
base_url="https://api.nexeonai.com"
)
response = client.messages.create(
model="claude-opus-4-5-20251101",
max_tokens=8192,
thinking={
"type": "enabled",
"budget_tokens": 8000 # Allow up to 8k tokens for reasoning
},
messages=[
{
"role": "user",
"content": "Analyze this code for security vulnerabilities and suggest fixes."
}
]
)
# Access thinking and response separately
for block in response.content:
if block.type == "thinking":
print("=== Claude's Reasoning ===")
print(block.thinking)
elif block.type == "text":
print("\n=== Final Answer ===")
print(block.text)For complex tasks, use higher budget_tokens (8000-16000). For simpler tasks where you just want better accuracy, 2000-4000 tokens is usually sufficient.
Extended thinking tokens are billed at the same rate as output tokens. Monitor your thinking budget to control costs.
Error Handling
Understand and handle API errors gracefully
The API uses standard HTTP status codes. Errors include a JSON body with details about what went wrong.
HTTP Status Codes
| Code | Status | Description |
|---|---|---|
| 200 | OK | Request succeeded |
| 400 | Bad Request | Invalid request body or parameters |
| 401 | Unauthorized | Invalid or missing API key |
| 402 | Payment Required | Insufficient credits |
| 403 | Forbidden | API key lacks permission |
| 404 | Not Found | Model or resource not found |
| 408 | Request Timeout | Non-streaming request exceeded 2 minute limit |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Server Error | Internal server error |
| 502 | Bad Gateway | Upstream provider error |
Error Response Format
{
"error": {
"type": "invalid_request_error",
"message": "Model 'unknown-model' is not available",
"param": "model",
"code": "model_not_found"
}
}Non-Streaming Timeout
Non-streaming requests have a 2 minute timeout. For long-running requests (reasoning models, complex tasks), we recommend using streaming mode for better reliability.
{
"detail": {
"type": "timeout_error",
"message": "Request timed out after 120 seconds. For long-running requests, we recommend using streaming mode (stream: true) for better reliability."
}
}Always use stream: true for reasoning models (o1, o3, o4) and extended thinking requests to avoid timeouts.
SSE Keep-Alive Comments
During streaming responses, the API sends periodic keep-alive comments (: NX) to prevent connection timeouts. These are standard SSE comments and should be ignored by your client.
: NX
data: {"id":"chatcmpl-nxn-abc123","object":"chat.completion.chunk",...}
: NX
data: {"id":"chatcmpl-nxn-abc123","object":"chat.completion.chunk",...}
data: [DONE]Keep-alive comments are sent every 300ms during periods of inactivity to prevent CDN/proxy timeouts (e.g., Cloudflare 524 errors). SSE-compatible clients automatically filter these out.
Rate Limits
Understand API usage limits
Rate limits are applied per account to ensure fair usage and platform stability.
| Limit Type | Default |
|---|---|
| Requests per minute | 1,000 RPM |
If you hit rate limits or need higher throughput for your application, reach out to our team and we can adjust your limits.
Need higher rate limits? Join our Discord and open a support ticket.