NeXeonAI - AI Without Limits

Introduction

Everything you need to integrate NeXeonAI into your application

NeXeonAI provides a unified API to access GPT-5, Claude, Gemini, DeepSeek, and 50+ models from a single endpoint. Drop-in compatible with OpenAI and Anthropic SDKs—switch your base URL and you're done.

Base URL

https://api.nexeonai.com/v1

Authentication

Bearer token or x-api-key header

OpenAI

GPT-5, GPT-4o, o1, o3

Anthropic

Claude Opus 4.5, Sonnet 4.5, Haiku 4.5

Google

Gemini 2.5, Gemini 2.0

Quickstart

Get up and running in under 2 minutes

Get your API key

Create an account and generate an API key from your dashboard.

Install the SDK

Use the official OpenAI or Anthropic SDK—no custom packages needed.

pip install openai

Make your first request

Point the SDK to NeXeonAI and make a request.

from openai import OpenAI

client = OpenAI(
    api_key="nex-...",
    base_url="https://api.nexeonai.com/v1"
)

response = client.chat.completions.create(
    model="gpt-5.2-chat-latest",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Authentication

Secure your API requests with API keys

All API requests require authentication via your API key. Include it in the request headers.

Bearer Token (OpenAI-compatible)

bash

curl https://api.nexeonai.com/v1/chat/completions \
  -H "Authorization: Bearer nex-..."

x-api-key Header (Anthropic-compatible)

bash

curl https://api.nexeonai.com/v1/messages \
  -H "x-api-key: nex-..."

Keep your API keys secure. Never expose them in client-side code or public repositories.

Models

List available models and their capabilities

GET/v1/models

Returns a list of all available models across all providers, including pricing and context window information.

Request

bash

curl https://api.nexeonai.com/v1/models \
  -H "Authorization: Bearer nex-..."

Response

json

{
  "object": "list",
  "data": [
    {
      "id": "gpt-5.2-chat-latest",
      "object": "model",
      "created": 1737921600,
      "owned_by": "openai"
    },
    {
      "id": "claude-sonnet-4-5-20250929",
      "object": "model",
      "created": 1735689600,
      "owned_by": "anthropic"
    },
    {
      "id": "claude-opus-4-5-20251101",
      "object": "model",
      "created": 1735689600,
      "owned_by": "anthropic"
    }
  ]
}

Chat Completions

Generate conversational responses from any model

POST/v1/chat/completions

The primary endpoint for generating AI responses. Compatible with OpenAI's chat completions format and works with all supported models.

Request Body

Parameter	Type	Description
`model`required	string	Model ID (e.g., "gpt-5.2-chat-latest", "claude-sonnet-4-5-20250929")
`messages`required	array	Array of message objects with role and content
`max_tokens`	integer	Maximum tokens to generate
`temperature`	number	Sampling temperature (0-2)
`stream`	boolean	Enable streaming responses
`top_p`	number	Nucleus sampling parameter

Request

bash

curl https://api.nexeonai.com/v1/chat/completions \
  -H "Authorization: Bearer nex-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2-chat-latest",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing briefly."}
    ],
    "max_tokens": 256
  }'

Response

json

{
  "id": "chatcmpl-nxn-abc123def456789",
  "object": "chat.completion",
  "created": 1737921600,
  "model": "gpt-5.2-chat-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits...",
        "tool_calls": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 128,
    "total_tokens": 152
  }
}

Responses API

OpenAI's new API for GPT-5+ reasoning models

POST/v1/responses

The Responses API provides 3% better reasoning performance and up to 80% improved cache utilization compared to Chat Completions.

Optimized for GPT-5 and newer reasoning models with built-in tools, multi-turn support, and better performance.

Request Body

Parameter	Type	Description
`model`required	string	Model ID (e.g., "gpt-5", "gpt-5.2-chat-latest")
`input`required	string \| array	Input text or array of messages
`instructions`	string	System-level instructions
`max_output_tokens`	integer	Maximum tokens to generate
`tools`	array	Built-in tools (web_search, file_search, code_interpreter)
`stream`	boolean	Enable streaming responses

Request

bash

curl https://api.nexeonai.com/v1/responses \
  -H "Authorization: Bearer nex-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "instructions": "You are a helpful assistant.",
    "input": "What is the weather in Tokyo?",
    "tools": [{"type": "web_search"}]
  }'

Response

json

{
  "id": "resp_nxn-abc123def456789",
  "object": "response",
  "created_at": 1737921600,
  "model": "gpt-5",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_nxn-xyz789",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Based on current data..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 48,
    "total_tokens": 63
  }
}

Messages API

Anthropic-compatible endpoint for Claude and all models

POST/v1/messages

Use the Anthropic SDK with any model. Native Claude support plus automatic format bridging for OpenAI, Gemini, and DeepSeek models.

Request Body

Parameter	Type	Description
`model`required	string	Model ID (works with any provider)
`messages`required	array	Messages in Anthropic format
`max_tokens`required	integer	Maximum tokens to generate
`system`	string	System prompt (separate from messages)
`temperature`	number	Sampling temperature
`stream`	boolean	Enable streaming responses

Request

bash

curl https://api.nexeonai.com/v1/messages \
  -H "x-api-key: nex-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "max_tokens": 1024,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Response

json

{
  "id": "msg_nxn-abc123def456789012345678",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-sonnet-4-5-20250929",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 10
  }
}

Reasoning Models

Advanced models with chain-of-thought reasoning capabilities

Reasoning models like OpenAI's o1, o3, o4-mini and Claude's extended thinking provide superior problem-solving capabilities by "thinking" through complex tasks step-by-step before responding.

Reasoning models may take longer to respond but provide significantly better results for complex tasks like coding, math, logic puzzles, and multi-step analysis.

Available Reasoning Models

Model	Provider	Best For
o1	OpenAI	Complex reasoning, PhD-level tasks
o1-mini	OpenAI	Fast reasoning, coding tasks
o1-pro	OpenAI	Maximum reasoning depth
o3	OpenAI	Next-gen reasoning
o3-mini	OpenAI	Fast next-gen reasoning
o4-mini	OpenAI	Latest compact reasoner
claude-opus-4-5-20251101	Anthropic	Most capable, extended thinking
claude-sonnet-4-5-20250929	Anthropic	Balanced reasoning + speed

Key Differences from Standard Models

Parameter	Type	Description
`max_completion_tokens`required	integer	Use this instead of max_tokens for OpenAI reasoning models (o1, o3, o4)
`reasoning_effort`	string	Control reasoning depth: "low", "medium", "high" (o1, o3 models)
`temperature`	number	Fixed at 1 for reasoning models—parameter is ignored

o1/o3 Request

bash

curl https://api.nexeonai.com/v1/chat/completions \
  -H "Authorization: Bearer nex-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o1",
    "messages": [
      {"role": "user", "content": "Solve this step by step: If a train travels 120km in 2 hours, then stops for 30 minutes, then travels 90km in 1.5 hours, what is the average speed for the entire journey including the stop?"}
    ],
    "max_completion_tokens": 4096,
    "reasoning_effort": "high"
  }'

Response

json

{
  "id": "chatcmpl-nxn-abc123def456789",
  "object": "chat.completion",
  "created": 1737921600,
  "model": "o1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Let me solve this step by step...\n\nTotal distance: 120 + 90 = 210 km\nTotal time: 2 + 0.5 + 1.5 = 4 hours\nAverage speed: 210 / 4 = 52.5 km/h",
        "tool_calls": null
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 48,
    "completion_tokens": 256,
    "total_tokens": 304
  }
}

Reasoning models do not support system messages or temperature parameters. Place all instructions in the user message.

Streaming

Real-time token-by-token responses

Enable streaming for real-time responses. The API sends Server-Sent Events (SSE) as tokens are generated, reducing time-to-first-token significantly.

Non-streaming timeout: Requests without stream: true have a 2 minute timeout. For reasoning models or long responses, always use streaming.

Streaming is recommended for user-facing applications. It provides a much better UX as users see responses as they're generated.

SSE Keep-Alive

During streaming, the API sends periodic keep-alive comments (: NX) every 300ms to prevent CDN timeouts. These are standard SSE comments and are automatically filtered by compliant clients.

from openai import OpenAI

client = OpenAI(
    api_key="nex-...",
    base_url="https://api.nexeonai.com/v1"
)

stream = client.chat.completions.create(
    model="gpt-5.2-chat-latest",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Extended Thinking

Claude's chain-of-thought reasoning with visible thinking process

Extended thinking allows Claude models to work through complex problems step-by-step, showing their reasoning process. This is especially powerful for coding, math, analysis, and complex multi-step tasks.

Extended thinking is available on Claude Opus 4.5 and Sonnet 4.5 models via the Messages API. The thinking process is returned in a separate content block.

How Extended Thinking Works

Enable thinking

Set thinking.type to "enabled" with a budget_tokens value

Claude thinks

Model reasons through the problem internally

Get response

Receive both thinking and final answer

Request Parameters

Parameter	Type	Description
`thinking.type`required	string	Set to "enabled" to activate extended thinking
`thinking.budget_tokens`required	integer	Maximum tokens for thinking (1024-32768). Higher = deeper reasoning
`max_tokens`required	integer	Maximum tokens for the final response (separate from thinking budget)

Request with Extended Thinking

bash

curl https://api.nexeonai.com/v1/messages \
  -H "x-api-key: nex-..." \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 8192,
    "thinking": {
      "type": "enabled",
      "budget_tokens": 8000
    },
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function to find the longest palindromic substring in O(n) time using Manacher algorithm. Explain your approach."
      }
    ]
  }'

Response with Thinking

json

{
  "id": "msg_nxn-abc123def456789012345678",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me think through Manacher's algorithm...\n\nThe key insight is that palindromes have mirror properties..."
    },
    {
      "type": "text",
      "text": "Here's an implementation of Manacher's algorithm...\n\n```python\ndef longest_palindrome(s: str) -> str:\n    t = '#' + '#'.join(s) + '#'\n    ..."
    }
  ],
  "model": "claude-opus-4-5-20251101",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 42,
    "output_tokens": 1847
  }
}

Python SDK Example

python

from anthropic import Anthropic

client = Anthropic(
    api_key="nex-...",
    base_url="https://api.nexeonai.com"
)

response = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=8192,
    thinking={
        "type": "enabled",
        "budget_tokens": 8000  # Allow up to 8k tokens for reasoning
    },
    messages=[
        {
            "role": "user",
            "content": "Analyze this code for security vulnerabilities and suggest fixes."
        }
    ]
)

# Access thinking and response separately
for block in response.content:
    if block.type == "thinking":
        print("=== Claude's Reasoning ===")
        print(block.thinking)
    elif block.type == "text":
        print("\n=== Final Answer ===")
        print(block.text)

For complex tasks, use higher budget_tokens (8000-16000). For simpler tasks where you just want better accuracy, 2000-4000 tokens is usually sufficient.

Extended thinking tokens are billed at the same rate as output tokens. Monitor your thinking budget to control costs.

Error Handling

Understand and handle API errors gracefully

The API uses standard HTTP status codes. Errors include a JSON body with details about what went wrong.

HTTP Status Codes

Code	Status	Description
200	OK	Request succeeded
400	Bad Request	Invalid request body or parameters
401	Unauthorized	Invalid or missing API key
402	Payment Required	Insufficient credits
403	Forbidden	API key lacks permission
404	Not Found	Model or resource not found
408	Request Timeout	Non-streaming request exceeded 2 minute limit
429	Too Many Requests	Rate limit exceeded
500	Server Error	Internal server error
502	Bad Gateway	Upstream provider error

Error Response Format

json

{
  "error": {
    "type": "invalid_request_error",
    "message": "Model 'unknown-model' is not available",
    "param": "model",
    "code": "model_not_found"
  }
}

Non-Streaming Timeout

Non-streaming requests have a 2 minute timeout. For long-running requests (reasoning models, complex tasks), we recommend using streaming mode for better reliability.

json

{
  "detail": {
    "type": "timeout_error",
    "message": "Request timed out after 120 seconds. For long-running requests, we recommend using streaming mode (stream: true) for better reliability."
  }
}

Always use stream: true for reasoning models (o1, o3, o4) and extended thinking requests to avoid timeouts.

SSE Keep-Alive Comments

During streaming responses, the API sends periodic keep-alive comments (: NX) to prevent connection timeouts. These are standard SSE comments and should be ignored by your client.

text

: NX

data: {"id":"chatcmpl-nxn-abc123","object":"chat.completion.chunk",...}

: NX

data: {"id":"chatcmpl-nxn-abc123","object":"chat.completion.chunk",...}

data: [DONE]

Keep-alive comments are sent every 300ms during periods of inactivity to prevent CDN/proxy timeouts (e.g., Cloudflare 524 errors). SSE-compatible clients automatically filter these out.

Rate Limits

Understand API usage limits

Rate limits are applied per account to ensure fair usage and platform stability.

Limit Type	Default
Requests per minute	1,000 RPM

If you hit rate limits or need higher throughput for your application, reach out to our team and we can adjust your limits.

Need higher rate limits? Join our Discord and open a support ticket.

Desktop Only

Introduction

Quickstart

Get your API key

Install the SDK

Make your first request

Authentication

Bearer Token (OpenAI-compatible)

x-api-key Header (Anthropic-compatible)

Models

Request

Response

Chat Completions

Request Body

Request

Response

Responses API

Request Body

Request

Response

Messages API

Request Body

Request

Response

Reasoning Models

Available Reasoning Models

Key Differences from Standard Models

o1/o3 Request

Response

Streaming

SSE Keep-Alive

Extended Thinking

How Extended Thinking Works

Request Parameters

Request with Extended Thinking

Response with Thinking

Python SDK Example

Error Handling

HTTP Status Codes

Error Response Format

Non-Streaming Timeout

SSE Keep-Alive Comments

Rate Limits