> This page location: Backend > AI Gateway > APIs > Anthropic Messages API
> Full Neon documentation index: https://neon.com/docs/llms.txt

> Summary: The Anthropic Messages endpoint lets you use the Anthropic SDK with Neon AI Gateway by changing only the base URL. Supports streaming, prompt caching, and extended thinking on Claude models.

# Anthropic Messages API

Use the Anthropic SDK with Neon AI Gateway

**Coming Soon: Private Preview**

This feature is in private preview: it's not ready for production use, and it may be briefly unavailable as we deploy updates. To get access, [sign up here](https://neon.com/blog/were-building-backends#access).

The Anthropic Messages endpoint exposes the [Anthropic Messages API](https://docs.anthropic.com/en/api/messages) through Neon AI Gateway. Use it when you need extended thinking or prompt caching, which require the native Anthropic SDK. For standard completions, the [chat completions](https://neon.com/docs/ai-gateway/chat-completions) endpoint works with all Anthropic models and doesn't require the Anthropic SDK.

**Base URL:** `https://<branch-host>/ai-gateway/anthropic`

**Note:** The Anthropic SDK appends `/v1/messages` to the base URL automatically. Set the base URL to `/ai-gateway/anthropic` (without `/v1`).

## Supported models

This endpoint accepts Anthropic models only. See the [AI Gateway catalog](https://neon.com/docs/ai-gateway/models) for the full list. Supported models:

- `claude-opus-4-8`, `claude-opus-4-7`, `claude-opus-4-6`, `claude-opus-4-5`
- `claude-sonnet-4-6`
- `claude-haiku-4-5`

Sending a non-Anthropic model ID returns `400 model is not available on this endpoint`. Use the [chat completions endpoint](https://neon.com/docs/ai-gateway/chat-completions) if you need to call multiple providers from the same code.

## Basic request

**TypeScript (Anthropic SDK)**

```typescript
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: process.env.NEON_AI_GATEWAY_TOKEN,
  baseURL: `${process.env.NEON_AI_GATEWAY_BASE_URL}/ai-gateway/anthropic`,
});

const message = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'What is Neon?' }],
});

console.log(message.content[0].text);
```

**Python (Anthropic SDK)**

```python
import anthropic
import os

client = anthropic.Anthropic(
    api_key=os.environ['NEON_AI_GATEWAY_TOKEN'],
    base_url=f"{os.environ['NEON_AI_GATEWAY_BASE_URL']}/ai-gateway/anthropic",
)

message = client.messages.create(
    model='claude-sonnet-4-6',
    max_tokens=1024,
    messages=[{'role': 'user', 'content': 'What is Neon?'}],
)

print(message.content[0].text)
```

**cURL**

```bash
curl -X POST "$NEON_AI_GATEWAY_BASE_URL/ai-gateway/anthropic/v1/messages" \
  -H "Authorization: Bearer $NEON_AI_GATEWAY_TOKEN" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "What is Neon?"}]
  }'
```

## Streaming

Streaming works the same as with the Anthropic SDK directly. Use `client.messages.stream()` or pass `"stream": true` in a cURL request. The only change from standard usage is `base_url`.

## Prompt caching

The gateway forwards the `cache_control` field to Anthropic unchanged. Prompt caching works exactly as described in the [Anthropic prompt caching docs](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching).

**TypeScript (Anthropic SDK)**

```typescript
const message = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  system: [
    { type: 'text', text: 'You are a helpful assistant.' },
    {
      type: 'text',
      text: longDocumentContent,
      cache_control: { type: 'ephemeral' },
    },
  ],
  messages: [{ role: 'user', content: 'Summarize the key points.' }],
});

console.log(message.usage);
// { input_tokens: 50, output_tokens: 200,
//   cache_creation_input_tokens: 10000, cache_read_input_tokens: 0 }
```

**Python (Anthropic SDK)**

```python
message = client.messages.create(
    model='claude-sonnet-4-6',
    max_tokens=1024,
    system=[
        {'type': 'text', 'text': 'You are a helpful assistant.'},
        {
            'type': 'text',
            'text': long_document_content,
            'cache_control': {'type': 'ephemeral'},
        },
    ],
    messages=[{'role': 'user', 'content': 'Summarize the key points.'}],
)

print(message.usage)
# input_tokens=50, output_tokens=200,
# cache_creation_input_tokens=10000, cache_read_input_tokens=0
```

## Extended thinking

The gateway forwards the `thinking` parameter to Anthropic unchanged. Set `budget_tokens` to control how many tokens Claude can use for thinking. `max_tokens` must be greater than `budget_tokens`.

**TypeScript (Anthropic SDK)**

```typescript
const message = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 16000,
  thinking: {
    type: 'enabled',
    budget_tokens: 10000,
  },
  messages: [{ role: 'user', content: 'Design a database schema for a multi-tenant SaaS app.' }],
});

for (const block of message.content) {
  if (block.type === 'thinking') {
    console.log('Thinking:', block.thinking);
  } else if (block.type === 'text') {
    console.log(block.text);
  }
}
```

**Python (Anthropic SDK)**

```python
message = client.messages.create(
    model='claude-sonnet-4-6',
    max_tokens=16000,
    thinking={
        'type': 'enabled',
        'budget_tokens': 10000,
    },
    messages=[{'role': 'user', 'content': 'Design a database schema for a multi-tenant SaaS app.'}],
)

for block in message.content:
    if block.type == 'thinking':
        print('Thinking:', block.thinking)
    elif block.type == 'text':
        print(block.text)
```

## Forwarded headers

The gateway forwards these request headers to the upstream provider:
`Accept`, `Anthropic-Beta`, `Anthropic-Version`, `Content-Type`, `User-Agent`.

All other headers are stripped. The `Authorization` header is replaced with the workspace credential before forwarding. Your `NEON_AI_GATEWAY_TOKEN` is never sent to Anthropic directly.

## Error handling

| Status            | Message                                   | Cause                                     |
| ----------------- | ----------------------------------------- | ----------------------------------------- |
| `400 Bad Request` | `unknown model`                           | Model ID not in the catalog               |
| `400 Bad Request` | `model is not available on this endpoint` | Non-Anthropic model sent to this endpoint |

For authentication, quota, and upstream errors, see [Troubleshooting](https://neon.com/docs/ai-gateway/troubleshooting).

## Next steps

- [Models](https://neon.com/docs/ai-gateway/models): full model catalog
- [Chat completions](https://neon.com/docs/ai-gateway/chat-completions): use any model including Anthropic via the unified endpoint
- [Authentication](https://neon.com/docs/ai-gateway/authentication): credential scopes and branch binding

---

## Related docs (APIs)

- [Chat completions](https://neon.com/docs/ai-gateway/chat-completions)
- [OpenAI Responses API](https://neon.com/docs/ai-gateway/openai-responses)
- [Gemini API](https://neon.com/docs/ai-gateway/gemini)
