LlaDash API Documentation
The LlaDash API is a next-generation LLM routing API available without authentication keys. It dynamically combines models via Groq's inference engine to generate high-quality answers.
[Access Restriction] Requests with User-Agents under 10 chars, or from common scripts like curl, python, or node are blocked (403 Forbidden). Please set a custom User-Agent.
Architecture Overview
Up to 4 models work together per request:
1. Safety Guard (Llama-Prompt-Guard-2): Verifies prompt safety
2. Router (Llama-3.1-8b): Judges complexity & generates titles
3. Thinker (Qwen3-32b): Generates internal reasoning
4. Responder (Llama-4-Scout-17b): Generates final answer
Text Generation Endpoint
Send a prompt to receive AI inferences. Supports both GET and POST requests.
Query Parameters
| Name | Type | Description |
|---|---|---|
prompt | String | Max length is 500 chars. Exceeding this returns 400 Bad Request. |
json | Boolean | Returns results in JSON, including thinking process and title if generated. |
xml | Boolean | Returns results in XML format. |
stream | Boolean | Streams the response in real-time using Server-Sent Events (SSE). |
title | Boolean | Generates a short, localized title alongside the answer. |
Example Request (JSON + Title)
const url = 'https://api.ndnx.workers.dev/ai/What is AI?&json=true&title=true';
const response = await fetch(url, {
headers: { 'User-Agent': 'MyAwesomeApp/1.0' }
});
const data = await response.json();
console.log(data);
Example Response
{
"status": "success",
"model": "meta-llama/llama-4-scout-17b-16e-instruct",
"thinking": "The user is asking for a definition of AI...",
"title": "AIの定義",
"answer": "人工知能(AI)とは..."
}
Rate Limit Information
Retrieve global queue status and underlying LLM rate limit states.
Rate Limits
- Global Queue Limit: Max 20 requests per minute
- Llama-Prompt-Guard: 30 RPM / 15,000 TPM
- Router (Llama-3.1-8b): 30 RPM / 131,072 TPM
- Thinker (Qwen3-32b): 30 RPM / 6,000 TPM
- Responder (Llama-4-Scout): 30 RPM / 131,072 TPM
Error Responses
- 400 Bad Request: Prompt too long or invalid request.
- 403 Forbidden: Bot detected, missing User-Agent, or security block.
- 429 Too Many Requests: Global queue limit (20/min) exceeded.
- 500 Internal Server Error: Backend API error or timeout.