LlaDash API Documentation

The LlaDash API is a next-generation LLM routing API available without authentication keys. It dynamically combines models via Groq's inference engine to generate high-quality answers.

[Access Restriction] Requests with User-Agents under 10 chars, or from common scripts like curl, python, or node are blocked (403 Forbidden). Please set a custom User-Agent.

Architecture Overview

Up to 4 models work together per request:
1. Safety Guard (Llama-Prompt-Guard-2): Verifies prompt safety
2. Router (Llama-3.1-8b): Judges complexity & generates titles
3. Thinker (Qwen3-32b): Generates internal reasoning
4. Responder (Llama-4-Scout-17b): Generates final answer

Text Generation Endpoint

Send a prompt to receive AI inferences. Supports both GET and POST requests.

GET POST /ai/{prompt}

Query Parameters

Name	Type	Description
`prompt`	String	Max length is 500 chars. Exceeding this returns 400 Bad Request.
`json`	Boolean	Returns results in JSON, including thinking process and title if generated.
`xml`	Boolean	Returns results in XML format.
`stream`	Boolean	Streams the response in real-time using Server-Sent Events (SSE).
`title`	Boolean	Generates a short, localized title alongside the answer.

Example Request (JSON + Title)

const url = 'https://api.ndnx.workers.dev/ai/What is AI?&json=true&title=true';
const response = await fetch(url, {
  headers: { 'User-Agent': 'MyAwesomeApp/1.0' }
});
const data = await response.json();
console.log(data);

Example Response

{
  "status": "success",
  "model": "meta-llama/llama-4-scout-17b-16e-instruct",
  "thinking": "The user is asking for a definition of AI...",
  "title": "AIの定義",
  "answer": "人工知能（AI）とは..."
}

Rate Limit Information

Retrieve global queue status and underlying LLM rate limit states.

GET /ai/rate

Rate Limits

Global Queue Limit: Max 20 requests per minute
Llama-Prompt-Guard: 30 RPM / 15,000 TPM
Router (Llama-3.1-8b): 30 RPM / 131,072 TPM
Thinker (Qwen3-32b): 30 RPM / 6,000 TPM
Responder (Llama-4-Scout): 30 RPM / 131,072 TPM

Error Responses

400 Bad Request: Prompt too long or invalid request.
403 Forbidden: Bot detected, missing User-Agent, or security block.
429 Too Many Requests: Global queue limit (20/min) exceeded.
500 Internal Server Error: Backend API error or timeout.