Just a few lines of code to start testing and optimizing your AI models
// Configure API providers// And custom endpointsconst config: AIConfigResult =configureModels({openai: {apiKey: process.env.OPENAI_API_KEY// Adapted temperature can go here// Company ID can go here},anthropic: {apiKey: process.env.ANTHROPIC_KEY}// More models avaliable here...// We support custom endpoints too!});
const colorTest: AiTestDefinition =defineAiTest("color-question", {cases: {prompt: "What color is Big Bird?",match: "yellow"},maxPrice: 0.01,// Max cost per one million tokensminAccuracy: 0.90,// Minimum of 90% accuracymaxLatency: 500,// Max response time (ms)priority: "accuracy",// Prioritize accuracy to time & $retries: 3 // Model sample size});
// Using API keys// For data persistence// Running tests against all modelsawait runAiTests([colorTest, mathTests],apiKey: process.env.RADAR_API_KEY,);// Test reliability ensured:// Multi-model testing (25+ providers)// Auto-retries with configurable count// String/numeric validation// Specialized NLP comparison functions// CI/CD integration via GitHub Actions
// On app initialization:// Get all best models at onceconst models = initializeModels({apiKey: process.env.AI_TEST_API_KEY,tests: ["color-question", "math"]});// In production - no await neededconst colorModel = models.get("color-question");// { primary: "gpt-4o-mini",// secondary: "claude-3.5-haiku" }// Use the optimal model directlyconst response = getAIResponse("What color is Big Bird?",colorModel.primary);// yellow
Purpose-built for developers navigating expensive AI models and complex prompt engineering in a rapidly evolving landscape
Test against 25+ models simultaneously, including GPT-4, Claude, Gemini, Llama and more.
Pay for models most efficiently for the standards that you need, not that you know of.
Quality checks detect model drift and automatically switches to better models when needed.
Deploy in minutes like any other unit testing framework.
One API to rule them all. Connect to OpenAI, Anthropic, Google, Together, and even custom/self-hosted endpoints.
Define expected outputs with exact match, regex, embeddings, or custom validation functions.
Set thresholds for cost, accuracy, and latency - then optimize for what matters most
SET YOUR THRESHOLDS
PRIORITIZE
RECOMMENDED MODELS
METRICS AVAILABLE: