FIND THE OPTIMAL AI MODEL FOR EVERY TASK

Unit test your prompts across all AI models and automatically use the best one in production. Save money, save time, improve quality.

Updates when models release
Flexible storage options
Lightweight, 0 dependencies
Open source, fully transparent

SIMPLE IMPLEMENTATION

Just a few lines of code to start testing and optimizing your AI models

1. CREATE INSTANCE
// Configure API providers
// And custom endpoints
const config: AIConfigResult =
configureModels({
openai: {
apiKey: process.env.OPENAI_API_KEY
// Adapted temperature can go here
// Company ID can go here
},
anthropic: {
apiKey: process.env.ANTHROPIC_KEY
}
// More models avaliable here...
// We support custom endpoints too!
});
2. BUILD TESTS
const colorTest: AiTestDefinition =
defineAiTest("color-question", {
cases: {
prompt: "What color is Big Bird?",
match: "yellow"
},
maxPrice: 0.01,
// Max cost per one million tokens
minAccuracy: 0.90,
// Minimum of 90% accuracy
maxLatency: 500,
// Max response time (ms)
priority: "accuracy",
// Prioritize accuracy to time & $
retries: 3 // Model sample size
});
3. RUN TESTS
// Using API keys
// For data persistence
// Running tests against all models
await runAiTests(
[colorTest, mathTests],
apiKey: process.env.RADAR_API_KEY,
);
// Test reliability ensured:
// Multi-model testing (25+ providers)
// Auto-retries with configurable count
// String/numeric validation
// Specialized NLP comparison functions
// CI/CD integration via GitHub Actions
4. GET TOP MODEL IN PROD
// On app initialization:
// Get all best models at once
const models = initializeModels({
apiKey: process.env.AI_TEST_API_KEY,
tests: ["color-question", "math"]
});
// In production - no await needed
const colorModel = models.get("color-question");
// { primary: "gpt-4o-mini",
// secondary: "claude-3.5-haiku" }
// Use the optimal model directly
const response = getAIResponse(
"What color is Big Bird?",
colorModel.primary
);
// yellow

KEY CAPABILITIES

Purpose-built for developers navigating expensive AI models and complex prompt engineering in a rapidly evolving landscape

MULTI-MODEL TESTING

Test against 25+ models simultaneously, including GPT-4, Claude, Gemini, Llama and more.

COST SAVINGS

Pay for models most efficiently for the standards that you need, not that you know of.

QUALITY MONITORING

Quality checks detect model drift and automatically switches to better models when needed.

SIMPLE INTEGRATION

Deploy in minutes like any other unit testing framework.

ALL MAJOR PROVIDERS

One API to rule them all. Connect to OpenAI, Anthropic, Google, Together, and even custom/self-hosted endpoints.

ADVANCED TESTING

Define expected outputs with exact match, regex, embeddings, or custom validation functions.

MODEL SELECTION MATRIX

Set thresholds for cost, accuracy, and latency - then optimize for what matters most

OPTIMIZATION DASHBOARD

SET YOUR THRESHOLDS

ACCURACY MINIMUM90%
MAX COST (PER 1K TOKENS)$0.005
MAX LATENCY500ms

PRIORITIZE

COST
ACCURACY
LATENCY

RECOMMENDED MODELS

CLAUDE-3-HAIKUPRIMARY
ACCURACY
93.6%
COST
$0.00025
LATENCY
230ms
GEMINI-1.5-PROBACKUP
ACCURACY
92.1%
COST
$0.0007
LATENCY
320ms

METRICS AVAILABLE:

String MatchNumeric Comparison% DifferenceSemantic Similarity+ More NLP Metrics
AUTO-SWITCHING ENABLED
PROJECTED YEARLY SAVINGS: $X
Ready to use the all-in-one prompt testing suite?

START OPTIMIZING TODAY