FIND THE OPTIMAL AI MODELAI MODEL FOR EVERY TASKTASK

Unit test your prompts across all AI models and automatically use the best one in production. Save money, save time, improve quality.

Updates when models release

Flexible storage options

Lightweight, 0 dependencies

Open source, fully transparent

SIMPLE IMPLEMENTATION

Just a few lines of code to start testing and optimizing your AI models

1. CREATE INSTANCE


// Configure API providers
// And custom endpoints
const config: AIConfigResult = 
  configureModels({
   openai: { 
     apiKey: process.env.OPENAI_API_KEY
     // Adapted temperature can go here
     // Company ID can go here
   },
   anthropic: { 
     apiKey: process.env.ANTHROPIC_KEY
   }
  // More models avaliable here...
  // We support custom endpoints too!
});

2. BUILD TESTS

const colorTest: AiTestDefinition = 
  defineAiTest("color-question", {
    cases: {
        prompt: "What color is Big Bird?",
        match: "yellow" 
    },
    maxPrice: 0.01,  
    // Max cost per one million tokens
    minAccuracy: 0.90, 
    // Minimum of 90% accuracy
    maxLatency: 500,  
    // Max response time (ms)
    priority: "accuracy",
    // Prioritize accuracy to time & $
    retries: 3  // Model sample size
});

3. RUN TESTS

// Using API keys 
// For data persistence
// Running tests against all models
await runAiTests(
  [colorTest, mathTests], 
  apiKey: process.env.RADAR_API_KEY,
);
// Test reliability ensured:
// Multi-model testing (25+ providers)
// Auto-retries with configurable count
// String/numeric validation
// Specialized NLP comparison functions
// CI/CD integration via GitHub Actions

4. GET TOP MODEL IN PROD

// On app initialization:
// Get all best models at once
const models = initializeModels({
  apiKey: process.env.AI_TEST_API_KEY,
  tests: ["color-question", "math"]
});
// In production - no await needed
const colorModel = models.get("color-question");
// { primary: "gpt-4o-mini", 
//   secondary: "claude-3.5-haiku" }
// Use the optimal model directly
const response = getAIResponse(
  "What color is Big Bird?",
  colorModel.primary
);
// yellow

KEY CAPABILITIES

Purpose-built for developers navigating expensive AI models and complex prompt engineering in a rapidly evolving landscape

MULTI-MODEL TESTING

Test against 25+ models simultaneously, including GPT-4, Claude, Gemini, Llama and more.

COST SAVINGS

Pay for models most efficiently for the standards that you need, not that you know of.

QUALITY MONITORING

Quality checks detect model drift and automatically switches to better models when needed.

SIMPLE INTEGRATION

Deploy in minutes like any other unit testing framework.

ALL MAJOR PROVIDERS

One API to rule them all. Connect to OpenAI, Anthropic, Google, Together, and even custom/self-hosted endpoints.

ADVANCED TESTING

Define expected outputs with exact match, regex, embeddings, or custom validation functions.

MODEL SELECTION MATRIX

Set thresholds for cost, accuracy, and latency - then optimize for what matters most

OPTIMIZATION DASHBOARD

TEST SUITE: TEXT CLASSIFICATION

SET YOUR THRESHOLDS

ACCURACY MINIMUM90%

MAX COST (PER 1K TOKENS)$0.005

MAX LATENCY500ms

PRIORITIZE

COST

ACCURACY

LATENCY

RECOMMENDED MODELS

CLAUDE-3-HAIKUPRIMARY

ACCURACY

93.6%

COST

$0.00025

LATENCY

230ms

GEMINI-1.5-PROBACKUP

ACCURACY

92.1%

COST

$0.0007

LATENCY

320ms

METRICS AVAILABLE:

String MatchNumeric Comparison% DifferenceSemantic Similarity+ More NLP Metrics

AUTO-SWITCHING ENABLED

PROJECTED YEARLY SAVINGS: $X

Ready to use the all-in-one prompt testing suite?

FIND THE OPTIMAL AI MODELAI MODEL FOR EVERY TASKTASK

SIMPLE IMPLEMENTATION

KEY CAPABILITIES

MULTI-MODEL TESTING

COST SAVINGS

QUALITY MONITORING

SIMPLE INTEGRATION

ALL MAJOR PROVIDERS

ADVANCED TESTING

MODEL SELECTION MATRIX

START OPTIMIZING TODAY