Testing

Like Spring, Embabel facilitates testing of user applications. The framework provides comprehensive testing support for both unit and integration testing scenarios.

Testing is critical to delivering quality software and must be considered from the outset.

Unit Testing

Unit testing in Embabel enables testing individual agent actions without involving real LLM calls.

Embabel’s design means that agents are usually POJOs that can be instantiated with fake or mock objects. Actions are methods that can be called directly with test fixtures. In additional to your domain objects, you will pass a text fixture for the Embabel OperationContext, enabling you to intercept and verify LLM calls.

The framework provides FakePromptRunner and FakeOperationContext to mock LLM interactions while allowing you to verify prompts, hyperparameters, and business logic. Alternatively you can use mock objects. Mockito is the default choice for Java; mockk for Kotlin.

Testing Prompts and Hyperparameters

Here are unit tests from the Java Agent Template and Kotlin Agent Template repositories, using Embabel fake objects:

class WriteAndReviewAgentTest {

    @Test
    void testWriteAndReviewAgent() {
        var context = FakeOperationContext.create();
        var promptRunner = (FakePromptRunner) context.promptRunner();
        context.expectResponse(new Story("One upon a time Sir Galahad . . "));

        var agent = new WriteAndReviewAgent(200, 400);
        agent.craftStory(new UserInput("Tell me a story about a brave knight", Instant.now()), context);

        String prompt = promptRunner.getLlmInvocations().getFirst().getPrompt();
        assertTrue(prompt.contains("knight"), "Expected prompt to contain 'knight'");

        var temp = promptRunner.getLlmInvocations().getFirst().getInteraction().getLlm().getTemperature();
        assertEquals(0.9, temp, 0.01,
                "Expected temperature to be 0.9: Higher for more creative output");
    }

    @Test
    void testReview() {
        var agent = new WriteAndReviewAgent(200, 400);
        var userInput = new UserInput("Tell me a story about a brave knight", Instant.now());
        var story = new Story("Once upon a time, Sir Galahad...");
        var context = FakeOperationContext.create();
        context.expectResponse("A thrilling tale of bravery and adventure!");
        agent.reviewStory(userInput, story, context);

        var promptRunner = (FakePromptRunner) context.promptRunner();
        String prompt = promptRunner.getLlmInvocations().getFirst().getPrompt();
        assertTrue(prompt.contains("knight"), "Expected review prompt to contain 'knight'");
        assertTrue(prompt.contains("review"), "Expected review prompt to contain 'review'");
    }
}

Testing the Fluent API: withId() and creating()

The FakePromptRunner fully supports the fluent API patterns used in production code, enabling comprehensive unit testing of agents that use withId() for interaction tracing and creating() for structured object creation with examples.

Testing withId() for Interaction Tracing:

The withId() method sets an interaction ID for better log tracing. In tests, you can verify the interaction ID was correctly set:

@Test
void shouldSetInteractionIdCorrectly() {
    var context = FakeOperationContext.create();
    var expectedIntent = new UserIntent("command", "Change channel names");
    context.expectResponse(expectedIntent);

    var result = context.ai()
            .withId("classify-intent")  // Set interaction ID for tracing
            .creating(UserIntent.class)
            .fromPrompt("Classify the user's intent");

    assertEquals(expectedIntent, result);

    // Verify the interaction ID was set correctly
    var interaction = context.getLlmInvocations().getFirst().getInteraction();
    assertEquals("classify-intent", interaction.getId().getValue());
}

Testing creating() with withExample():

The creating() API allows you to provide strongly-typed examples to improve LLM output quality. In tests, you can verify examples were included:

@Test
void shouldIncludeExamplesInPrompt() {
    var context = FakeOperationContext.create();
    var expectedPlan = new ChannelEditPlan(1, "Lead Vox");
    context.expectResponse(expectedPlan);

    var result = context.ai()
            .withLlm(llmSelectionService.selectOptimalLlm())
            .withId("analyze-edit-request")
            .creating(ChannelEditPlan.class)
            .withExample("Rename channel 1", new ChannelEditPlan(1, "Bass"))
            .withExample("Rename channel 2", new ChannelEditPlan(2, "Drums"))
            .fromPrompt("Analyze the edit request");

    assertEquals(expectedPlan, result);

    // Verify examples were added as prompt contributors
    var promptContributors = context.getLlmInvocations().getFirst()
            .getInteraction().getPromptContributors();
    assertTrue(promptContributors.size() >= 2, "Examples should be added as prompt contributors");
}

Using CreationExample for Reusable Examples:

For cleaner code and reusability, you can use the CreationExample data class to define examples that can be shared across tests or passed as collections:

@Test
void shouldUseCreationExampleDataClass() {
    var context = FakeOperationContext.create();
    var expectedPlan = new ChannelEditPlan(1, "Lead Vox");
    context.expectResponse(expectedPlan);

    // Create a reusable example using CreationExample
    var example = new CreationExample<>(
        "Rename channel example",
        new ChannelEditPlan(2, "Rhythm")
    );

    var result = context.ai()
            .withDefaultLlm()
            .creating(ChannelEditPlan.class)
            .withExample(example)  // Pass the CreationExample directly
            .fromPrompt("Analyze the edit request");

    assertEquals(expectedPlan, result);
}

Adding Multiple Examples with withExamples():

When you have many examples to add, use withExamples() to pass them as a list or vararg. This is especially useful when examples are loaded from a file or database:

@Test
void shouldAddMultipleExamplesFromList() {
    var context = FakeOperationContext.create();
    var expectedPlan = new ChannelEditPlan(1, "Lead Vox");
    context.expectResponse(expectedPlan);

    // Create a list of examples (could be loaded from configuration)
    var examples = List.of(
        new CreationExample<>("Rename to Bass", new ChannelEditPlan(1, "Bass")),
        new CreationExample<>("Rename to Drums", new ChannelEditPlan(2, "Drums")),
        new CreationExample<>("Rename to Keys", new ChannelEditPlan(3, "Keys")),
        new CreationExample<>("Rename to Vocals", new ChannelEditPlan(4, "Vocals"))
    );

    var result = context.ai()
            .withDefaultLlm()
            .creating(ChannelEditPlan.class)
            .withExamples(examples)  // Pass all examples at once
            .fromPrompt("Analyze the request");

    assertEquals(expectedPlan, result);

    // Verify all examples were added
    var promptContributors = context.getLlmInvocations().getFirst()
            .getInteraction().getPromptContributors();
    assertTrue(promptContributors.size() >= 4);
}

You can also use vararg syntax for inline example lists:

var result = context.ai()
        .withDefaultLlm()
        .creating(ChannelEditPlan.class)
        .withExamples(
            new CreationExample<>("Example 1", new ChannelEditPlan(1, "Bass")),
            new CreationExample<>("Example 2", new ChannelEditPlan(2, "Drums")),
            new CreationExample<>("Example 3", new ChannelEditPlan(3, "Keys"))
        )
        .fromPrompt("Analyze the request");

Full Fluent API Chain Example:

Here’s a complete example showing how to test an action that uses all the fluent API features:

@Test
void shouldTestCompleteFluentApiChain() {
    var context = FakeOperationContext.create();
    var expectedOutput = new ComplexOutput("analysis complete", 42);
    context.expectResponse(expectedOutput);

    // Production code pattern with full fluent API chain
    var result = context.ai()
            .withLlm(LlmOptions.withModel("gpt-4"))
            .withId("complex-analysis")
            .withSystemPrompt("You are an expert analyst")
            .creating(ComplexOutput.class)
            .withExample("Simple case", new ComplexOutput("basic", 1))
            .withExample("Complex case", new ComplexOutput("advanced", 100))
            .fromPrompt("Analyze the input data");

    assertEquals(expectedOutput, result);

    // Comprehensive verification
    var invocation = context.getLlmInvocations().getFirst();
    assertEquals("gpt-4", invocation.getInteraction().getLlm().getModel());
    assertEquals("complex-analysis", invocation.getInteraction().getId().getValue());
    assertTrue(invocation.getInteraction().getPromptContributors().size() >= 3); // system + 2 examples
}

Key Testing Patterns Demonstrated

Testing Prompt Content:

  • Use context.getLlmInvocations().getFirst().getPrompt() to get the actual prompt sent to the LLM
  • Verify that key domain data is properly included in the prompt using assertTrue(prompt.contains(...))

Testing Tools:

  • Access tools via getInteraction().getTools() to verify tools added via withToolObject() or withTool()
  • Access tool group requirements via getInteraction().getToolGroups() to verify named tool group requirements added via withToolGroup(ToolGroupRequirement)

Testing with Spring Dependencies:

  • Mock Spring-injected services like HoroscopeService using standard mocking frameworks - Pass mocked dependencies to agent constructor for isolated unit testing

Testing Multiple LLM Interactions

@Test
void shouldHandleMultipleLlmInteractions() {
    // Arrange
    var input = new UserInput("Write about space exploration");
    var story = new Story("The astronaut gazed at Earth...");
    ReviewedStory review = new ReviewedStory("Compelling narrative with vivid imagery.");

    // Set up expected responses in order
    context.expectResponse(story);
    context.expectResponse(review);

    // Act
    var writtenStory = agent.writeStory(input, context);
    ReviewedStory reviewedStory = agent.reviewStory(writtenStory, context);

    // Assert
    assertEquals(story, writtenStory);
    assertEquals(review, reviewedStory);

    // Verify both LLM calls were made
    List<LlmInvocation> invocations = context.getLlmInvocations();
    assertEquals(2, invocations.size());

    // Verify first call (writer)
    var writerCall = invocations.get(0);
    assertEquals(0.8, writerCall.getInteraction().getLlm().getTemperature(), 0.01);

    // Verify second call (reviewer)
    var reviewerCall = invocations.get(1);
    assertEquals(0.2, reviewerCall.getInteraction().getLlm().getTemperature(), 0.01);
}

You can also use Mockito or mockk directly. Consider this component, using direct injection of Ai:

@Component
public record InjectedComponent(Ai ai) {

    public record Joke(String leadup, String punchline) {
    }

    public String tellJokeAbout(String topic) {
        return ai
                .withDefaultLlm()
                .generateText("Tell me a joke about " + topic);
    }
}

A unit test using Mockito (Java) or mockk (Kotlin) to verify prompt and hyperparameters:

class InjectedComponentTest {

    @Test
    void testTellJokeAbout() {
        var mockAi = Mockito.mock(Ai.class);
        var mockPromptRunner = Mockito.mock(PromptRunner.class);

        var prompt = "Tell me a joke about frogs";
        // Yep, an LLM came up with this joke.
        var terribleJoke = """
                Why don't frogs ever pay for drinks?
                Because they always have a tadpole in their wallet!
                """;
        when(mockAi.withDefaultLlm()).thenReturn(mockPromptRunner);
        when(mockPromptRunner.generateText(prompt)).thenReturn(terribleJoke);

        var injectedComponent = new InjectedComponent(mockAi);
        var joke = injectedComponent.tellJokeAbout("frogs");

        assertEquals(terribleJoke, joke);
        Mockito.verify(mockAi).withDefaultLlm();
        Mockito.verify(mockPromptRunner).generateText(prompt);
    }

}

Integration Testing

Integration testing exercises complete agent workflows with real or mock external services while still avoiding actual LLM calls for predictability and speed.

This can ensure:

  • Agents are picked up by the agent platform
  • Data flow is correct within agents
  • Failure scenarios are handled gracefully
  • Agents interact correctly with each other and external systems
  • The overall workflow behaves as expected
  • LLM prompts and hyperparameters are correctly configured

Embabel integration testing is built on top of Spring’s excellent integration testing support, thus allowing you to work with real databases if you wish. Spring’s integration with Testcontainers is particularly userul.

Using EmbabelMockitoIntegrationTest

Embabel provides EmbabelMockitoIntegrationTest as a base class that simplifies integration testing with convenient helper methods:

/**
 * Use framework superclass to test the complete workflow of writing and reviewing a story.
 * This will run under Spring Boot against an AgentPlatform instance
 * that has loaded all our agents.
 */
class StoryWriterIntegrationTest extends EmbabelMockitoIntegrationTest {

    @Test
    void shouldExecuteCompleteWorkflow() {
        var input = new UserInput("Write about artificial intelligence");

        var story = new Story("AI will transform our world...");
        var reviewedStory = new ReviewedStory(story, "Excellent exploration of AI themes.", Personas.REVIEWER);

        whenCreateObject(contains("Craft a short story"), Story.class)
                .thenReturn(story);

        // The second call uses generateText
        whenGenerateText(contains("You will be given a short story to review"))
                .thenReturn(reviewedStory.review());

        var invocation = AgentInvocation.create(agentPlatform, ReviewedStory.class);
        var reviewedStoryResult = invocation.invoke(input);

        assertNotNull(reviewedStoryResult);
        assertTrue(reviewedStoryResult.getContent().contains(story.text()),
                "Expected story content to be present: " + reviewedStoryResult.getContent());
        assertEquals(reviewedStory, reviewedStoryResult,
                "Expected review to match: " + reviewedStoryResult);

        verifyCreateObjectMatching(prompt -> prompt.contains("Craft a short story"), Story.class,
                llm -> llm.getLlm().getTemperature() == 0.9 && llm.getToolGroups().isEmpty());
        verifyGenerateTextMatching(prompt -> prompt.contains("You will be given a short story to review"));
        verifyNoMoreInteractions();
    }
}

Key Integration Testing Features

Base Class Benefits:

  • EmbabelMockitoIntegrationTest handles Spring Boot setup and LLM mocking automatically
  • Provides agentPlatform and llmOperations pre-configured
  • Includes helper methods for common testing patterns

Convenient Stubbing Methods:

  • whenCreateObject(prompt, outputClass): Mock object creation calls
  • whenGenerateText(prompt): Mock text generation calls
  • Support for both exact prompts and contains() matching
  • Supports streaming calls by calling supportsStreaming(true) in test setup.

Advanced Verification:

  • verifyCreateObjectMatching(): Verify prompts with custom matchers
  • verifyGenerateTextMatching(): Verify text generation calls
  • verifyNoMoreInteractions(): Ensure no unexpected LLM calls

LLM Configuration Testing:

  • Verify temperature settings: llm.getLlm().getTemperature() == 0.9
  • Check tool groups: llm.getToolGroups().isEmpty()
  • Validate persona and other LLM options

Was this page helpful?

Share