Embabel

Working with LLM Reasoning / Thinking

Motivation

Sometimes you need to validate an LLM’s reasoning process in addition to obtaining a structured result.

Consider this scenario: a user wants to plan a vacation and specifies that their preferred destinations are Greece and Italy, with available travel dates in June, August, or September. They ask the LLM to find flight options with affordable tickets for a one-week stay. The LLM returns a structured Flight object containing departure and return dates, destinations, and prices. Even if the output adheres to the expected schema, you may want to verify that:

The flight dates fall within the requested months
The destinations are actually in Greece or Italy rather than somewhere else

If the flight details fall outside the user’s criteria, access to the LLM’s reasoning process helps you understand why it made those choices.

An even more important use case arises when the LLM cannot fulfill a request—for example, when it cannot create the requested object because the user’s criteria are ambiguous or contradictory. In this case, the thinking blocks explain what went wrong, even though no result was produced.

Concepts

ThinkingBlock — An abstraction that carries details about LLM reasoning, including the tag type, tag value, and reasoning text content.
ThinkingTagType — An enum defining the types of reasoning markers: TAG (XML-style tags like <think>), PREFIX (line prefixes like //THINKING:), and NO_PREFIX (untagged reasoning text before JSON output).
ThinkingResponse<T> — A response wrapper that holds both the result object and a list of ThinkingBlock instances.
ThinkingException — An exception that preserves thinking blocks when object instantiation fails, enabling debugging even in error scenarios.
thinking() — The core PromptRunner API method that enables thinking extraction.

Example: Handling Objects and Thinking Blocks

// Configure the PromptRunner with an LLM and optional tools
PromptRunner runner = ai.withDefaultLlm()  // Example uses claude-sonnet-4-5
                        .withToolObject(Tooling.class);

String prompt = """
    What is the hottest month in Florida and its average high temperature?
    Please provide a detailed analysis of your reasoning.
    """;

// Use thinking() to enable thinking block extraction
ThinkingResponse<MonthItem> response = runner
        .thinking()
        .createObject(prompt, MonthItem.class);

// Access the structured result
MonthItem result = response.getResult();

// Access the LLM's reasoning process
List<ThinkingBlock> thinkingBlocks = response.getThinkingBlocks();

// Inspect individual thinking blocks
for (ThinkingBlock block : thinkingBlocks) {
    System.out.println("Type: " + block.getTagType());   // TAG, PREFIX, or NO_PREFIX
    System.out.println("Tag: " + block.getTagValue());   // e.g., "think", "analysis"
    System.out.println("Content: " + block.getContent());
}

Example: Handling Failures Gracefully

Use createObjectIfPossible when the LLM might not be able to produce a valid result:

ThinkingResponse<MonthItem> response = runner
        .thinking()
        .createObjectIfPossible(prompt, MonthItem.class);

MonthItem result = response.getResult();
if (result != null) {
    // Process the result
} else {
    // Object creation failed—examine the reasoning to understand why
    for (ThinkingBlock block : response.getThinkingBlocks()) {
        logger.info("LLM reasoning: {}", block.getContent());
    }
}

Provider Notes

Embabel exposes thinking through a provider-neutral API:

PromptRunner.thinking()
LlmOptions.thinking

This remains the primary public API for enabling reasoning/thinking mode.

Under the hood, provider integrations may translate Thinking differently to match provider-specific capabilities. For example, Google GenAI maps Embabel thinking options to the corresponding Spring AI Google GenAI chat options such as includeThoughts and thinkingBudget.

No new application-level thinking API is required for callers. In general, existing applications should continue to use Embabel’s generic thinking API rather than provider-specific configuration.

Some providers may also define model-level defaults in model YAML, but explicit runtime thinking requests still flow through LlmOptions.thinking.

Provider behavior may differ slightly depending on how Spring AI surfaces reasoning data:

Some providers expose reasoning on the assistant message itself
Others expose it through generation metadata

As a result, the presence and shape of extracted thinking blocks may vary somewhat by provider and Spring AI integration version.