Using LLMs for automated test generation

June 15, 2026

Disclaimer
This content is generated byMeta Llama 3.3 70b, it's possible it contains mistakes.

Introduction to Automated Test Generation

When developing software, ensuring that every component functions as expected is vital, but manually creating comprehensive tests can be time-consuming and tedious. Large language models (LLMs) offer a promising solution by automatically generating unit tests, integration tests, and edge cases from code context. This approach can significantly reduce the testing burden, but it requires careful consideration of prompt patterns, validation strategies, and the role of human oversight.

Crafting Effective Prompts

To leverage LLMs for test generation, it’s essential to craft well-structured prompts that provide sufficient context for the model to understand the code’s functionality and generate relevant tests. A good prompt should include:

A clear description of the function or module under test
Relevant code snippets or documentation
Specific testing objectives, such as validation of edge cases or error handling

For example, using the @langchain/core library, you can create a prompt template like this:

import { PromptTemplate } from "@langchain/core";

const testPrompt = PromptTemplate.fromTemplate(
  "Write a unit test for the {functionName} function, which {functionDescription}. The function is defined as: {codeSnippet}.",
);

To further improve the prompt, consider adding additional context, such as:

Information about the function’s parameters and return types
Details about the expected behavior or output
Any relevant constraints or assumptions

Validation and Human Oversight

While LLMs can generate a wide range of tests, it’s crucial to validate their correctness and relevance. This is where human oversight becomes essential. Developers should review generated tests to ensure they:

Correctly cover the desired functionality
Don’t introduce unnecessary complexity or redundancy
Align with the project’s testing strategy and standards

Automated validation tools can also help verify the correctness of generated tests. For instance, you can use Jest or Mocha to run the generated tests and report any failures or errors. Additionally, consider using code analysis tools, such as linters or code formatters, to ensure the generated tests adhere to the project’s coding standards.

Edge Cases and Integration Tests

LLMs can also be used to generate integration tests and edge cases by providing additional context, such as:

API endpoints or dependencies
Specific input scenarios or corner cases
Expected output or behavior

By combining these strategies, developers can create a robust testing suite that covers a wide range of scenarios, from unit tests to integration tests and edge cases. For example:

import { PromptTemplate } from "@langchain/core";

const integrationTestPrompt = PromptTemplate.fromTemplate(
  "Write an integration test for the {functionName} function, which interacts with the {dependency} API. The function is defined as: {codeSnippet}.",
);

When generating integration tests, consider using techniques such as mocking or stubbing to isolate dependencies and ensure the tests are reliable and efficient.

Limitations and Future Directions

While LLM-generated tests can be highly effective, they may not always cover all possible scenarios, and the models can be limited by their training data. Therefore, it’s essential to continuously monitor and refine the testing process to ensure that it remains effective and accurate. This may involve:

Regularly reviewing and updating the prompt templates to ensure they remain relevant and effective
Using multiple LLMs or testing strategies to cover a wider range of scenarios
Incorporating human feedback and oversight to improve the accuracy and reliability of the generated tests

In conclusion, using LLMs for automated test generation offers a powerful way to streamline the testing process, but it requires careful consideration of prompt patterns, validation, and human oversight. By starting simple and composing patterns, developers can create a robust testing suite that ensures the reliability and quality of their software.