How to Write Natural Language Test Tutorial

April 26, 2026

Most QA engineers spend more time fixing broken selectors than actually testing. A UI redesign ships, XPath locators break, and the test suite goes red before the feature ever gets a real look. Natural language testing solves this by replacing brittle selector logic with plain English descriptions of what a user does.

This is a practical write natural language test tutorial. You will learn what a well-formed natural language test looks like, how to structure one from scratch, what makes tests fail even in plain English, and how tools like Autosana execute those descriptions against real mobile apps and websites without any code.

The barrier to entry is genuinely low. If you can describe a user flow to a colleague, you can write a natural language test. The skill is in writing descriptions that are specific enough to be unambiguous and structured enough to be repeatable.

#01What a natural language test actually is

A natural language test is not a comment block above a Selenium script. It is not a Gherkin feature file that still requires step definitions underneath. It is a plain English description of a user flow that an AI agent reads, interprets, and executes directly against a running application.

The distinction matters. Gherkin-based BDD looks readable on the surface, but every "Given / When / Then" line maps to a coded step definition. Change the UI, the step definition breaks. You are still maintaining code, just indirectly.

True natural language test automation, as platforms like Autosana practice it, means the AI agent receives a sentence like "Log in with test@example.com and verify the home screen loads" and executes that instruction by understanding what "log in" means in context. A transformer model plans the action sequence. Computer vision identifies the relevant UI elements. A feedback loop retries or adapts if the first attempt hits an unexpected state.

The result is a test that does not reference a single CSS selector, XPath expression, or element ID. When the UI changes, the test agent re-reasons from the description rather than breaking on a stale locator. That is the mechanism behind self-healing tests that teams keep hearing about.

Natural language tests are not automatically good tests, though. Vague descriptions produce vague results. "Check that checkout works" is a natural language test. It is also nearly useless. The craft in this tutorial is writing descriptions that are specific without being fragile.

#02The anatomy of a good natural language test

Every strong natural language test has three parts: a setup condition, a sequence of actions, and a verifiable outcome.

Setup condition: what state should the app be in before the test begins? Is the user logged in or logged out? Is there existing data in the cart? Is a feature flag enabled? Skipping setup is the most common mistake beginners make. The AI agent cannot guess that your checkout test assumes a non-empty cart.

Action sequence: what does the user actually do? Write this as a series of discrete steps. Each step should describe one interaction. "Open the app, navigate to the product page, add the item to the cart, proceed to checkout, enter the test credit card details, and submit the order" is a sequence. "Use the checkout flow" is not.

Verifiable outcome: what should be true at the end? Name the specific screen, message, or state. "Verify the order confirmation screen displays with order number" is a valid assertion. "Verify it worked" is not.

Here is a concrete before and after example:

Weak test: "Test that the login works."

Strong test: "Navigate to the login screen. Enter 'qa@example.com' in the email field and 'TestPass123' in the password field. Tap the Sign In button. Verify the home dashboard loads with the user's first name visible in the top-right corner."

The strong version gives the AI agent a clear starting point, specific inputs, and a named assertion. It will produce consistent results across runs. The weak version will produce inconsistent interpretations depending on what the agent decides to check.

For mobile apps, also specify the platform context where it matters. "On the Android build, tap the floating action button" is more reliable than hoping the agent picks the right element on the right platform.

#03Step-by-step: write your first natural language test

Follow this sequence the first time you write natural language tests for a real application.

Step 1: Pick one critical user flow. Do not start with an edge case. Start with the flow your users hit most often. For an e-commerce app, that is product discovery to purchase. For a SaaS tool, that is onboarding to first meaningful action. One flow, end to end.

Step 2: Write the user journey in prose first. Forget test syntax entirely. Write a paragraph describing what a new user does. "She opens the app, sees the login screen, enters her email and password, taps sign in, and lands on the dashboard." This gives you the raw material.

Step 3: Break the prose into discrete steps. Each sentence in your paragraph becomes one test step. "Open the app" is step one. "Tap the email field and enter 'user@test.com'" is step two. Keep each step to one action.

Step 4: Add your assertions explicitly. After each meaningful action, add a verification step. Do not assume the agent will verify things automatically. "Verify the dashboard screen is visible" is a step, not an afterthought.

Step 5: Add setup and teardown instructions. Tell the system what state the app should be in before the test runs. If you are using a tool like Autosana, you can configure this via hooks: a cURL request to create a test user, a script to reset the database, or an app launch configuration for mobile. Setup failures cause test failures that have nothing to do with the feature being tested.

Step 6: Run it once and review the screenshots. Autosana provides screenshots at every step and a full session replay of each execution. Look at what the agent actually did. If step three produced an unexpected result, your description was ambiguous. Refine it.

This cycle takes about 20 minutes for a first test. After writing ten tests, you will write them in five.

For a closer look at how this applies to mobile specifically, see the Natural Language iOS Testing: A Practical Guide.

#04Mistakes that break natural language tests

Natural language tests break for different reasons than selector-based tests, but they do break. Here are the failure patterns to avoid.

Ambiguous element references. "Tap the button" fails when there are four buttons on screen. "Tap the blue 'Continue' button in the bottom navigation bar" does not. Give the agent enough context to locate the right element without guessing.

Implicit state assumptions. If your test says "add the item to the cart" but does not specify which item or which product page to start on, the agent will make a choice. That choice may not match your intent. Be explicit about starting state.

Missing assertions. A test that only describes actions and never verifies outcomes will pass even when the feature is broken. Every test needs at least one assertion that would fail if the feature stopped working.

Over-specifying visual details. Natural language tests are not immune to brittleness. If you write "tap the red 500px-wide button with the text 'Submit' at coordinates 320, 780", you have recreated selector fragility in sentence form. Describe behavior and intent, not pixel positions.

Testing too much in one test. A test that covers signup, onboarding, product discovery, cart, checkout, and order confirmation in a single flow is hard to debug and slow to run. Split it into focused tests. One test per meaningful user goal.

The flaky test prevention research on this topic is clear: most test instability comes from environment setup problems and over-coupled test flows, not from the testing tool itself. Natural language tests have the same vulnerability.

Not reviewing results visually. The biggest advantage of a tool like Autosana is the screenshot-per-step and session replay output. Skipping that review means you miss the cases where the agent completed the test but took a wrong path to get there.

#05Who should write natural language tests (and who else can)

The traditional answer is QA engineers. The better answer is anyone who understands the user flow.

Product managers know what the acceptance criteria are. They can write natural language tests directly from user stories without a translation layer. A PM who writes "When a user with a free plan tries to access the premium report, verify that the upgrade prompt modal appears" has written a valid test. No QA intermediary required.

Manual testers who have been writing test cases in spreadsheets for years are already writing natural language tests. They just have not been running them automatically. The mental model is the same. The output is now executable.

Developers on fast-moving teams can write tests as part of a feature PR rather than waiting for QA to catch up. QA automation for startups with small teams especially benefits from this model: one engineer can cover a feature end-to-end without blocking a separate QA cycle.

One caveat: whoever writes the test needs to understand the expected behavior of the feature. Natural language tests are not a substitute for product knowledge. A vague test description from someone who does not understand the flow produces a vague test.

Autosana's positioning here is direct. The platform is built for mobile app development teams and QA engineers, but the natural language interface means non-developers can contribute tests without learning XPath or writing code. That expands test coverage without expanding QA headcount.

For teams exploring the comparison of selector-based vs intent-based testing, the productivity difference is measurable: teams report writing tests ten times faster when switching from code-based to natural language approaches (e2eAgent.io, 2026).

#06Integrating natural language tests into your CI/CD pipeline

Writing tests once and running them manually defeats the purpose. The value of natural language test automation is that you write it once and it runs on every build, every merge, every deployment.

Autosana supports CI/CD integration with GitHub Actions, Fastlane, and Expo EAS. The setup pattern is the same regardless of the pipeline: trigger the test suite on push or pull request, pass the build artifact (an .apk for Android or .app simulator build for iOS), and receive results via Slack or email when the run completes.

For web testing, you do not need a build file at all. Enter the URL and the test suite runs against it directly.

A practical integration pattern for mobile teams:

On pull request open: run the smoke test suite against the feature branch build. Five to ten tests covering the most critical user flows.
On merge to main: run the full regression suite against the staging environment.
On release candidate tag: run the full suite against the production build before deployment.

Scheduled runs add a second layer. Autosana supports automated runs at configured intervals with results delivered via Slack notifications. Running your critical path tests every six hours on staging catches environment drift before it becomes a production incident.

The hooks system handles setup and teardown at each layer. A Python script creates a test user before the suite runs. A cURL request resets the cart state between tests. A Bash script flips the feature flag for the test environment. This is where the difference between a toy testing setup and a production-grade one lives.

Teams that connect their CI/CD pipeline with natural language tests stop treating QA as a gate at the end of the release cycle. It becomes continuous verification instead.

Natural language tests are not a shortcut that sacrifices quality. Written with a clear setup, specific action steps, and explicit assertions, they produce the same coverage as code-based tests with a fraction of the maintenance cost. The tools that execute them have matured enough in 2026 that the gap between "what I wrote" and "what the agent did" is narrow enough for production use.

If you are ready to move past this tutorial and run a real test, book a demo with Autosana. Upload your iOS or Android build, write your first test in plain English exactly as this tutorial described, and review the screenshots from each step to see what the agent actually executed. That first run will tell you more about your app's test coverage than any amount of reading about natural language automation.

Frequently Asked Questions

No coding experience is required. Natural language tests are written as plain English descriptions of user behavior. You describe what a user does and what the expected outcome is, and the AI agent handles execution. Tools like Autosana are designed so that product managers, manual testers, and developers can all write tests without writing code or learning selector syntax.

Specific enough that a colleague who has never used the app could follow the instructions without guessing. Each step should describe one action with enough context to identify the right element. 'Tap the Sign In button on the login screen' is good. 'Tap the button' is not. Your assertions should name the exact screen, message, or state that signals success.

Self-healing tests adapt to UI changes automatically. Instead of storing a brittle XPath or CSS selector, the AI agent re-reasons from your plain English description every time it runs. If a button moves or gets relabeled, the agent uses context to find the right element rather than failing on a stale reference. Autosana's self-healing mechanism means teams spend less time rewriting tests after UI updates and more time testing new features.

Yes. Autosana integrates with GitHub Actions, Fastlane, and Expo EAS, so you can trigger test runs automatically on every pull request or build. Results are delivered via Slack or email. For Android, you pass an .apk build. For iOS, you pass a simulator .app build. Web tests run against a URL with no build file required.

Gherkin feature files look like plain English but each 'Given / When / Then' line maps to a coded step definition underneath. Change the UI and the step definition breaks. You are maintaining code, just one layer removed. True natural language testing means the AI agent reads your description and executes it directly, with no step definitions or selector-based code anywhere in the chain. For more on how these approaches differ, see the comparison of selector-based vs intent-based testing.

Get Started

Check out Autosana today.

Learn More →

In this article

What a natural language test actually is The anatomy of a good natural language test Step-by-step: write your first natural language test Mistakes that break natural language tests Who should write natural language tests (and who else can)Integrating natural language tests into your CI/CD pipeline FAQ

How to Write Natural Language Test Tutorial

April 26, 2026

#01What a natural language test actually is

#02The anatomy of a good natural language test

Every strong natural language test has three parts: a setup condition, a sequence of actions, and a verifiable outcome.

Here is a concrete before and after example:

Weak test: "Test that the login works."

#03Step-by-step: write your first natural language test

Follow this sequence the first time you write natural language tests for a real application.

This cycle takes about 20 minutes for a first test. After writing ten tests, you will write them in five.

For a closer look at how this applies to mobile specifically, see the Natural Language iOS Testing: A Practical Guide.

#04Mistakes that break natural language tests

Natural language tests break for different reasons than selector-based tests, but they do break. Here are the failure patterns to avoid.

#05Who should write natural language tests (and who else can)

The traditional answer is QA engineers. The better answer is anyone who understands the user flow.

#06Integrating natural language tests into your CI/CD pipeline

Writing tests once and running them manually defeats the purpose. The value of natural language test automation is that you write it once and it runs on every build, every merge, every deployment.

For web testing, you do not need a build file at all. Enter the URL and the test suite runs against it directly.

A practical integration pattern for mobile teams:

On pull request open: run the smoke test suite against the feature branch build. Five to ten tests covering the most critical user flows.
On merge to main: run the full regression suite against the staging environment.
On release candidate tag: run the full suite against the production build before deployment.

Teams that connect their CI/CD pipeline with natural language tests stop treating QA as a gate at the end of the release cycle. It becomes continuous verification instead.

Frequently Asked Questions

Get Started

Check out Autosana today.

Learn More →

In this article