In this post, we will show how to specify and verify imperative programs in Lean 4 using Velvet—an embedded verifier, which relies on a combination of automated symbolic and AI-assisted theorem proving techniques.

Disclaimer: At the time of writing, Velvet is distributed as part of the Loom framework. Velvet is currently under active development by VERSE Lab, as we are working to improve its expressivity and performance. It’s likely that its codebase will soon be moved to a separate repository. Nevertheless, the code linked in this post will remain accessible.

Formal program verification is about telling what a program should do without telling how it should do it and then mathematically proving that the program indeed does exactly that. The what is given by a program specification—a logical statement that describes the assumptions on the program’s input and the properties that should hold true about its outcomes.1

Getting Started: Specifying and Verifying Functional Programs

The Lean 4 theorem prover allows one to write a program and also to state its specification in the form of a mathematical theorem. For instance, the following code fragment shows a function append that concatenates two lists of integers and a theorem append_assoc that states and proves the function’s associativity (meaning that the result of concatenating three lists does not depend on the order in which we perform concatenation—only on the position of the arguments).

def append (xs ys : List Int) : List Int :=
  match xs with
  | []      => ys
  | x :: xs => x :: append xs ys

theorem append_assoc (xs ys zs : List Int) : 
    append (append xs ys) zs = append xs (append ys zs) := by
  induction xs with
  | nil => rfl
  | cons x xs ih => simp [append, ih]  

The proof of append_assoc is done by induction on the shape of the first argument of the append function (i.e., the list “on the left side” of the concatenation), and its details are not that important for now: it suffices to say that it closely mimics the paper-and-pencil argument that argues for the validity of the theorem’s statement by considering two different ways lists can be constructed.

The program append is written in a functional style, in which the result of a program is determined solely by its parameters, and all functions always terminate. Thanks to these restrictions, functional programs are known to be particularly well-suited for formal verification, with simple theorems featuring relatively natural proofs—just as we’ve seen above. Unfortunately, such mathematically “pure” functional programming makes it non-trivial to express so-called imperative features of common programming languages (and even pseudocode languages used in common textbooks for presenting algorithms): potentially non-terminating loops, exceptions, mutable variables, and randomness.

Velvet addresses this gap by providing support for imperative programming within Lean’s verification ecosystem. It is not the only existing verifier for imperative programs—many great tools exist to do exactly that, including Dafny, Verus, and Prusti, to mention just a few. None of those tools, however, allow one to use Lean as a way to orchestrate their verification, which is a unique feature of Velvet—by virtue of it being embedded in Lean. To put it differently, Velvet is a Lean library rather than a standalone tool.

Imperative Programming in Velvet

The code accompanying the rest of the post can be found in this file.

Let’s start our tour of Velvet by implementing in it a simple program that checks whether a given natural number is not prime:

method IsNonPrime (n: Nat) return (result: Bool)
  do
    if n  1 then
      return true
    let mut i: Nat := 2
    let mut ret: Bool := false
    while i * i  n
    invariant true
    do
      if n % i = 0 then
        ret := true
      i := i + 1
    return ret

Most of the code of IsNonPrime should be self-explanatory, so let us highlight only unusual parts. First, its result is explicitly named in its signature (as result, but one can use any arbitrary name different from parameters) for the reasons that will soon become clear. Similarly to Python, the scoping of the code is determined by its offset, and the do keyword starts a new code block. All local variables (introduced using let) and function parameters are immutable by default unless explicitly marked as mut. In the while-loop, right after the condition we can see the invariant true annotation. For now, it serves no particular purpose except for making the parser happy, so let’s not worry about it and just think of it as a piece of unavoidable boilerplate.

We can immediately test our function by running it and checking its result in VSCode’s Lean InfoView. For instance, executing

#eval (IsNonPrime 42).extract

results in true (42 is indeed non-prime), while running

#eval (IsNonPrime 239).extract

produces the result false, just as expected.

Specifying a Program in Velvet

As the next step, we move from tests, which can only show that a program behaves as expected on specific inputs, to formal specifications stating that a program always does what it’s supposed to do. Despite its tiny size and simplicity, IsNonPrime is surprisingly tricky to specify formally, as it relies on the definition of primality of natural numbers. Thanks to the rich vocabulary of Lean specifications and programming mechanisms, one can define primality in multiple different ways. We are going to do it by first writing a mathematical function that returns the number of divisors of a natural number n:

def countDivisors (n: Nat) : Nat :=
  (List.range (n + 1)).filter (fun d => d > 0  n % d = 0) |>.length

In plain words, countDivisors first constructs a list of numbers from 0 to n, then keeps only those that are strictly positive and are divisors of n; finally, it returns the size of the resulting list. Indeed, for any prime number countDivisors should return 2: counting 1 and the number itself, i.e., only the trivial divisors. We are going to adopt this as a definition of a prime number, defined in Lean as the following predicate:

def isPrime (n: Nat) : Prop :=
  n > 1  countDivisors n = 2

We are now ready to ascribe a formal specification to IsNonPrime, which we can do by adding a logical statement that starts with ensures right after the function’s signature:

method IsNonPrime (n: Nat) return (result: Bool)
  ensures result  ¬isPrime n
  do
    if n  1 then
      return true
    let mut i: Nat := 2
    let mut ret: Bool := false
    while i * i  n
    invariant true
    do
      if n % i = 0 then
        ret := true
      i := i + 1
    return ret

The added statement is commonly called a postcondition: it postulates what should hold true about the program’s result at the end of its execution (and that’s why we had to give an explicit name to the result!).2 In our example, the postcondition asserts that the function should return true if and only if (denoted as in Lean) its result is not prime (¬isPrime).

Let us now go ahead and try to verify that the desired property does indeed hold for any input n passed to IsNonPrime. This can be done by adding the following command to the file after the function definition:3

prove_correct IsNonPrime by
  loom_solve!

Sadly, this does not immediately verify our program. Running loom_solve! proof tactic, however, provides us with a very helpful piece of information: a Lean proof context shown in VSCode InfoView, which states what exactly could not be proven about the program:

n i : ℕ
result : Bool
if_neg : 1 < n
done_1 : n < i * i
⊢ result = true ↔ ¬isPrime n

In short, this is what remains to be proven: when the program’s input n is strictly larger than 1 (indicated by the hypothesis if_neg), the returned result matches our specification.

What about the case when n is 0 or 1? In fact, this case, corresponding to taking the then-branch in the program’s body, was indeed proven by loom_solve!, and this happened automatically! To see this, let us step back and ask Velvet for all facts that need to be proven to establish the desired postcondition of IsNonPrime. This can be done by adding the following line right above prove_correct ...:

set_option loom.solver "custom"

Doing so makes prove_correct produce the following collection of Lean statements:

-- Goal 1
n : ℕ
if_pos : n ≤ 1
⊢ ¬isPrime n

-- Goal 2
n : ℕ
if_neg : ¬n ≤ 1
i : ℕ
ret : Bool
if_neg_1 : ¬i * i ≤ n
⊢ ¬i * i ≤ n

-- Goal 3
n i : ℕ
ret : Bool
if_neg : 1 < n
done_1 : n < i * i
⊢ ret = true ↔ ¬isPrime n

We have already seen the third statement, so where did the first two come from? Looking closely, one can notice that the first one corresponds to the fact that needs to be proven to justify IsNonPrime returning true for n ≤ 1. The correctness of this statement follows immediately from the definition of isPrime, which we have defined above. The second goal comes from the requirement, imposed by Velvet, to show that the loop condition (in this case, i * i ≤ n) is indeed negated at the end of the loop execution, and it holds trivially. Both goals are proven by loom_solve! automatically using a combination of SMT solvers (such as Z3 and cvc5) and Lean’s own proof automation (most prominently, the grind and aesop tactics).

Multi-Modal Verification

The demonstrated ability to automatically prove some of the facts required to verify a program, while leaving some others open for an interactive proof is one of the key advantages of Velvet compared to state-of-the-art program verifiers, which typically provide only an automated or only an interactive verification mode. The former ones suffer from the lack of debugging information when a proof fails, while the latter make most of the proofs (even of trivial facts) extremely laborious. Velvet combines the strengths of both modes, thus, providing a multi-modal verification experience.

Dealing with Unbounded Executions: Loop Invariants

To prove the only interesting statement about IsNonPrime, we will have to do a bit more work and provide so-called loop invariants—assertions such that (1) they hold right before the start of the loop’s execution, (2) if they hold true at the start of a loop iteration, they also hold true at the end of it, and (3) when combined, they allow deriving the program’s postcondition. From the mathematician’s perspective, loop invariants are very similar to an induction hypothesis, which also needs to hold in the base case and must be re-established for the induction step. In this case, (1) is the equivalent of proving the base case, (2) is the induction step, and (3) is using the statement proven by induction (i.e., the invariant) to prove the desired fact. So, if you are familiar with proofs by induction, you can simply think of a combination of inductive loop invariants as an induction hypothesis necessary to prove that certain facts are true about our program’s state no matter how many iterations the loop makes.

Sadly, there is no algorithm to reliably infer loop invariants for any program and its provable postcondition—a fact that directly follows from Rice’s Theorem, which states that any non-trivial semantic property of programs (e.g., the existence of their loop invariants for a given postcondition) is undecidable. However, invariants can frequently be conjectured by using our understanding of what does the loop do (yes, you can also try to do it using your favourite AI system) and then verified using formal proof techniques to satisfy the requirements (1)-(3) listed above. This is what we are going to do for now. Let us change invariant true in the program, so it will look as follows:

method IsNonPrime (n: Nat) return (result: Bool)
  ensures result  ¬isPrime n
  do
    if n  1 then
      return true
    let mut i: Nat := 2
    let mut ret: Bool := false
    while i * i  n
    invariant 2  i
    invariant (ret = false   d, 2  d  d < i  n % d  0)
    invariant (i - 1) * (i - 1)  n
    do
      if n % i = 0 then
        ret := true
      i := i + 1
    return ret

The first invariant states that at the start and at the end of each loop iteration, the variable i is at least 2. The second one is the most interesting: it states that the variable ret is false if and only if there are no divisors of n between 2 and i. Finally, the third invariant simply states that the square of i - 1 is not larger than n. Because of the requirement (2), adding these invariants made the job of our verifier quite a bit harder: now we need to prove that each of these invariants is preserved by a loop iteration. As a result, if we once again add

set_option loom.solver "custom"

in front of our proof script (starting with prove_correct IsNonPrime by), we will see that loom_solve! leaves a whopping 15 facts to prove! The good news is that most of them are no match for Lean’s proof automation, and can be solved without any involvement required from our side. So let us comment out the option set_option loom.solver "custom" and run this script instead:

prove_correct IsNonPrime by
  loom_solve; try simp_all

Now it leaves us with just a single fact to prove, which looks strikingly similar to the one we have struggled with before:

n i : 
ret : Bool
i_1 : 
ret_1 : Bool
if_neg : 1 < n
invariant_1 : 2  i_1
invariant_2 : ret_1 = false   (d : ), 2  d  d < i_1  ¬n % d = 0
invariant_3 : (i_1 - 1) * (i_1 - 1)  n
done_1 : n < i_1 * i_1
i_2 : i = i_1  ret = ret_1
 ret_1 = true  ¬isPrime n

This time, however, we are in a much better situation, as, thanks to our (or rather Velvet’s and Lean’s) hard work, we have the three invariants (e.g., invariant_1, etc) available to us as assumptions, meaning we can use them to verify the much desired fact about the function’s result.

Unleashing AI-Powered Proof Automation

For now, the only thing that stands between us and proving correctness of IsNonPrime with regard to its specification is the statement above that, roughly, states that a number n is prime if and only if the number of its divisors between 2 and its discrete square root (i_1) is exactly zero. While somewhat obvious from our understanding of mathematics of division, this fact is by far non-trivial—mostly because we talk about enumerating all potential divisors not between 2 and $n$ but between 2 and $\sqrt{n}$. Since this requires number-theoretic reasoning beyond what SMT solvers or Lean’s grind tactic handle well, it’s time to bring in AI-powered proof automation.

First, let us “hoist” the statement we are willing to prove as a separate theorem called remaining_goal (any name will do), whose proof is omitted for now via Lean’s sorry keyword:

theorem remaining_goal
(n : )
(i : )
(ret : Bool)
(i_1 : )
(ret_1 : Bool)
(if_neg : 1 < n)
(invariant_1 : 2  i_1)
(invariant_2 : ret_1 = false   (d : ), 2  d  d < i_1  ¬n % d = 0)
(invariant_3 : (i_1 - 1) * (i_1 - 1)  n)
(done_1 : n < i_1 * i_1)
(i_2 : i = i_1  ret = ret_1)
: ret_1 = true  ¬isPrime n :=
  by sorry

With this theorem, the verification of IsNonPrime can be accomplished simply via the following script, which makes use of remaining_goal:

prove_correct IsNonPrime by
  loom_solve; try simp_all
  apply (remaining_goal n i ret i_1 ret_1 if_neg invariant_1 invariant_2 invariant_3 done_1 i_2)

Thanks to the multi-modal nature of Velvet, we could prove remaining_goal manually by writing a Lean proof script, as its statement is, in fact, true. Here, however, we will take advantage of existing AI-based proof automation systems.

In this experiment, which took place in early January 2026, I’ve tried to use Claude Code and Harmonic’s Aristotle. In case of Claude Code, I simply asked to complete all the proofs in the file, making sure there are no sorrys and that the file compiles. For Aristotle, I constructed a file that only contained the theorem and the definitions required by it, doing a quick clean-up from Velvet’s specific libraries, as those are not necessary for the proof at this point and might prevent the system from accepting the file. Each of the two systems was able to accomplish the proof in approximately 20 minutes.

The resulting proofs of this theorem are highly non-idiomatic and I won’t be posting them here to spare the reader’s eyes. In case you’re still curious, the Lean development accompanying this post, including the proof produced by Aristotle, can be found by this link. The repository also contains a number of other curious examples in Velvet, including a highly non-trivial correctness proof of a memory allocator, which was offered as a problem in the recent Theorem Proving Competition held in Wuhan in November 2025.

Concluding Remarks

In this post, we’ve had a brief introduction to the principles of specification and verification of imperative programs in Velvet—a multi-modal program verifier implemented as a library on top of Lean. Velvet combines automated and interactive reasoning modes, attempting to discharge the majority of proof obligations needed for program correctness using existing symbolic automation techniques (such as SMT solvers and Lean’s grind tactic), leaving “complex” facts to be proven by other means, such as writing a proof script manually. In the latter case, present-day AI systems, such as Claude Code and Aristotle, prove to be quite capable of completing the proofs involving mathematical statements with close to no human intervention. This combination of symbolic automation, interactive theorem proving, and AI assistance hints at a future where rigorous program verification becomes accessible to a much wider audience of non-experts.

Further Reading

To learn more about Velvet’s capabilities, check out this short paper, which provides a tutorial-style introduction to its features. If you are curious about semantic foundations for engineering multi-modal verifiers on top of Lean and are comfortable with programming language meta-theory, you might be interested in checking out our recently published POPL’26 paper.

  1. There are, indeed, multiple possible styles to describe a desired program’s behaviour. For instance, a specification can be a standalone mathematical theorem that describes certain properties of a program, for instance, associativity of the String.concat operation. To keep things simple, in this post, we will be following one of the most widely accepted ones: by means of Floyd-Hoare Logic, specifying programs using pre- and postconditions. 

  2. The postcondition on its own does not require a program to always terminate. In fact, in a commonly-accepted way to reason about programs with loops, known as partial correctness, a never-terminating program would satisfy any postcondition, which might be a bit counter-intuitive. A stronger notion of total correctness guarantees that a program always terminates on a suitably constrained class of input, at the expense of proving more facts about the program. In this post, we will stick with partial correctness, leaving the total one for future discussions. 

  3. Many of the tactics used for writing proofs in Velvet have the loom_ prefix. This is because Velvet is, in fact, built on top of a more general Lean library called Loom. You can read more about Loom in this paper