Liveness Proofs in Veil, Part I: The First Step

Safety property means “nothing bad happens during the run of a program”; liveness property means “the program eventually does something good”. In this post, we walk through a simple proof of a liveness property in Veil, using a basic consensus protocol as an example.

This will be the first post in this series where we will explore a collection of techniques for proving liveness properties of distributed protocols, with a focus on how these proofs can be carried out in Veil, a verifier for distributed protocols in Lean. If you want to know more about Veil, start by reading this blog post.

The purpose of this post is to show that liveness is within comfortable reach of a Lean-based verifier, not just safety. Many mechanised protocol developments prove safety in full and then leave liveness as future work, to be proven, fittingly, eventually. This series sets out to do that future work. We begin here with the smallest example we could find—deliberately so, to keep the protocol out of the way and put the liveness argument in full view—and follow it end to end, from an informal argument to a machine-checked proof. The argument is what carries forward; later posts keep it and grow the protocol.

Introduction

Safety properties are a central part of how we specify and reason about distributed protocols. A consensus protocol, for example, should not allow two different values to be chosen. A termination detector should not announce termination when not all processes have terminated. A mutual exclusion protocol should not let two processes enter the critical section at the same time. These are all safety properties: they say that something bad never happens.

But many correctness properties ask for more: they require the system to make progress. This brings us to liveness properties that require answering the question:

Does something good eventually happen?¹

For example, does a consensus protocol eventually choose a value? Does a termination detector eventually report termination once all processes have terminated? Does a waiting process eventually enter the critical section?

In this post, we will make this question concrete using a minimal example from the TLA+ examples repository. The example is small enough that the liveness argument fits on one page, but it already contains the essential ingredients: an action that can make progress, a fairness assumption that prevents the system from ignoring that action forever, and a temporal property saying that “something good will eventually happen.” We will use this example to see how liveness can be formalized using the notations from TLA, the Temporal Logic of Actions.² Then we will prove the property using a standard proof rule for reasoning from weak fairness to progress.

Running Example: One-Step Consensus

TLA+ has been the de-facto language and framework of distributed protocol specification for more than three decades, and it remains the natural baseline against which any new tool for verifying such protocols should be measured. We will therefore use a small example from the TLA+ examples repository throughout this post. The example is an idealised abstraction of consensus, not a real protocol: it has no processes, messages, quorums, or failures, only the effect a consensus protocol is supposed to provide—eventually some value is chosen, and at most one value is ever chosen.³

CONSTANT Value

VARIABLE chosen

Init == chosen = {}

Next == /\ chosen = {}
        /\ \E v \in Value : chosen' = {v}

TypeOK == /\ chosen \subseteq Value
          /\ IsFiniteSet(chosen) 

Inv ==  /\ TypeOK
        /\ Cardinality(chosen) \leq 1

ASSUME ValueNonempty == Value # {}

Success == <>(chosen # {})

Spec == Init /\ [][Next]_chosen

LiveSpec == Spec /\ WF_chosen(Next)

The state of the model is a single set chosen ⊆ Value. Initially chosen = {}. The only transition Next is enabled while chosen is empty, and replaces it by {v} for some v ∈ Value (here, \E is existential quantification, and chosen' is the value of chosen in the next state). In other words, the protocol can take exactly one meaningful step (i.e., the first step), choosing a value, after which nothing further happens; this single step is precisely the “something good” whose inevitability we are going to prove. The safety invariant Inv says, modulo the type information TypeOK, that Cardinality(chosen) ≤ 1—the agreement property of this tiny model. We need the assumption that Value is nonempty (ASSUME ValueNonempty) for liveness. The remaining lines—Success, Spec, and LiveSpec—live at the temporal level, which we unpack next.

Encoding Temporal Properties in TLA+

To make sense of Success, Spec, and LiveSpec, we look at infinite execution traces:⁴

\[e = s_0, s_1, s_2, \dots\]

and at the two temporal operators TLA+ inherits from TLA, the underlying logic:

$\square P$ (“always”, ASCII []P) means that a property $P$ holds at every position of the trace;
$\diamond P$ (“eventually”, ASCII <>P) means that $P$ holds at some position of the trace.

These operators apply both to state predicates (so “holds at position $i$” means “is true in state $s_i$”) and to actions, i.e., two-state relations like Next (in which case “holds at position $i$” means “the pair $(s_i, s_{i+1})$ satisfies the relation”).

With this, the liveness property

Success == <>(chosen # {})

reads exactly as expected: eventually, the set chosen is nonempty.

The specification of all traces of a state-transition system in TLA+ is written as

Spec == Init /\ [][Next]_chosen

In other words, a trace satisfies Spec iff $s_0$ satisfies Init and every adjacent pair satisfies [Next]_chosen. The bracketed action [Next]_chosen expands to Next \/ chosen' = chosen—either a real Next step or a stuttering step that leaves chosen unchanged. Stuttering matters because it lets an abstract spec ignore the lower-level steps of any refined protocol that do not touch the abstract variable.

But stuttering also opens a gap between safety and liveness. The all-stuttering trace

chosen = {}, chosen = {}, chosen = {}, ...

satisfies Spec (every step is a stuttering step), yet never satisfies Success. To rule such traces out, the live specification adds a weak fairness assumption:

LiveSpec == Spec /\ WF_chosen(Next)

Operationally, WF_chosen(Next) says:⁵

If a non-stuttering Next step remains enabled from some point on, then the trace must eventually take such a step.

A non-stuttering Next step is <<Next>>_chosen ≡ Next /\ chosen' # chosen—the dual of the bracketed action. Enabled in TLA+ means there exists some next state that makes the action formula true. In our model, Next is enabled whenever chosen = {} (and Value is nonempty), so under LiveSpec the system cannot stay forever in a state with chosen = {}. The property we want to prove is therefore

LiveSpec => Success

The desired temporal property above follows from an intuitive argument: a continuously enabled fair action must eventually fire, and firing it makes the goal true. This argument is not special to our example. TLA packages it as a general proof rule, WF1, which derives a progress property from a weak fairness assumption by discharging three obligations about a single step of the system. WF1 is the engine of the liveness proof we are after, and TLA+ already lets us run it: its proof system, TLAPS, can mechanise exactly the argument above. So the question is not whether such proof is mechanisable at all—it clearly is.

The real question is whether it can be carried out in a setting that offers a modern proof mode: interactive tactics, definitional unfolding, SMT-backed automation, and a rich ambient library of mathematics. TLAPS is a respectable tool but predates these conveniences and was not designed to grow into a full-blown theorem prover. The rest of this post replays the same liveness argument—WF1 and all—in Veil, which sits on top of Lean and inherits all of those amenities for free.

Encoding One-Step Consensus in Veil

(The Veil file for the rest of the post can be found here.)

We start by encoding the same transition system in Veil, translating it as directly as the syntax allows: the constant Value becomes an abstract type, Init becomes an after_init block, the action Next becomes an action choose, and the agreement invariant becomes a safety property. Only two choices are not syntactically identical to the TLA+ version: the set chosen is modelled as a unary relation chosen : value → Bool, and the two state predicates the liveness proof will need, chosen = {} and chosen # {}, are named as ghost relations noneChosen and someChosen. With these conventions, the model is:

veil module Consensus

type value

relation chosen : value → Bool

#gen_state

after_init {
  chosen V := false
}

ghost relation noneChosen := ∀ v, ¬ chosen v
ghost relation someChosen := ∃ v, chosen v

action choose {
  require noneChosen
  let v ← pick value
  chosen v := true
}

-- TLA stuttering step: `chosen` does not change.
action stutter {
  pure ()
}

safety [agreement] chosen V1 ∧ chosen V2 → V1 = V2

The TLA+ assumption ValueNonempty has no explicit counterpart here, but it is not lost: declaring type value in Veil automatically provides an Inhabited value instance—a designated element witnessing that the type is nonempty. This is the Veil analogue of Value # {}, and we will need it later for the WF1 obligation that choose is enabled whenever noneChosen holds, since choosing a value presupposes that one exists.

As shown in a previous post, we can ask Veil to verify the declared safety properties by running:

#check_invariants

For this tiny model, Veil proves agreement inductive automatically.

The new ingredient is the temporal property. Veil’s temporal reasoning is built on Lentil, a small Lean library formalizing TLA (itself inspired by the Rocq library coq-tla). Lentil supplies the temporal operators and proof infrastructure to state liveness properties in a notation close to the TLA used above, and to prove them in a mode that feels like proving ordinary Lean propositions. In it, our liveness property is:

temporal [success] 𝒲ℱ choose → ◇ ⌜ someChosen ⌝

This is almost a transliteration of the TLA formula LiveSpec => Success. The body ◇ ⌜ someChosen ⌝ says that someChosen eventually holds; the brackets ⌜ ... ⌝ lift a state predicate (a σ → Prop, with σ the state type) into a temporal formula, which is needed because ◇ expects a temporal formula while someChosen talks about a single state. The premise 𝒲ℱ choose is the weak fairness assumption on choose, the counterpart of WF_chosen(Next); for now we model fairness by stating it directly as a premise of the temporal property.

The Paper Proof of Liveness

Before seeing how the proof is carried out in Veil, we first give the proof on paper, in plain prose, so the intuition is clear before we turn to the Lean details. The proof has two parts.

Part 1: Reduce “eventually” to “leads-to”. Instead of trying to prove ◇ someChosen directly (omitting the lifting brackets ⌜ ... ⌝ for brevity), we first prove the leads-to property:

noneChosen ↝ someChosen

where $P \leadsto Q$ (typed as P ↝ Q in Lentil) abbreviates $\square(P \to \diamond Q)$ and reads “from any point at which $P$ holds, $Q$ eventually holds”.

This leads-to property is enough to derive ◇ someChosen. Unfolding it, noneChosen → ◇ someChosen holds at every position, and in particular at the initial one. Since noneChosen holds there (the after_init block leaves every value unchosen), ◇ someChosen follows at the start of the trace, which is exactly our goal.

The whole liveness statement therefore reduces to this one leads-to.

Part 2: Prove the leads-to with WF1. A leads-to of this shape is precisely what WF1 handles—Lamport’s standard rule for turning weak fairness into progress. Informally, WF1 says that if a fairly-scheduled action reliably drives the system toward a goal, then the goal is eventually reached. “Reliably drives toward the goal” is made precise by three obligations, each about a single step of the system. With choose as the fair action, they are:

The noneChosen property is preserved until someChosen. From a state where no value is chosen, every step of the system either leaves no value chosen or else chooses one—a stutter step keeps noneChosen, a choose step reaches someChosen, so one of the two always holds.
The choose action step achieves someChosen. Taken from such a state, choose makes some value chosen: it picks a value and records it.
The choose action is enabled while noneChosen holds. Whenever no value is chosen, a choose step is possible. This is where we use that value is inhabited—there is always some value to pick.

Together with weak fairness of choose, these three imply the leads-to in Part 1. We argue by contradiction: suppose that from some point where noneChosen holds, someChosen is never reached. By (1) the system then stays in noneChosen forever; by (3) choose stays enabled the whole time; so weak fairness must eventually force a choose step; and by (2) that step makes someChosen true—contradicting the assumption. Hence someChosen is eventually reached, which is exactly noneChosen ↝ someChosen.

Proving Liveness in Veil

Now we mechanise this proof in Veil. The proof is set up with

prove_temporal_by [success]

which starts the proof of the following theorem:

(⌜ veil_term% Init ⌝ ∧ □⟨ veil_term% NextStep ⟩ ∧ □⌜ veil_term% Invariants ⌝)
|-tla- (veil_term% success)

(In the displayed goals, veil_term% and veil_tr% are decorators for hiding Veil’s internal arguments; read veil_term% Init as Init, veil_tr% choose as choose, and so on.)

Here |-tla- denotes a sequent between temporal formulas: we are proving that the left formula implies the right. The left-hand side is the protocol specification that Veil generates from the model, and it matches the similar TLA+ Spec: ⌜ Init ⌝ is the initial-state predicate from the after_init block; the temporal statement □⟨ NextStep ⟩ says every step satisfies the generated next-step relation, where NextStep is the disjunction of the actions’ transition predicates (here choose ∨ stutter), so this conjunct corresponds to [][Next]_chosen; and □⌜ Invariants ⌝ carries the declared invariants as (such as agreement) always-true assumptions.

Here, the action brackets ⟨ ... ⟩ lift a transition predicate into a temporal formula, the analogue of ⌜ ... ⌝ for state predicates.⁶ Here Invariants is just agreement; we will not need it for success, but Veil keeps invariants around because liveness proofs for larger protocols usually rely on them.

We now retrace the two parts of the paper proof in Veil. Provided by Lentil, the tactics below are temporal counterparts of the ordinary Lean ones, prefixed with t: tsuffices, tapply, tclear, and so on mirror suffices, apply, clear, but act on temporal sequents rather than ordinary goals.

Part 1 (i.e., from “eventually” to “leads-to”) corresponds to a single tsuffices tactic:

tsuffices hleadsto :
  ⌜ fun st => (veil_term% noneChosen) st ⌝ ↝ 
    ⌜ fun st => (veil_term% someChosen) st ⌝ by
  ...

The by block discharges the original ◇ someChosen given the leads-to, following the same reasoning as in Part 1 of the paper proof.

Now the leads-to itself becomes the main goal and we carry out Part 2. To that end, we use the following version of the WF1 that is already proven sound and provided as theorem by Lentil:⁷

theorem wf1_original p q next a :
  ((p ∧ ⟨next⟩ ⇒ ◯ (p ∨ q)) ∧
   ((p ∧ ⟨next⟩ ∧ ⟨a⟩ ⇒ ◯ q)) ∧
   ((p ⇒ Enabled a))) |-tla-
    (□ ⟨next⟩ ∧ 𝒲ℱ a → p ↝ q)

Two notations appear here for the first time. ◯ p is the next operator: p holds at the next position (for a state predicate, in the next state). And p ⇒ q abbreviates □(p → q), meaning that that implication between p and q is holding at every position of the system trace.

We apply wf1_original with p = noneChosen, q = someChosen, next = NextStep, and a = choose, which replaces the leads-to with exactly the three obligations (1)–(3) from the paper proof. Each constrains a single step, so once its outer □ is stripped with the tmonotone tactic, the obligation is no longer temporal; rather, it is just a first-order verification condition about one transition. At that point we leave the temporal proof mode: the veil_solve_temporal tactic reduces the goal to an ordinary Lean goal and discharges it with Veil’s automation. Combining the two tactics, one line closes all three:

all_goals tmonotone ; veil_solve_temporal

The Complete Proof Script

Putting all the pieces of the proof together, here is the whole proof script in Lean, with comments helping to align with the proof in plain English explained above:

prove_temporal_by [success]
  -- Name each of the three conjuncts: ⌜ Init ⌝, □⟨ NextStep ⟩ and □⌜ Invariants ⌝
  tstart hInit hNext hInv
  tclear hInv             -- the agreement invariant is not needed for this proof
  tdsimp only [success]   -- unfold `success`
  tintro hwf              -- introduce hwf : weak fairness of `choose` to the 
                          -- temporal proof context

  -- Part 1: reduce the goal to `noneChosen ↝ someChosen`
  tsuffices hleadsto :
   ⌜fun st => (veil_term% noneChosen) st⌝ ↝ ⌜fun st => (veil_term% someChosen) st⌝ by
    tunfold leads_to at hleadsto           -- unfold leads-to to expose the outer □
    thave himp := always_weaken _ hleadsto -- fix it at the initial position
    tapply himp                            -- prove with `noneChosen` as the state
    tclear *- hInit                        -- clear unrelated temporal hypotheses
    veil_solve_temporal                    -- reduce the goal to FO VC and prove it

  -- Part 2: prove the leads-to via WF1:
  -- `p = noneChosen`, `q = someChosen`, `next = NextStep`, and `a = choose`
  tclear hInit                             -- ⌜ Init ⌝ is not used here
  trevert hNext hwf ; trewrite [← TLA.and_imp]
                                           -- massage the goal to match it with 
                                           --   the conclusion of `wf1_original`
  tapply wf1_original                      -- leaves the three obligations
  tsplit_ands
  --  (1) noneChosen is stable until someChosen
  --  (2) a `choose` step achieves someChosen
  --  (3) `choose` is enabled while noneChosen holds
  --  Each is about one transition: strip the outer □, 
  --                                reduce to first-order VC and discharge it
  all_goals tmonotone ; veil_solve_temporal

This file contains this development in full. We recommend the interested reader to explore the proof interactively in Lean proof mode, so we encourage you to open it and step through the script to see how the statement of the goal changes.

Future Steps

So far, we have spent a lot of words on a deliberately minimal example, but, in exchange, we have seen an entire liveness argument end to end: from the informal “something good eventually happens,” to its TLA formalisation with weak fairness, to a Lean-checked proof in Veil.

Stepping back, the point is not the example itself but what carrying it out in Veil buys us. We have shown a dedicated proof mode for liveness, sitting on top of Lean, in which temporal goals are first-class and the WF1 rule is just another lemma to apply. Inside it, proofs can be written by hand (i.e., by stepping through the changing goal interactively, as we did above) or handed off to an AI assistant, exactly as one would for any other Lean development. Either way, the resulting script stays close to the informal argument: split the eventuality into a leads-to, apply WF1, and discharge three small one-step obligations. This tight correspondence between the paper argument and the machine proof is what makes the approach pleasant to use by hand, and it is also what should make it amenable to auto-formalisation, when the shape of the proof mirrors the shape of the reasoning, there is far less of a gap for a human or a model to bridge.

The example here was chosen so that we can focus on talking about liveness proofs without being bogged down by the protocol details. In the next posts in this series, we will keep this machinery fixed and grow the protocol, applying the same proof mode to more realistic distributed systems. Stay tuned!

The terms safety and liveness were initially introduced by Leslie Lamport in Proving the Correctness of Multiprocess Programs. A formal definition was later given by Bowen Alpern and Fred B. Schneider in Defining Liveness. ↩
TLA and TLA+ are not the same thing. TLA is the underlying logic, while TLA+ is the specification language built on top of TLA (Lamport, Specifying Systems); it adds a version of Zermelo–Fraenkel set theory together with the data structures and module system used to write specifications. The specification below is written in TLA+, whereas the temporal operators and the proof rule WF1 we will use belong to TLA. ↩
More precisely, a real consensus protocol should refine this abstract specification: its concrete executions should implement the abstract behavior described here, under a suitable refinement mapping. Refinement is an important topic in TLA+, but it is orthogonal to what we focus on in this post, so we will not discuss it further here. ↩
TLA usually calls such an infinite trace a behaviour. ↩
We only give an operational reading here. One of several equivalent ways to define weak fairness formally is $\mathrm{WF}_v(A) \equiv \square(\square\,\mathrm{ENABLED}\,\langle A\rangle_v \rightarrow \diamond\langle A\rangle_v)$, where $\langle A\rangle_v$ is the non-stuttering action $A \land v’ \neq v$. ↩
Don’t confuse this ⟨ ... ⟩ with the angle-bracket action <<Next>>_chosen from the TLA+ section. There, the subscript made it the non-stuttering version of Next; here, ⟨ ... ⟩ is just the lifting of definitions to temporal formulas, with no subscript and no non-stuttering condition. ↩
Lentil’s wf1 has a slightly different shape. The version shown here is written to resemble Lamport’s original WF1 rule more closely, so it is called wf1_original. ↩

Proofs and Intuitions