# A Verified DRUP Proof Checker
As the title suggests, a verified implementation of a checker for propositional unsatisfiability proofs in the [DRUP format](https://satcompetition.github.io/2022/certificates.html) that is produced by many solvers.
The core of the checker is written in [Why3](https://why3.lri.fr/), which is extracted to OCaml, compiled natively, and exported as a C library with Python bindings.
* The checker also supports RAT clauses, so DRAT proofs are accepted.
* The current implementation is not optimized, and will be considerably slower than [DRAT-trim](https://github.com/marijnheule/drat-trim) on large proofs (see [performance](#performance) below).
* Accordingly, the frontend does not accept proofs in binary format.
## Installation
If you use a recent Linux distribution on x86_64, you should be able to install the compiled wheel from PyPI:
```bash
$ pip install drup
```
Otherwise, you need to have OCaml (>= 4.12), Ctypes (>=0.20), Why3 (>= 1.5.1), and Dune (>=2.9.3) installed.
The most straightforward way to install these is to use [opam](https://opam.ocaml.org/doc/Install.html), which is available in most package systems, and then install Why3 and Dune (a sufficiently recent version of OCaml should already be installed with Opam):
```bash
$ opam install why3 dune
```
If you do not intend to check the verification of the library or develop it further, then you do not need to install Why3's IDE or any of the solvers that it supports.
Once OCaml and Why3 are installed, make sure that Python `build` is installed:
```bash
$ pip install build
```
Then, clone this repository, build, and install the package:
```bash
$ git clone https://github.com/cmu-transparency/verified_rup.git
$ cd verified_rup
$ python -m build
$ pip install dist/*.whl
```
## Usage
### Command line interface
The package provides a command line interface for checking proofs stored in files:
```bash
$ drup --help
usage: drup [-h] [-d] [-v] dimacs drup
Checks DRUP & DRAT proofs against DIMACS source. Returns 0 if the proof is valid, -1 if not, or a negative error code if the input is invalid.
positional arguments:
dimacs Path to a DIMACS CNF formula
drup Path to a DRUP/DRAT proof
optional arguments:
-h, --help show this help message and exit
-d, --derivation Check each step, ignore absence of empty clause
-v, --verbose Print detailed information about failed checks
For more information visit https://github.com/cmu-transparency/verified_rup
```
### As a Python module
See the [documentation](https://fairlyaccountable.org/verified_rup/drup.html) for details of the API.
The primary function is `drup.check_proof`, or alternatively, `drup.check_derivation` to check each step of the proof, ignoring the absence of an empty clause). There are corresponding convenience functions `check_proof_from_strings` and `check_proof_from_files`, similarly for `check_derivation`.
The following example uses [CNFgen](https://massimolauria.net/cnfgen/) to generate a PHP instance,
and [PySAT](https://pysathq.github.io/) to solve it and generate a DRUP proof.
To illustrate the verbose output given for failed checks, only the first ten clauses of the proof are checked against the proof.
```python
import drup
import cnfgen
from pysat.formula import CNF
from pysat.solvers import Solver
dimacs = cnfgen.BinaryPigeonholePrinciple(4, 3).dimacs()
cnf = CNF(from_string=dimacs).clauses
g4 = Solver(name='g4', with_proof=True, bootstrap_with=cnf)
g4.solve()
proof = [[int(l) for l in c.split(' ')[:-1]] for c in g4.get_proof()]
drup.check_proof(cnf[:10], proof, verbose=True)
```
This gives a `CheckerResult` object with the following information:
```python
CheckerResult(
Outcome.INVALID,
[],
RupInfo([4, 6, 7, 8], [-4, -6, -7, -8, 3, 5]),
RatInfo(
[4, 6, 7, 8],
[-4, -3],
RupInfo([6, 7, 8, -3], [-6, -7, -8, 5, 3, -4])
)
)
```
The `RupInfo` component relates that RUP verification failed on the first clause of the proof, `4 6 7 8 0`, after propagating the literals `-4, -6, -7, -8, 3, 5`, and failing to find more opposites to propagate.
Likewise, the `RatInfo` component relates that RAT verification on this clause failed when checking the pivot on clause `-4 -3 0`.
The resolvent of these clauses, `6 7 8 -3 0`, failed RUP verification after propagating `-6 -7 -8 5 3 -4`.
## Performance
At present, the implementation of RUP checking is not optimized, and drop lines are ignored.
Unit propagation does not take advantage of watched literals, and does not use mutable data structures.
Nonetheless, the performance compares well to that of [DRAT-trim](https://github.com/marijnheule/drat-trim) on small proofs (<200 variables, a few hundred clauses).
We measure this on random unsatisfiable instances generated by the procedure described in [[1]](#references).
To evaluate the performance of DRAT-trim without the overhead of creating and tearing down a new process for each instance, we compiled it into a library with the same `check_from_strings` interface as the C library, and called it using [ctypes](https://docs.python.org/3/library/ctypes.html).
In the table below, each configuration is run on 10,000 instances, with proofs generated by [Glucose 4](https://www.labri.fr/perso/lsimon/research/glucose/).
| # vars | # clauses (avg) | pf len (avg) | `drup (sec, avg)` | `drat-trim (sec, avg)` |
| ------ | --------------- | ------------ | ----------------- | ---------------------- |
| 25 | 147.7 | 7.3 | 0.001 | 0.085 |
| 50 | 280.5 | 14.2 | 0.006 | 0.179 |
| 75 | 413.5 | 26.3 | 0.022 | 0.217 |
| 100 | 548.2 | 40.6 | 0.068 | 0.172 |
| 150 | 811.8 | 102.7 | 0.407 | 0.326 |
| 200 | 1079.5 | 227.9 | 1.916 | 0.292 |
### References
[[1]](https://openreview.net/forum?id=HJMC_iA5tm) Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, David L. Dill. *Learning a SAT Solver from Single-Bit Supervision*. International Conference on Learning Representations (ICLR), 2019.
## Verification
The verification can be examined by running `src/librupchecker/rup_pure.mlw` in Why3, or by checking the Why3 session in `src/librupchecker/rup_pure/why3session.xml`.
The proof was developed using Why3 1.5.1, Alt-Ergo 2.4.0, Z3 4.8.6, and CVC4 1.8.
Verification has not been attempted with earlier versions of Why3 or the provers.
The primary contract on the proof checker is as follows:
```ocaml
requires { no_redundant_clauses cnf /\ no_trivial_clauses cnf }
requires { no_redundant_clauses pf /\ no_trivial_clauses pf }
ensures { match result with
| Valid -> valid_proof cnf pf
| _ -> proof_failure orig result
end }
```
The bindings used by this library take care of removing redundant and trivial clauses.
The `valid_proof` predicate is a straightforward translation of DRAT certification requirements.
A `proof_failure` result provides additional assurances about the verbose output of the checker.
```ocaml
predicate proof_failure (cnf : cnf) (result : result) =
match result with
| Valid -> false
| InvalidEmpty steps rup_info -> rup_failure (steps ++ cnf) rup_info
| InvalidStep steps rup_info rat_info ->
rup_failure (steps ++ cnf) rup_info /\ rat_failure (steps ++ cnf) rat_info
end
```
An `InvalidEmpty` result indicates all of the listed `steps` are valid, but the empty clause was not derived, as witnessed by the provided `rup_info`.
This is only returned when all of the steps in the proof are valid except an empty clause at the end.
The `rup_info` component ensures that the empty clause is not RUP, and that the unit chain used to conclude this is exhaustive.
```ocaml
predicate rup_failure (cnf : cnf) (info : rup_info) =
not (rup cnf info.rup_clause) /\
match info.rup_clause with
| Nil ->
let cnf' = bcp cnf info.chain in
is_unit_chain cnf info.chain /\
forall chain' . is_unit_chain cnf' chain' ->
forall c . mem c (bcp cnf' chain') <-> mem c cnf'
| Cons _ _ ->
let cnf' = bcp (cnf ++ (negate_clause info.rup_clause)) info.chain in
is_unit_chain (cnf ++ (negate_clause info.rup_clause)) info.chain /\
forall chain' . is_unit_chain cnf' chain' ->
forall c . mem c (bcp cnf' chain') <-> mem c cnf'
end
```
An `InvalidStep` result indicates that the `steps` are valid up to some non-empty step.
The next step in the certificate following `steps` is invalid, as witnessed by the provided `rup_info` and `rat_info`.
In addition to the assurances on `rup_info` described above, `rat_info` provides that the identified pivot clause is not RUP.
```ocaml
predicate rat_failure (cnf : cnf) (info : rat_info) =
not (rat cnf info.rat_clause) /\
match info.rat_clause with
| Nil -> false
| Cons l _ ->
mem info.pivot_clause (pivot_clauses cnf l) /\
info.pivot_info.rup_clause = resolve info.rat_clause info.pivot_clause l /\
rup_failure cnf info.pivot_info
end
```
The derivation checker provides a similar contract, but rather than ensuring `valid_proof` on success, it provides `valid_derivation`.
```ocaml
predicate valid_derivation (cnf : cnf) (pf : proof) =
match pf with
| Nil -> true
| Cons c cs -> (rup cnf c \/ rat cnf c) /\ valid_derivation (Cons c cnf) cs
end
```
This is the same condition as `valid_proof`, but in the `Nil` case, the checker does not require that the empty clause is not RUP.
## Acknowledgements
Many thanks to [Frank Pfenning](http://www.cs.cmu.edu/~fp/), [Joseph Reeves](https://www.cs.cmu.edu/~jereeves/), and [Marijn Huele](https://www.cs.cmu.edu/~mheule/) for the ongoing insightful discussions that led to this project.