Run-length encoding verified in Agda

This post is intended to be a guided exercise (with a proposed solution) for Agda beginner's who are familiar with the basics and want to work on a slightly more involved exercise than proving basic properties about natural numbers. By verified algorithm we mean that termination is guaranteed and that it produces the right output. In our case we want the compression and decompression functions to satisfy $decompress ∘ compress = id$. Termination is automatically checked by Agda so it will not require any explicit proof.

The full code is here (spoilers!). Agda 2.6.0.1 was used.

The algorithm

The run-length encoding algorithm (RLE for short) is the most basic lossless compression algorithm one could imagine. The algorithm goes through the string and groups consecutive characters by saving just one character and the length of the group. For instance, the string aaabbcccc becomes (3,a)(2,b)(4,c). Notice that parentheses and commas are used to denote pairs and not literal characters of the string. Finally, by looking at the compressed string the decompression algorithm is immediately apparent.

Implementation

The proposed implementation uses some basic functions from the standard library. Here is a full list of the needed types and function imports for the implementation. If you are not familiar with them, you should refer to the standard library.

open import Agda.Builtin.Nat using (Nat; zero; suc)
open import Data.List using (List; []; __; replicate; _++_; foldl)
open import Data.Maybe using (Maybe; just; nothing)
open import Data.Nat.Properties using (__)
open import Data.Product using (_×_; _,_; proj; proj)
open import Relation.Nullary using (yes; no)

In this post we will not delve in low level details such as how to encode pairs or using machine bytes, instead, we will use naturals as bytes and lists of naturals as byte strings.

Byte : Set
Byte = Nat

ByteString : Set
ByteString = List Byte

The exercises:

Compression algorithm

Implement the compression algorithm:

compress-rle : ByteString  List (Nat × Byte)
compress-rle = ?

Implementing this algorithm in a language such as Haskell is trivial, however, in Agda it can be a little bit trickier due the termination checker.

Proposed solution
rle : Maybe (Nat × Byte)  ByteString  List (Nat × Byte)  List (Nat × Byte)
rle nothing [] ac = ac
rle (just (k , b)) [] ac = (k , b)  ac
rle nothing (x  tail) ac = rle (just (1 , x)) tail ac
rle (just (k , b)) (x  tail) ac with b  x
... | yes p = rle ((just (suc k , b))) tail ac
... | no p = rle (just (1 , x)) tail ((k , b)  ac)

compress-rle : ByteString  List (Nat × Byte)
compress-rle l = rle nothing l []

Decompression algorithm

Implement the decompression algorithm:

decompress-rle : List (Nat × Byte)  ByteString
decompress-rle = ?
Proposed solution
dec-step : ByteString  (Nat × Byte)  ByteString
dec-step = λ ac y  replicate (proj y) (proj y) ++ ac

decompress-rle : List (Nat × Byte)  ByteString
decompress-rle = foldl dec-step []

Verification

The goal of this section is to provide a proof (term) of the following theorem (type):

inverse-rle :  b  decompress-rle (compress-rle b)  b

The proof will indeed depend on the implementation, so if you want to follow along you should use the proposed code above.

The proof is not straightforward, so in order to guide the reader we suggest three lemmas that will lead to the proof. Of course, these lemmas are not required and you may want to take another road to prove the theorem.

We also suggest that you add the following imports:

open import Data.List.Properties using (++-identityʳ; ++-assoc)
open import Relation.Binary.PropositionalEquality using (__; refl; sym; trans)

Lemma 1

Prove the following:

decompress-head :  (l k x)  decompress-rle ((k , x)  l)  decompress-rle l ++ replicate k x
decompress-head = ?
Proposed solution
decompress-head :  (l k x)  decompress-rle ((k , x)  l)  decompress-rle l ++ replicate k x
decompress-head l k x = trans s1 (s2 l [] k x)
where
s1 : decompress-rle ((k , x)  l)  foldl dec-step (replicate k x) l
s1 rewrite ++-identityʳ (replicate k x) = refl
s2 :  l ac k x  foldl dec-step (ac ++ replicate k x) l  foldl dec-step ac l ++ replicate k x
s2 [] ac k x = refl
s2 ((k1 , x1)  l) ac k x rewrite
    ++-identityʳ (replicate k1 x1)
    | sym (s2 l (replicate k1 x1 ++ ac) k x)
    | ++-assoc (replicate k1 x1) ac (replicate k x)
  = refl

Lemma 2

Prove the following:

decompress-just :  (k x bs l)  decompress-rle (rle (just (k , x)) bs l)  decompress-rle l ++ (replicate k x ++ bs)
decompress-just = ?
Proposed solution
lemma-replicate :  (bs k) (x : Byte)  x  replicate k x ++ bs  replicate k x ++ x  bs
lemma-replicate bs zero x = refl
lemma-replicate bs (suc k) x rewrite lemma-replicate bs k x = refl

decompress-just :  (k x bs l)  decompress-rle (rle (just (k , x)) bs l)  decompress-rle l ++ (replicate k x ++ bs)
decompress-just k x [] l rewrite
 ++-identityʳ (replicate k x)
 | sym (decompress-head l k x)
 | ++-identityʳ (replicate k x)
 = refl
decompress-just k x (y  bs) ac with x  y
... | yes refl rewrite
decompress-just (suc k) x bs ac
| lemma-replicate bs k x
= refl
... | no _ rewrite
decompress-just 1 y bs ((k , x)  ac)
| ++-identityʳ (replicate k x)
| sym (++-assoc (foldl dec-step [] ac) (replicate k x) (y  bs))
| sym (decompress-head ac k x)
| ++-identityʳ (replicate k x)
= refl

Lemma 3

Prove the following:

decompress-nothing :  (b ac)  decompress-rle (rle nothing b ac)  decompress-rle ac ++ b
decompress-nothing = ?
Proposed solution
decompress-nothing :  (b ac)  decompress-rle (rle nothing b ac)  decompress-rle ac ++ b
decompress-nothing [] [] = refl
decompress-nothing [] (x  ac) =  sym (++-identityʳ (decompress-rle (x  ac)))
decompress-nothing (x  b) ac = decompress-just 1 x b ac

Main theorem

With the lemma 3 in hand, it should be trivial to prove the main theorem.

Main theorem

Prove the following:

inverse-rle :  b  decompress-rle (compress-rle b)  b
inverse-rle = ?
Proposed solution
inverse-rle :  b  decompress-rle (compress-rle b)  b
inverse-rle b = decompress-nothing b []

Conclusion

We have implemented and verified the run-length encoding algorithm in Agda and we have seen that even a simple algorithm is not straightforward to verify.

I am sure that the proposed proof is not the most concise and I think that it would be a good exercise to try to improve it.

Suggestions are welcome!

Related