edit-distance-vector-1.0.0.4/0000755000000000000000000000000012720747401014061 5ustar0000000000000000edit-distance-vector-1.0.0.4/CHANGELOG.md0000644000000000000000000000071212720747401015672 0ustar0000000000000000edit-distance-vector 1.0.0.4 * Relax version bounds to support GHC 8.0 edit-distance-vector 1.0.0.3 * Relax version bounds to support GHC 7.6.3 and 7.4.2 edit-distance-vector 1.0.0.2 * Relax version bounds to support GHC 7.6.3 and 7.4.2 edit-distance-vector 1.0.0.1 * Relax version bounds to support GHC 7.10.1. edit-distance-vector 1.0 * Initial release extracted from aeson-diff package and rewritten to use Data.Vector. edit-distance-vector-1.0.0.4/edit-distance-vector.cabal0000644000000000000000000000322712720747401021066 0ustar0000000000000000name: edit-distance-vector version: 1.0.0.4 synopsis: Calculate edit distances and edit scripts between vectors. description: An implementation of the Wagner–Fischer dynamic programming algorithm to find the optimal edit script and cost between two sequences. . The implementation in this package is specialised to sequences represented with "Data.Vector" but is otherwise agnostic to: . * The type of values in the vectors; . * The type representing edit operations; and . * The type representing the cost of operations. . homepage: https://github.com/thsutton/edit-distance-vector bug-reports: https://github.com/thsutton/edit-distance-vector/issues license: BSD3 license-file: LICENSE author: Thomas Sutton maintainer: me@thomas-sutton.id.au copyright: (c) 2015 Thomas Sutton and others. category: Data, Data Structures, Algorithms build-type: Simple extra-source-files: README.md, CHANGELOG.md, test/sample.hs cabal-version: >=1.10 source-repository HEAD type: git location: https://github.com/thsutton/edit-distance-vector library default-language: Haskell2010 hs-source-dirs: lib exposed-modules: Data.Vector.Distance build-depends: base >=4.5 && <5.0 , vector >= 0.8 test-suite properties default-language: Haskell2010 type: exitcode-stdio-1.0 hs-source-dirs: test main-is: properties.hs build-depends: base , QuickCheck , edit-distance-vector , quickcheck-instances , vector edit-distance-vector-1.0.0.4/LICENSE0000644000000000000000000000276412720747401015077 0ustar0000000000000000Copyright (c) 2015, Thomas Sutton All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Thomas Sutton nor the names of other contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. edit-distance-vector-1.0.0.4/README.md0000644000000000000000000000370612720747401015346 0ustar0000000000000000Edit Distance: Vector ===================== [![Build Status][badge]][status] This is a small library for calculating the edit distance and edit script between two vectors. It is generic enough that you should be able to use it with vectors containing *values* of any type you like, with *changes* described by any type you like, and with *costs* represented by any type you like (with a few restrictions). Installing ---------- The `edit-distance-vector` package is a normal Haskell library and can be installed using the Cabal package management tool. ````{bash} cabal update cabal install edit-distance-vector ```` `edit-distance-vector` is [automatically tested][status] on GHC versions 7.4.2, 7.6.3, 7.8.3, 7.10.1, 8.0.2 using the Travis CI service. Usage ----- The interface to `edit-distance-vector` is very small; just import `Data.Vector.Distance`, create a `Params` value with the correct operations to deal with your types, and pass this to `leastChanges` along with your `Vector`s. ````{haskell} import Data.Monoid import qualified Data.Vector as V import Data.Vector.Distance -- | Editing vectors of 'Char' values, with '(String, Int, Char)' describing -- changes, and the additive monoid of 'Int' describing costs. str :: Params Char (String, Int, Char) (Sum Int) str = Params { equivalent = (==) , delete = \i c -> ("delete", i, c) , insert = \i c -> ("insert", i, c) , substitute = \i c c' -> ("replace", i, c') , cost = const (Sum 1) , positionOffset = \ (op, _, _) -> if op == "delete" then 0 else 1 } main :: IO () main = do print $ leastChanges str (V.fromList "I am thomas") (V.fromList "My name is Thomas") ```` (See `test/sample.hs` for a version of this code that is compiled by the automated test suite.) [badge]: https://travis-ci.org/thsutton/edit-distance-vector.svg?branch=master [status]: https://travis-ci.org/thsutton/edit-distance-vector edit-distance-vector-1.0.0.4/Setup.hs0000644000000000000000000000005612720747401015516 0ustar0000000000000000import Distribution.Simple main = defaultMain edit-distance-vector-1.0.0.4/lib/0000755000000000000000000000000012720747401014627 5ustar0000000000000000edit-distance-vector-1.0.0.4/lib/Data/0000755000000000000000000000000012720747401015500 5ustar0000000000000000edit-distance-vector-1.0.0.4/lib/Data/Vector/0000755000000000000000000000000012720747401016742 5ustar0000000000000000edit-distance-vector-1.0.0.4/lib/Data/Vector/Distance.hs0000644000000000000000000001460012720747401021031 0ustar0000000000000000{-# LANGUAGE RecordWildCards #-} {-# LANGUAGE TupleSections #-} {-# LANGUAGE ViewPatterns #-} -- | Description: Calculate differences between vectors. -- -- This module implements a variation on the -- -- algorithm to find the shortest sequences of operations which transforms -- one vector of values into another. module Data.Vector.Distance ( -- * Types Params(..), ChangeMatrix(..), -- * Operations leastChanges, allChanges, -- * Example strParams, ) where import Control.Applicative import Control.Arrow ((***)) import Data.Function import Data.List hiding (delete, insert) import Data.Maybe import Data.Monoid import Data.Vector (Vector) import qualified Data.Vector as V -- | Operations invoked by the Wagner-Fischer algorithm. -- -- The parameters to this type are as follows: -- -- * 'v' is the type of values being compared, -- * 'o' is the type representing operations, -- * 'c' is the type representing costs. -- -- The chief restrictions on these type parameters is that the cost type 'c' -- must have instances of 'Monoid' and 'Ord'. A good default choice might be -- the type @('Sum' 'Int')@. data Params v o c = Params { equivalent :: v -> v -> Bool -- ^ Are two values equivalent? , delete :: Int -> v -> o -- ^ Delete the element at an index. , insert :: Int -> v -> o -- ^ Insert an element at an index. , substitute :: Int -> v -> v -> o -- ^ Substitute an element at an index. , cost :: o -> c -- ^ Cost of a change. , positionOffset :: o -> Int -- ^ Positions to advance after a change. E.g. @0@ for a deletion. } -- | Matrix of optimal edit scripts and costs for all prefixes of two vectors. -- -- This is a representation of the @n * m@ dynamic programming matrix -- constructed by the algorithm. The matrix is stored in a 'Vector' in -- row-major format with an additional row and column corresponding to the -- empty prefix of the source and destination 'Vectors'. type ChangeMatrix o c = Vector (c, [o]) -- | /O(nm)./ Find the cost and optimal edit script to transform one 'Vector' -- into another. leastChanges :: (Monoid c, Ord c) => Params v o c -> Vector v -- ^ \"Source\" vector. -> Vector v -- ^ \"Destination" vector. -> (c, [o]) leastChanges p ss tt = fmap (catMaybes . reverse) . V.last $ rawChanges p ss tt -- | /O(nm)./ Calculate the complete matrix of edit scripts and costs between -- two vectors. allChanges :: (Monoid c, Ord c) => Params v o c -> Vector v -- ^ \"Source\" vector. -> Vector v -- ^ \"Destination" vector. -> ChangeMatrix o c allChanges p src dst = V.map (fmap (catMaybes . reverse)) $ rawChanges p src dst -- | /O(nm)./ Calculate the complete matrix of edit scripts and costs between -- two vectors. -- -- This is a fairly direct implementation of Wagner-Fischer algorithm using -- the 'Vector' data-type. The 'ChangeMatrix' is constructed in a single-pass. -- -- Note: The change matrix is \"raw\" in that the edit script in each cell is -- in reverse order and uses 'Maybe' to allow for steps at which no change is -- necessary. rawChanges :: (Monoid c, Ord c) => Params v o c -> Vector v -- ^ \"Source\" vector. -> Vector v -- ^ \"Destination" vector. -> Vector (c, [Maybe o]) rawChanges p@Params{..} src dst = let len_x = 1 + V.length dst len_y = 1 + V.length src len_n = len_x * len_y ix x y = (x * len_y) + y -- Get a cell from the 'ChangeMatrix'. It is an error to get a cell -- which hasn't been calculated yet! get :: Vector (c, [Maybe o]) -> Int -> Int -> (c, [Maybe o]) get m x y = fromMaybe (error $ "Unable to get " <> show (x,y) <> " from change matrix") (m V.!? (ix x y)) -- Calculate the position to be updated by the next edit in a script. position = sum . fmap (maybe 1 positionOffset) -- Given a partially complete 'ChangeMatrix', compute the next cell. ctr v = case V.length v `quotRem` len_y of -- Do nothing for "" ~> "" ( 0, 0) -> (mempty, mempty) -- Delete everything in src for "..." ~> "" ( 0, pred -> y) -> let o = delete 0 (src V.! y) (pc, po) = get v 0 y in (cost o <> pc, Just o : po) -- Insert everything in dst for "" ~> "..." (pred -> x, 0) -> let o = insert x (fromMaybe (error "NAH") $ dst V.!? x) (pc, po) = get v x 0 in (cost o <> pc, Just o : po) -- Compare options between src and dst for "..." ~> "..." (pred -> x, pred -> y) -> let s = src V.! y d = dst V.! x tl = get v (x) (y) top = get v (x+1) (y) left = get v (x) (y+1) in if s `equivalent` d then (Nothing:) <$> get v x y else minimumBy (compare `on` fst) -- Option 1: perform a deletion. [ let c = delete (position . snd $ top) s in (cost c <>) *** (Just c :) $ top -- Option 2: perform an insertion. , let c = insert (position . snd $ left) d in (cost c <>) *** (Just c :) $ left -- Option 3: perform a substitution. , let c = substitute (position . snd $ tl) s d in (cost c <>) *** (Just c :) $ tl ] in V.constructN len_n ctr -- | Example 'Params' to compare @('Vector' 'Char')@ values. -- -- The algorithm will produce edit distances in terms of @('Sum' 'Int')@ and -- edit scripts containing @(String, Int, Char)@ values. -- -- The first component of each operation is either @"delete"@, @"insert"@, or -- @"replace"@. strParams :: Params Char (String, Int, Char) (Sum Int) strParams = Params{..} where equivalent = (==) delete i c = ("delete", i, c) insert i c = ("insert", i, c) substitute i c c' = ("replace", i, c') cost _ = Sum 1 positionOffset ("delete", _, _) = 0 positionOffset _ = 1 edit-distance-vector-1.0.0.4/test/0000755000000000000000000000000012720747401015040 5ustar0000000000000000edit-distance-vector-1.0.0.4/test/properties.hs0000644000000000000000000000530212720747401017570 0ustar0000000000000000{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE RecordWildCards #-} {-# LANGUAGE TemplateHaskell #-} {-# OPTIONS_GHC -fno-warn-orphans #-} module Main where import Control.Monad import Data.Monoid import qualified Data.Vector as V import System.Exit import Test.QuickCheck import Test.QuickCheck.Instances () import Data.Vector.Distance -- | Changes to a 'String' (or other sequence, really). data C a = I Int a | D Int a | S Int a a deriving (Show, Eq) -- | Apply a list of changes. runC :: [C v] -> [v] -> [v] runC [] l = l runC (D i _ : r) l = let (h,t) = splitAt i l l' = h <> tail t in runC r l' runC (I i a : r) l = let (h,t) = splitAt i l l' = h <> [a] <> t in runC r l' runC (S i a a' : r) l = let (h,t) = splitAt i l l' = h <> [a'] <> tail t in runC r l' -- | Edit parameters for 'String'. str :: Params Char (C Char) (Sum Int) str = Params{..} where equivalent = (==) delete = D insert = I substitute = S cost op = Sum 1 positionOffset op = case op of D{} -> 0 _ -> 1 -- | Patch extracted from identical documents should be mempty. prop_distance_id :: String -> Bool prop_distance_id s = let s' = V.fromList s in leastChanges str s' s' == (Sum 0, []) -- | Delete everything! prop_distance_delete :: NonEmptyList Char -> Bool prop_distance_delete (NonEmpty s) = leastChanges str (V.fromList s) V.empty == (Sum $ length s, [ D 0 c | c <- s ]) -- | Insert everything! prop_distance_insert :: NonEmptyList Char -> Bool prop_distance_insert (NonEmpty s) = leastChanges str V.empty (V.fromList s) == (Sum $ length s, [ I i c | (i,c) <- zip [0..] s ]) -- | The examples from wikipedia. prop_distance_canned :: Bool prop_distance_canned = let sitting = V.fromList ("sitting" :: String) kitten = V.fromList ("kitten" :: String) saturday = V.fromList ("Saturday" :: String) sunday = V.fromList ("Sunday" :: String) in leastChanges str sitting kitten == (Sum 3, [S 0 's' 'k',S 4 'i' 'e',D 6 'g']) && leastChanges str kitten sitting == (Sum 3, [S 0 'k' 's',S 4 'e' 'i',I 6 'g']) && leastChanges str saturday sunday == (Sum 3, [D 1 'a',D 1 't',S 2 'r' 'n']) && leastChanges str sunday saturday == (Sum 3, [I 1 'a',I 2 't',S 4 'n' 'r']) -- | Apply the found changes works. -- -- @apply . leastChanges === id@ prop_distance_apply :: String -> String -> Bool prop_distance_apply ss tt = tt == runC (snd $ leastChanges str (V.fromList ss) (V.fromList tt)) ss -- -- Use Template Haskell to automatically run all of the properties above. -- return [] runTests :: IO Bool runTests = $quickCheckAll main :: IO () main = do result <- runTests unless result exitFailure edit-distance-vector-1.0.0.4/test/sample.hs0000644000000000000000000000133212720747401016654 0ustar0000000000000000import Data.Monoid import qualified Data.Vector as V import Data.Vector.Distance -- | Editing vectors of 'Char' values, with '(String, Int, Char)' describing -- changes, and the additive monoid of 'Int' describing costs. str :: Params Char (String, Int, Char) (Sum Int) str = Params { equivalent = (==) , delete = \i c -> ("delete", i, c) , insert = \i c -> ("insert", i, c) , substitute = \i c c' -> ("replace", i, c') , cost = const (Sum 1) , positionOffset = \ (op, _, _) -> if op == "delete" then 0 else 1 } main :: IO () main = do print $ leastChanges str (V.fromList "I am thomas") (V.fromList "My name is Thomas")