pax_global_header00006660000000000000000000000064123467653500014525gustar00rootroot0000000000000052 comment=668eaa46715a8d1ca7bf5bdeb81112a8280e5361 clucy-0.4.0/000077500000000000000000000000001234676535000126455ustar00rootroot00000000000000clucy-0.4.0/.gitignore000066400000000000000000000001041234676535000146300ustar00rootroot00000000000000/classes /lib /target clucy*.jar pom.xml pom.xml.asc .lein-failures clucy-0.4.0/.travis.yml000066400000000000000000000001151234676535000147530ustar00rootroot00000000000000language: clojure lein: lein2 jdk: - openjdk6 - openjdk7 - oraclejdk7clucy-0.4.0/ChangeLog000066400000000000000000000005501234676535000144170ustar00rootroot000000000000000.3.0 / 2012-04-10 * Support for Lucene 3.5.0 0.2.2 / 2011-07-03 * Support versions of Lucene other than 3.0. 0.2.1 / 2011-05-31 * Expose score, total hits, and max-score metadata. * Allow for result highlighting. 0.2.0 / 2011-05-04 * Add deletion functionality. * Accept per-field store/index settings. 0.1.0 / 2010-03-26 * Initial release. clucy-0.4.0/LICENSE.html000066400000000000000000000311651234676535000146230ustar00rootroot00000000000000 Eclipse Public License - Version 1.0

Eclipse Public License - v 1.0

THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT.

1. DEFINITIONS

"Contribution" means:

a) in the case of the initial Contributor, the initial code and documentation distributed under this Agreement, and

b) in the case of each subsequent Contributor:

i) changes to the Program, and

ii) additions to the Program;

where such changes and/or additions to the Program originate from and are distributed by that particular Contributor. A Contribution 'originates' from a Contributor if it was added to the Program by such Contributor itself or anyone acting on such Contributor's behalf. Contributions do not include additions to the Program which: (i) are separate modules of software distributed in conjunction with the Program under their own license agreement, and (ii) are not derivative works of the Program.

"Contributor" means any person or entity that distributes the Program.

"Licensed Patents" mean patent claims licensable by a Contributor which are necessarily infringed by the use or sale of its Contribution alone or when combined with the Program.

"Program" means the Contributions distributed in accordance with this Agreement.

"Recipient" means anyone who receives the Program under this Agreement, including all Contributors.

2. GRANT OF RIGHTS

a) Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non-exclusive, worldwide, royalty-free copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, distribute and sublicense the Contribution of such Contributor, if any, and such derivative works, in source code and object code form.

b) Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non-exclusive, worldwide, royalty-free patent license under Licensed Patents to make, use, sell, offer to sell, import and otherwise transfer the Contribution of such Contributor, if any, in source code and object code form. This patent license shall apply to the combination of the Contribution and the Program if, at the time the Contribution is added by the Contributor, such addition of the Contribution causes such combination to be covered by the Licensed Patents. The patent license shall not apply to any other combinations which include the Contribution. No hardware per se is licensed hereunder.

c) Recipient understands that although each Contributor grants the licenses to its Contributions set forth herein, no assurances are provided by any Contributor that the Program does not infringe the patent or other intellectual property rights of any other entity. Each Contributor disclaims any liability to Recipient for claims brought by any other entity based on infringement of intellectual property rights or otherwise. As a condition to exercising the rights and licenses granted hereunder, each Recipient hereby assumes sole responsibility to secure any other intellectual property rights needed, if any. For example, if a third party patent license is required to allow Recipient to distribute the Program, it is Recipient's responsibility to acquire that license before distributing the Program.

d) Each Contributor represents that to its knowledge it has sufficient copyright rights in its Contribution, if any, to grant the copyright license set forth in this Agreement.

3. REQUIREMENTS

A Contributor may choose to distribute the Program in object code form under its own license agreement, provided that:

a) it complies with the terms and conditions of this Agreement; and

b) its license agreement:

i) effectively disclaims on behalf of all Contributors all warranties and conditions, express and implied, including warranties or conditions of title and non-infringement, and implied warranties or conditions of merchantability and fitness for a particular purpose;

ii) effectively excludes on behalf of all Contributors all liability for damages, including direct, indirect, special, incidental and consequential damages, such as lost profits;

iii) states that any provisions which differ from this Agreement are offered by that Contributor alone and not by any other party; and

iv) states that source code for the Program is available from such Contributor, and informs licensees how to obtain it in a reasonable manner on or through a medium customarily used for software exchange.

When the Program is made available in source code form:

a) it must be made available under this Agreement; and

b) a copy of this Agreement must be included with each copy of the Program.

Contributors may not remove or alter any copyright notices contained within the Program.

Each Contributor must identify itself as the originator of its Contribution, if any, in a manner that reasonably allows subsequent Recipients to identify the originator of the Contribution.

4. COMMERCIAL DISTRIBUTION

Commercial distributors of software may accept certain responsibilities with respect to end users, business partners and the like. While this license is intended to facilitate the commercial use of the Program, the Contributor who includes the Program in a commercial product offering should do so in a manner which does not create potential liability for other Contributors. Therefore, if a Contributor includes the Program in a commercial product offering, such Contributor ("Commercial Contributor") hereby agrees to defend and indemnify every other Contributor ("Indemnified Contributor") against any losses, damages and costs (collectively "Losses") arising from claims, lawsuits and other legal actions brought by a third party against the Indemnified Contributor to the extent caused by the acts or omissions of such Commercial Contributor in connection with its distribution of the Program in a commercial product offering. The obligations in this section do not apply to any claims or Losses relating to any actual or alleged intellectual property infringement. In order to qualify, an Indemnified Contributor must: a) promptly notify the Commercial Contributor in writing of such claim, and b) allow the Commercial Contributor to control, and cooperate with the Commercial Contributor in, the defense and any related settlement negotiations. The Indemnified Contributor may participate in any such claim at its own expense.

For example, a Contributor might include the Program in a commercial product offering, Product X. That Contributor is then a Commercial Contributor. If that Commercial Contributor then makes performance claims, or offers warranties related to Product X, those performance claims and warranties are such Commercial Contributor's responsibility alone. Under this section, the Commercial Contributor would have to defend claims against the other Contributors related to those performance claims and warranties, and if a court requires any other Contributor to pay any damages as a result, the Commercial Contributor must pay those damages.

5. NO WARRANTY

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely responsible for determining the appropriateness of using and distributing the Program and assumes all risks associated with its exercise of rights under this Agreement , including but not limited to the risks and costs of program errors, compliance with applicable laws, damage to or loss of data, programs or equipment, and unavailability or interruption of operations.

6. DISCLAIMER OF LIABILITY

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

7. GENERAL

If any provision of this Agreement is invalid or unenforceable under applicable law, it shall not affect the validity or enforceability of the remainder of the terms of this Agreement, and without further action by the parties hereto, such provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable.

If Recipient institutes patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Program itself (excluding combinations of the Program with other software or hardware) infringes such Recipient's patent(s), then such Recipient's rights granted under Section 2(b) shall terminate as of the date such litigation is filed.

All Recipient's rights under this Agreement shall terminate if it fails to comply with any of the material terms or conditions of this Agreement and does not cure such failure in a reasonable period of time after becoming aware of such noncompliance. If all Recipient's rights under this Agreement terminate, Recipient agrees to cease use and distribution of the Program as soon as reasonably practicable. However, Recipient's obligations under this Agreement and any licenses granted by Recipient relating to the Program shall continue and survive.

Everyone is permitted to copy and distribute copies of this Agreement, but in order to avoid inconsistency the Agreement is copyrighted and may only be modified in the following manner. The Agreement Steward reserves the right to publish new versions (including revisions) of this Agreement from time to time. No one other than the Agreement Steward has the right to modify this Agreement. The Eclipse Foundation is the initial Agreement Steward. The Eclipse Foundation may assign the responsibility to serve as the Agreement Steward to a suitable separate entity. Each new version of the Agreement will be given a distinguishing version number. The Program (including Contributions) may always be distributed subject to the version of the Agreement under which it was received. In addition, after a new version of the Agreement is published, Contributor may elect to distribute the Program (including its Contributions) under the new version. Except as expressly stated in Sections 2(a) and 2(b) above, Recipient receives no rights or licenses to the intellectual property of any Contributor under this Agreement, whether expressly, by implication, estoppel or otherwise. All rights in the Program not expressly granted under this Agreement are reserved.

This Agreement is governed by the laws of the State of New York and the intellectual property laws of the United States of America. No party to this Agreement will bring a legal action under this Agreement more than one year after the cause of action arose. Each party waives its rights to a jury trial in any resulting litigation.

clucy-0.4.0/README.md000066400000000000000000000045761234676535000141400ustar00rootroot00000000000000Clucy ===== [![Build Status](https://secure.travis-ci.org/weavejester/clucy.png?branch=master)](http://travis-ci.org/weavejester/clucy) Clucy is a Clojure interface to [Lucene](http://lucene.apache.org/). Installation ------------ To install Clucy, add the following dependency to your `project.clj` file: [clucy "0.4.0"] Usage ----- To use Clucy, first require it: (ns example (:require [clucy.core :as clucy])) Then create an index. You can use `(memory-index)`, which stores the search index in RAM, or `(disk-index "/path/to/a-folder")`, which stores the index in a folder on disk. (def index (clucy/memory-index)) Next, add Clojure maps to the index: (clucy/add index {:name "Bob", :job "Builder"} {:name "Donald", :job "Computer Scientist"}) You can remove maps just as easily: (clucy/delete index {:name "Bob", :job "Builder"}) Once maps have been added, the index can be searched: user=> (clucy/search index "bob" 10) ({:name "Bob", :job "Builder"}) user=> (clucy/search index "scientist" 10) ({:name "Donald", :job "Computer Scientist"}) You can search and remove all in one step. To remove all of the scientists... (clucy/search-and-delete index "job:scientist") Storing Fields -------------- By default all fields in a map are stored and indexed. If you would like more fine-grained control over which fields are stored and index, add this to the meta-data for your map. (with-meta {:name "Stever", :job "Writer", :phone "555-212-0202"} {:phone {:stored false}}) When the map above is saved to the index, the phone field will be available for searching but will not be part of map in the search results. This example is pretty contrived, this makes more sense when you are indexing something large (like the full text of a long article) and you don't want to pay the price of storing the entire text in the index. Default Search Field -------------------- A field called "\_content" that contains all of the map's values is stored in the index for each map (excluding fields with {:stored false} in the map's metadata). This provides a default field to run all searches against. Anytime you call the search function without providing a default search field "\_content" is used. This behavior can be disabled by binding *content* to false, you must then specify the default search field with every search invocation. clucy-0.4.0/project.clj000066400000000000000000000015451234676535000150120ustar00rootroot00000000000000(defproject clucy "0.4.0" :description "A Clojure interface to the Lucene search engine" :url "http://github/weavejester/clucy" :dependencies [[org.clojure/clojure "1.4.0"] [org.apache.lucene/lucene-core "4.2.0"] [org.apache.lucene/lucene-queryparser "4.2.0"] [org.apache.lucene/lucene-analyzers-common "4.2.0"] [org.apache.lucene/lucene-highlighter "4.2.0"]] :license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :profiles {:1.4 {:dependencies [[org.clojure/clojure "1.4.0"]]} :1.5 {:dependencies [[org.clojure/clojure "1.5.0"]]} :1.6 {:dependencies [[org.clojure/clojure "1.6.0-master-SNAPSHOT"]]}} :codox {:src-dir-uri "http://github/weavejester/clucy/blob/master" :src-linenum-anchor-prefix "L"}) clucy-0.4.0/src/000077500000000000000000000000001234676535000134345ustar00rootroot00000000000000clucy-0.4.0/src/clucy/000077500000000000000000000000001234676535000145535ustar00rootroot00000000000000clucy-0.4.0/src/clucy/core.clj000066400000000000000000000207331234676535000162020ustar00rootroot00000000000000(ns clucy.core (:import (java.io StringReader File) (org.apache.lucene.analysis Analyzer TokenStream) (org.apache.lucene.analysis.standard StandardAnalyzer) (org.apache.lucene.document Document Field Field$Index Field$Store) (org.apache.lucene.index IndexWriter IndexReader Term IndexWriterConfig DirectoryReader FieldInfo) (org.apache.lucene.queryparser.classic QueryParser) (org.apache.lucene.search BooleanClause BooleanClause$Occur BooleanQuery IndexSearcher Query ScoreDoc Scorer TermQuery) (org.apache.lucene.search.highlight Highlighter QueryScorer SimpleHTMLFormatter) (org.apache.lucene.util Version AttributeSource) (org.apache.lucene.store NIOFSDirectory RAMDirectory Directory))) (def ^{:dynamic true} *version* Version/LUCENE_CURRENT) (def ^{:dynamic true} *analyzer* (StandardAnalyzer. *version*)) ;; To avoid a dependency on either contrib or 1.2+ (defn as-str ^String [x] (if (keyword? x) (name x) (str x))) ;; flag to indicate a default "_content" field should be maintained (def ^{:dynamic true} *content* true) (defn memory-index "Create a new index in RAM." [] (RAMDirectory.)) (defn disk-index "Create a new index in a directory on disk." [^String dir-path] (NIOFSDirectory. (File. dir-path))) (defn- index-writer "Create an IndexWriter." ^IndexWriter [index] (IndexWriter. index (IndexWriterConfig. *version* *analyzer*))) (defn- index-reader "Create an IndexReader." ^IndexReader [index] (DirectoryReader/open ^Directory index)) (defn- add-field "Add a Field to a Document. Following options are allowed for meta-map: :stored - when false, then do not store the field value in the index. :indexed - when false, then do not index the field. :analyzed - when :indexed is enabled use this option to disable/eneble Analyzer for current field. :norms - when :indexed is enabled user this option to disable/enable the storing of norms." ([document key value] (add-field document key value {})) ([document key value meta-map] (.add ^Document document (Field. (as-str key) (as-str value) (if (false? (:stored meta-map)) Field$Store/NO Field$Store/YES) (if (false? (:indexed meta-map)) Field$Index/NO (case [(false? (:analyzed meta-map)) (false? (:norms meta-map))] [false false] Field$Index/ANALYZED [true false] Field$Index/NOT_ANALYZED [false true] Field$Index/ANALYZED_NO_NORMS [true true] Field$Index/NOT_ANALYZED_NO_NORMS)))))) (defn- map-stored "Returns a hash-map containing all of the values in the map that will be stored in the search index." [map-in] (merge {} (filter (complement nil?) (map (fn [item] (if (or (= nil (meta map-in)) (not= false (:stored ((first item) (meta map-in))))) item)) map-in)))) (defn- concat-values "Concatenate all the maps values being stored into a single string." [map-in] (apply str (interpose " " (vals (map-stored map-in))))) (defn- map->document "Create a Document from a map." [map] (let [document (Document.)] (doseq [[key value] map] (add-field document key value (key (meta map)))) (if *content* (add-field document :_content (concat-values map))) document)) (defn add "Add hash-maps to the search index." [index & maps] (with-open [writer (index-writer index)] (doseq [m maps] (.addDocument writer (map->document m))))) (defn delete "Deletes hash-maps from the search index." [index & maps] (with-open [writer (index-writer index)] (doseq [m maps] (let [query (BooleanQuery.)] (doseq [[key value] m] (.add query (BooleanClause. (TermQuery. (Term. (.toLowerCase (as-str key)) (.toLowerCase (as-str value)))) BooleanClause$Occur/MUST))) (.deleteDocuments writer query))))) (defn- document->map "Turn a Document object into a map." ([^Document document score] (document->map document score (constantly nil))) ([^Document document score highlighter] (let [m (into {} (for [^Field f (.getFields document)] [(keyword (.name f)) (.stringValue f)])) fragments (highlighter m) ; so that we can highlight :_content m (dissoc m :_content)] (with-meta m (-> (into {} (for [^Field f (.getFields document) :let [field-type (.fieldType f)]] [(keyword (.name f)) {:indexed (.indexed field-type) :stored (.stored field-type) :tokenized (.tokenized field-type)}])) (assoc :_fragments fragments :_score score) (dissoc :_content)))))) (defn- make-highlighter "Create a highlighter function which will take a map and return highlighted fragments." [^Query query ^IndexSearcher searcher config] (if config (let [indexReader (.getIndexReader searcher) scorer (QueryScorer. (.rewrite query indexReader)) config (merge {:field :_content :max-fragments 5 :separator "..." :pre "" :post ""} config) {:keys [field max-fragments separator fragments-key pre post]} config highlighter (Highlighter. (SimpleHTMLFormatter. pre post) scorer)] (fn [m] (let [str (field m) token-stream (.tokenStream ^Analyzer *analyzer* (name field) (StringReader. str))] (.getBestFragments ^Highlighter highlighter ^TokenStream token-stream ^String str (int max-fragments) ^String separator)))) (constantly nil))) (defn search "Search the supplied index with a query string." [index query max-results & {:keys [highlight default-field default-operator page results-per-page] :or {page 0 results-per-page max-results}}] (if (every? false? [default-field *content*]) (throw (Exception. "No default search field specified")) (with-open [reader (index-reader index)] (let [default-field (or default-field :_content) searcher (IndexSearcher. reader) parser (doto (QueryParser. *version* (as-str default-field) *analyzer*) (.setDefaultOperator (case (or default-operator :or) :and QueryParser/AND_OPERATOR :or QueryParser/OR_OPERATOR))) query (.parse parser query) hits (.search searcher query (int max-results)) highlighter (make-highlighter query searcher highlight) start (* page results-per-page) end (min (+ start results-per-page) (.totalHits hits))] (doall (with-meta (for [hit (map (partial aget (.scoreDocs hits)) (range start end))] (document->map (.doc ^IndexSearcher searcher (.doc ^ScoreDoc hit)) (.score ^ScoreDoc hit) highlighter)) {:_total-hits (.totalHits hits) :_max-score (.getMaxScore hits)})))))) (defn search-and-delete "Search the supplied index with a query string and then delete all of the results." ([index query] (if *content* (search-and-delete index query :_content) (throw (Exception. "No default search field specified")))) ([index query default-field] (with-open [writer (index-writer index)] (let [parser (QueryParser. *version* (as-str default-field) *analyzer*) query (.parse parser query)] (.deleteDocuments writer query))))) clucy-0.4.0/test/000077500000000000000000000000001234676535000136245ustar00rootroot00000000000000clucy-0.4.0/test/clucy/000077500000000000000000000000001234676535000147435ustar00rootroot00000000000000clucy-0.4.0/test/clucy/test/000077500000000000000000000000001234676535000157225ustar00rootroot00000000000000clucy-0.4.0/test/clucy/test/core.clj000066400000000000000000000052261234676535000173510ustar00rootroot00000000000000(ns clucy.test.core (:use clucy.core clojure.test [clojure.set :only [intersection]])) (def people [{:name "Miles" :age 36} {:name "Emily" :age 0.3} {:name "Joanna" :age 34} {:name "Melinda" :age 34} {:name "Mary" :age 48} {:name "Mary Lou" :age 39}]) (deftest core (testing "memory-index fn" (let [index (memory-index)] (is (not (nil? index))))) (testing "disk-index fn" (let [index (disk-index "/tmp/test-index")] (is (not (nil? index))))) (testing "add fn" (let [index (memory-index)] (doseq [person people] (add index person)) (is (== 1 (count (search index "name:miles" 10)))))) (testing "delete fn" (let [index (memory-index)] (doseq [person people] (add index person)) (delete index (first people)) (is (== 0 (count (search index "name:miles" 10)))))) (testing "search fn" (let [index (memory-index)] (doseq [person people] (add index person)) (is (== 1 (count (search index "name:miles" 10)))) (is (== 1 (count (search index "name:miles age:100" 10)))) (is (== 0 (count (search index "name:miles AND age:100" 10)))) (is (== 0 (count (search index "name:miles age:100" 10 :default-operator :and)))))) (testing "search-and-delete fn" (let [index (memory-index)] (doseq [person people] (add index person)) (search-and-delete index "name:mary") (is (== 0 (count (search index "name:mary" 10)))))) (testing "search fn with highlighting" (let [index (memory-index) config {:field :name}] (doseq [person people] (add index person)) (is (= (map #(-> % meta :_fragments) (search index "name:mary" 10 :highlight config)) ["Mary" "Mary Lou"])))) (testing "search fn returns scores in metadata" (let [index (memory-index) _ (doseq [person people] (add index person)) results (search index "name:mary" 10)] (is (true? (every? pos? (map (comp :_score meta) results)))) (is (= 2 (:_total-hits (meta results)))) (is (pos? (:_max-score (meta results)))) (is (= (count people) (:_total-hits (meta (search index "*:*" 2))))))) (testing "pagination" (let [index (memory-index)] (doseq [person people] (add index person)) (is (== 3 (count (search index "m*" 10 :page 0 :results-per-page 3)))) (is (== 1 (count (search index "m*" 10 :page 1 :results-per-page 3)))) (is (empty? (intersection (set (search index "m*" 10 :page 0 :results-per-page 3)) (set (search index "m*" 10 :page 1 :results-per-page 3))))))))