Lucene++ - a full-featured, c++ search engine
API Documentation


Loading...
Searching...
No Matches
Public Member Functions | Static Public Member Functions | Protected Attributes
Lucene::FieldCacheTermsFilter Class Reference

A Filter that only accepts documents whose single term value in the specified field is contained in the provided set of allowed terms. More...

#include <FieldCacheTermsFilter.h>

+ Inheritance diagram for Lucene::FieldCacheTermsFilter:

Public Member Functions

 FieldCacheTermsFilter (const String &field, Collection< String > terms)
 
virtual ~FieldCacheTermsFilter ()
 
virtual String getClassName ()
 
boost::shared_ptr< FieldCacheTermsFiltershared_from_this ()
 
FieldCachePtr getFieldCache ()
 
virtual DocIdSetPtr getDocIdSet (const IndexReaderPtr &reader)
 Creates a DocIdSet enumerating the documents that should be permitted in search results.
 
- Public Member Functions inherited from Lucene::Filter
virtual ~Filter ()
 
boost::shared_ptr< Filtershared_from_this ()
 
- Public Member Functions inherited from Lucene::LuceneObject
virtual ~LuceneObject ()
 
virtual void initialize ()
 Called directly after instantiation to create objects that depend on this object being fully constructed.
 
virtual LuceneObjectPtr clone (const LuceneObjectPtr &other=LuceneObjectPtr())
 Return clone of this object.
 
virtual int32_t hashCode ()
 Return hash code for this object.
 
virtual bool equals (const LuceneObjectPtr &other)
 Return whether two objects are equal.
 
virtual int32_t compareTo (const LuceneObjectPtr &other)
 Compare two objects.
 
virtual String toString ()
 Returns a string representation of the object.
 
- Public Member Functions inherited from Lucene::LuceneSync
virtual ~LuceneSync ()
 
virtual SynchronizePtr getSync ()
 Return this object synchronize lock.
 
virtual LuceneSignalPtr getSignal ()
 Return this object signal.
 
virtual void lock (int32_t timeout=0)
 Lock this object using an optional timeout.
 
virtual void unlock ()
 Unlock this object.
 
virtual bool holdsLock ()
 Returns true if this object is currently locked by current thread.
 
virtual void wait (int32_t timeout=0)
 Wait for signal using an optional timeout.
 
virtual void notifyAll ()
 Notify all threads waiting for signal.
 

Static Public Member Functions

static String _getClassName ()
 
- Static Public Member Functions inherited from Lucene::Filter
static String _getClassName ()
 

Protected Attributes

String field
 
Collection< String > terms
 
- Protected Attributes inherited from Lucene::LuceneSync
SynchronizePtr objectLock
 
LuceneSignalPtr objectSignal
 

Additional Inherited Members

- Protected Member Functions inherited from Lucene::LuceneObject
 LuceneObject ()
 

Detailed Description

A Filter that only accepts documents whose single term value in the specified field is contained in the provided set of allowed terms.

This is the same functionality as TermsFilter (from contrib/queries), except this filter requires that the field contains only a single term for all documents. Because of drastically different implementations, they also have different performance characteristics, as described below.

The first invocation of this filter on a given field will be slower, since a StringIndex must be created. Subsequent invocations using the same field will re-use this cache. However, as with all functionality based on FieldCache, persistent RAM is consumed to hold the cache, and is not freed until the IndexReader is closed. In contrast, TermsFilter has no persistent RAM consumption.

With each search, this filter translates the specified set of Terms into a private OpenBitSet keyed by term number per unique IndexReader (normally one reader per segment). Then, during matching, the term number for each docID is retrieved from the cache and then checked for inclusion using the OpenBitSet. Since all testing is done using RAM resident data structures, performance should be very fast, most likely fast enough to not require further caching of the DocIdSet for each possible combination of terms. However, because docIDs are simply scanned linearly, an index with a great many small documents may find this linear scan too costly.

In contrast, TermsFilter builds up an OpenBitSet, keyed by docID, every time it's created, by enumerating through all matching docs using TermDocs to seek and scan through each term's docID list. While there is no linear scan of all docIDs, besides the allocation of the underlying array in the OpenBitSet, this approach requires a number of "disk seeks" in proportion to the number of terms, which can be exceptionally costly when there are cache misses in the OS's IO cache.

Generally, this filter will be slower on the first invocation for a given field, but subsequent invocations, even if you change the allowed set of Terms, should be faster than TermsFilter, especially as the number of Terms being matched increases. If you are matching only a very small number of terms, and those terms in turn match a very small number of documents, TermsFilter may perform faster.

Which filter is best is very application dependent.

Constructor & Destructor Documentation

◆ FieldCacheTermsFilter()

Lucene::FieldCacheTermsFilter::FieldCacheTermsFilter ( const String &  field,
Collection< String >  terms 
)

◆ ~FieldCacheTermsFilter()

virtual Lucene::FieldCacheTermsFilter::~FieldCacheTermsFilter ( )
virtual

Member Function Documentation

◆ _getClassName()

static String Lucene::FieldCacheTermsFilter::_getClassName ( )
inlinestatic

◆ getClassName()

virtual String Lucene::FieldCacheTermsFilter::getClassName ( )
inlinevirtual

Reimplemented from Lucene::Filter.

◆ getDocIdSet()

virtual DocIdSetPtr Lucene::FieldCacheTermsFilter::getDocIdSet ( const IndexReaderPtr reader)
virtual

Creates a DocIdSet enumerating the documents that should be permitted in search results.

Note: null can be returned if no documents are accepted by this Filter.

Note: This method will be called once per segment in the index during searching. The returned DocIdSet must refer to document IDs for that segment, not for the top-level reader.

Parameters
readera IndexReader instance opened on the index currently searched on. Note, it is likely that the provided reader does not represent the whole underlying index ie. if the index has more than one segment the given reader only represents a single segment.
Returns
a DocIdSet that provides the documents which should be permitted or prohibited in search results. NOTE: null can be returned if no documents will be accepted by this Filter.
See also
DocIdBitSet

Implements Lucene::Filter.

◆ getFieldCache()

FieldCachePtr Lucene::FieldCacheTermsFilter::getFieldCache ( )

◆ shared_from_this()

boost::shared_ptr< FieldCacheTermsFilter > Lucene::FieldCacheTermsFilter::shared_from_this ( )
inline

Field Documentation

◆ field

String Lucene::FieldCacheTermsFilter::field
protected

◆ terms

Collection<String> Lucene::FieldCacheTermsFilter::terms
protected

The documentation for this class was generated from the following file:

clucene.sourceforge.net