Lucene++ - a full-featured, c++ search engine
API Documentation


Loading...
Searching...
No Matches
Public Member Functions | Static Public Member Functions | Protected Member Functions | Protected Attributes
Lucene::FuzzyTermEnum Class Reference

Subclass of FilteredTermEnum for enumerating all terms that are similar to the specified filter term. More...

#include <FuzzyTermEnum.h>

+ Inheritance diagram for Lucene::FuzzyTermEnum:

Public Member Functions

 FuzzyTermEnum (const IndexReaderPtr &reader, const TermPtr &term, double minSimilarity, int32_t prefixLength)
 Constructor for enumeration of all terms from specified reader which share a prefix of length prefixLength with term and which have a fuzzy similarity > minSimilarity.
 
 FuzzyTermEnum (const IndexReaderPtr &reader, const TermPtr &term, double minSimilarity)
 
 FuzzyTermEnum (const IndexReaderPtr &reader, const TermPtr &term)
 
virtual ~FuzzyTermEnum ()
 
virtual String getClassName ()
 
boost::shared_ptr< FuzzyTermEnumshared_from_this ()
 
virtual double difference ()
 Equality measure on the term.
 
virtual bool endEnum ()
 Indicates the end of the enumeration has been reached.
 
virtual void close ()
 Closes the enumeration to further activity, freeing resources.
 
- Public Member Functions inherited from Lucene::FilteredTermEnum
virtual ~FilteredTermEnum ()
 
boost::shared_ptr< FilteredTermEnumshared_from_this ()
 
virtual int32_t docFreq ()
 Returns the docFreq of the current Term in the enumeration. Returns -1 if no Term matches or all terms have been enumerated.
 
virtual bool next ()
 Increments the enumeration to the next element. True if one exists.
 
virtual TermPtr term ()
 Returns the current Term in the enumeration. Returns null if no Term matches or all terms have been enumerated.
 
- Public Member Functions inherited from Lucene::TermEnum
virtual ~TermEnum ()
 
boost::shared_ptr< TermEnumshared_from_this ()
 
- Public Member Functions inherited from Lucene::LuceneObject
virtual ~LuceneObject ()
 
virtual void initialize ()
 Called directly after instantiation to create objects that depend on this object being fully constructed.
 
virtual LuceneObjectPtr clone (const LuceneObjectPtr &other=LuceneObjectPtr())
 Return clone of this object.
 
virtual int32_t hashCode ()
 Return hash code for this object.
 
virtual bool equals (const LuceneObjectPtr &other)
 Return whether two objects are equal.
 
virtual int32_t compareTo (const LuceneObjectPtr &other)
 Compare two objects.
 
virtual String toString ()
 Returns a string representation of the object.
 
- Public Member Functions inherited from Lucene::LuceneSync
virtual ~LuceneSync ()
 
virtual SynchronizePtr getSync ()
 Return this object synchronize lock.
 
virtual LuceneSignalPtr getSignal ()
 Return this object signal.
 
virtual void lock (int32_t timeout=0)
 Lock this object using an optional timeout.
 
virtual void unlock ()
 Unlock this object.
 
virtual bool holdsLock ()
 Returns true if this object is currently locked by current thread.
 
virtual void wait (int32_t timeout=0)
 Wait for signal using an optional timeout.
 
virtual void notifyAll ()
 Notify all threads waiting for signal.
 

Static Public Member Functions

static String _getClassName ()
 
- Static Public Member Functions inherited from Lucene::FilteredTermEnum
static String _getClassName ()
 
- Static Public Member Functions inherited from Lucene::TermEnum
static String _getClassName ()
 

Protected Member Functions

void ConstructTermEnum (const IndexReaderPtr &reader, const TermPtr &term, double minSimilarity, int32_t prefixLength)
 
virtual bool termCompare (const TermPtr &term)
 The termCompare method in FuzzyTermEnum uses Levenshtein distance to calculate the distance between the given term and the comparing term.
 
double similarity (const String &target)
 Compute Levenshtein distance.
 
int32_t calculateMaxDistance (int32_t m)
 The max Distance is the maximum Levenshtein distance for the text compared to some other value that results in score that is better than the minimum similarity.
 
- Protected Member Functions inherited from Lucene::FilteredTermEnum
virtual void setEnum (const TermEnumPtr &actualEnum)
 Use this method to set the actual TermEnum (eg. in ctor), it will be automatically positioned on the first matching term.
 
- Protected Member Functions inherited from Lucene::LuceneObject
 LuceneObject ()
 

Protected Attributes

Collection< int32_t > p
 Allows us save time required to create a new array every time similarity is called.
 
Collection< int32_t > d
 
double _similarity
 
bool _endEnum
 
TermPtr searchTerm
 
String field
 
String text
 
String prefix
 
double minimumSimilarity
 
double scale_factor
 
- Protected Attributes inherited from Lucene::FilteredTermEnum
TermPtr currentTerm
 The current term.
 
TermEnumPtr actualEnum
 The delegate enum - to set this member use setEnum.
 
- Protected Attributes inherited from Lucene::LuceneSync
SynchronizePtr objectLock
 
LuceneSignalPtr objectSignal
 

Detailed Description

Subclass of FilteredTermEnum for enumerating all terms that are similar to the specified filter term.

Term enumerations are always ordered by Term.compareTo(). Each term in the enumeration is greater than all that precede it.

Constructor & Destructor Documentation

◆ FuzzyTermEnum() [1/3]

Lucene::FuzzyTermEnum::FuzzyTermEnum ( const IndexReaderPtr reader,
const TermPtr term,
double  minSimilarity,
int32_t  prefixLength 
)

Constructor for enumeration of all terms from specified reader which share a prefix of length prefixLength with term and which have a fuzzy similarity > minSimilarity.

After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.

Parameters
readerDelivers terms.
termPattern term.
minSimilarityMinimum required similarity for terms from the reader. Default value is 0.5.
prefixLengthLength of required common prefix. Default value is 0.

◆ FuzzyTermEnum() [2/3]

Lucene::FuzzyTermEnum::FuzzyTermEnum ( const IndexReaderPtr reader,
const TermPtr term,
double  minSimilarity 
)

◆ FuzzyTermEnum() [3/3]

Lucene::FuzzyTermEnum::FuzzyTermEnum ( const IndexReaderPtr reader,
const TermPtr term 
)

◆ ~FuzzyTermEnum()

virtual Lucene::FuzzyTermEnum::~FuzzyTermEnum ( )
virtual

Member Function Documentation

◆ _getClassName()

static String Lucene::FuzzyTermEnum::_getClassName ( )
inlinestatic

◆ calculateMaxDistance()

int32_t Lucene::FuzzyTermEnum::calculateMaxDistance ( int32_t  m)
protected

The max Distance is the maximum Levenshtein distance for the text compared to some other value that results in score that is better than the minimum similarity.

Parameters
mThe length of the "other value"
Returns
The maximum Levenshtein distance that we care about

◆ close()

virtual void Lucene::FuzzyTermEnum::close ( )
virtual

Closes the enumeration to further activity, freeing resources.

Reimplemented from Lucene::FilteredTermEnum.

◆ ConstructTermEnum()

void Lucene::FuzzyTermEnum::ConstructTermEnum ( const IndexReaderPtr reader,
const TermPtr term,
double  minSimilarity,
int32_t  prefixLength 
)
protected

◆ difference()

virtual double Lucene::FuzzyTermEnum::difference ( )
virtual

Equality measure on the term.

Implements Lucene::FilteredTermEnum.

◆ endEnum()

virtual bool Lucene::FuzzyTermEnum::endEnum ( )
virtual

Indicates the end of the enumeration has been reached.

Implements Lucene::FilteredTermEnum.

◆ getClassName()

virtual String Lucene::FuzzyTermEnum::getClassName ( )
inlinevirtual

Reimplemented from Lucene::FilteredTermEnum.

◆ shared_from_this()

boost::shared_ptr< FuzzyTermEnum > Lucene::FuzzyTermEnum::shared_from_this ( )
inline

◆ similarity()

double Lucene::FuzzyTermEnum::similarity ( const String &  target)
protected

Compute Levenshtein distance.

Similarity returns a number that is 1.0f or less (including negative numbers) based on how similar the Term is compared to a target term. It returns exactly 0.0 when

editDistance > maximumEditDistance

Otherwise it returns:

1 - (editDistance / length)

where length is the length of the shortest term (text or target) including a prefix that are identical and editDistance is the Levenshtein distance for the two words.

Embedded within this algorithm is a fail-fast Levenshtein distance algorithm. The fail-fast algorithm differs from the standard Levenshtein distance algorithm in that it is aborted if it is discovered that the minimum distance between the words is greater than some threshold.

To calculate the maximum distance threshold we use the following formula:

(1 - minimumSimilarity) * length

where length is the shortest term including any prefix that is not part of the similarity comparison. This formula was derived by solving for what maximum value of distance returns false for the following statements:

similarity = 1 - ((double)distance / (double)(prefixLength + std::min(textlen, targetlen)));
return (similarity > minimumSimilarity);

where distance is the Levenshtein distance for the two words.

Levenshtein distance (also known as edit distance) is a measure of similarity between two strings where the distance is measured as the number of character deletions, insertions or substitutions required to transform one string to the other string.

Parameters
targetThe target word or phrase.
Returns
the similarity, 0.0 or less indicates that it matches less than the required threshold and 1.0 indicates that the text and target are identical.

◆ termCompare()

virtual bool Lucene::FuzzyTermEnum::termCompare ( const TermPtr term)
protectedvirtual

The termCompare method in FuzzyTermEnum uses Levenshtein distance to calculate the distance between the given term and the comparing term.

Implements Lucene::FilteredTermEnum.

Field Documentation

◆ _endEnum

bool Lucene::FuzzyTermEnum::_endEnum
protected

◆ _similarity

double Lucene::FuzzyTermEnum::_similarity
protected

◆ d

Collection<int32_t> Lucene::FuzzyTermEnum::d
protected

◆ field

String Lucene::FuzzyTermEnum::field
protected

◆ minimumSimilarity

double Lucene::FuzzyTermEnum::minimumSimilarity
protected

◆ p

Collection<int32_t> Lucene::FuzzyTermEnum::p
protected

Allows us save time required to create a new array every time similarity is called.

◆ prefix

String Lucene::FuzzyTermEnum::prefix
protected

◆ scale_factor

double Lucene::FuzzyTermEnum::scale_factor
protected

◆ searchTerm

TermPtr Lucene::FuzzyTermEnum::searchTerm
protected

◆ text

String Lucene::FuzzyTermEnum::text
protected

The documentation for this class was generated from the following file:

clucene.sourceforge.net