Lucene++ - a full-featured, c++ search engine
API Documentation
Subclass of FilteredTermEnum for enumerating all terms that are similar to the specified filter term. More...
#include <FuzzyTermEnum.h>
Public Member Functions | |
FuzzyTermEnum (const IndexReaderPtr &reader, const TermPtr &term, double minSimilarity, int32_t prefixLength) | |
Constructor for enumeration of all terms from specified reader which share a prefix of length prefixLength with term and which have a fuzzy similarity > minSimilarity. | |
FuzzyTermEnum (const IndexReaderPtr &reader, const TermPtr &term, double minSimilarity) | |
FuzzyTermEnum (const IndexReaderPtr &reader, const TermPtr &term) | |
virtual | ~FuzzyTermEnum () |
virtual String | getClassName () |
boost::shared_ptr< FuzzyTermEnum > | shared_from_this () |
virtual double | difference () |
Equality measure on the term. | |
virtual bool | endEnum () |
Indicates the end of the enumeration has been reached. | |
virtual void | close () |
Closes the enumeration to further activity, freeing resources. | |
![]() | |
virtual | ~FilteredTermEnum () |
boost::shared_ptr< FilteredTermEnum > | shared_from_this () |
virtual int32_t | docFreq () |
Returns the docFreq of the current Term in the enumeration. Returns -1 if no Term matches or all terms have been enumerated. | |
virtual bool | next () |
Increments the enumeration to the next element. True if one exists. | |
virtual TermPtr | term () |
Returns the current Term in the enumeration. Returns null if no Term matches or all terms have been enumerated. | |
![]() | |
virtual | ~TermEnum () |
boost::shared_ptr< TermEnum > | shared_from_this () |
![]() | |
virtual | ~LuceneObject () |
virtual void | initialize () |
Called directly after instantiation to create objects that depend on this object being fully constructed. | |
virtual LuceneObjectPtr | clone (const LuceneObjectPtr &other=LuceneObjectPtr()) |
Return clone of this object. | |
virtual int32_t | hashCode () |
Return hash code for this object. | |
virtual bool | equals (const LuceneObjectPtr &other) |
Return whether two objects are equal. | |
virtual int32_t | compareTo (const LuceneObjectPtr &other) |
Compare two objects. | |
virtual String | toString () |
Returns a string representation of the object. | |
![]() | |
virtual | ~LuceneSync () |
virtual SynchronizePtr | getSync () |
Return this object synchronize lock. | |
virtual LuceneSignalPtr | getSignal () |
Return this object signal. | |
virtual void | lock (int32_t timeout=0) |
Lock this object using an optional timeout. | |
virtual void | unlock () |
Unlock this object. | |
virtual bool | holdsLock () |
Returns true if this object is currently locked by current thread. | |
virtual void | wait (int32_t timeout=0) |
Wait for signal using an optional timeout. | |
virtual void | notifyAll () |
Notify all threads waiting for signal. | |
Static Public Member Functions | |
static String | _getClassName () |
![]() | |
static String | _getClassName () |
![]() | |
static String | _getClassName () |
Protected Member Functions | |
void | ConstructTermEnum (const IndexReaderPtr &reader, const TermPtr &term, double minSimilarity, int32_t prefixLength) |
virtual bool | termCompare (const TermPtr &term) |
The termCompare method in FuzzyTermEnum uses Levenshtein distance to calculate the distance between the given term and the comparing term. | |
double | similarity (const String &target) |
Compute Levenshtein distance. | |
int32_t | calculateMaxDistance (int32_t m) |
The max Distance is the maximum Levenshtein distance for the text compared to some other value that results in score that is better than the minimum similarity. | |
![]() | |
virtual void | setEnum (const TermEnumPtr &actualEnum) |
Use this method to set the actual TermEnum (eg. in ctor), it will be automatically positioned on the first matching term. | |
![]() | |
LuceneObject () | |
Protected Attributes | |
Collection< int32_t > | p |
Allows us save time required to create a new array every time similarity is called. | |
Collection< int32_t > | d |
double | _similarity |
bool | _endEnum |
TermPtr | searchTerm |
String | field |
String | text |
String | prefix |
double | minimumSimilarity |
double | scale_factor |
![]() | |
TermPtr | currentTerm |
The current term. | |
TermEnumPtr | actualEnum |
The delegate enum - to set this member use setEnum . | |
![]() | |
SynchronizePtr | objectLock |
LuceneSignalPtr | objectSignal |
Subclass of FilteredTermEnum for enumerating all terms that are similar to the specified filter term.
Term enumerations are always ordered by Term.compareTo(). Each term in the enumeration is greater than all that precede it.
Lucene::FuzzyTermEnum::FuzzyTermEnum | ( | const IndexReaderPtr & | reader, |
const TermPtr & | term, | ||
double | minSimilarity, | ||
int32_t | prefixLength | ||
) |
Constructor for enumeration of all terms from specified reader which share a prefix of length prefixLength with term and which have a fuzzy similarity > minSimilarity.
After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.
reader | Delivers terms. |
term | Pattern term. |
minSimilarity | Minimum required similarity for terms from the reader. Default value is 0.5. |
prefixLength | Length of required common prefix. Default value is 0. |
Lucene::FuzzyTermEnum::FuzzyTermEnum | ( | const IndexReaderPtr & | reader, |
const TermPtr & | term, | ||
double | minSimilarity | ||
) |
Lucene::FuzzyTermEnum::FuzzyTermEnum | ( | const IndexReaderPtr & | reader, |
const TermPtr & | term | ||
) |
|
virtual |
|
inlinestatic |
|
protected |
The max Distance is the maximum Levenshtein distance for the text compared to some other value that results in score that is better than the minimum similarity.
m | The length of the "other value" |
|
virtual |
Closes the enumeration to further activity, freeing resources.
Reimplemented from Lucene::FilteredTermEnum.
|
protected |
|
virtual |
Equality measure on the term.
Implements Lucene::FilteredTermEnum.
|
virtual |
Indicates the end of the enumeration has been reached.
Implements Lucene::FilteredTermEnum.
|
inlinevirtual |
Reimplemented from Lucene::FilteredTermEnum.
|
inline |
|
protected |
Compute Levenshtein distance.
Similarity returns a number that is 1.0f or less (including negative numbers) based on how similar the Term is compared to a target term. It returns exactly 0.0 when
editDistance > maximumEditDistance
Otherwise it returns:
1 - (editDistance / length)
where length is the length of the shortest term (text or target) including a prefix that are identical and editDistance is the Levenshtein distance for the two words.
Embedded within this algorithm is a fail-fast Levenshtein distance algorithm. The fail-fast algorithm differs from the standard Levenshtein distance algorithm in that it is aborted if it is discovered that the minimum distance between the words is greater than some threshold.
To calculate the maximum distance threshold we use the following formula:
(1 - minimumSimilarity) * length
where length is the shortest term including any prefix that is not part of the similarity comparison. This formula was derived by solving for what maximum value of distance returns false for the following statements:
similarity = 1 - ((double)distance / (double)(prefixLength + std::min(textlen, targetlen))); return (similarity > minimumSimilarity);
where distance is the Levenshtein distance for the two words.
Levenshtein distance (also known as edit distance) is a measure of similarity between two strings where the distance is measured as the number of character deletions, insertions or substitutions required to transform one string to the other string.
target | The target word or phrase. |
|
protectedvirtual |
The termCompare method in FuzzyTermEnum uses Levenshtein distance to calculate the distance between the given term and the comparing term.
Implements Lucene::FilteredTermEnum.
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
Allows us save time required to create a new array every time similarity is called.
|
protected |
|
protected |
|
protected |
|
protected |