Lucene++ - a full-featured, c++ search engine
API Documentation
BooleanScorer uses a ~16k array to score windows of docs. So it scores docs 0-16k first, then docs 16-32k, etc. For each window it iterates through all query terms and accumulates a score in table[doc%16k]. It also stores in the table a bitmask representing which terms contributed to the score. Non-zero scores are chained in a linked list. At the end of scoring each window it then iterates through the linked list and, if the bitmask matches the boolean constraints, collects a hit. For boolean queries with lots of frequent terms this can be much faster, since it does not need to update a priority queue for each posting, instead performing constant-time operations per posting. The only downside is that it results in hits being delivered out-of-order within the window, which means it cannot be nested within other scorers. But it works well as a top-level scorer. More...
#include <BooleanScorer.h>
Public Member Functions | |
BooleanScorer (const SimilarityPtr &similarity, int32_t minNrShouldMatch, Collection< ScorerPtr > optionalScorers, Collection< ScorerPtr > prohibitedScorers) | |
virtual | ~BooleanScorer () |
virtual String | getClassName () |
boost::shared_ptr< BooleanScorer > | shared_from_this () |
virtual int32_t | advance (int32_t target) |
Advances to the first beyond the current whose document number is greater than or equal to target. Returns the current document number or NO_MORE_DOCS if there are no more docs in the set. | |
virtual int32_t | docID () |
Returns the following: | |
virtual int32_t | nextDoc () |
Advances to the next document in the set and returns the doc it is currently on, or NO_MORE_DOCS if there are no more docs in the set. | |
virtual double | score () |
Returns the score of the current document matching the query. Initially invalid, until nextDoc() or advance(int32_t) is called the first time, or when called from within Collector#collect . | |
virtual void | score (const CollectorPtr &collector) |
Scores and collects all matching documents. | |
virtual String | toString () |
Returns a string representation of the object. | |
![]() | |
Scorer (const SimilarityPtr &similarity) | |
Constructs a Scorer. | |
Scorer (const WeightPtr &weight) | |
virtual | ~Scorer () |
boost::shared_ptr< Scorer > | shared_from_this () |
SimilarityPtr | getSimilarity () |
Returns the Similarity implementation used by this scorer. | |
void | visitSubScorers (QueryPtr parent, BooleanClause::Occur relationship, ScorerVisitor *visitor) |
void | visitScorers (ScorerVisitor *visitor) |
virtual float | termFreq () |
![]() | |
virtual | ~DocIdSetIterator () |
boost::shared_ptr< DocIdSetIterator > | shared_from_this () |
![]() | |
virtual | ~LuceneObject () |
virtual void | initialize () |
Called directly after instantiation to create objects that depend on this object being fully constructed. | |
virtual LuceneObjectPtr | clone (const LuceneObjectPtr &other=LuceneObjectPtr()) |
Return clone of this object. | |
virtual int32_t | hashCode () |
Return hash code for this object. | |
virtual bool | equals (const LuceneObjectPtr &other) |
Return whether two objects are equal. | |
virtual int32_t | compareTo (const LuceneObjectPtr &other) |
Compare two objects. | |
![]() | |
virtual | ~LuceneSync () |
virtual SynchronizePtr | getSync () |
Return this object synchronize lock. | |
virtual LuceneSignalPtr | getSignal () |
Return this object signal. | |
virtual void | lock (int32_t timeout=0) |
Lock this object using an optional timeout. | |
virtual void | unlock () |
Unlock this object. | |
virtual bool | holdsLock () |
Returns true if this object is currently locked by current thread. | |
virtual void | wait (int32_t timeout=0) |
Wait for signal using an optional timeout. | |
virtual void | notifyAll () |
Notify all threads waiting for signal. | |
Static Public Member Functions | |
static String | _getClassName () |
![]() | |
static String | _getClassName () |
![]() | |
static String | _getClassName () |
Protected Member Functions | |
virtual bool | score (const CollectorPtr &collector, int32_t max, int32_t firstDocID) |
Collects matching documents in a range. Hook for optimization. Note, firstDocID is added to ensure that nextDoc() was called before this method. | |
![]() | |
LuceneObject () | |
Protected Attributes | |
SubScorerPtr | scorers |
BucketTablePtr | bucketTable |
int32_t | maxCoord |
Collection< double > | coordFactors |
int32_t | requiredMask |
int32_t | prohibitedMask |
int32_t | nextMask |
int32_t | minNrShouldMatch |
int32_t | end |
BucketPtr | current |
Bucket * | __current = nullptr |
int32_t | doc |
![]() | |
SimilarityPtr | similarity |
![]() | |
SynchronizePtr | objectLock |
LuceneSignalPtr | objectSignal |
Additional Inherited Members | |
![]() | |
WeightPtr | weight |
![]() | |
static const int32_t | NO_MORE_DOCS |
When returned by nextDoc() , advance(int) and docID() it means there are no more docs in the iterator. | |
BooleanScorer uses a ~16k array to score windows of docs. So it scores docs 0-16k first, then docs 16-32k, etc. For each window it iterates through all query terms and accumulates a score in table[doc%16k]. It also stores in the table a bitmask representing which terms contributed to the score. Non-zero scores are chained in a linked list. At the end of scoring each window it then iterates through the linked list and, if the bitmask matches the boolean constraints, collects a hit. For boolean queries with lots of frequent terms this can be much faster, since it does not need to update a priority queue for each posting, instead performing constant-time operations per posting. The only downside is that it results in hits being delivered out-of-order within the window, which means it cannot be nested within other scorers. But it works well as a top-level scorer.
The new BooleanScorer2 implementation instead works by merging priority queues of postings, albeit with some clever tricks. For example, a pure conjunction (all terms required) does not require a priority queue. Instead it sorts the posting streams at the start, then repeatedly skips the first to to the last. If the first ever equals the last, then there's a hit. When some terms are required and some terms are optional, the conjunction can be evaluated first, then the optional terms can all skip to the match and be added to the score. Thus the conjunction can reduce the number of priority queue updates for the optional terms.
Lucene::BooleanScorer::BooleanScorer | ( | const SimilarityPtr & | similarity, |
int32_t | minNrShouldMatch, | ||
Collection< ScorerPtr > | optionalScorers, | ||
Collection< ScorerPtr > | prohibitedScorers | ||
) |
|
virtual |
|
inlinestatic |
|
virtual |
Advances to the first beyond the current whose document number is greater than or equal to target. Returns the current document number or NO_MORE_DOCS
if there are no more docs in the set.
Behaves as if written:
int32_t advance(int32_t target) { int32_t doc; while ((doc = nextDoc()) < target) { } return doc; }
Some implementations are considerably more efficient than that.
NOTE: certain implementations may return a different value (each time) if called several times in a row with the same target.
NOTE: this method may be called with {@value NO_MORE_DOCS} for efficiency by some Scorers. If your implementation cannot efficiently determine that it should exhaust, it is recommended that you check for that value in each call to this method.
NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behaviour.
Implements Lucene::DocIdSetIterator.
|
virtual |
Returns the following:
NO_MORE_DOCS
if nextDoc()
or advance(int)
were not called yet. NO_MORE_DOCS
if the iterator has exhausted. Implements Lucene::DocIdSetIterator.
|
inlinevirtual |
Reimplemented from Lucene::Scorer.
|
virtual |
Advances to the next document in the set and returns the doc it is currently on, or NO_MORE_DOCS
if there are no more docs in the set.
NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behaviour.
Implements Lucene::DocIdSetIterator.
|
virtual |
Returns the score of the current document matching the query. Initially invalid, until nextDoc()
or advance(int32_t)
is called the first time, or when called from within Collector#collect
.
Implements Lucene::Scorer.
|
virtual |
Scores and collects all matching documents.
collector | The collector to which all matching documents are passed. |
Reimplemented from Lucene::Scorer.
|
protectedvirtual |
Collects matching documents in a range. Hook for optimization. Note, firstDocID is added to ensure that nextDoc()
was called before this method.
collector | The collector to which all matching documents are passed. |
max | Do not score documents past this. |
firstDocID | The first document ID (ensures nextDoc() is called before this method. |
Reimplemented from Lucene::Scorer.
|
inline |
|
virtual |
Returns a string representation of the object.
Reimplemented from Lucene::LuceneObject.
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |