Lucene++ - a full-featured, c++ search engine
API Documentation


Loading...
Searching...
No Matches
Public Member Functions | Static Public Member Functions | Protected Member Functions | Protected Attributes
Lucene::BooleanScorer Class Reference

BooleanScorer uses a ~16k array to score windows of docs. So it scores docs 0-16k first, then docs 16-32k, etc. For each window it iterates through all query terms and accumulates a score in table[doc%16k]. It also stores in the table a bitmask representing which terms contributed to the score. Non-zero scores are chained in a linked list. At the end of scoring each window it then iterates through the linked list and, if the bitmask matches the boolean constraints, collects a hit. For boolean queries with lots of frequent terms this can be much faster, since it does not need to update a priority queue for each posting, instead performing constant-time operations per posting. The only downside is that it results in hits being delivered out-of-order within the window, which means it cannot be nested within other scorers. But it works well as a top-level scorer. More...

#include <BooleanScorer.h>

+ Inheritance diagram for Lucene::BooleanScorer:

Public Member Functions

 BooleanScorer (const SimilarityPtr &similarity, int32_t minNrShouldMatch, Collection< ScorerPtr > optionalScorers, Collection< ScorerPtr > prohibitedScorers)
 
virtual ~BooleanScorer ()
 
virtual String getClassName ()
 
boost::shared_ptr< BooleanScorershared_from_this ()
 
virtual int32_t advance (int32_t target)
 Advances to the first beyond the current whose document number is greater than or equal to target. Returns the current document number or NO_MORE_DOCS if there are no more docs in the set.
 
virtual int32_t docID ()
 Returns the following:
 
virtual int32_t nextDoc ()
 Advances to the next document in the set and returns the doc it is currently on, or NO_MORE_DOCS if there are no more docs in the set.
 
virtual double score ()
 Returns the score of the current document matching the query. Initially invalid, until nextDoc() or advance(int32_t) is called the first time, or when called from within Collector#collect.
 
virtual void score (const CollectorPtr &collector)
 Scores and collects all matching documents.
 
virtual String toString ()
 Returns a string representation of the object.
 
- Public Member Functions inherited from Lucene::Scorer
 Scorer (const SimilarityPtr &similarity)
 Constructs a Scorer.
 
 Scorer (const WeightPtr &weight)
 
virtual ~Scorer ()
 
boost::shared_ptr< Scorershared_from_this ()
 
SimilarityPtr getSimilarity ()
 Returns the Similarity implementation used by this scorer.
 
void visitSubScorers (QueryPtr parent, BooleanClause::Occur relationship, ScorerVisitor *visitor)
 
void visitScorers (ScorerVisitor *visitor)
 
virtual float termFreq ()
 
- Public Member Functions inherited from Lucene::DocIdSetIterator
virtual ~DocIdSetIterator ()
 
boost::shared_ptr< DocIdSetIteratorshared_from_this ()
 
- Public Member Functions inherited from Lucene::LuceneObject
virtual ~LuceneObject ()
 
virtual void initialize ()
 Called directly after instantiation to create objects that depend on this object being fully constructed.
 
virtual LuceneObjectPtr clone (const LuceneObjectPtr &other=LuceneObjectPtr())
 Return clone of this object.
 
virtual int32_t hashCode ()
 Return hash code for this object.
 
virtual bool equals (const LuceneObjectPtr &other)
 Return whether two objects are equal.
 
virtual int32_t compareTo (const LuceneObjectPtr &other)
 Compare two objects.
 
- Public Member Functions inherited from Lucene::LuceneSync
virtual ~LuceneSync ()
 
virtual SynchronizePtr getSync ()
 Return this object synchronize lock.
 
virtual LuceneSignalPtr getSignal ()
 Return this object signal.
 
virtual void lock (int32_t timeout=0)
 Lock this object using an optional timeout.
 
virtual void unlock ()
 Unlock this object.
 
virtual bool holdsLock ()
 Returns true if this object is currently locked by current thread.
 
virtual void wait (int32_t timeout=0)
 Wait for signal using an optional timeout.
 
virtual void notifyAll ()
 Notify all threads waiting for signal.
 

Static Public Member Functions

static String _getClassName ()
 
- Static Public Member Functions inherited from Lucene::Scorer
static String _getClassName ()
 
- Static Public Member Functions inherited from Lucene::DocIdSetIterator
static String _getClassName ()
 

Protected Member Functions

virtual bool score (const CollectorPtr &collector, int32_t max, int32_t firstDocID)
 Collects matching documents in a range. Hook for optimization. Note, firstDocID is added to ensure that nextDoc() was called before this method.
 
- Protected Member Functions inherited from Lucene::LuceneObject
 LuceneObject ()
 

Protected Attributes

SubScorerPtr scorers
 
BucketTablePtr bucketTable
 
int32_t maxCoord
 
Collection< double > coordFactors
 
int32_t requiredMask
 
int32_t prohibitedMask
 
int32_t nextMask
 
int32_t minNrShouldMatch
 
int32_t end
 
BucketPtr current
 
Bucket__current = nullptr
 
int32_t doc
 
- Protected Attributes inherited from Lucene::Scorer
SimilarityPtr similarity
 
- Protected Attributes inherited from Lucene::LuceneSync
SynchronizePtr objectLock
 
LuceneSignalPtr objectSignal
 

Additional Inherited Members

- Data Fields inherited from Lucene::Scorer
WeightPtr weight
 
- Static Public Attributes inherited from Lucene::DocIdSetIterator
static const int32_t NO_MORE_DOCS
 When returned by nextDoc(), advance(int) and docID() it means there are no more docs in the iterator.
 

Detailed Description

BooleanScorer uses a ~16k array to score windows of docs. So it scores docs 0-16k first, then docs 16-32k, etc. For each window it iterates through all query terms and accumulates a score in table[doc%16k]. It also stores in the table a bitmask representing which terms contributed to the score. Non-zero scores are chained in a linked list. At the end of scoring each window it then iterates through the linked list and, if the bitmask matches the boolean constraints, collects a hit. For boolean queries with lots of frequent terms this can be much faster, since it does not need to update a priority queue for each posting, instead performing constant-time operations per posting. The only downside is that it results in hits being delivered out-of-order within the window, which means it cannot be nested within other scorers. But it works well as a top-level scorer.

The new BooleanScorer2 implementation instead works by merging priority queues of postings, albeit with some clever tricks. For example, a pure conjunction (all terms required) does not require a priority queue. Instead it sorts the posting streams at the start, then repeatedly skips the first to to the last. If the first ever equals the last, then there's a hit. When some terms are required and some terms are optional, the conjunction can be evaluated first, then the optional terms can all skip to the match and be added to the score. Thus the conjunction can reduce the number of priority queue updates for the optional terms.

Constructor & Destructor Documentation

◆ BooleanScorer()

Lucene::BooleanScorer::BooleanScorer ( const SimilarityPtr similarity,
int32_t  minNrShouldMatch,
Collection< ScorerPtr optionalScorers,
Collection< ScorerPtr prohibitedScorers 
)

◆ ~BooleanScorer()

virtual Lucene::BooleanScorer::~BooleanScorer ( )
virtual

Member Function Documentation

◆ _getClassName()

static String Lucene::BooleanScorer::_getClassName ( )
inlinestatic

◆ advance()

virtual int32_t Lucene::BooleanScorer::advance ( int32_t  target)
virtual

Advances to the first beyond the current whose document number is greater than or equal to target. Returns the current document number or NO_MORE_DOCS if there are no more docs in the set.

Behaves as if written:

int32_t advance(int32_t target)
{
    int32_t doc;
    while ((doc = nextDoc()) < target)
    { }
    return doc;
}

Some implementations are considerably more efficient than that.

NOTE: certain implementations may return a different value (each time) if called several times in a row with the same target.

NOTE: this method may be called with {@value NO_MORE_DOCS} for efficiency by some Scorers. If your implementation cannot efficiently determine that it should exhaust, it is recommended that you check for that value in each call to this method.

NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behaviour.

Implements Lucene::DocIdSetIterator.

◆ docID()

virtual int32_t Lucene::BooleanScorer::docID ( )
virtual

Returns the following:

Implements Lucene::DocIdSetIterator.

◆ getClassName()

virtual String Lucene::BooleanScorer::getClassName ( )
inlinevirtual

Reimplemented from Lucene::Scorer.

◆ nextDoc()

virtual int32_t Lucene::BooleanScorer::nextDoc ( )
virtual

Advances to the next document in the set and returns the doc it is currently on, or NO_MORE_DOCS if there are no more docs in the set.

NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behaviour.

Implements Lucene::DocIdSetIterator.

◆ score() [1/3]

virtual double Lucene::BooleanScorer::score ( )
virtual

Returns the score of the current document matching the query. Initially invalid, until nextDoc() or advance(int32_t) is called the first time, or when called from within Collector#collect.

Implements Lucene::Scorer.

◆ score() [2/3]

virtual void Lucene::BooleanScorer::score ( const CollectorPtr collector)
virtual

Scores and collects all matching documents.

Parameters
collectorThe collector to which all matching documents are passed.

Reimplemented from Lucene::Scorer.

◆ score() [3/3]

virtual bool Lucene::BooleanScorer::score ( const CollectorPtr collector,
int32_t  max,
int32_t  firstDocID 
)
protectedvirtual

Collects matching documents in a range. Hook for optimization. Note, firstDocID is added to ensure that nextDoc() was called before this method.

Parameters
collectorThe collector to which all matching documents are passed.
maxDo not score documents past this.
firstDocIDThe first document ID (ensures nextDoc() is called before this method.
Returns
true if more matching documents may remain.

Reimplemented from Lucene::Scorer.

◆ shared_from_this()

boost::shared_ptr< BooleanScorer > Lucene::BooleanScorer::shared_from_this ( )
inline

◆ toString()

virtual String Lucene::BooleanScorer::toString ( )
virtual

Returns a string representation of the object.

Reimplemented from Lucene::LuceneObject.

Field Documentation

◆ __current

Bucket* Lucene::BooleanScorer::__current = nullptr
protected

◆ bucketTable

BucketTablePtr Lucene::BooleanScorer::bucketTable
protected

◆ coordFactors

Collection<double> Lucene::BooleanScorer::coordFactors
protected

◆ current

BucketPtr Lucene::BooleanScorer::current
protected

◆ doc

int32_t Lucene::BooleanScorer::doc
protected

◆ end

int32_t Lucene::BooleanScorer::end
protected

◆ maxCoord

int32_t Lucene::BooleanScorer::maxCoord
protected

◆ minNrShouldMatch

int32_t Lucene::BooleanScorer::minNrShouldMatch
protected

◆ nextMask

int32_t Lucene::BooleanScorer::nextMask
protected

◆ prohibitedMask

int32_t Lucene::BooleanScorer::prohibitedMask
protected

◆ requiredMask

int32_t Lucene::BooleanScorer::requiredMask
protected

◆ scorers

SubScorerPtr Lucene::BooleanScorer::scorers
protected

The documentation for this class was generated from the following file:

clucene.sourceforge.net