Lucene++ - a full-featured, c++ search engine
API Documentation


Loading...
Searching...
No Matches
Public Member Functions | Static Public Member Functions | Data Fields | Protected Member Functions
Lucene::NumericRangeQuery Class Reference

A Query that matches numeric values within a specified range. To use this, you must first index the numeric values using NumericField (expert: NumericTokenStream). If your terms are instead textual, you should use TermRangeQuery. NumericRangeFilter is the filter equivalent of this query. More...

#include <NumericRangeQuery.h>

+ Inheritance diagram for Lucene::NumericRangeQuery:

Public Member Functions

 NumericRangeQuery (const String &field, int32_t precisionStep, int32_t valSize, NumericValue min, NumericValue max, bool minInclusive, bool maxInclusive)
 
virtual ~NumericRangeQuery ()
 
virtual String getClassName ()
 
boost::shared_ptr< NumericRangeQueryshared_from_this ()
 
String getField ()
 Returns the field name for this query.
 
bool includesMin ()
 Returns true if the lower endpoint is inclusive.
 
bool includesMax ()
 Returns true if the upper endpoint is inclusive.
 
NumericValue getMin ()
 Returns the lower value of this range query.
 
NumericValue getMax ()
 Returns the upper value of this range query.
 
virtual LuceneObjectPtr clone (const LuceneObjectPtr &other=LuceneObjectPtr())
 Returns a clone of this query.
 
virtual String toString (const String &field)
 Prints a query to a string, with field assumed to be the default field and omitted.
 
virtual bool equals (const LuceneObjectPtr &other)
 Return whether two objects are equal.
 
virtual int32_t hashCode ()
 Return hash code for this object.
 
- Public Member Functions inherited from Lucene::MultiTermQuery
 MultiTermQuery ()
 
virtual ~MultiTermQuery ()
 
boost::shared_ptr< MultiTermQueryshared_from_this ()
 
int32_t getTotalNumberOfTerms ()
 Return the number of unique terms visited during execution of the query. If there are many of them, you may consider using another query type or optimize your total term count in index.
 
void clearTotalNumberOfTerms ()
 Resets the counting of unique terms. Do this before executing the query/filter.
 
virtual QueryPtr rewrite (const IndexReaderPtr &reader)
 Called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.
 
virtual RewriteMethodPtr getRewriteMethod ()
 
virtual void setRewriteMethod (const RewriteMethodPtr &method)
 Sets the rewrite method to be used when executing the query. You can use one of the four core methods, or implement your own subclass of RewriteMethod.
 
- Public Member Functions inherited from Lucene::Query
 Query ()
 
virtual ~Query ()
 
boost::shared_ptr< Queryshared_from_this ()
 
virtual void setBoost (double b)
 Sets the boost for this query clause to b. Documents matching this clause will (in addition to the normal weightings) have their score multiplied by b.
 
virtual double getBoost ()
 Gets the boost for this clause. Documents matching this clause will (in addition to the normal weightings) have their score multiplied by b. The boost is 1.0 by default.
 
virtual String toString ()
 Prints a query to a string.
 
virtual WeightPtr createWeight (const SearcherPtr &searcher)
 Constructs an appropriate Weight implementation for this query. Only implemented by primitive queries, which re-write to themselves.
 
virtual WeightPtr weight (const SearcherPtr &searcher)
 Constructs and initializes a Weight for a top-level query.
 
virtual QueryPtr combine (Collection< QueryPtr > queries)
 Called when re-writing queries under MultiSearcher.
 
virtual void extractTerms (SetTerm terms)
 Adds all terms occurring in this query to the terms set. Only works if this query is in its rewritten form.
 
virtual SimilarityPtr getSimilarity (const SearcherPtr &searcher)
 Returns the Similarity implementation to be used for this query. Subclasses may override this method to specify their own Similarity implementation, perhaps one that delegates through that of the Searcher. By default the Searcher's Similarity implementation is returned.
 
String boostString ()
 Return given boost value as a string.
 
- Public Member Functions inherited from Lucene::LuceneObject
virtual ~LuceneObject ()
 
virtual void initialize ()
 Called directly after instantiation to create objects that depend on this object being fully constructed.
 
virtual int32_t compareTo (const LuceneObjectPtr &other)
 Compare two objects.
 
- Public Member Functions inherited from Lucene::LuceneSync
virtual ~LuceneSync ()
 
virtual SynchronizePtr getSync ()
 Return this object synchronize lock.
 
virtual LuceneSignalPtr getSignal ()
 Return this object signal.
 
virtual void lock (int32_t timeout=0)
 Lock this object using an optional timeout.
 
virtual void unlock ()
 Unlock this object.
 
virtual bool holdsLock ()
 Returns true if this object is currently locked by current thread.
 
virtual void wait (int32_t timeout=0)
 Wait for signal using an optional timeout.
 
virtual void notifyAll ()
 Notify all threads waiting for signal.
 

Static Public Member Functions

static String _getClassName ()
 
static NumericRangeQueryPtr newLongRange (const String &field, int32_t precisionStep, int64_t min, int64_t max, bool minInclusive, bool maxInclusive)
 Factory that creates a NumericRangeFilter, that filters a long range using the given precisionStep.
 
static NumericRangeQueryPtr newLongRange (const String &field, int64_t min, int64_t max, bool minInclusive, bool maxInclusive)
 Factory that creates a NumericRangeFilter, that filters a long range using the default precisionStep NumericUtils#PRECISION_STEP_DEFAULT (4).
 
static NumericRangeQueryPtr newIntRange (const String &field, int32_t precisionStep, int32_t min, int32_t max, bool minInclusive, bool maxInclusive)
 Factory that creates a NumericRangeFilter, that filters a int range using the given precisionStep.
 
static NumericRangeQueryPtr newIntRange (const String &field, int32_t min, int32_t max, bool minInclusive, bool maxInclusive)
 Factory that creates a NumericRangeFilter, that filters a int range using the default precisionStep NumericUtils#PRECISION_STEP_DEFAULT (4).
 
static NumericRangeQueryPtr newDoubleRange (const String &field, int32_t precisionStep, double min, double max, bool minInclusive, bool maxInclusive)
 Factory that creates a NumericRangeFilter, that filters a double range using the given precisionStep.
 
static NumericRangeQueryPtr newDoubleRange (const String &field, double min, double max, bool minInclusive, bool maxInclusive)
 Factory that creates a NumericRangeFilter, that filters a double range using the default precisionStep NumericUtils#PRECISION_STEP_DEFAULT (4).
 
static NumericRangeQueryPtr newNumericRange (const String &field, int32_t precisionStep, NumericValue min, NumericValue max, bool minInclusive, bool maxInclusive)
 Factory that creates a NumericRangeQuery, that queries a int, long or double range using the given precisionStep. You can have half-open ranges (which are in fact <= or >= queries) by setting the min or max value to VariantUtils::null(). By setting inclusive to false it will match all documents excluding the bounds, with inclusive on the boundaries are hits, too.
 
static NumericRangeQueryPtr newNumericRange (const String &field, NumericValue min, NumericValue max, bool minInclusive, bool maxInclusive)
 Factory that creates a NumericRangeQuery, that queries a int, long or double range using the default precisionStep NumericUtils#PRECISION_STEP_DEFAULT (4). You can have half-open ranges (which are in fact <= or >= queries) by setting the min or max value to VariantUtils::null(). By setting inclusive to false it will match all documents excluding the bounds, with inclusive on the boundaries are hits, too.
 
- Static Public Member Functions inherited from Lucene::MultiTermQuery
static String _getClassName ()
 
static RewriteMethodPtr CONSTANT_SCORE_FILTER_REWRITE ()
 A rewrite method that first creates a private Filter, by visiting each term in sequence and marking all docs for that term. Matching documents are assigned a constant score equal to the query's boost.
 
static RewriteMethodPtr SCORING_BOOLEAN_QUERY_REWRITE ()
 A rewrite method that first translates each term into BooleanClause.Occur#SHOULD clause in a BooleanQuery, and keeps the scores as computed by the query. Note that typically such scores are meaningless to the user, and require non-trivial CPU to compute, so it's almost always better to use CONSTANT_SCORE_AUTO_REWRITE_DEFAULT instead.
 
static RewriteMethodPtr CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE ()
 Like SCORING_BOOLEAN_QUERY_REWRITE except scores are not computed. Instead, each matching document receives a constant score equal to the query's boost.
 
static RewriteMethodPtr CONSTANT_SCORE_AUTO_REWRITE_DEFAULT ()
 Read-only default instance of ConstantScoreAutoRewrite, with ConstantScoreAutoRewrite#setTermCountCutoff set to ConstantScoreAutoRewrite#DEFAULT_TERM_COUNT_CUTOFF and ConstantScoreAutoRewrite#setDocCountPercent set to ConstantScoreAutoRewrite#DEFAULT_DOC_COUNT_PERCENT. Note that you cannot alter the configuration of this instance; you'll need to create a private instance instead.
 
- Static Public Member Functions inherited from Lucene::Query
static String _getClassName ()
 
static QueryPtr mergeBooleanQueries (Collection< BooleanQueryPtr > queries)
 Merges the clauses of a set of BooleanQuery's into a single BooleanQuery.
 

Data Fields

INTERNAL : String field
 
int32_t precisionStep
 
int32_t valSize
 
NumericValue min
 
NumericValue max
 
bool minInclusive
 
bool maxInclusive
 

Protected Member Functions

virtual FilteredTermEnumPtr getEnum (const IndexReaderPtr &reader)
 Construct the enumeration to be used, expanding the pattern term.
 
- Protected Member Functions inherited from Lucene::MultiTermQuery
void incTotalNumberOfTerms (int32_t inc)
 
- Protected Member Functions inherited from Lucene::LuceneObject
 LuceneObject ()
 

Additional Inherited Members

- Protected Attributes inherited from Lucene::MultiTermQuery
RewriteMethodPtr rewriteMethod
 
int32_t numberOfTerms
 
- Protected Attributes inherited from Lucene::Query
double boost
 
- Protected Attributes inherited from Lucene::LuceneSync
SynchronizePtr objectLock
 
LuceneSignalPtr objectSignal
 

Detailed Description

A Query that matches numeric values within a specified range. To use this, you must first index the numeric values using NumericField (expert: NumericTokenStream). If your terms are instead textual, you should use TermRangeQuery. NumericRangeFilter is the filter equivalent of this query.

You create a new NumericRangeQuery with the static factory methods, eg:

QueryPtr q = NumericRangeQuery::newDoubleRange("weight", 0.3, 0.10, true, true);

matches all documents whose double valued "weight" field ranges from 0.3 to 0.10, inclusive.

The performance of NumericRangeQuery is much better than the corresponding TermRangeQuery because the number of terms that must be searched is usually far fewer, thanks to trie indexing, described below.

You can optionally specify a precisionStep when creating this query. This is necessary if you've changed this configuration from its default (4) during indexing. Lower values consume more disk space but speed up searching. Suitable values are between 1 and 8. A good starting point to test is 4, which is the default value for all Numeric* classes. See below for details.

This query defaults to MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT for 32 bit integer ranges with precisionStep <=8 and 64 bit (long/double) ranges with precisionStep <=6. Otherwise it uses MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE as the number of terms is likely to be high. With precision steps of <=4, this query can be run with one of the BooleanQuery rewrite methods without changing BooleanQuery's default max clause count.

How it works

See the publication about panFMP, where this algorithm was described (referred to as TrieRangeQuery):

Schindler, U, Diepenbroek, M, 2008. Generic XML-based Framework for Metadata Portals. Computers & Geosciences 34 (12), 1947-1955. doi:10.1016/j.cageo.2008.02.023

A quote from this paper: Because Apache Lucene is a full-text search engine and not a conventional database, it cannot handle numerical ranges (eg., field value is inside user defined bounds, even dates are numerical values). We have developed an extension to Apache Lucene that stores the numerical values in a special string-encoded format with variable precision (all numerical values like doubles, longs, and ints are converted to lexicographic sortable string representations and stored with different precisions (for a more detailed description of how the values are stored, see NumericUtils). A range is then divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.

For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the lowest precision. Overall, a range could consist of a theoretical maximum of 7*255*2 + 255 = 3825 distinct terms (when there is a term for every distinct value of an 8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used because it would always be possible to reduce the full 256 values to one term with degraded precision). In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records and a uniform value distribution).

Precision Step: You can choose any precisionStep when encoding values. Lower step values mean more precisions and so more terms in index (and index gets larger). On the other hand, the maximum number of terms to match reduces, which optimized query speed. The formula to calculate the maximum term count is:

n = [ (bitsPerValue/precisionStep - 1) * (2 ^ precisionStep - 1 ) * 2 ] + (2 ^ precisionStep - 1 )

(this formula is only correct, when bitsPerValue/precisionStep is an integer; in other cases, the value must be rounded up and the last summand must contain the modulo of the division as precision step). For longs stored using a precision step of 4, n = 15*15*2 + 15 = 465, and for a precision step of 2, n = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking in the term enum of the index. Because of this, the ideal precisionStep value can only be found out by testing. Important: You can index with a lower precision step value and test search speed using a multiple of the original step value.

Good values for precisionStep are depending on usage and data type:

Comparisons of the different types of RangeQueries on an index with about 500,000 docs showed that TermRangeQuery in boolean rewrite mode (with raised BooleanQuery clause count) took about 30-40 secs to complete, TermRangeQuery in constant score filter rewrite mode took 5 secs and executing this class took <100ms to complete (on an Opteron64 machine, 8 bit precision step). This query type was developed for a geographic portal, where the performance for eg. bounding boxes or exact date/time stamps is important.

Constructor & Destructor Documentation

◆ NumericRangeQuery()

Lucene::NumericRangeQuery::NumericRangeQuery ( const String &  field,
int32_t  precisionStep,
int32_t  valSize,
NumericValue  min,
NumericValue  max,
bool  minInclusive,
bool  maxInclusive 
)

◆ ~NumericRangeQuery()

virtual Lucene::NumericRangeQuery::~NumericRangeQuery ( )
virtual

Member Function Documentation

◆ _getClassName()

static String Lucene::NumericRangeQuery::_getClassName ( )
inlinestatic

◆ clone()

virtual LuceneObjectPtr Lucene::NumericRangeQuery::clone ( const LuceneObjectPtr other = LuceneObjectPtr())
virtual

Returns a clone of this query.

Reimplemented from Lucene::MultiTermQuery.

◆ equals()

virtual bool Lucene::NumericRangeQuery::equals ( const LuceneObjectPtr other)
virtual

Return whether two objects are equal.

Reimplemented from Lucene::MultiTermQuery.

◆ getClassName()

virtual String Lucene::NumericRangeQuery::getClassName ( )
inlinevirtual

Reimplemented from Lucene::MultiTermQuery.

◆ getEnum()

virtual FilteredTermEnumPtr Lucene::NumericRangeQuery::getEnum ( const IndexReaderPtr reader)
protectedvirtual

Construct the enumeration to be used, expanding the pattern term.

Implements Lucene::MultiTermQuery.

◆ getField()

String Lucene::NumericRangeQuery::getField ( )

Returns the field name for this query.

◆ getMax()

NumericValue Lucene::NumericRangeQuery::getMax ( )

Returns the upper value of this range query.

◆ getMin()

NumericValue Lucene::NumericRangeQuery::getMin ( )

Returns the lower value of this range query.

◆ hashCode()

virtual int32_t Lucene::NumericRangeQuery::hashCode ( )
virtual

Return hash code for this object.

Reimplemented from Lucene::MultiTermQuery.

◆ includesMax()

bool Lucene::NumericRangeQuery::includesMax ( )

Returns true if the upper endpoint is inclusive.

◆ includesMin()

bool Lucene::NumericRangeQuery::includesMin ( )

Returns true if the lower endpoint is inclusive.

◆ newDoubleRange() [1/2]

static NumericRangeQueryPtr Lucene::NumericRangeQuery::newDoubleRange ( const String &  field,
double  min,
double  max,
bool  minInclusive,
bool  maxInclusive 
)
static

Factory that creates a NumericRangeFilter, that filters a double range using the default precisionStep NumericUtils#PRECISION_STEP_DEFAULT (4).

◆ newDoubleRange() [2/2]

static NumericRangeQueryPtr Lucene::NumericRangeQuery::newDoubleRange ( const String &  field,
int32_t  precisionStep,
double  min,
double  max,
bool  minInclusive,
bool  maxInclusive 
)
static

Factory that creates a NumericRangeFilter, that filters a double range using the given precisionStep.

◆ newIntRange() [1/2]

static NumericRangeQueryPtr Lucene::NumericRangeQuery::newIntRange ( const String &  field,
int32_t  min,
int32_t  max,
bool  minInclusive,
bool  maxInclusive 
)
static

Factory that creates a NumericRangeFilter, that filters a int range using the default precisionStep NumericUtils#PRECISION_STEP_DEFAULT (4).

◆ newIntRange() [2/2]

static NumericRangeQueryPtr Lucene::NumericRangeQuery::newIntRange ( const String &  field,
int32_t  precisionStep,
int32_t  min,
int32_t  max,
bool  minInclusive,
bool  maxInclusive 
)
static

Factory that creates a NumericRangeFilter, that filters a int range using the given precisionStep.

◆ newLongRange() [1/2]

static NumericRangeQueryPtr Lucene::NumericRangeQuery::newLongRange ( const String &  field,
int32_t  precisionStep,
int64_t  min,
int64_t  max,
bool  minInclusive,
bool  maxInclusive 
)
static

Factory that creates a NumericRangeFilter, that filters a long range using the given precisionStep.

◆ newLongRange() [2/2]

static NumericRangeQueryPtr Lucene::NumericRangeQuery::newLongRange ( const String &  field,
int64_t  min,
int64_t  max,
bool  minInclusive,
bool  maxInclusive 
)
static

Factory that creates a NumericRangeFilter, that filters a long range using the default precisionStep NumericUtils#PRECISION_STEP_DEFAULT (4).

◆ newNumericRange() [1/2]

static NumericRangeQueryPtr Lucene::NumericRangeQuery::newNumericRange ( const String &  field,
int32_t  precisionStep,
NumericValue  min,
NumericValue  max,
bool  minInclusive,
bool  maxInclusive 
)
static

Factory that creates a NumericRangeQuery, that queries a int, long or double range using the given precisionStep. You can have half-open ranges (which are in fact <= or >= queries) by setting the min or max value to VariantUtils::null(). By setting inclusive to false it will match all documents excluding the bounds, with inclusive on the boundaries are hits, too.

◆ newNumericRange() [2/2]

static NumericRangeQueryPtr Lucene::NumericRangeQuery::newNumericRange ( const String &  field,
NumericValue  min,
NumericValue  max,
bool  minInclusive,
bool  maxInclusive 
)
static

Factory that creates a NumericRangeQuery, that queries a int, long or double range using the default precisionStep NumericUtils#PRECISION_STEP_DEFAULT (4). You can have half-open ranges (which are in fact <= or >= queries) by setting the min or max value to VariantUtils::null(). By setting inclusive to false it will match all documents excluding the bounds, with inclusive on the boundaries are hits, too.

◆ shared_from_this()

boost::shared_ptr< NumericRangeQuery > Lucene::NumericRangeQuery::shared_from_this ( )
inline

◆ toString()

virtual String Lucene::NumericRangeQuery::toString ( const String &  field)
virtual

Prints a query to a string, with field assumed to be the default field and omitted.

The representation used is one that is supposed to be readable by QueryParser. However, there are the following limitations:

If the query was created by the parser, the printed representation may not be exactly what was parsed. For example, characters that need to be escaped will be represented without the required backslash.

Some of the more complicated queries (eg. span queries) don't have a representation that can be parsed by QueryParser.

Reimplemented from Lucene::Query.

Field Documentation

◆ __pad0__

INTERNAL Lucene::NumericRangeQuery::__pad0__

◆ max

NumericValue Lucene::NumericRangeQuery::max

◆ maxInclusive

bool Lucene::NumericRangeQuery::maxInclusive

◆ min

NumericValue Lucene::NumericRangeQuery::min

◆ minInclusive

bool Lucene::NumericRangeQuery::minInclusive

◆ precisionStep

int32_t Lucene::NumericRangeQuery::precisionStep

◆ valSize

int32_t Lucene::NumericRangeQuery::valSize

The documentation for this class was generated from the following file:

clucene.sourceforge.net