Lucene++ - a full-featured, c++ search engine
API Documentation
IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable. More...
#include <IndexReader.h>
Public Types | |
enum | FieldOption { FIELD_OPTION_ALL , FIELD_OPTION_INDEXED , FIELD_OPTION_STORES_PAYLOADS , FIELD_OPTION_OMIT_TERM_FREQ_AND_POSITIONS , FIELD_OPTION_UNINDEXED , FIELD_OPTION_INDEXED_WITH_TERMVECTOR , FIELD_OPTION_INDEXED_NO_TERMVECTOR , FIELD_OPTION_TERMVECTOR , FIELD_OPTION_TERMVECTOR_WITH_POSITION , FIELD_OPTION_TERMVECTOR_WITH_OFFSET , FIELD_OPTION_TERMVECTOR_WITH_POSITION_OFFSET } |
Constants describing field properties, for example used for IndexReader#getFieldNames(FieldOption) . More... | |
Public Member Functions | |
IndexReader () | |
virtual | ~IndexReader () |
virtual String | getClassName () |
boost::shared_ptr< IndexReader > | shared_from_this () |
int32_t | getRefCount () |
Returns the current refCount for this reader. | |
void | incRef () |
Increments the refCount of this IndexReader instance. RefCounts are used to determine when a reader can be closed safely, i.e. as soon as there are no more references. Be sure to always call a corresponding decRef , in a finally clause; otherwise the reader may never be closed. Note that close simply calls decRef(), which means that the IndexReader will not really be closed until decRef has been called for all outstanding references. | |
void | decRef () |
Decreases the refCount of this IndexReader instance. If the refCount drops to 0, then pending changes (if any) are committed to the index and this reader is closed. | |
virtual IndexReaderPtr | reopen () |
Refreshes an IndexReader if the index has changed since this instance was (re)opened. | |
virtual IndexReaderPtr | reopen (bool openReadOnly) |
Just like reopen() , except you can change the readOnly of the original reader. If the index is unchanged but readOnly is different then a new reader will be returned. | |
virtual IndexReaderPtr | reopen (const IndexCommitPtr &commit) |
Reopen this reader on a specific commit point. This always returns a readOnly reader. If the specified commit point matches what this reader is already on, and this reader is already readOnly, then this same instance is returned; if it is not already readOnly, a readOnly clone is returned. | |
virtual LuceneObjectPtr | clone (const LuceneObjectPtr &other=LuceneObjectPtr()) |
Efficiently clones the IndexReader (sharing most internal state). | |
virtual LuceneObjectPtr | clone (bool openReadOnly, const LuceneObjectPtr &other=LuceneObjectPtr()) |
Clones the IndexReader and optionally changes readOnly. A readOnly reader cannot open a writable reader. | |
virtual DirectoryPtr | directory () |
Returns the directory associated with this index. The default implementation returns the directory specified by subclasses when delegating to the IndexReader(Directory) constructor, or throws an UnsupportedOperation exception if one was not specified. | |
virtual int64_t | getVersion () |
Version number when this IndexReader was opened. Not implemented in the IndexReader base class. | |
virtual MapStringString | getCommitUserData () |
Retrieve the String userData optionally passed to IndexWriter::commit. This will return null if IndexWriter#commit(MapStringString) has never been called for this index. | |
virtual bool | isCurrent () |
Check whether any new changes have occurred to the index since this reader was opened. | |
virtual bool | isOptimized () |
Checks is the index is optimized (if it has a single segment and no deletions). Not implemented in the IndexReader base class. | |
virtual Collection< TermFreqVectorPtr > | getTermFreqVectors (int32_t docNumber)=0 |
Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector contains terms and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null. The term vectors that are returned may either be of type TermFreqVector or of type TermPositionVector if positions or offsets have been stored. | |
virtual TermFreqVectorPtr | getTermFreqVector (int32_t docNumber, const String &field)=0 |
Return a term frequency vector for the specified document and field. The returned vector contains terms and frequencies for the terms in the specified field of this document, if the field had the storeTermVector flag set. If termvectors had been stored with positions or offsets, a TermPositionVector is returned. | |
virtual void | getTermFreqVector (int32_t docNumber, const String &field, const TermVectorMapperPtr &mapper)=0 |
Load the Term Vector into a user-defined data structure instead of relying on the parallel arrays of the TermFreqVector . | |
virtual void | getTermFreqVector (int32_t docNumber, const TermVectorMapperPtr &mapper)=0 |
Map all the term vectors for all fields in a Document. | |
virtual int32_t | numDocs ()=0 |
Returns the number of documents in this index. | |
virtual int32_t | maxDoc ()=0 |
Returns one greater than the largest possible document number. This may be used to, eg., determine how big to allocate an array which will have an element for every document number in an index. | |
int32_t | numDeletedDocs () |
Returns the number of deleted documents. | |
virtual DocumentPtr | document (int32_t n) |
Returns the stored fields of the n'th Document in this index. | |
virtual DocumentPtr | document (int32_t n, const FieldSelectorPtr &fieldSelector)=0 |
Get the Document at the n'th position. The FieldSelector may be used to determine what Field s to load and how they should be loaded. NOTE: If this Reader (more specifically, the underlying FieldsReader) is closed before the lazy Field is loaded an exception may be thrown. If you want the value of a lazy Field to be available after closing you must explicitly load it or fetch the Document again with a new loader. | |
virtual bool | isDeleted (int32_t n)=0 |
Returns true if document n has been deleted. | |
virtual bool | hasDeletions ()=0 |
Returns true if any documents have been deleted. | |
virtual bool | hasChanges () |
Used for testing. | |
virtual bool | hasNorms (const String &field) |
Returns true if there are norms stored for this field. | |
virtual ByteArray | norms (const String &field)=0 |
Returns the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents. | |
virtual void | norms (const String &field, ByteArray norms, int32_t offset)=0 |
Reads the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents. | |
virtual void | setNorm (int32_t doc, const String &field, uint8_t value) |
Resets the normalization factor for the named field of the named document. The norm represents the product of the field's boost and its length normalization . Thus, to preserve the length normalization values when resetting this, one should base the new value upon the old. | |
virtual void | setNorm (int32_t doc, const String &field, double value) |
Resets the normalization factor for the named field of the named document. | |
virtual TermEnumPtr | terms ()=0 |
Returns an enumeration of all the terms in the index. The enumeration is ordered by Term::compareTo(). Each term is greater than all that precede it in the enumeration. Note that after calling terms(), TermEnum#next() must be called on the resulting enumeration before calling other methods such as TermEnum#term() . | |
virtual TermEnumPtr | terms (const TermPtr &t)=0 |
Returns an enumeration of all terms starting at a given term. If the given term does not exist, the enumeration is positioned at the first term greater than the supplied term. The enumeration is ordered by Term::compareTo(). Each term is greater than all that precede it in the enumeration. | |
virtual int32_t | docFreq (const TermPtr &t)=0 |
Returns the number of documents containing the term t. | |
virtual TermDocsPtr | termDocs (const TermPtr &term) |
Returns an enumeration of all the documents which contain term. For each document, the document number, the frequency of the term in that document is also provided, for use in search scoring. If term is null, then all non-deleted docs are returned with freq=1. The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration. | |
virtual TermDocsPtr | termDocs ()=0 |
Returns an unpositioned TermDocs enumerator. | |
virtual TermPositionsPtr | termPositions (const TermPtr &term) |
Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available. Thus, this method positions of the term in the document is available. This positional information facilitates phrase and proximity searching. The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration. | |
virtual TermPositionsPtr | termPositions ()=0 |
Returns an unpositioned TermPositions enumerator. | |
virtual void | deleteDocument (int32_t docNum) |
Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the document method will result in an error. The presence of this document may still be reflected in the docFreq statistic, though this will be corrected eventually as the index is further modified. | |
virtual int32_t | deleteDocuments (const TermPtr &term) |
Deletes all documents that have a given term indexed. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See deleteDocument(int) for information about when this deletion will become effective. | |
virtual void | undeleteAll () |
Undeletes all documents currently marked as deleted in this index. | |
void | flush () |
void | flush (MapStringString commitUserData) |
void | commit (MapStringString commitUserData) |
Commit changes resulting from delete, undeleteAll, or setNorm operations. If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics). | |
void | close () |
Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called. | |
virtual HashSet< String > | getFieldNames (FieldOption fieldOption)=0 |
Get a list of unique field names that exist in this index and have the specified field option information. | |
virtual IndexCommitPtr | getIndexCommit () |
Return the IndexCommit that this reader has opened. This method is only implemented by those readers that correspond to a Directory with its own segments_N file. | |
virtual Collection< IndexReaderPtr > | getSequentialSubReaders () |
Returns the sequential sub readers that this reader is logically composed of. For example, IndexSearcher uses this API to drive searching by one sub reader at a time. If this reader is not composed of sequential child readers, it should return null. If this method returns an empty array, that means this reader is a null reader (for example a MultiReader that has no sub readers). | |
virtual LuceneObjectPtr | getFieldCacheKey () |
virtual LuceneObjectPtr | getDeletesCacheKey () |
This returns null if the reader has no deletions. | |
virtual int64_t | getUniqueTermCount () |
Returns the number of unique terms (across all fields) in this reader. | |
virtual int32_t | getTermInfosIndexDivisor () |
For IndexReader implementations that use TermInfosReader to read terms, this returns the current indexDivisor as specified when the reader was opened. | |
![]() | |
virtual | ~LuceneObject () |
virtual void | initialize () |
Called directly after instantiation to create objects that depend on this object being fully constructed. | |
virtual int32_t | hashCode () |
Return hash code for this object. | |
virtual bool | equals (const LuceneObjectPtr &other) |
Return whether two objects are equal. | |
virtual int32_t | compareTo (const LuceneObjectPtr &other) |
Compare two objects. | |
virtual String | toString () |
Returns a string representation of the object. | |
![]() | |
virtual | ~LuceneSync () |
virtual SynchronizePtr | getSync () |
Return this object synchronize lock. | |
virtual LuceneSignalPtr | getSignal () |
Return this object signal. | |
virtual void | lock (int32_t timeout=0) |
Lock this object using an optional timeout. | |
virtual void | unlock () |
Unlock this object. | |
virtual bool | holdsLock () |
Returns true if this object is currently locked by current thread. | |
virtual void | wait (int32_t timeout=0) |
Wait for signal using an optional timeout. | |
virtual void | notifyAll () |
Notify all threads waiting for signal. | |
Static Public Member Functions | |
static String | _getClassName () |
static IndexReaderPtr | open (const DirectoryPtr &directory) |
Returns a IndexReader reading the index in the given Directory, with readOnly = true. | |
static IndexReaderPtr | open (const DirectoryPtr &directory, bool readOnly) |
Returns an IndexReader reading the index in the given Directory. You should pass readOnly = true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader. | |
static IndexReaderPtr | open (const IndexCommitPtr &commit, bool readOnly) |
Returns an IndexReader reading the index in the given IndexCommit . You should pass readOnly = true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader. | |
static IndexReaderPtr | open (const DirectoryPtr &directory, const IndexDeletionPolicyPtr &deletionPolicy, bool readOnly) |
Returns an IndexReader reading the index in the given Directory, with a custom IndexDeletionPolicy . You should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader. | |
static IndexReaderPtr | open (const DirectoryPtr &directory, const IndexDeletionPolicyPtr &deletionPolicy, bool readOnly, int32_t termInfosIndexDivisor) |
Returns an IndexReader reading the index in the given Directory, with a custom IndexDeletionPolicy . You should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader. | |
static IndexReaderPtr | open (const IndexCommitPtr &commit, const IndexDeletionPolicyPtr &deletionPolicy, bool readOnly) |
Returns an IndexReader reading the index in the given Directory, using a specific commit and with a custom IndexDeletionPolicy . You should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader. | |
static IndexReaderPtr | open (const IndexCommitPtr &commit, const IndexDeletionPolicyPtr &deletionPolicy, bool readOnly, int32_t termInfosIndexDivisor) |
Returns an IndexReader reading the index in the given Directory, using a specific commit and with a custom IndexDeletionPolicy . You should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader. | |
static int64_t | lastModified (const DirectoryPtr &directory2) |
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use isCurrent() instead. | |
static int64_t | getCurrentVersion (const DirectoryPtr &directory) |
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index. | |
static MapStringString | getCommitUserData (const DirectoryPtr &directory) |
Reads commitUserData, previously passed to IndexWriter#commit(MapStringString) , from current index segments file. This will return null if IndexWriter#commit(MapStringString) has never been called for this index. | |
static bool | indexExists (const DirectoryPtr &directory) |
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it. | |
static void | main (Collection< String > args) |
Prints the filename and size of each file within a given compound file. Add the -extract flag to extract files to the current working directory. In order to make the extracted version of the index work, you have to copy the segments file from the compound index into the directory where the extracted files are stored. | |
static Collection< IndexCommitPtr > | listCommits (const DirectoryPtr &dir) |
Returns all commit points that exist in the Directory. Normally, because the default is KeepOnlyLastCommitDeletionPolicy , there would be only one commit point. But if you're using a custom IndexDeletionPolicy then there could be many commits. Once you have a given commit, you can open a reader on it by calling IndexReader#open(IndexCommit,bool) . There must be at least one commit in the Directory, else this method throws an exception. Note that if a commit is in progress while this method is running, that commit may or may not be returned array. | |
Static Public Attributes | |
static const int32_t | DEFAULT_TERMS_INDEX_DIVISOR |
Protected Member Functions | |
void | ensureOpen () |
virtual void | doSetNorm (int32_t doc, const String &field, uint8_t value)=0 |
Implements setNorm in subclass. | |
virtual void | doDelete (int32_t docNum)=0 |
Implements deletion of the document numbered docNum. Applications should call deleteDocument(int) or deleteDocuments(Term) . | |
virtual void | doUndeleteAll ()=0 |
Implements actual undeleteAll() in subclass. | |
virtual void | acquireWriteLock () |
Does nothing by default. Subclasses that require a write lock for index modifications must implement this method. | |
void | commit () |
Commit changes resulting from delete, undeleteAll, or setNorm operations. If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics). | |
virtual void | doCommit (MapStringString commitUserData)=0 |
Implements commit. | |
virtual void | doClose ()=0 |
Implements close. | |
![]() | |
LuceneObject () | |
Static Protected Member Functions | |
static IndexReaderPtr | open (const DirectoryPtr &directory, const IndexDeletionPolicyPtr &deletionPolicy, const IndexCommitPtr &commit, bool readOnly, int32_t termInfosIndexDivisor) |
Protected Attributes | |
bool | closed |
bool | _hasChanges |
int32_t | refCount |
![]() | |
SynchronizePtr | objectLock |
LuceneSignalPtr | objectSignal |
IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.
Concrete subclasses of IndexReader are usually constructed with a call to one of the static open methods, eg. open(DirectoryPtr, bool)
.
For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
An IndexReader can be opened on a directory for which an IndexWriter is opened already, but it cannot be used to delete documents from the index then.
NOTE: for backwards API compatibility, several methods are not listed as abstract, but have no useful implementations in this base class and instead always throw UnsupportedOperation exception. Subclasses are strongly encouraged to override these methods, but in many cases may not need to.
NOTE: as of 2.4, it's possible to open a read-only IndexReader using the static open methods that accept the bool readOnly parameter. Such a reader has better concurrency as it's not necessary to synchronize on the isDeleted method. You must specify false if you want to make changes with the resulting IndexReader.
NOTE: IndexReader
instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexReader instance; use your own (non-Lucene) objects instead.
Constants describing field properties, for example used for IndexReader#getFieldNames(FieldOption)
.
Lucene::IndexReader::IndexReader | ( | ) |
|
virtual |
|
inlinestatic |
|
protectedvirtual |
Does nothing by default. Subclasses that require a write lock for index modifications must implement this method.
Reimplemented in Lucene::DirectoryReader, Lucene::ReadOnlyDirectoryReader, and Lucene::ReadOnlySegmentReader.
|
virtual |
Clones the IndexReader and optionally changes readOnly. A readOnly reader cannot open a writable reader.
Reimplemented in Lucene::DirectoryReader, and Lucene::SegmentReader.
|
virtual |
Efficiently clones the IndexReader (sharing most internal state).
On cloning a reader with pending changes (deletions, norms), the original reader transfers its write lock to the cloned reader. This means only the cloned reader may make further changes to the index, and commit the changes to the index on close, but the old reader still reflects all changes made up until it was cloned.
Like reopen()
, it's safe to make changes to either the original or the cloned reader: all shared mutable state obeys "copy on write" semantics to ensure the changes are not seen by other readers.
Reimplemented from Lucene::LuceneObject.
Reimplemented in Lucene::DirectoryReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
void Lucene::IndexReader::close | ( | ) |
Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.
|
protected |
Commit changes resulting from delete, undeleteAll, or setNorm operations. If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics).
void Lucene::IndexReader::commit | ( | MapStringString | commitUserData | ) |
Commit changes resulting from delete, undeleteAll, or setNorm operations. If an exception is hit, then either no changes or all changes will have been committed to the index (transactional semantics).
void Lucene::IndexReader::decRef | ( | ) |
Decreases the refCount of this IndexReader instance. If the refCount drops to 0, then pending changes (if any) are committed to the index and this reader is closed.
|
virtual |
Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the document
method will result in an error. The presence of this document may still be reflected in the docFreq
statistic, though this will be corrected eventually as the index is further modified.
|
virtual |
Deletes all documents that have a given term indexed. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See deleteDocument(int)
for information about when this deletion will become effective.
|
virtual |
Returns the directory associated with this index. The default implementation returns the directory specified by subclasses when delegating to the IndexReader(Directory) constructor, or throws an UnsupportedOperation exception if one was not specified.
Reimplemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, and Lucene::SegmentReader.
|
pure virtual |
Returns the number of documents containing the term t.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
protectedpure virtual |
Implements close.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
protectedpure virtual |
Implements commit.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
virtual |
Returns the stored fields of the n'th Document in this index.
NOTE: for performance reasons, this method does not check if the requested document is deleted, and therefore asking for a deleted document may yield unspecified results. Usually this is not required, however you can call isDeleted(int)
with the requested document ID to verify the document is not deleted.
Reimplemented in Lucene::SegmentReader.
|
pure virtual |
Get the Document
at the n'th position. The FieldSelector
may be used to determine what Field
s to load and how they should be loaded. NOTE: If this Reader (more specifically, the underlying FieldsReader) is closed before the lazy Field
is loaded an exception may be thrown. If you want the value of a lazy Field
to be available after closing you must explicitly load it or fetch the Document again with a new loader.
NOTE: for performance reasons, this method does not check if the requested document is deleted, and therefore asking for a deleted document may yield unspecified results. Usually this is not required, however you can call isDeleted(int32_t)
with the requested document ID to verify the document is not deleted.
n | Get the document at the n'th position |
fieldSelector | The FieldSelector to use to determine what Fields should be loaded on the Document. May be null, in which case all Fields will be loaded. |
Document
at the n'th position Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, Lucene::SegmentReader, and Lucene::SegmentReader.
|
protectedpure virtual |
Implements deletion of the document numbered docNum. Applications should call deleteDocument(int)
or deleteDocuments(Term)
.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
protectedpure virtual |
Implements setNorm in subclass.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
protectedpure virtual |
Implements actual undeleteAll() in subclass.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
protected |
void Lucene::IndexReader::flush | ( | ) |
void Lucene::IndexReader::flush | ( | MapStringString | commitUserData | ) |
commitUserData | Opaque Map (String -> String) that's recorded into the segments file in the index, and retrievable by IndexReader#getCommitUserData . |
|
inlinevirtual |
|
virtual |
Retrieve the String userData optionally passed to IndexWriter::commit. This will return null if IndexWriter#commit(MapStringString)
has never been called for this index.
Reimplemented in Lucene::DirectoryReader.
|
static |
Reads commitUserData, previously passed to IndexWriter#commit(MapStringString)
, from current index segments file. This will return null if IndexWriter#commit(MapStringString)
has never been called for this index.
|
static |
Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.
directory | where the index resides. |
|
virtual |
This returns null if the reader has no deletions.
Reimplemented in Lucene::FilterIndexReader, and Lucene::SegmentReader.
|
virtual |
Reimplemented in Lucene::FilterIndexReader, and Lucene::SegmentReader.
|
pure virtual |
Get a list of unique field names that exist in this index and have the specified field option information.
fieldOption | specifies which field option should be available for the returned fields |
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
virtual |
Return the IndexCommit that this reader has opened. This method is only implemented by those readers that correspond to a Directory with its own segments_N file.
Reimplemented in Lucene::DirectoryReader.
int32_t Lucene::IndexReader::getRefCount | ( | ) |
Returns the current refCount for this reader.
|
virtual |
Returns the sequential sub readers that this reader is logically composed of. For example, IndexSearcher uses this API to drive searching by one sub reader at a time. If this reader is not composed of sequential child readers, it should return null. If this method returns an empty array, that means this reader is a null reader (for example a MultiReader that has no sub readers).
NOTE: You should not try using sub-readers returned by this method to make any changes (setNorm, deleteDocument, etc.). While this might succeed for one composite reader (like MultiReader), it will most likely lead to index corruption for other readers (like DirectoryReader obtained through open
. Use the parent reader directly.
Reimplemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, and Lucene::MultiReader.
|
pure virtual |
Return a term frequency vector for the specified document and field. The returned vector contains terms and frequencies for the terms in the specified field of this document, if the field had the storeTermVector flag set. If termvectors had been stored with positions or offsets, a TermPositionVector
is returned.
docNumber | document for which the term frequency vector is returned. |
field | field for which the term frequency vector is returned. |
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
pure virtual |
Load the Term Vector into a user-defined data structure instead of relying on the parallel arrays of the TermFreqVector
.
docNumber | The number of the document to load the vector for |
field | The name of the field to load |
mapper | The TermVectorMapper to process the vector. Must not be null. |
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
pure virtual |
Map all the term vectors for all fields in a Document.
docNumber | The number of the document to load the vector for |
mapper | The TermVectorMapper to process the vector. Must not be null. |
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
pure virtual |
Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector contains terms and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null. The term vectors that are returned may either be of type TermFreqVector
or of type TermPositionVector
if positions or offsets have been stored.
docNumber | document for which term frequency vectors are returned |
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
virtual |
For IndexReader implementations that use TermInfosReader to read terms, this returns the current indexDivisor as specified when the reader was opened.
Reimplemented in Lucene::DirectoryReader, and Lucene::SegmentReader.
|
virtual |
Returns the number of unique terms (across all fields) in this reader.
This method returns int64_t, even though internally Lucene cannot handle more than 2^31 unique terms, for a possible future when this limitation is removed.
Reimplemented in Lucene::SegmentReader.
|
virtual |
Version number when this IndexReader was opened. Not implemented in the IndexReader base class.
If this reader is based on a Directory (ie, was created by calling open
, or reopen
on a reader based on a Directory), then this method returns the version recorded in the commit that the reader opened. This version is advanced every time IndexWriter#commit
is called.
If instead this reader is a near real-time reader (ie, obtained by a call to IndexWriter#getReader
, or by calling reopen
on a near real-time reader), then this method returns the version of the last commit done by the writer. Note that even as further changes are made with the writer, the version will not changed until a commit is completed. Thus, you should not rely on this method to determine when a near real-time reader should be opened. Use isCurrent
instead.
Reimplemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, and Lucene::ParallelReader.
|
virtual |
Used for testing.
|
pure virtual |
Returns true if any documents have been deleted.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
virtual |
Returns true if there are norms stored for this field.
Reimplemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
void Lucene::IndexReader::incRef | ( | ) |
Increments the refCount of this IndexReader instance. RefCounts are used to determine when a reader can be closed safely, i.e. as soon as there are no more references. Be sure to always call a corresponding decRef
, in a finally clause; otherwise the reader may never be closed. Note that close
simply calls decRef(), which means that the IndexReader will not really be closed until decRef
has been called for all outstanding references.
|
static |
Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.
directory | the directory to check for an index |
|
virtual |
Check whether any new changes have occurred to the index since this reader was opened.
If this reader is based on a Directory (ie, was created by calling open
, or reopen
on a reader based on a Directory), then this method checks if any further commits (see IndexWriter#commit
have occurred in that directory).
If instead this reader is a near real-time reader (ie, obtained by a call to IndexWriter#getReader
, or by calling reopen
on a near real-time reader), then this method checks if either a new commit has occurred, or any new uncommitted changes have taken place via the writer. Note that even if the writer has only performed merging, this method will still return false.
In any event, if this returns false, you should call reopen
to get a new reader that sees the changes.
Reimplemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, and Lucene::ParallelReader.
|
pure virtual |
Returns true if document n has been deleted.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, Lucene::ReadOnlySegmentReader, and Lucene::SegmentReader.
|
virtual |
Checks is the index is optimized (if it has a single segment and no deletions). Not implemented in the IndexReader base class.
Reimplemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, and Lucene::ParallelReader.
|
static |
Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use isCurrent()
instead.
|
static |
Returns all commit points that exist in the Directory. Normally, because the default is KeepOnlyLastCommitDeletionPolicy
, there would be only one commit point. But if you're using a custom IndexDeletionPolicy
then there could be many commits. Once you have a given commit, you can open a reader on it by calling IndexReader#open(IndexCommit,bool)
. There must be at least one commit in the Directory, else this method throws an exception. Note that if a commit is in progress while this method is running, that commit may or may not be returned array.
|
static |
Prints the filename and size of each file within a given compound file. Add the -extract flag to extract files to the current working directory. In order to make the extracted version of the index work, you have to copy the segments file from the compound index into the directory where the extracted files are stored.
args | Usage: IndexReader [-extract] <cfsfile> |
|
pure virtual |
Returns one greater than the largest possible document number. This may be used to, eg., determine how big to allocate an array which will have an element for every document number in an index.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
pure virtual |
Returns the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
pure virtual |
Reads the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
int32_t Lucene::IndexReader::numDeletedDocs | ( | ) |
Returns the number of deleted documents.
|
pure virtual |
Returns the number of documents in this index.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
static |
Returns a IndexReader reading the index in the given Directory, with readOnly = true.
directory | the index directory |
|
static |
Returns an IndexReader reading the index in the given Directory. You should pass readOnly = true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader.
directory | the index directory |
readOnly | true if no changes (deletions, norms) will be made with this IndexReader |
|
static |
Returns an IndexReader reading the index in the given Directory, with a custom IndexDeletionPolicy
. You should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader.
directory | the index directory |
deletionPolicy | a custom deletion policy (only used if you use this reader to perform deletes or to set norms); see IndexWriter for details. |
readOnly | true if no changes (deletions, norms) will be made with this IndexReader |
|
static |
Returns an IndexReader reading the index in the given Directory, with a custom IndexDeletionPolicy
. You should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader.
directory | the index directory |
deletionPolicy | a custom deletion policy (only used if you use this reader to perform deletes or to set norms); see IndexWriter for details. |
readOnly | true if no changes (deletions, norms) will be made with this IndexReader |
termInfosIndexDivisor | Subsamples which indexed terms are loaded into RAM. This has the same effect as IndexWriter#setTermIndexInterval except that setting must be done at indexing time while this setting can be set per reader. When set to N, then one in every N*termIndexInterval terms in the index is loaded into memory. By setting this to a value > 1 you can reduce memory usage, at the expense of higher latency when loading a TermInfo. The default value is 1. Set this to -1 to skip loading the terms index entirely. |
|
staticprotected |
|
static |
Returns an IndexReader reading the index in the given IndexCommit
. You should pass readOnly = true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader.
commit | the commit point to open |
readOnly | true if no changes (deletions, norms) will be made with this IndexReader |
|
static |
Returns an IndexReader reading the index in the given Directory, using a specific commit and with a custom IndexDeletionPolicy
. You should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader.
commit | the specific IndexCommit to open; see IndexReader#listCommits to list all commits in a directory |
deletionPolicy | a custom deletion policy (only used if you use this reader to perform deletes or to set norms); see IndexWriter for details. |
readOnly | true if no changes (deletions, norms) will be made with this IndexReader |
|
static |
Returns an IndexReader reading the index in the given Directory, using a specific commit and with a custom IndexDeletionPolicy
. You should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader.
commit | the specific IndexCommit to open; see IndexReader#listCommits to list all commits in a directory |
deletionPolicy | a custom deletion policy (only used if you use this reader to perform deletes or to set norms); see IndexWriter for details. |
readOnly | true if no changes (deletions, norms) will be made with this IndexReader |
termInfosIndexDivisor | Subsamples which indexed terms are loaded into RAM. This has the same effect as IndexWriter#setTermIndexInterval except that setting must be done at indexing time while this setting can be set per reader. When set to N, then one in every N * termIndexInterval terms in the index is loaded into memory. By setting this to a value > 1 you can reduce memory usage, at the expense of higher latency when loading a TermInfo. The default value is 1. Set this to -1 to skip loading the terms index entirely. |
|
virtual |
Refreshes an IndexReader if the index has changed since this instance was (re)opened.
Opening an IndexReader is an expensive operation. This method can be used to refresh an existing IndexReader to reduce these costs. This method tries to only load segments that have changed or were created after the IndexReader was (re)opened.
If the index has not changed since this instance was (re)opened, then this call is a NOOP and returns this instance. Otherwise, a new instance is returned. The old instance is not closed and remains usable.
If the reader is reopened, even though they share resources internally, it's safe to make changes (deletions, norms) with the new reader. All shared mutable state obeys "copy on write" semantics to ensure the changes are not seen by other readers.
You can determine whether a reader was actually reopened by comparing the old instance with the instance returned by this method:
IndexReaderPtr reader = ... ... IndexReaderPtr newReader = r.reopen(); if (newReader != reader) { ... // reader was reopened reader->close(); } reader = newReader; ...
Be sure to synchronize that code so that other threads, if present, can never use reader after it has been closed and before it's switched to newReader. If this reader is a near real-time reader (obtained from IndexWriter#getReader()
, reopen() will simply call writer.getReader() again for you, though this may change in the future.
Reimplemented in Lucene::DirectoryReader, Lucene::MultiReader, and Lucene::ParallelReader.
|
virtual |
Just like reopen()
, except you can change the readOnly of the original reader. If the index is unchanged but readOnly is different then a new reader will be returned.
Reimplemented in Lucene::DirectoryReader.
|
virtual |
Reopen this reader on a specific commit point. This always returns a readOnly reader. If the specified commit point matches what this reader is already on, and this reader is already readOnly, then this same instance is returned; if it is not already readOnly, a readOnly clone is returned.
Reimplemented in Lucene::DirectoryReader.
|
virtual |
Resets the normalization factor for the named field of the named document.
|
virtual |
Resets the normalization factor for the named field of the named document. The norm represents the product of the field's boost
and its length normalization
. Thus, to preserve the length normalization values when resetting this, one should base the new value upon the old.
NOTE: If this field does not store norms, then this method call will silently do nothing.
|
inline |
|
pure virtual |
Returns an unpositioned TermDocs
enumerator.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
virtual |
Returns an enumeration of all the documents which contain term. For each document, the document number, the frequency of the term in that document is also provided, for use in search scoring. If term is null, then all non-deleted docs are returned with freq=1. The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.
Reimplemented in Lucene::FilterIndexReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
pure virtual |
Returns an unpositioned TermPositions
enumerator.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, Lucene::SegmentReader, and Lucene::SegmentReader.
|
virtual |
Returns an enumeration of all the documents which contain term. For each document, in addition to the document number and frequency of the term in that document, a list of all of the ordinal positions of the term in the document is available. Thus, this method positions of the term in the document is available. This positional information facilitates phrase and proximity searching. The enumeration is ordered by document number. Each document number is greater than all that precede it in the enumeration.
Reimplemented in Lucene::ParallelReader, and Lucene::SegmentReader.
|
pure virtual |
Returns an enumeration of all the terms in the index. The enumeration is ordered by Term::compareTo(). Each term is greater than all that precede it in the enumeration. Note that after calling terms(), TermEnum#next()
must be called on the resulting enumeration before calling other methods such as TermEnum#term()
.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
pure virtual |
Returns an enumeration of all terms starting at a given term. If the given term does not exist, the enumeration is positioned at the first term greater than the supplied term. The enumeration is ordered by Term::compareTo(). Each term is greater than all that precede it in the enumeration.
Implemented in Lucene::DirectoryReader, Lucene::FilterIndexReader, Lucene::MultiReader, Lucene::ParallelReader, and Lucene::SegmentReader.
|
virtual |
Undeletes all documents currently marked as deleted in this index.
|
protected |
|
protected |
|
static |
|
protected |