ir package
mika.ir.search
- class mika.ir.search(column_to_search, data, retrieval_model, use_reranker=False, reranker_model=None)
Class to perform information retrieval using semantic search and an optional reranker.
- Variables:
data (Data() object) – Data object loaded using mika.utils.Data
column_to_search (str) – name of column to search - use “Combined Text” if it is desired to search all text columns (must first be defined in Data object)
retrieval_model (SentenceTransformer model) – a sentence transformer model defined according to: https://www.sbert.net/examples/applications/computing-embeddings/README.html?highlight=sentencetransformer#sentence_transformers.SentenceTransformer
reranker_model (CrossEncoder model [optional]) – a cross encoder model defined according to: https://www.sbert.net/docs/package_reference/cross_encoder.html?highlight=crossencoder#sentence_transformers.cross_encoder.CrossEncoder
- get_sentence_embeddings(savepath)
Compute sentence embeddings for the corpus.
- Parameters:
savepath (str) – Path to save sentence embeddings
- Return type:
None
- load_sentence_embeddings(filepath)
Load previously computed sentence embeddings.
- Parameters:
filepath (str) – Path to previously saved sentence embeddings
- Return type:
None
- run_search(query, rank_k=1, return_k=1, use_passages=False)
Run the search using a query and loaded corpus embeddings.
- Parameters:
query (str) – Query to search
rank_k (int) – Number of results to rank in semantic search
return_k (int) – Number of results to return (must be less than or equal to rank_k)
- Returns:
ranked_hits – Doc ID, scores, and text of top k hits, structured in a DataFrame
- Return type:
DataFrame