Gilda modules reference¶
API¶
-
gilda.api.
ground
(text, context=None, organisms=None)[source]¶ Return a list of scored matches for a text to ground.
Parameters: Returns: A list of ScoredMatch objects representing the groundings.
Return type:
-
gilda.api.
get_models
()[source]¶ Return a list of entity texts for which disambiguation models exist.
Returns: The list of entity texts for which a disambiguation model is available. Return type: list[str]
-
gilda.api.
get_names
(db, id, status=None, source=None)[source]¶ Return a list of entity texts corresponding to a given database ID.
Parameters: - db (str) – The database in which the ID is an entry, e.g., HGNC.
- id (str) – The ID of an entry in the database.
- status (Optional[str]) – If given, only entity texts with the given status e.g., “synonym” are returned.
- source (Optional[str]) – If given, only entity texts from the given source e.g., “uniprot” are returned.
Grounder¶
-
class
gilda.grounder.
Grounder
(terms=None)[source]¶ Bases:
object
Class to look up and ground query texts in a terms file.
Parameters: terms (str or dict or None) – Specifies the grounding terms that should be loaded in the Grounder. If None, the default grounding terms are loaded from the versioned resource folder. If str, it is interpreted as a path to a grounding terms TSV file which is then loaded. If dict, it is assumed to be a grounding terms dict with normalized entity strings as keys and Term objects as values. Default: None -
get_models
()[source]¶ Return a list of entity texts for which disambiguation models exist.
Returns: The list of entity texts for which a disambiguation model is available. Return type: list[str]
-
get_names
(db, id, status=None, source=None)[source]¶ Return a list of entity texts corresponding to a given database ID.
Parameters: - db (str) – The database in which the ID is an entry, e.g., HGNC.
- id (str) – The ID of an entry in the database.
- status (Optional[str]) – If given, only entity texts with the given status e.g., “synonym” are returned.
- source (Optional[str]) – If given, only entity texts from the given source e.g., “uniprot” are returned.
Returns: names – A list of entity texts corresponding to the given database/ID
Return type:
-
-
class
gilda.grounder.
ScoredMatch
(term, score, match, disambiguation=None)[source]¶ Bases:
object
Class representing a scored match to a grounding term.
-
term
¶ The Term that the scored match is for.
Type: gilda.grounder.Term
-
match
¶ The Match object characterizing the match to the Term.
Type: gilda.scorer.Match
-
-
gilda.grounder.
load_terms_file
(terms_file)[source]¶ Load a TSV file containing terms into a lookup dictionary.
Parameters: terms_file (str) – Path to a TSV terms file with columns corresponding to the serialized elements of a Term. Returns: A lookup dictionary whose keys are normalized entity texts, and values are lists of Terms with that normalized entity text. Return type: dict
Scorer¶
-
class
gilda.scorer.
Match
(query, ref, exact=None, space_mismatch=None, dash_mismatches=None, cap_combos=None)[source]¶ Bases:
object
Class representing a match between a query and a reference string
-
gilda.scorer.
generate_match
(query, ref, beginning_of_sentence=False)[source]¶ Return a match data structure based on comparing a query to a ref str.
Parameters: Returns: A Match object characterizing the match between the two strings.
Return type:
-
gilda.scorer.
score_namespace
(term)[source]¶ Note: this is currently not included as an explicit score term. It is just used to rank identically scored entries.
-
gilda.scorer.
score_string_match
(match)[source]¶ Return a score between 0 and 1 for the goodness of a match.
This score is purely based on the relationship of the two strings and does not take the status of the reference into account.
Parameters: match (gilda.scorer.Match) – The Match object characterizing the relationship of the query and reference strings. Returns: A match score between 0 and 1. Return type: float
Term¶
Process¶
Module containing various string processing functions used for grounding.
-
gilda.process.
depluralize
(word)[source]¶ Return the depluralized version of the word, along with a status flag.
Parameters: word (str) – The word which is to be depluralized. Returns: - str – The original word, if it is detected to be non-plural, or the depluralized version of the word.
- str – A status flag represeting the detected pluralization status of the word, with non_plural (e.g., BRAF), plural_oes (e.g., mosquitoes), plural_ies (e.g., antibodies), plural_es (e.g., switches), plural_cap_s (e.g., MAPKs), and plural_s (e.g., receptors).
-
gilda.process.
get_capitalization_pattern
(word, beginning_of_sentence=False)[source]¶ Return the type of capitalization for the string.
Parameters: Returns: The capitalization pattern of the given word. Returns one of the following: sentence_initial_cap, single_cap_letter, all_caps, all_lower, initial_cap, mixed.
Return type:
-
gilda.process.
normalize
(s)[source]¶ Normalize white spaces, dashes and case of a given string.
Parameters: s (str) – The string to be normalized. Returns: The normalized string. Return type: str
-
gilda.process.
remove_dashes
(s)[source]¶ Remove all types of dashes in the given string.
Parameters: s (str) – The string in which all types of dashes should be replaced. Returns: The string from which dashes have been removed. Return type: str
-
gilda.process.
replace_dashes
(s, rep='-')[source]¶ Replace all types of dashes in a given string with a given replacement.
Parameters: Returns: The string in which dashes have been replaced.
Return type:
-
gilda.process.
replace_greek_latin
(s)[source]¶ Replace Greek spelled out letters with their latin character.
-
gilda.process.
replace_greek_spelled_out
(s)[source]¶ Replace Greek unicode character with latin spelled out.
-
gilda.process.
replace_greek_uni
(s)[source]¶ Replace Greek spelled out letters with their unicode character.