Gilda modules reference

API

gilda.api.annotate(text, sent_split_fun=None, organisms=None, namespaces=None, context_text=None)[source]

Annotate a given text with Gilda (i.e., do named entity recognition).

Parameters

text (str) – The text to be annotated.
sent_split_fun (Callable[str, Iterable[Tuple[int, int]]], optional) – A function that splits the text into sentences. The default is nltk.tokenize.PunktSentenceTokenizer.span_tokenize(). The function should take a string as input and return an iterable of coordinate pairs corresponding to the start and end coordinates for each sentence in the input text.
organisms (list[str], optional) – A list of organism names to pass to the grounder. If not provided, human is used.
namespaces (list[str], optional) – A list of namespaces to pass to the grounder to restrict the matches to. By default, no restriction is applied.
context_text (Optional[str]) – A longer span of text that serves as additional context for the text being annotated for disambiguation purposes.

Returns

A list of matches where each match is an Annotation object which contains as attributes the text span that was matched, the list of ScoredMatches, and the start and end character offsets of the text span.

Return type

list[Annotation]

gilda.api.get_grounder()[source]

Initialize and return the default Grounder instance.

Return type: Grounder
Returns: A Grounder instance whose attributes and methods can be used directly.

gilda.api.get_models()[source]

Return a list of entity texts for which disambiguation models exist.

Returns: The list of entity texts for which a disambiguation model is available.
Return type: list[str]

gilda.api.get_names(db, id, status=None, source=None)[source]

Return a list of entity texts corresponding to a given database ID.

Parameters

db (str) – The database in which the ID is an entry, e.g., HGNC.
id (str) – The ID of an entry in the database.
status (Optional[str]) – If given, only entity texts with the given status e.g., “synonym” are returned.
source (Optional[str]) – If given, only entity texts from the given source e.g., “uniprot” are returned.

gilda.api.ground(text, context=None, organisms=None, namespaces=None)[source]

Return a list of scored matches for a text to ground.

Parameters

text (str) – The entity text to be grounded.
context (Optional[str]) – Any additional text that serves as context for disambiguating the given entity text, used if a model exists for disambiguating the given text.
organisms (Optional[List[str]]) – A list of taxonomy identifiers to use as a priority list when surfacing matches for proteins/genes from multiple organisms.
namespaces (Optional[List[str]]) – A list of namespaces to restrict the matches to. By default, no restriction is applied.

Returns

A list of ScoredMatch objects representing the groundings.

Return type

list[gilda.grounder.ScoredMatch]

Examples

Ground a string corresponding to an entity name, label, or synonym

>>> import gilda
>>> scored_matches = gilda.ground('mapt')

The matches are sorted in descending order by score, and in the event of a tie, by the namespace of the primary grounding. Each scored match has a gilda.term.Term object that contain information about the primary grounding.

>>> scored_matches[0].term.db
'hgnc'
>>> scored_matches[0].term.id
'6893'
>>> scored_matches[0].term.get_curie()
'hgnc:6893'

The score for each match can be accessed directly:

>>> scored_matches[0].score
0.7623

The rationale for each match is contained in the match attribute whose fields are described in gilda.scorer.Match:

>>> match_object = scored_matches[0].match

Give optional context to be used by Gilda’s disambiguation models, if available

>>> scored_matches = gilda.ground('ER', context='Calcium is released from the ER.')

Only return results from a certain namespace, such as when a family and gene have the same name

>>> scored_matches = gilda.ground('ESR', namespaces=["hgnc"])

gilda.api.make_grounder(terms)[source]

Create a custom grounder from a list of Terms.

Parameters: terms (Union[str, List[Term], Mapping[str, List[Term]]]) – Specifies the grounding terms that should be loaded in the Grounder. If str, it is interpreted as a path to a grounding terms gzipped TSV file which is then loaded. If list, it is assumed to be a flat list of Terms. If dict, it is assumed to be a grounding terms dict with normalized entity strings as keys and lists of Term objects as values. Default: None
Return type: Grounder
Returns: A Grounder instance, initialized with either the default terms loaded from the resource file or a custom set of terms if the terms argument was specified.

Examples

The following example shows how to get an ontology with obonet and load custom terms:

from gilda import make_grounder
from gilda.process import normalize
from gilda import Term

prefix = "UBERON"
url = "http://purl.obolibrary.org/obo/uberon/basic.obo"
g = obonet.read_obo(url)
custom_obo_terms = []
it = tqdm(g.nodes(data=True), unit_scale=True, unit="node")
for node, data in it:
    # Skip entries imported from other ontologies
    if not node.startswith(f"{prefix}:"):
        continue

    identifier = node.removeprefix(f"{prefix}:")

    name = data["name"]
    custom_obo_terms.append(gilda.Term(
        norm_text=normalize(name),
        text=name,
        db=prefix,
        id=identifier,
        entry_name=name,
        status="name",
        source=prefix,
    ))

    # Add terms for all synonyms
    for synonym_raw in data.get("synonym", []):
        try:
            # Try to parse out of the quoted OBO Field
            synonym = synonym_raw.split('"')[1].strip()
        except IndexError:
            continue  # the synonym was malformed

        custom_obo_terms.append(gilda.Term(
            norm_text=normalize(synonym),
            text=synonym,
            db=prefix,
            id=identifier,
            entry_name=name,
            status="synonym",
            source=prefix,
        ))

custom_grounder = gilda.make_grounder(custom_obo_terms)
scored_matches = custom_grounder.ground("head")

Additional examples for loading custom content from OBO Graph JSON, pyobo, and more can be found in the Jupyter notebooks in the Gilda repository on GitHub.

Grounder

class gilda.grounder.Annotation(text, matches, start, end)[source]

Bases: object

A class representing a named entity annotation in a given text.

text

The text span that was annotated.

Type: str

matches

The list of scored matches for the text span.

Type: list[ScoredMatch]

start

The start character offset of the text span.

Type: int

end

The end character offset of the text span.

Type: int

to_json()[source]: Convert the Annotation object to JSON.

class gilda.grounder.Grounder(terms=None, *, namespace_priority=None)[source]

Bases: object

Class to look up and ground query texts in a terms file.

Parameters

terms (Union[str, Path, Iterable[Term], Mapping[str, List[Term]], None]) –
Specifies the grounding terms that should be loaded in the Grounder.
- If None, the default grounding terms are loaded from the versioned resource folder.
- If str or pathlib.Path, it is interpreted as a path to a grounding terms gzipped TSV file which is then loaded. If it’s a str and looks like a URL, will be downloaded from the internet
- If dict, it is assumed to be a grounding terms dict with normalized entity strings as keys and gilda.term.Term instances as values.
- If list, set, tuple, or any other iterable, it is assumed to be a flat list of gilda.term.Term instances.
namespace_priority (Optional[List[str]]) – Specifies a term namespace priority order. For example, if multiple terms are matched with the same score, will use this list to decide which are given by which namespace appears further towards the front of the list. By default, DEFAULT_NAMESPACE_PRIORITY is used, which, for example, prioritizes famplex entities over HGNC ones.

get_ambiguities(skip_names=True, skip_curated=True, skip_name_matches=True, skip_species_ambigs=True)[source]

Return a list of ambiguous term groups in the grounder.

Parameters

skip_names (bool) – If True, groups of terms where one has the “name” status are skipped. This makes sense usually since these are prioritized over synonyms anyway.
skip_curated (bool) – If True, groups of terms where one has the “curated” status are skipped. This makes sense usually since these are prioritized over synonyms anyway.
skip_name_matches (bool) – If True, groups of terms that all share the same standard name are skipped. This is effective at eliminating spurious ambiguities due to unresolved cross-references between equivalent terms in different namespaces.
skip_species_ambigs (bool) – If True, groups of terms that are all genes or proteins, and are all from different species (one term from each species) are skipped. This is effective at eliminating ambiguities between orthologous genes in different species that are usually resolved using the organism priority list.

Return type

List[List[Term]]

get_models()[source]

Return a list of entity texts for which disambiguation models exist.

Returns: The list of entity texts for which a disambiguation model is available.
Return type: list[str]

get_names(db, id, status=None, source=None)[source]

Return a list of entity texts corresponding to a given database ID.

Parameters

db (str) – The database in which the ID is an entry, e.g., HGNC.
id (str) – The ID of an entry in the database.
status (Optional[str]) – If given, only entity texts with the given status e.g., “synonym” are returned.
source (Optional[str]) – If given, only entity texts from the given source e.g., “uniprot” are returned.

Returns

names – A list of entity texts corresponding to the given database/ID

Return type

list[str]

ground(raw_str, context=None, organisms=None, namespaces=None)[source]

Return scored groundings for a given raw string.

Parameters

raw_str (str) – A string to be grounded with respect to the set of Terms that the Grounder contains.
context (Optional[str]) – Any additional text that serves as context for disambiguating the given entity text, used if a model exists for disambiguating the given text.
organisms (Optional[List[str]]) – An optional list of organism identifiers defining a priority ranking among organisms, if genes/proteins from multiple organisms match the input. If not provided, the default [‘9606’] i.e., human is used.
namespaces (Optional[List[str]]) – A list of namespaces to restrict matches to. This will apply to both the primary namespace of a matched term, to any subsumed matches, and to the source namespaces of terms if they were created using cross-reference mappings. By default, no restriction is applied.

Returns

A list of ScoredMatch objects representing the groundings sorted by decreasing score.

Return type

list[gilda.grounder.ScoredMatch]

ground_best(raw_str, context=None, organisms=None, namespaces=None)[source]

Return the best scored grounding for a given raw string.

Parameters

raw_str (str) – A string to be grounded with respect to the set of Terms that the Grounder contains.
context (Optional[str]) – Any additional text that serves as context for disambiguating the given entity text, used if a model exists for disambiguating the given text.
organisms (Optional[List[str]]) – An optional list of organism identifiers defining a priority ranking among organisms, if genes/proteins from multiple organisms match the input. If not provided, the default [‘9606’] i.e., human is used.
namespaces (Optional[List[str]]) – A list of namespaces to restrict matches to. This will apply to both the primary namespace of a matched term, to any subsumed matches, and to the source namespaces of terms if they were created using cross-reference mappings. By default, no restriction is applied.

Returns

The best ScoredMatch returned by ground() if any are returned, otherwise None.

Return type

Optional[gilda.grounder.ScoredMatch]

lookup(raw_str)[source]

Return matching Terms for a given raw string.

Parameters: raw_str (str) – A string to be looked up in the set of Terms that the Grounder contains.
Return type: List[Term]
Returns: A list of Terms that are potential matches for the given string.

print_summary(**kwargs)[source]

Print the summary of this grounder.

Return type: None

summary_str()[source]

Summarize the contents of the grounder.

Return type: str

class gilda.grounder.ScoredMatch(term, score, match, disambiguation=None, subsumed_terms=None)[source]

Bases: object

Class representing a scored match to a grounding term.

term

The Term that the scored match is for.

Type: gilda.grounder.Term

score

The score associated with the match.

Type: float

match

The Match object characterizing the match to the Term.

Type: gilda.scorer.Match

disambiguation

Meta-information about disambiguation, when available.

Type: Optional[dict]

subsumed_terms

A list of additional Term objects that also matched, have the same db/id value as the term associated with the match, but were further down the score ranking. In some cases examining the subsumed terms associated with a match can provide additional metadata in downstream applications.

Type: Optional[list[gilda.grounder.Term]]

get_grounding_dict()[source]

Get the groundings as CURIEs and URLs.

Return type: Mapping[str, str]

get_groundings()[source]

Return all groundings for this match including from mapped and subsumed terms.

Return type: Set[Tuple[str, str]]
Returns: A set of tuples representing groundings for this match including the grounding for the primary term as well as any subsumed terms, and groundings that come from having mapped an original source grounding during grounding resource construction.

get_namespaces()[source]

Return all namespaces for this match including from mapped and subsumed terms.

Return type: Set[str]
Returns: A set of strings representing namespaces for terms involved in this match, including the namespace for the primary term as well as any subsumed terms, and groundings that come from having mapped an original source grounding during grounding resource construction.

gilda.grounder.load_entries_from_terms_file(terms_file)[source]

Yield Terms from a compressed terms TSV file path.

Parameters: terms_file (Union[str, Path]) – Path to a compressed TSV terms file with columns corresponding to the serialized elements of a Term.
Return type: Iterator[Term]
Returns: Terms loaded from the file yielded by a generator.

gilda.grounder.load_terms_file(terms_file)[source]

Load a TSV file containing terms into a lookup dictionary.

Parameters: terms_file (Union[str, Path]) – Path to a compressed TSV terms file with columns corresponding to the serialized elements of a Term.
Return type: Mapping[str, List[Term]]
Returns: A lookup dictionary whose keys are normalized entity texts, and values are lists of Terms with that normalized entity text.

Scorer

class gilda.scorer.Match(query, ref, exact=None, space_mismatch=None, dash_mismatches=None, cap_combos=None)[source]

Bases: object

Class representing a match between a query and a reference string

gilda.scorer.generate_match(query, ref, beginning_of_sentence=False)[source]

Return a match data structure based on comparing a query to a ref str.

Parameters

query (str) – The string to be compared against a reference string.
ref (str) – The reference string against which the incoming query string is compared.
beginning_of_sentence (bool) – True if the query_str appears at the beginning of a sentence, relevant for how capitalization is evaluated.

Returns

A Match object characterizing the match between the two strings.

Return type

Match

gilda.scorer.score_string_match(match)[source]

Return a score between 0 and 1 for the goodness of a match.

This score is purely based on the relationship of the two strings and does not take the status of the reference into account.

Parameters: match (gilda.scorer.Match) – The Match object characterizing the relationship of the query and reference strings.
Returns: A match score between 0 and 1.
Return type: float

Term

class gilda.term.Term(norm_text, text, db, id, entry_name, status, source, organism=None, source_db=None, source_id=None)[source]

Bases: object

Represents a text entry corresponding to a grounded term.

norm_text

The normalized text corresponding to the text entry, used for lookups.

Type: str

text

The text entry itself.

Type: str

db

The database / name space corresponding to the grounded term.

Type: str

id

The identifier of the grounded term within the database / name space.

Type: str

entry_name

The standardized name corresponding to the grounded term.

Type: str

status

The relationship of the text entry to the grounded term, e.g., synonym.

Type: str

source

The source from which the term was obtained.

Type: str

organism

When the term represents a protein, this attribute provides the taxonomy code of the species for the protein. For non-proteins, not provided. Default: None

Type: Optional[str]

source_db

If the term’s db/id was mapped from a different, original db/id from a given source, this attribute provides the original db value before mapping.

Type: Optional[str]

source_id

If the term’s db/id was mapped from a different, original db/id from a given source, this attribute provides the original ID value before mapping.

Type: Optional[str]

get_bioregistry_url()[source]

Return a URL for this term that the Bioregistry can resolve.

Returns: A Bioregistry URL string for this term, or None if it cannot be created.

get_curie(style='bioregistry')[source]

Return the compact URI for this term.

Parameters: style (str, optional) – The style of CURIE to return. One of ‘bioregistry’ (default) or ‘identifiers’. The ‘bioregistry’ style corresponds to bioregistry.io CURIEs, aqnd the ‘identifiers’ style corresponds to identifiers.org CURIEs.
Return type: str
Returns: A normalized CURIE string for this term, or None if it cannot be normalized.

get_groundings()[source]

Return all groundings for this term, including from a mapped source.

Return type: Set[Tuple[str, str]]
Returns: A set of tuples representing the main grounding for this term, as well as any source grounding from which the main grounding was mapped.

get_idenfiers_url()[source]

Return a URL for this term that Identifiers.org can resolve.

Returns: An Identifiers.org URL string for this term, or None if it cannot be created.

get_identifiers_url()[source]: Get the full identifiers.org URL for this term.

get_namespaces()[source]

Return all namespaces for this term, including from a mapped source.

Return type: Set[str]
Returns: A set of strings including the main namespace for this term, as well as any source namespace from which the main grounding was mapped.

to_json()[source]: Return the term serialized into a JSON dict.

to_list()[source]: Return the term serialized into a list of strings.

gilda.term.dump_terms(terms, fname)[source]

Dump a list of terms to a tsv.gz file.

Return type: None

gilda.term.get_bioregistry_url(db, id)[source]

Return a URL that the Bioregistry can resolve.

Parameters

db (str) – The database / namespace of the identifier assuming the default Gilda configuration.
id (str) – The identifier, assuming the default Gilda configuration.

Return type

Optional[str]

Returns

A Bioregistry URL string, or None if it cannot be created.

gilda.term.get_curie(db, id, style='bioregistry')[source]

Return a normalized CURIE for the given database and identifier.

The default Gilda configuration uses INDRA’s style of databases and identifiers. This function is a simple way to normalize these into CURIEs that follow the native style of bioregistry.io or identifiers.org.

Parameters

db (str) – The database / namespace of the identifier assuming the default Gilda configuration.
id (str) – The identifier, assuming the default Gilda configuration.
style (str, optional) – The style of CURIE to return. One of ‘bioregistry’ (default) or ‘identifiers’. The ‘bioregistry’ style corresponds to bioregistry.io CURIEs, aqnd the ‘identifiers’ style corresponds to identifiers.org CURIEs.

Return type

Optional[str]

Returns

A normalized CURIE string, or None if it cannot be normalized.

gilda.term.get_identifiers_curie(db, id)[source]

Get the full identifiers.org curie for a term.

Return type: Optional[str]

gilda.term.get_identifiers_url(db, id)[source]

Return a URL for this term that Identifiers.org can resolve.

Parameters

db (str) – The database / namespace of the identifier assuming the default Gilda configuration.
id (str) – The identifier, assuming the default Gilda configuration.

Return type

Optional[str]

Returns

An Identifiers.org URL string for this term, or None if it cannot be created.

Process

Module containing various string processing functions used for grounding.

gilda.process.dashes = ['−', '-', '‐', '‑', '‒', '–', '—', '―']: A list of all kinds of dashes

gilda.process.depluralize(word)[source]

Return the depluralized version of the word, along with a status flag.

Parameters: word (str) – The word which is to be depluralized.
Returns: The original word, if it is detected to be non-plural, or the depluralized version of the word, and a status flag representing the detected pluralization status of the word, with non_plural (e.g., BRAF), plural_oes (e.g., mosquitoes), plural_ies (e.g., antibodies), plural_es (e.g., switches), plural_cap_s (e.g., MAPKs), and plural_s (e.g., receptors).
Return type: list of str pairs

gilda.process.get_capitalization_pattern(word, beginning_of_sentence=False)[source]

Return the type of capitalization for the string.

Parameters

word (str) – The word whose capitalization is determined.
beginning_of_sentence (Optional[bool]) – True if the word appears at the beginning of a sentence. Default: False

Returns

The capitalization pattern of the given word. Returns one of the following: sentence_initial_cap, single_cap_letter, all_caps, all_lower, initial_cap, mixed.

Return type

str

gilda.process.normalize(s)[source]

Normalize white spaces, dashes and case of a given string.

Parameters: s (str) – The string to be normalized.
Returns: The normalized string.
Return type: str

gilda.process.remove_dashes(s)[source]

Remove all types of dashes in the given string.

Parameters: s (str) – The string in which all types of dashes should be replaced.
Returns: The string from which dashes have been removed.
Return type: str

gilda.process.replace_dashes(s, rep='-')[source]

Replace all types of dashes in a given string with a given replacement.

Parameters

s (str) – The string in which all types of dashes should be replaced.
rep (Optional[str]) – The string with which dashes should be replaced. By default, the plain ASCII dash (-) is used.

Returns

The string in which dashes have been replaced.

Return type

str

gilda.process.replace_greek_latin(s)[source]: Replace Greek spelled out letters with their latin character.

gilda.process.replace_greek_spelled_out(s)[source]: Replace Greek unicode character with latin spelled out.

gilda.process.replace_greek_uni(s)[source]: Replace Greek spelled out letters with their unicode character.

gilda.process.replace_unicode(s)[source]

Replace unicode with ASCII equivalent, except Greek letters.

Greek letters are handled separately and aren’t replaced in this context.

gilda.process.replace_whitespace(s, rep=' ')[source]

Replace any length white spaces in the given string with a replacement.

Parameters

s (str) – The string in which any length whitespaces should be replaced.
rep (Optional[str]) – The string with which all whitespace should be replaced. By default, the plain ASCII space ( ) is used.

Returns

The string in which whitespaces have been replaced.

Return type

str

gilda.process.split_preserve_tokens(s)[source]

Return split words of a string including the non-word tokens.

Parameters: s (str) – The string to be split.
Returns: The list of words in the string including the separator tokens, typically spaces and dashes..
Return type: list of str

Named Entity Recognition

Gilda implements a simple dictionary-based named entity recognition (NER) algorithm. It can be used as follows:

>>> from gilda.ner import annotate
>>> text = "MEK phosphorylates ERK"
>>> results = annotate(text)

The results are a list of Annotation objects each of which contains:

the text string matched
a list of gilda.grounder.ScoredMatch instances containing a sorted list of matches for the given text span (first one is the best match)
the start position in the text string where the entity starts
the end position in the text string where the entity ends

In this example, the two concepts are grounded to FamPlex entries.

>>> results[0].text, results[0].matches[0].term.get_curie(), results[0].start, results[0].end
('MEK', 'fplx:MEK', 0, 3)
>>> results[1].text, results[1].matches[0].term.get_curie(), results[1].start, results[1].end
('ERK', 'fplx:ERK', 19, 22)

If you directly look in the second part of the 4-tuple, you get a full description of the match itself:

>>> results[0].matches[0]
ScoredMatch(Term(mek,MEK,FPLX,MEK,MEK,curated,famplex,None,None,None),0.9288806431663574,Match(query=mek,ref=MEK,exact=False,space_mismatch=False,dash_mismatches=set(),cap_combos=[('all_lower', 'all_caps')]))

BRAT

Gilda implements a way to output annotation in a format appropriate for the BRAT Rapid Annotation Tool (BRAT).

>>> from gilda.ner import get_brat
>>> from pathlib import Path
>>> brat_string = get_brat(results)
>>> Path("results.ann").write_text(brat_string)
>>> Path("results.txt").write_text(text)

For brat to work, you need to store the text in a file with the extension .txt and the annotations in a file with the same name but extension .ann.

gilda.ner.annotate(text, *, grounder=None, sent_split_fun=None, organisms=None, namespaces=None, context_text=None)[source]

Annotate a given text with Gilda.

Parameters

text (str) – The text to be annotated.
grounder (gilda.grounder.Grounder, optional) – The Gilda grounder to use for grounding.
sent_split_fun (Callable[str, Iterable[Tuple[int, int]]], optional) – A function that splits the text into sentences. The default is nltk.tokenize.PunktSentenceTokenizer.span_tokenize(). The function should take a string as input and return an iterable of coordinate pairs corresponding to the start and end coordinates for each sentence in the input text.
organisms (list[str], optional) – A list of organism names to pass to the grounder. If not provided, human is used.
namespaces (List[str], optional) – A list of namespaces to pass to the grounder to restrict the matches to. By default, no restriction is applied.
context_text (Optional[str]) – A longer span of text that serves as additional context for the text being annotated for disambiguation purposes.

Returns

A list of Annotations where each contains as attributes the text span that was matched, the list of ScoredMatches, and the start and end character offsets of the text span.

Return type

List[Annotation]

gilda.ner.get_brat(annotations, entity_type='Entity', ix_offset=1, include_text=True)[source]

Return brat-formatted annotation strings for the given entities.

Parameters

annotations (list[Annotation]) – A list of named entity annotations in the text.
entity_type (str, optional) – The brat entity type to use for the annotations. The default is ‘Entity’. This is useful for differentiating between annotations in the same text extracted from different reading systems.
ix_offset (int, optional) – The index offset to use for the brat annotations. The default is 1.
include_text (bool, optional) – Whether to include the text of the entity in the brat annotations. The default is True. If not provided, the text that matches the span will be written to the annotation file.

Returns

A string containing the brat-formatted annotations.

Return type

str

Pandas Utilities

Utilities for Pandas.

gilda.pandas_utils.ground_df(df, source_column, *, target_column=None, grounder=None, **kwargs)[source]

Ground the elements of a column in a Pandas dataframe as CURIEs, in-place.

Parameters

df (DataFrame) – A pandas dataframe
source_column (Union[str, int]) – The column to ground. This column contains text corresponding to named entities’ labels or synonyms
target_column (Union[None, str, int]) – The column where to put the groundings (either a CURIE string, or None). It’s possible to create a new column when passing a string for this argument. If not given, will create a new column name like <source column>_grounded.
grounder (Optional[Grounder]) – A custom grounder. If none given, uses the built-in grounder.
kwargs – Keyword arguments passed to Grounder.ground(), could include context, organisms, or namespaces.

Return type

None

Examples

The following example shows how to use this function.

import pandas as pd
import gilda

url = "https://raw.githubusercontent.com/OBOAcademy/obook/master/docs/tutorial/linking_data/data.csv"
df = pd.read_csv(url)
gilda.ground_df(df, source_column="disease", target_column="disease_curie")

gilda.pandas_utils.ground_df_map(df, source_column, *, grounder=None, **kwargs)[source]

Ground the elements of a column in a Pandas dataframe as CURIEs.

Parameters

df (DataFrame) – A pandas dataframe
source_column (Union[str, int]) – The column to ground. This column contains text corresponding to named entities’ labels or synonyms
grounder (Optional[Grounder]) – A custom grounder. If none given, uses the built-in ground.
kwargs – Keyword arguments passed to Grounder.ground(), could include context, organisms, or namespaces.

Returns

A pandas series representing the grounded CURIE strings. Contains NaNs if grounding was not successful or if there was an NaN in the cell before.

Return type

series