class modules.gemtractor.gemtractor.GEMtractor(sbml_file)[source]

Bases: object

Main class for the GEMtractor

It reads an SBML file, trims entities, and extracts the encoded network.

Parameters:sbml_file (str) – path to the SBML file

Look for genes associated to a reaction.

Will check if there are annotations using the FBC package, otherwise it will evaluate the reactions’ notes. If there are still no valid gene associations, the function assumes there is an undocumented catalyst and falls back to a gene with the reaction’s identifier (prefixed with ‘reaction_’).

Parameters:reaction (libsbml:Reaction) – the SBML S-Base reaction
Returns:the gene associations
Return type:str

get a parser to parse gene-association strings of reactions

Returns:the expression parser
Return type:pyparsing:ParserElement
_extract_genes_from_sbml_notes(annotation, default)[source]

extract the genes from a free-text sbml note

expects the gene associations as

<p>GENE_ASSOCIATION: a and (b or c)</p>
  • annotation (str) – the annotation string
  • default (str) – the default to return if no gene-associations were found

the gene-associations

Return type:



get the genes associated to a reaction

Will cache the gene associations. If there is nothing in cache, it will run _GEMtractor__find_genes(), _parse_expression(), and _unfold_complex_expression() to find them.

Parameters:reaction – the SBML S-Base reaction
Returns:list of GeneComplexes catalyzing the reaction
Return type:list of network.genecomplex.GeneComplex

implode a list of genes into a proper logical expression

basically joins the list with or, making sure every item is enclosed in brackets

Parameters:genes (list of network.genecomplex.GeneComplex) – the list of optional genes
Returns:the logical expression (genes joined using ‘or’)
Return type:str
_overwrite_genes_in_sbml_notes(new_genes, reaction)[source]

set the gene-associations for a note in an sbml reaction

overwrites it, if it was set already

  • new_genes (str) – the new gene-associations
  • reaction – the sbml reaction

parse a gene-association expression

uses the expression parser from _GEMtractor__get_expression_parser()

Parameters:expression (str) – the gene-association expression
Returns:the parse result
Return type:pyparsing:ParseResults
_set_genes_in_sbml(genes, reaction)[source]

set the genes of a reaction in an sbml model

tries to set the genes using the FBC package, if supported, otherwise sets the genes in the reaction’s notes. if genes already exist the will be overwritten…


unfold a gene-association parse result

takes a parse result and unfolds it to a list of alternative gene complexes, which can catalyze a certain reaction

Parameters:parseresult – the result of the expression parser
Returns:the list of gene-complexes catalyzing
Return type:list of network.genecomplex.GeneComplex

Extract the Network from the SBML model

Will go through the SBML file and convert the (remaining) entities into our own network structure.

Returns:the network (after optional trimming)

get the annotations of a gene product

if the document is annotated using the FBC package there is good chance we’ll find more information about a gene in the gene-product’s annotations

Parameters:gene (str) – the label gene product of interest
Returns:the annotations of the gene-product labeled gene or None if there are no such annotations
Return type:xml str

get the annotations of an sbml reaction

Parameters:reactionid (str) – the identifier of the reaction in the sbml document
Returns:the annotations of that reaction
Return type:xml str
get_sbml(filter_species=[], filter_reactions=[], filter_genes=[], filter_gene_complexes=[], remove_reaction_enzymes_removed=True, remove_ghost_species=False, discard_fake_enzymes=False, remove_reaction_missing_species=False, removing_enzyme_removes_complex=True)[source]

Get a filtered SBML document from the model file

this parses the SBML file, applies the trimming according to the arguments, and returns the trimmed model

  • filter_species (list of str) – species identifiers to get rid of
  • filter_reactions (list of str) – reaction identifiers to get rid of
  • filter_genes (list of str) – enzyme identifiers to get rid of
  • filter_gene_complexes (list of str) – enzyme-complex identifiers to get rid of, every list-item should be of format: ‘A + B + gene42’
  • remove_reaction_enzymes_removed (bool) – should we remove a reaction if all it’s genes were removed?
  • remove_ghost_species (bool) – should species be removed, that do not participate in any reaction anymore - even though they might be required in other entities?
  • discard_fake_enzymes (bool) – should fake enzymes (implicitly assumes enzymes, if no enzymes are annotated to a reaction) be removed?
  • remove_reaction_missing_species (bool) – remove a reaction if one of the participating genes was removed?
  • removing_enzyme_removes_complex (bool) – if an enzyme is removed, should also all enzyme complexes be removed in which it participates?

the SBML document

Return type: