metquest package

Submodules

metquest.construct_graph module

create_graph(path_name_with_models, no_of_orgs)[source]

This function creates bipartite graph of the organisms based on the path provided and the number of organsisms. For instance, if a folder has 3 model files, and the number of organisms is 2, 3 (3C2) different bipartite graphs are created. The graph objects and the dictionary are saved as gpickle and pickle files respectively.

Parameters:
  • path_name_with_models (str) – Absolute path name of the folder containing the models.
  • no_of_orgs (int) – Number of organisms to be used for creating the DiGraph.
Returns:

  • H (NetworkX DiGraph Object) – Bipartite graph consisting of internal and exchange reactions in organisms
  • full_name_map (dict) – Dictionary mapping the adhoc reaction names to reaction names in the model

metquest.execute_metquest module

execute_all_codes()[source]

This function executes all the codes including constructing graphs and executing metquest.

Parameters:None
Returns:
Return type:None
find_important_reactions(all_reactions_involved, currenttarmet, seed_metabolites, namemap, G)[source]

This function determines the important reactions based on the pathways generated for the target metabolite.

Parameters:
  • all_reactions_involved (list) – list of all reactions found in all the pathways from source to target
  • currenttarmet (str) – Current target metabolite
  • seed_metabolites (set) – Set of seed metabolites including the source
  • namemap (dict) – Dictionary mapping the adhoc reaction names to reaction names in the model
  • G (NetworkX DiGraph Object) – Bipartite graph of the metabolic network
Returns:

Return type:

None

Notes

We define important reactions as those reactions which occur in almost all the pathways producing the target metabolite (apart from the reactions that are involved in the production of target metabolite and the uptake of seed metabolite)

find_jaccard_between_paths(only_source_to_target)[source]

This function determines the jaccard values between the pathways generated from the source to the target.

Parameters:only_source_to_target (list) – list of lists consisting of all pathways producing the target metabolite from the source
Returns:
  • jaccard_values (list) – list of all jaccard values (float) for all the pathway combinations
  • path_combinations (list) – list of all pathway combinations corresponding to the jaccard values

Notes

Jaccard value J = (set(A).intersection(set(B)))/(set(A).union(set(B))) J = 1 indicates two sets are the same J = 0 indicates two sets are different

find_pathways_involving_exchange_mets(number_of_xml, pathway_table, currenttarmet, seed_metabolites, namemap, G)[source]

This function identifies the pathways producing the target metabolites, which involve exchange metabolites. This function prints output only when a community of organisms is considered, i.e., when more than one metabolic network is used.

Parameters:
  • number_of_xml (int) – Number of xml files in the folder
  • pathway_table (dict) – Dictionary of dictionary containing the pathways of different sizes identified for every metabolite. This will have only the acyclic/ branched pathways.
  • currenttarmet (str) – Current target metabolite
  • seed_metabolites (set) – Set of seed metabolites including the source
  • namemap (dict) – Dictionary mapping the adhoc reaction names to reaction names in the model
  • G (NetworkX DiGraph Object) – Bipartite graph of the metabolic network
Returns:

exchange_candidates_inverted_dict – Dictionary containing the number of times an exchange reaction is repeated

Return type:

dict

find_pathways_starting_from_source(source_metabolites, pathway_table, currenttarmet, cutoff, G)[source]

This function finds all pathways starting from the source metabolites

Parameters:
  • source_metabolites (list) – List of source metabolites
  • pathway_table (dict) – Dictionary of dictionary containing the pathways of different sizes identified for every metabolite. This will have only the acyclic/ branched pathways.
  • currenttarmet (str) – Current target metabolite
  • cutoff (int) – Maximum pathway length cutoff
  • G (NetworkX DiGraph Object) – Bipartite graph of the metabolic network
Returns:

  • most_different_paths (dict) – For the given source metabolite, a combination of two most different pathways based on minimum Jaccard value is returned.
  • only_source_to_target (list) – list of list containing all pathways starting from source metabolite

print_summary(scope, currenttarmet, pathway_table, cutoff, cyclic_pathways, namemap, source_metabolites, seed_metabolites, number_of_xml, G)[source]

This function prints the results summary obtained from the pathways, i.e., 1. Number of metabolites in scope 2. Target metabolite 3. Pathway size cutoff 4. Number of all branched pathways found from seed 5. Number of all branched pathways from seed whose size <= Pathway size cutoff 6. Minimum number of steps to produce target metabolite 7. Number of branched pathways from source whose size <= Pathway size cutoff 8. Target metabolite can be produced using cyclic pathway 9. Number of cyclic pathways whose size <= Pathway size cutoff 10. One of the combination of most different pathways producing target metabolite 11. Important reactions based on the frequency of occurrences

Parameters:
  • scope (set) – Set of metabolites that can be produced from the given set of seed metabolites
  • currenttarmet (str) – Current target metabolite
  • pathway_table (dict) – Dictionary of dictionary containing the pathways of different sizes identified for every metabolite. This will have only the acyclic/ branched pathways.
  • cutoff (int) – Maximum pathway length cutoff
  • cyclic_pathways (dict) – Dictionary of dictionary containing cyclic pathways of different sizes identified for every metabolite.
  • namemap (dict) – Dictionary mapping the adhoc reaction names to reaction names in the model
  • source_metabolites (list) – List of source metabolites
  • seed_metabolites (set) – Set of seed metabolites including the source
  • number_of_xml (int) – Number of xml files in the folder
  • G (NetworkX DiGraph Object) – Bipartite graph of the metabolic network
Returns:

Return type:

None

write_output_to_file(pathway_table, currenttarmet, cutoff, cyclic_pathways, folder_to_create, namemap, source_metabolites, G)[source]

This function writes the pathways of sizes less than or equal to the cutoff from source to the target and seed metabolites to target. This function also writes cyclic pathways of sizes less than or equal to cutoff from the source to target.

Parameters:
  • pathway_table (dict) – Dictionary of dictionary containing the pathways of different sizes identified for every metabolite. This will have only the acyclic/ branched pathways.
  • currenttarmet (str) – Current target metabolite
  • cutoff (int) – Maximum pathway length cutoff
  • cyclic_pathways (dict) – Dictionary of dictionary containing cyclic pathways of different sizes identified for every metabolite.
  • folder_to_create (str) – Name of the folder where results have to be written
  • namemap (dict) – Dictionary mapping the adhoc reaction names to reaction names in the model
  • source_metabolites (list) – List of source metabolites
  • G (NetworkX DiGraph Object) – Bipartite graph of the metabolic network
Returns:

Return type:

None

metquest.fetch_reactions module

segregate_reactions_from_models(path_name)[source]

This function gets the data pertaining to the reactions and the metabolites from the models of multiple organisms. This requires as input the pathname where the ‘.xml’ files are located. From this path, this function reads all the files using the functions in the COBRA toolbox and generates the stoichiometric model for these SBML models.

Parameters:path_name (str) – full path name where the model files are
Returns:
  • all_organisms_info (dict) – Dictionary of all model data (reaction information about all the organisms)
  • namemap (dict) – Dictionary mapping the adhoc reaction names to reaction names in the model

metquest.generate_partitions module

generate_partitions(maximumvalue, lbnumlist, columnvalue)[source]

This code takes as input the columnvalue (j), values of the shortest path of each of the metabolites (given as a list) and the sum that has to be obtained using these combination of numbers.

Parameters:
  • maximumvalue (int) – Maximum values which the numbers can take
  • lbnumlist (list) – a list of values pertaining to the length of shortest paths of every metabolite
  • columnvalue (int) – Desired sum to be obtained All the partitions of numbers which will generate the desired sum whose values are between the values for shortest paths and the maximum values.
Returns:

all_partitions

Return type:

List of tuples

Notes

For instance, if the column value is 7, the number of imputs is 2, and the shortest path of the metabolites is 4,3 respectively, and the maximum sum that has to be obtained is 8, then

>>> generate_partitions(7,[4,3],8)
[(4, 4), (5, 3)]
>>> generate_partitions(4, [2,1,1], 5)
[(2, 1, 2), (2, 2, 1), (3, 1, 1)]

metquest.get_reaction_types module

find_different_reaction_types(stoi_matrix, model, current_model_name)[source]

This function finds the exchange, irreversible and the reversible reactions from the model.

Parameters:
  • stoi_matrix (numpy array) – full path name where the model files are
  • model (COBRA model object) – COBRA model object created from SBML models
  • current_model_name (str) – Name which is to be prefixed against every reaction/metabolite (to differentiate the entries in multiple organisms, when a community model is built)
Returns:

  • exchange_met_ids (list) – Metabolite identifiers of exchange metabolites
  • irrev_lhs_nodes (list) – Metabolite identifiers of reactants of irreversible reactions
  • irrev_rhs_nodes (list) – Metabolite identifiers of products of irreversible reactions
  • rev_lhs_nodes (list) – Metabolite identifiers of reactants of reversible reactions
  • rev_rhs_nodes (list) – Metabolite identifiers of products of reversible reactions
  • exchange_rxn_ids (list) – Reaction identifers of exchange reactions
  • irrev_rxn_ids (list) – Reaction identifiers of irreversible reactions
  • rev_rxn_ids (list) – Reaction identifiers of reversible reactions

metquest.guided_bfs module

forward_pass(graph_object, seedmets)[source]

This function carries out the Guided Breadth First Search on a directed bipartite graph starting from the entries in seed metabolite set.

Parameters:
  • graph_object (NetworkX DiGraph Object) – Bipartite graph of the metabolic network
  • seedmets (set) – Set of seed metabolites including the source
Returns:

  • lower_bound_metabolite (defaultdict) – Minimum number of steps required to reach a metabolite
  • status_dict (defaultdict) – Dictionary pertaining to the status of every reaction - whether it has been visited or not
  • scope (set) – Set of metabolites that can be produced from the given set of seed metabolites

Notes

Starting with the set of seed metabolites S, the algorithm first finds all the reactions from the set R, whose precursor metabolites are in S. Such reactions are marked visited and added to the visited reaction set. Metabolites produced by these reactions are checked. The reactions where these metabolites participate are then checked for the presence of all its predecessors and are added to the queue. This traversal continues in a breadth-first manner and stops when there are no further reactions to be visited.

metquest.package_data module

metquest.pathway_assembler module

find_pathways(G, seed_mets_input, path_len_cutoff, *args)[source]

This function tries to identify pathways between a set of seed and target metabolites of a given size cut-off.

Parameters:
  • G (NetworkX DiGraph Object) – Bipartite graph of the metabolic network
  • seed_mets_input (set) – Set of seed metabolites including the source
  • path_len_cutoff (int) – Maximum size of the pathways
  • *args – Used to decide if a particular combination has to be evaluated or not. i.e., if the number of pathways produced for two different metabolites are higher, for instance, if in the reaction A + B -> C, A has 2000 pathways and B has 5 pathways, then C will have 10000 pathways at the maximum. If this pathway cutoff (maxnumpath) is 1000, this combination will not be evaluated, provided C has been already found. By default, it is set to 1000
Returns:

  • pathway_table (dict) – Dictionary of dictionary containing the pathways of different sizes identified for every metabolite. This will have only the acyclic/ branched pathways.
  • cyclic_pathways (dict) – Dictionary of dictionary containing cyclic pathways of different sizes identified for every metabolite.
  • scope (set) – Set of metabolites which can be synthesised

Module contents