tripal.analysis package¶

Module contents¶

class tripal.analysis.AnalysisClient(tripalinstance, **requestArgs)¶

Bases: tripal.client.Client

Manage Tripal analyses

add_analysis(name, program, programversion, sourcename, algorithm=None, sourceversion=None, sourceuri=None, description=None, date_executed=None)¶

Create an analysis

Parameters:	name (str) – analysis name program (str) – analysis program programversion (str) – analysis programversion algorithm (str) – analysis algorithm sourcename (str) – analysis sourcename sourceversion (str) – analysis sourceversion sourceuri (str) – analysis sourceuri description (str) – analysis description date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:	dict
Returns:	Analysis information

delete_orphans(job_name=None, no_wait=None)¶

Delete orphans Drupal analysis nodes

Parameters:	job_name (str) – Name of the job no_wait (bool) – Return immediately without waiting for job completion
Return type:	str
Returns:	status

get_analyses(analysis_id=None, name=None, program=None, programversion=None, algorithm=None, sourcename=None, sourceversion=None, sourceuri=None, date_executed=None)¶

Get analyses

Parameters:	analysis_id (str) – An analysis ID name (str) – analysis name program (str) – analysis program programversion (str) – analysis programversion algorithm (str) – analysis algorithm sourcename (str) – analysis sourcename sourceversion (str) – analysis sourceversion sourceuri (str) – analysis sourceuri date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:	list of dict
Returns:	Analysis information

get_analyses_tripal(analysis_id=None)¶

Get analysis entities

Parameters:	analysis_id (int) – An analysis entity/node ID
Return type:	list of dict
Returns:	Analysis entity/node information

load_blast(name, program, programversion, sourcename, blast_output, blast_ext=None, blastdb=None, blastdb_id=None, blast_parameters=None, query_re=None, query_type=None, query_uniquename=False, is_concat=False, search_keywords=False, no_parsed=u'all', no_wait=False, algorithm=None, sourceversion=None, sourceuri=None, description=u'', date_executed=None)¶

Create a Blast analysis

Parameters:	name (str) – analysis name program (str) – analysis program programversion (str) – analysis programversion sourcename (str) – analysis sourcename blast_output (str) – Path to the Blast file to load (single XML file, or directory containing multiple XML files) blast_ext (str) – If looking for files in a directory, extension of the blast result files blastdb (str) – Name of the database blasted against (must be in the Chado db table) blastdb_id (str) – ID of the database blasted against (must be in the Chado db table) blast_parameters (str) – Blast parameters used to produce these results query_re (str) – The regular expression that can uniquely identify the query name. This parameters is required if the feature name is not the first word in the blast query name. query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘contig’) of the query. It must be a valid Sequence Ontology term. query_uniquename (bool) – Use this if the –query-re regular expression matches unique names instead of names in the database. is_concat (bool) – If the blast result file is simply a list of concatenated blast results. search_keywords (bool) – Extract keywords for Tripal search no_parsed (str) – Maximum number of hits to parse per feature. Default=all no_wait (bool) – Do not wait for job to complete algorithm (str) – analysis algorithm sourceversion (str) – analysis sourceversion sourceuri (str) – analysis sourceuri description (str) – analysis description date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:	str
Returns:	Loading information

load_fasta(fasta, organism=None, organism_id=None, analysis=None, analysis_id=None, sequence_type=u'contig', re_name=None, re_uniquename=None, db_ext_id=None, re_accession=None, rel_type=None, rel_subject_re=None, rel_subject_type=None, method=u'insup', match_type=u'uniquename', job_name=None, no_wait=False)¶

Load fasta sequences

Parameters:	fasta (str) – Path to the Fasta file to load organism (str) – Organism common name or abbreviation organism_id (int) – Organism ID analysis (str) – Analysis name analysis_id (int) – Analysis ID sequence_type (str) – Sequence type re_name (str) – Regular expression for the name re_uniquename (str) – Regular expression for the unique name db_ext_id (str) – External DB ID re_accession (str) – Regular expression for the accession from external DB rel_type (str) – Relation type (part_of or derives_from) rel_subject_re (str) – Relation subject regular expression (used to extract id of related entity) rel_subject_type (str) – Relation subject type (must match already loaded data, e.g. mRNA) method (str) – Insertion method (insert, update or insup, default=insup (Insert and Update)) match_type (str) – Match type for already loaded features (name or uniquename; default=uniquename; used for “Update only” or “Insert and update” methods)’ job_name (str) – Name of the job no_wait (bool) – Do not wait for job to complete
Return type:	str
Returns:	Loading information

load_gff3(gff, organism=None, organism_id=None, analysis=None, analysis_id=None, import_mode=u'update', target_organism=None, target_organism_id=None, target_type=None, target_create=False, start_line=None, landmark_type=None, alt_id_attr=None, create_organism=False, re_mrna=None, re_protein=None, job_name=None, no_wait=False)¶

Load GFF3 file

Parameters:	gff (str) – Path to the GFF file to load organism (str) – Organism common name or abbreviation organism_id (int) – Organism ID analysis (str) – Analysis name analysis_id (int) – Analysis ID import_mode (str) – Import mode (add_only=existing features won’t be touched, update=existing features will be updated and obsolete attributes kept)’) target_organism (str) – In case of Target attribute in the GFF3, choose the organism abbreviation or common name to which target sequences belong. Select this only if target sequences belong to a different organism than the one specified with –organism-id. And only choose an organism here if all of the target sequences belong to the same species. If the targets in the GFF file belong to multiple different species then the organism must be specified using the ‘target_organism=genus:species’ attribute in the GFF file.’) target_organism_id (int) – In case of Target attribute in the GFF3, choose the organism ID to which target sequences belong. Select this only if target sequences belong to a different organism than the one specified with –organism-id. And only choose an organism here if all of the target sequences belong to the same species. If the targets in the GFF file belong to multiple different species then the organism must be specified using the ‘target_organism=genus:species’ attribute in the GFF file.’) target_type (str) – In case of Target attribute in the GFF3, if the unique name for a target sequence is not unique (e.g. a protein and an mRNA have the same name) then you must specify the type for all targets in the GFF file. If the targets are of different types then the type must be specified using the ‘target_type=type’ attribute in the GFF file. This must be a valid Sequence Ontology (SO) term.’) target_create (bool) – In case of Target attribute in the GFF3, if the target feature cannot be found, create one using the organism and type specified above, or using the ‘target_organism’ and ‘target_type’ fields specified in the GFF file. Values specified in the GFF file take precedence over those specified above.’) start_line (int) – The line in the GFF file where importing should start landmark_type (str) – A Sequence Ontology type for the landmark sequences in the GFF fie (e.g. ‘chromosome’). alt_id_attr (str) – When ID attribute is absent, specify which other attribute can uniquely identify the feature. create_organism (bool) – Create organisms when encountering organism attribute (these lines will be skip otherwise) re_mrna (str) – Regular expression for the mRNA name re_protein (str) – Replacement string for the protein name job_name (str) – Name of the job no_wait (bool) – Do not wait for job to complete
Return type:	str
Returns:	Loading information

load_go(name, program, programversion, sourcename, gaf_output, organism=None, organism_id=None, gaf_ext=None, query_type=None, query_matching=u'uniquename', method=u'add', name_column=2, re_name=None, no_wait=False, algorithm=None, sourceversion=None, sourceuri=None, description=None, date_executed=None)¶

Create a GO analysis

Parameters:	organism (str) – Organism common name or abbreviation organism_id (int) – Organism ID name (str) – analysis name program (str) – analysis program programversion (str) – analysis programversion sourcename (str) – analysis sourcename gaf_output (str) – Path to the GAF file to load (single file, or directory containing multiple GAF files) gaf_ext (str) – If looking for files in a directory, extension of the GAF files query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘contig’) of the query. It must be a valid Sequence Ontology term. query_matching (str) – Method to match identifiers to features in the database. (‘name’, ‘uniquename’ or ‘dbxref’) method (str) – Import method (‘add’ or ‘remove’) name_column (int) – Column containing the feature identifiers (2, 3, 10 or 11; default=2). re_name (str) – Regular expression to extract the feature name from GAF file. no_wait (bool) – Do not wait for job to complete algorithm (str) – analysis algorithm sourceversion (str) – analysis sourceversion sourceuri (str) – analysis sourceuri description (str) – analysis description date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:	str
Returns:	Loading information

load_interpro(name, program, programversion, sourcename, interpro_output, interpro_parameters=None, query_re=None, query_type=None, query_uniquename=False, parse_go=False, no_wait=False, algorithm=None, sourceversion=None, sourceuri=None, description=u'', date_executed=None)¶

Create an Interpro analysis

Parameters:	name (str) – analysis name program (str) – analysis program programversion (str) – analysis programversion sourcename (str) – analysis sourcename interpro_output (str) – Path to the InterProScan file to load (single XML file, or directory containing multiple XML files) interpro_parameters (str) – InterProScan parameters used to produce these results query_re (str) – The regular expression that can uniquely identify the query name. This parameters is required if the feature name is not the first word in the blast query name. query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘contig’) of the query. It must be a valid Sequence Ontology term. query_uniquename (bool) – Use this if the query_re regular expression matches unique names instead of names in the database. parse_go (bool) – Load GO annotation to the database no_wait (bool) – Do not wait for job to complete algorithm (str) – analysis algorithm sourceversion (str) – analysis sourceversion sourceuri (str) – analysis sourceuri description (str) – analysis description date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:	str
Returns:	Loading information

sync(analysis=None, analysis_id=None, job_name=None, no_wait=None)¶

Synchronize an analysis

Parameters:	analysis (str) – Analysis name analysis_id (str) – ID of the analysis to sync job_name (str) – Name of the job no_wait (bool) – Return immediately without waiting for job completion
Return type:	str
Returns:	status