tripal.analysis package

Module contents

class tripal.analysis.AnalysisClient(tripalinstance, **requestArgs)

Bases: tripal.client.Client

Manage Tripal analyses

add_analysis(name, program, programversion, sourcename, algorithm=None, sourceversion=None, sourceuri=None, description=None, date_executed=None)

Create an analysis

Parameters:
  • name (str) – analysis name
  • program (str) – analysis program
  • programversion (str) – analysis programversion
  • algorithm (str) – analysis algorithm
  • sourcename (str) – analysis sourcename
  • sourceversion (str) – analysis sourceversion
  • sourceuri (str) – analysis sourceuri
  • description (str) – analysis description
  • date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:

dict

Returns:

Analysis information

delete_orphans(job_name=None, no_wait=None)

Delete orphans Drupal analysis nodes

Parameters:
  • job_name (str) – Name of the job
  • no_wait (bool) – Return immediately without waiting for job completion
Return type:

str

Returns:

status

get_analyses(analysis_id=None, name=None, program=None, programversion=None, algorithm=None, sourcename=None, sourceversion=None, sourceuri=None, date_executed=None)

Get analyses

Parameters:
  • analysis_id (str) – An analysis ID
  • name (str) – analysis name
  • program (str) – analysis program
  • programversion (str) – analysis programversion
  • algorithm (str) – analysis algorithm
  • sourcename (str) – analysis sourcename
  • sourceversion (str) – analysis sourceversion
  • sourceuri (str) – analysis sourceuri
  • date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:

list of dict

Returns:

Analysis information

get_analyses_tripal(analysis_id=None)

Get analysis entities

Parameters:analysis_id (int) – An analysis entity/node ID
Return type:list of dict
Returns:Analysis entity/node information
load_blast(name, program, programversion, sourcename, blast_output, blast_ext=None, blastdb=None, blastdb_id=None, blast_parameters=None, query_re=None, query_type=None, query_uniquename=False, is_concat=False, search_keywords=False, no_parsed=u'all', no_wait=False, algorithm=None, sourceversion=None, sourceuri=None, description=u'', date_executed=None)

Create a Blast analysis

Parameters:
  • name (str) – analysis name
  • program (str) – analysis program
  • programversion (str) – analysis programversion
  • sourcename (str) – analysis sourcename
  • blast_output (str) – Path to the Blast file to load (single XML file, or directory containing multiple XML files)
  • blast_ext (str) – If looking for files in a directory, extension of the blast result files
  • blastdb (str) – Name of the database blasted against (must be in the Chado db table)
  • blastdb_id (str) – ID of the database blasted against (must be in the Chado db table)
  • blast_parameters (str) – Blast parameters used to produce these results
  • query_re (str) – The regular expression that can uniquely identify the query name. This parameters is required if the feature name is not the first word in the blast query name.
  • query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘contig’) of the query. It must be a valid Sequence Ontology term.
  • query_uniquename (bool) – Use this if the –query-re regular expression matches unique names instead of names in the database.
  • is_concat (bool) – If the blast result file is simply a list of concatenated blast results.
  • search_keywords (bool) – Extract keywords for Tripal search
  • no_parsed (str) – Maximum number of hits to parse per feature. Default=all
  • no_wait (bool) – Do not wait for job to complete
  • algorithm (str) – analysis algorithm
  • sourceversion (str) – analysis sourceversion
  • sourceuri (str) – analysis sourceuri
  • description (str) – analysis description
  • date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:

str

Returns:

Loading information

load_fasta(fasta, organism=None, organism_id=None, analysis=None, analysis_id=None, sequence_type=u'contig', re_name=None, re_uniquename=None, db_ext_id=None, re_accession=None, rel_type=None, rel_subject_re=None, rel_subject_type=None, method=u'insup', match_type=u'uniquename', job_name=None, no_wait=False)

Load fasta sequences

Parameters:
  • fasta (str) – Path to the Fasta file to load
  • organism (str) – Organism common name or abbreviation
  • organism_id (int) – Organism ID
  • analysis (str) – Analysis name
  • analysis_id (int) – Analysis ID
  • sequence_type (str) – Sequence type
  • re_name (str) – Regular expression for the name
  • re_uniquename (str) – Regular expression for the unique name
  • db_ext_id (str) – External DB ID
  • re_accession (str) – Regular expression for the accession from external DB
  • rel_type (str) – Relation type (part_of or derives_from)
  • rel_subject_re (str) – Relation subject regular expression (used to extract id of related entity)
  • rel_subject_type (str) – Relation subject type (must match already loaded data, e.g. mRNA)
  • method (str) – Insertion method (insert, update or insup, default=insup (Insert and Update))
  • match_type (str) – Match type for already loaded features (name or uniquename; default=uniquename; used for “Update only” or “Insert and update” methods)’
  • job_name (str) – Name of the job
  • no_wait (bool) – Do not wait for job to complete
Return type:

str

Returns:

Loading information

load_gff3(gff, organism=None, organism_id=None, analysis=None, analysis_id=None, import_mode=u'update', target_organism=None, target_organism_id=None, target_type=None, target_create=False, start_line=None, landmark_type=None, alt_id_attr=None, create_organism=False, re_mrna=None, re_protein=None, job_name=None, no_wait=False)

Load GFF3 file

Parameters:
  • gff (str) – Path to the GFF file to load
  • organism (str) – Organism common name or abbreviation
  • organism_id (int) – Organism ID
  • analysis (str) – Analysis name
  • analysis_id (int) – Analysis ID
  • import_mode (str) – Import mode (add_only=existing features won’t be touched, update=existing features will be updated and obsolete attributes kept)’)
  • target_organism (str) – In case of Target attribute in the GFF3, choose the organism abbreviation or common name to which target sequences belong. Select this only if target sequences belong to a different organism than the one specified with –organism-id. And only choose an organism here if all of the target sequences belong to the same species. If the targets in the GFF file belong to multiple different species then the organism must be specified using the ‘target_organism=genus:species’ attribute in the GFF file.’)
  • target_organism_id (int) – In case of Target attribute in the GFF3, choose the organism ID to which target sequences belong. Select this only if target sequences belong to a different organism than the one specified with –organism-id. And only choose an organism here if all of the target sequences belong to the same species. If the targets in the GFF file belong to multiple different species then the organism must be specified using the ‘target_organism=genus:species’ attribute in the GFF file.’)
  • target_type (str) – In case of Target attribute in the GFF3, if the unique name for a target sequence is not unique (e.g. a protein and an mRNA have the same name) then you must specify the type for all targets in the GFF file. If the targets are of different types then the type must be specified using the ‘target_type=type’ attribute in the GFF file. This must be a valid Sequence Ontology (SO) term.’)
  • target_create (bool) – In case of Target attribute in the GFF3, if the target feature cannot be found, create one using the organism and type specified above, or using the ‘target_organism’ and ‘target_type’ fields specified in the GFF file. Values specified in the GFF file take precedence over those specified above.’)
  • start_line (int) – The line in the GFF file where importing should start
  • landmark_type (str) – A Sequence Ontology type for the landmark sequences in the GFF fie (e.g. ‘chromosome’).
  • alt_id_attr (str) – When ID attribute is absent, specify which other attribute can uniquely identify the feature.
  • create_organism (bool) – Create organisms when encountering organism attribute (these lines will be skip otherwise)
  • re_mrna (str) – Regular expression for the mRNA name
  • re_protein (str) – Replacement string for the protein name
  • job_name (str) – Name of the job
  • no_wait (bool) – Do not wait for job to complete
Return type:

str

Returns:

Loading information

load_go(name, program, programversion, sourcename, gaf_output, organism=None, organism_id=None, gaf_ext=None, query_type=None, query_matching=u'uniquename', method=u'add', name_column=2, re_name=None, no_wait=False, algorithm=None, sourceversion=None, sourceuri=None, description=None, date_executed=None)

Create a GO analysis

Parameters:
  • organism (str) – Organism common name or abbreviation
  • organism_id (int) – Organism ID
  • name (str) – analysis name
  • program (str) – analysis program
  • programversion (str) – analysis programversion
  • sourcename (str) – analysis sourcename
  • gaf_output (str) – Path to the GAF file to load (single file, or directory containing multiple GAF files)
  • gaf_ext (str) – If looking for files in a directory, extension of the GAF files
  • query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘contig’) of the query. It must be a valid Sequence Ontology term.
  • query_matching (str) – Method to match identifiers to features in the database. (‘name’, ‘uniquename’ or ‘dbxref’)
  • method (str) – Import method (‘add’ or ‘remove’)
  • name_column (int) – Column containing the feature identifiers (2, 3, 10 or 11; default=2).
  • re_name (str) – Regular expression to extract the feature name from GAF file.
  • no_wait (bool) – Do not wait for job to complete
  • algorithm (str) – analysis algorithm
  • sourceversion (str) – analysis sourceversion
  • sourceuri (str) – analysis sourceuri
  • description (str) – analysis description
  • date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:

str

Returns:

Loading information

load_interpro(name, program, programversion, sourcename, interpro_output, interpro_parameters=None, query_re=None, query_type=None, query_uniquename=False, parse_go=False, no_wait=False, algorithm=None, sourceversion=None, sourceuri=None, description=u'', date_executed=None)

Create an Interpro analysis

Parameters:
  • name (str) – analysis name
  • program (str) – analysis program
  • programversion (str) – analysis programversion
  • sourcename (str) – analysis sourcename
  • interpro_output (str) – Path to the InterProScan file to load (single XML file, or directory containing multiple XML files)
  • interpro_parameters (str) – InterProScan parameters used to produce these results
  • query_re (str) – The regular expression that can uniquely identify the query name. This parameters is required if the feature name is not the first word in the blast query name.
  • query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘contig’) of the query. It must be a valid Sequence Ontology term.
  • query_uniquename (bool) – Use this if the query_re regular expression matches unique names instead of names in the database.
  • parse_go (bool) – Load GO annotation to the database
  • no_wait (bool) – Do not wait for job to complete
  • algorithm (str) – analysis algorithm
  • sourceversion (str) – analysis sourceversion
  • sourceuri (str) – analysis sourceuri
  • description (str) – analysis description
  • date_executed (str) – analysis date_executed (yyyy-mm-dd)
Return type:

str

Returns:

Loading information

sync(analysis=None, analysis_id=None, job_name=None, no_wait=None)

Synchronize an analysis

Parameters:
  • analysis (str) – Analysis name
  • analysis_id (str) – ID of the analysis to sync
  • job_name (str) – Name of the job
  • no_wait (bool) – Return immediately without waiting for job completion
Return type:

str

Returns:

status