crops.iomod.taggers module

Tags, names, etc, functions and objects are defined here.

infix_gen(inpath, terms=False)[source]

Return information about the interval source for file names.

Parameters:
  • inpath (str) – Path to interval database used.
  • terms (bool, optional) – Only discard terminal segments, defaults to False.
Raises:

TypeError – If terms is not boolean.

Returns:

Filename’s infix tag.

Return type:

str

makeheader(mainid=None, seqid=None, chains=None, source=None, extrainfo=None, short=False)[source]

Return a fasta header of the format “>crops|MainID_seqID|Chains chain list|extrainfo”.

Parameters:
  • mainid (str) – PDB ID, Uniprot ID, etc.
  • seqid (str, optional) – Sequence Identifier, usually a natural number: “1”, “2”, etc, defaults to None.
  • chains (set [str], optional) – A set containing the chain IDs of monomers sharing the same sequence, defaults to None.
  • source (str, optional) – The source of the sequence (‘RCSB PDB’, ‘UniProtKB/SwissProt’, ‘PDBe’, etc).
  • extrainfo (bool, optional) – Additional information to be included in the header, defaults to None.
  • short – If True, a short version of the header (‘>MainID_seqID|Chains chain list’) is returned, defaults to False.
Raises:

ValueError – If any of mainid, seqid, extrainfo or elements of chains are not strings, or chains is not a set.

Returns:

A fasta header.

Return type:

str

retrieve_id(seqheader)[source]

Extract sequence IDs and additional comments from a standard .fasta header.

Parameters:seqheader (str) – Standard .fasta header, starting with “>”.
Raises:ValueError – If seqheader is not a string.
Returns:A dictionary with the sequence identifiers (‘mainid’, ‘chains’, ‘seqid’, ‘source’, ‘comments’).
Return type:dict [str, str or set]
target_format(inpath, terms=False, th=0, notfound=False)[source]

Return information about the interval source for .fasta headers.

Parameters:
  • inpath (str) – Path to interval database used.
  • terms (bool, optional) – Only discard terminal segments, defaults to False.
  • th (int or float, optional) – Uniprot threshold (% of original UP sequence below which segment is removed), defaults to 0.
  • notfound (bool, optional) – The sequence was not found in interval source, defaults to False.
Raises:
  • TypeError – If th is not a numeric (int, float) value.
  • TypeError – If terms is not boolean.
Returns:

Extra information for .fasta headers

Return type:

str