crops.elements.sequences module¶
Sequence and multi-sequence objects are defined here.
-
guess_type(inseq)[source]¶ Return the biological type of the sequence as guessed from residue types.
Parameters: inseq (str) – Sequence to be evaluated. Returns: Sequence type (‘Protein’ or ‘DNA’ or ‘RNA’ or ‘Unknown’). Return type: str
-
class
oligoseq(oligomer_id=None, imer=None)[source]¶ Bases:
objectAn object grouping several
crops.elements.sequences.sequenceobjects pertaining to a common oligomer.Parameters: - oligomer_id (str) – Oligomer identifier (e.g. PDB id), defaults to None.
- imer (dict [str,
crops.elements.sequences.sequence], optional) – Container of severalcrops.elements.sequences.sequenceobjects making up the oligomer, defaults to empty dict.
Variables: Raises: TypeError – If the input formats are wrong.
Example: >>> from crops.elements import sequences as ces >>> my_oligoseq = ces.oligoseq(oligomer_id='exampleID') >>> my_oligoseq.add_monomer >>> my_sequence.add_monomer('header_example','GATTACA',nid='mychain') >>> my_sequence.add_monomer('another_header','TACATACA') >>> my_sequence.nchains() 2 >>> my_sequence.length('mychain') 7 >>> my_sequence.write('/path/to/output/dir/') >>> print(my_sequence) docs Protein/polynucleotide sequence object: (id='example_id', # chains = 2) >>> my_sequence.purge() >>> my_sequence.nchains() 0
-
add_sequence(newseq)[source]¶ Add a new
crops.elements.sequences.sequenceto the object.Parameters: newseq (
crops.elements.sequences.sequence) – Sequence object.Raises: - TypeError – If newseq is not a
crops.elements.sequences.sequenceobject. - Exception – If sequence content is incompatible with that in oligoseq (oligomer id, other sequences, etc).
- TypeError – If newseq is not a
-
chainlist()[source]¶ Return a set with all the chain names in the object.
Returns: Chain names in crops.elements.sequences.oligoseq.Return type: set [str]
-
del_sequence(seqid)[source]¶ Remove the selected
crops.elements.sequences.sequencefrom the object.Parameters: seqid (str) – Doomed sequence’s identifier. Raises: TypeError – If seqid is not a string.
-
id¶
-
imer¶
-
length(seqid)[source]¶ Return the length of a certain sequence.
Parameters: seqid (str) – ID of
crops.elements.sequences.sequence.Raises: - TypeError – When ‘seqid’ is not a string.
- KeyError – Specific sequence not found in
crops.elements.sequences.oligoseq.
Returns: Length of
crops.elements.sequences.sequence.Return type:
-
nchains()[source]¶ Return number of chains in object, counting all sequence objects contained.
Returns: Number of chains in object, counting al crops.elements.sequences.sequencecontained.Return type: int
-
nseqs()[source]¶ Return number of sequence objects in object.
Returns: Number of crops.elements.sequences.sequenceobjects in object.Return type: int
-
set_cropmaps(mapdict, cropmain=False)[source]¶ Sets the parsed cropmaps from
crops.iomod.parsers.parsemapfile.Parameters: Raises: TypeError – When mapdict has not the appropriate format.
-
whatseq(chain)[source]¶ Return the sequence number corresponding to a given chain.
Parameters: chain (str) – The chain ID. Returns: The crops.elements.sequences.sequenceof that chain.Return type: str
-
write(outdir, infix='', split=False, oneline=False)[source]¶ Write all
crops.elements.sequences.sequenceobjects to .fasta file or string.Parameters: - outdir (str) – Output directory or ‘string’.
- infix (str, optional) – Filename tag to distinguish from original input file, defaults to “”.
- split (bool, optional) – If True, identical sequences are dumped for each chain, defaults to False.
- oneline (bool, optional) – If True, sequences are not split in 80 residue-lines, defaults to False.
Raises: FileNotFoundError – Output directory not found.
-
class
sequence(seqid=None, oligomer=None, seq=None, chains=None, source=None, header=None, biotype=None, extrainfo=None)[source]¶ Bases:
objectA
crops.elements.sequences.sequenceobject representing a single chain sequence.The
crops.elements.sequences.sequenceclass represents a data structure to hold all sequence versions and other useful information characterising it. It contains functions to store, manipulate and organise sequence versions.Parameters: - seqid (str) – Sequence identifier. Can be used alone or together with oligomer ID, defaults to None.
- oligomer (str, optional) – Oligomer identifier. Sometimes as important as seqid, defaults to None.
- seq (str, optional) – Sequence string, defaults to None.
- chains (set [str], optional) – The names of chains having this sequence, defaults to None.
- source (str, optional) – Source of the sequence, defaults to None
- header (str, optional) – Standard .fasta header, starting with “>”, defaults to None.
- biotype (str, optional) – Type of molecule (‘Protein’, ‘DNA’, ‘RNA’…), defaults to None.
- extrainfo (str, optional) – Other useful information about the sequence, defaults to None.
Variables: - name (str) – Sequence identifier.
- oligomer_id (str) – Oligomer identifier.
- chains (set [str]) – The names of chains having this sequence.
- seqs (dict [str, str]) – The set of sequences, including default “mainseq”.
- source (str) – Source of the sequence.
- source_headers (list [str]) – A list of headers from input files.
- crops_header (str) – A new header containing the information from the object that will be used when printing sequence and cropmap.
- biotype (str) – Type of molecule (‘Protein’, ‘DNA’, ‘RNA’…).
- infostring (str) – Other useful information about the sequence.
- cropmap (dict [int, int]) – A dictionary mapping residue numbers from original sequence to cropped sequence.
- cropbackmap (dict [int, int]) – A dictionary mapping residue numbers from cropped sequence to original sequence.
- msa (Any) – A free variable not used by CROPS itself.
- cropmsa (Any) – A free variable not used by CROPS itself.
- intervals (
crops.elements.intervals.intinterval) – The integer interval object containing the cropping information.
Raises: TypeError – For wrong input formats.
Example: >>> from crops.elements import sequences as ces >>> myseq = ces.sequence(seqid='1', oligomer = 'exampleID') >>> myseq.mainseq('GATTACA') >>> myseq.mainseq() 'GATTACA' >>> myseq.chains = {'A', 'B'} >>> myseq.addseq('gapseq','GAT--C-') >>> myseq.addseq('cobra','TACATACA') >>> myseq.length() 7 >>> myseq.ngaps('gapseq') 3 >>> myseq.guess_biotype() 'DNA' >>> print(myseq) Sequence object >EXAMPLEID_1|Chains A,B (seq=GATTACA, type=DNA, length=7) >>> myseq.source = 'Example' >>> myseq.addseq('cropseq', '+A+T++') >>> myseq.addseq('cropgapseq', '+A+-++') >>> myseq.full_length() 7 >>> myseq.mainseq('AT') 'AT' >>> myseq.ncrops() 4 >>> myseq.update_cropsheader() >>> myseq.cropinfo() '#Residues cropped: 4 (1 not from terminals) ; % cropped: 66.67 (16.67 not from terminal segments)' >>> myseq.dump(out='string') '>crops|exampleID_1|Chains A,B|Source: Example|#Residues cropped: 4 (1 not from terminal segments) ; % cropped: 66.67 (16.67 not from terminal segments)\nAT\n'
Example: >>> from crops.elements import sequences as ces >>> from crops.iomod import parsers as cip >>> myseq = cip.parseseqfile('7M6C.fasta') >>> myseq Sequence object: (>7M6C_1|Chain A, seq=MRTLWIMAVL[...]KPLCKKADPC, type=Undefined, length=138) >>> myseq.guess_biotype() 'Protein' >>> myseq Sequence object: (>7M6C_1|Chain A, seq=MRTLWIMAVL[...]KPLCKKADPC, type=Protein, length=138)
-
biotype¶
-
chains¶
-
cropbackmap¶
-
cropinfo()[source]¶ Return a string containing statistics about the cropped residues.
Returns: Statistics on number of crops. Return type: str
-
cropmap¶
-
cropmsa¶
-
crops_header¶
-
delseq(delid=None, wipeall=False)[source]¶ Delete sequence(s) from the seqs dictionary.
Parameters: Raises: TypeError – If delid is not a string or wipeall is not a boolean.
-
dump(out, split=False, oneline=False)[source]¶ Write header and main sequence to a file. If the file exists, output is appended.
Parameters: Raises: Returns: A string containing the output if and only if out==’string’.
Return type:
-
dumpmap(out, split=False)[source]¶ Write header and cropmap to a file. If file exists, output is appended.
Parameters: Raises: - TypeError – If out is neither a string nor an open file.
- ValueError – If one or both of cropmap and cropbackmap are empty.
- KeyError – If object contains no chains.
-
full_length()[source]¶ Return the length of the full sequence. If not found, the main sequence will be considered the full sequence, and will be saved as so.
Returns: Length of the full sequence. Return type: int
-
guess_biotype()[source]¶ Save the guessed biotype and return it.
Returns: Guessed biotype. Return type: str
-
infostring¶
-
intervals¶
-
length()[source]¶ Return the length of the main sequence.
Returns: Length of the main sequence. Return type: int
-
mainseq(add=None)[source]¶ Return or modifies the main sequence.
Parameters: add (str, optional) – If given, the main sequence is replaced by ‘add’, defaults to None. Raises: TypeError – If ‘add’ is given and is not a string. Returns: The (new) main sequence. Return type: str
-
msa¶
-
name¶
-
ncrops(seqid='cropseq', offterminals=False, offmidseq=False)[source]¶ Return the number of cropped elements (‘+’,’*’) in a sequence.
Parameters: - seqid (str, optional) – The ID of the sequence containing the cropped elements, defaults to ‘cropseq’.
- offterminals (bool, optional) – Count those removed from terminal segments only, defaults to False.
- offmidseq (bool, optional) – Count those removed NOT from terminal segments only, defaults to False.
Raises: TypeError – If seqid is not a string, or offterminals, offmidseq are not boolean.
Returns: Number of cropped elements in seqid according to interval chosen. If seqid not found, 0 is returned.
Return type:
-
ngaps(seqid='gapseq')[source]¶ Return the number of gaps (‘-’) in a sequence.
Parameters: seqid (str, optional) – The ID of the sequence containing the gaps, defaults to ‘gapseq’. Raises: TypeError – If seqid is not a string. Returns: Number of gaps in seqid. If ‘gapseq’ is a list of several models, a list is returned. If seqid not found, 0 is returned. Return type: int or list [int]
-
oligomer_id¶
-
seqs¶
-
source¶
-
source_headers¶