crops.elements.sequences module¶

Sequence and multi-sequence objects are defined here.

guess_type(inseq)[source]¶

Return the biological type of the sequence as guessed from residue types.

Parameters:	inseq (str) – Sequence to be evaluated.
Returns:	Sequence type (‘Protein’ or ‘DNA’ or ‘RNA’ or ‘Unknown’).
Return type:	str

class oligoseq(oligomer_id=None, imer=None)[source]¶

Bases: object

An object grouping several crops.elements.sequences.sequence objects pertaining to a common oligomer.

Parameters:	oligomer_id (str) – Oligomer identifier (e.g. PDB id), defaults to None. imer (dict [str, `crops.elements.sequences.sequence`], optional) – Container of several `crops.elements.sequences.sequence` objects making up the oligomer, defaults to empty dict.
Variables:	id (str) – Oligomer sequence identifier (e.g. PDB id). imer (dict [str, `crops.elements.sequence.monomer_sequence`]) – Container of several `crops.elements.sequence.monomer_sequence` making up the oligomer.
Raises:	TypeError – If the input formats are wrong.
Example:

>>> from crops.elements import sequences as ces
>>> my_oligoseq = ces.oligoseq(oligomer_id='exampleID')
>>> my_oligoseq.add_monomer
>>> my_sequence.add_monomer('header_example','GATTACA',nid='mychain')
>>> my_sequence.add_monomer('another_header','TACATACA')
>>> my_sequence.nchains()
2
>>> my_sequence.length('mychain')
7
>>> my_sequence.write('/path/to/output/dir/')
>>> print(my_sequence)
docs Protein/polynucleotide sequence object: (id='example_id', # chains = 2)
>>> my_sequence.purge()
>>> my_sequence.nchains()
0

add_sequence(newseq)[source]¶

Add a new crops.elements.sequences.sequence to the object.

Parameters:	newseq (`crops.elements.sequences.sequence`) – Sequence object.
Raises:	TypeError – If newseq is not a `crops.elements.sequences.sequence` object. Exception – If sequence content is incompatible with that in oligoseq (oligomer id, other sequences, etc).

chainlist()[source]¶

Return a set with all the chain names in the object.

Returns:	Chain names in `crops.elements.sequences.oligoseq`.
Return type:	set [str]

copy()[source]¶

deepcopy()[source]¶

del_sequence(seqid)[source]¶

Remove the selected crops.elements.sequences.sequence from the object.

Parameters:	seqid (str) – Doomed sequence’s identifier.
Raises:	TypeError – If seqid is not a string.

id¶

imer¶

length(seqid)[source]¶

Return the length of a certain sequence.

Parameters:	seqid (str) – ID of `crops.elements.sequences.sequence`.
Raises:	TypeError – When ‘seqid’ is not a string. KeyError – Specific sequence not found in `crops.elements.sequences.oligoseq`.
Returns:	Length of `crops.elements.sequences.sequence`.
Return type:	int

nchains()[source]¶

Return number of chains in object, counting all sequence objects contained.

Returns:	Number of chains in object, counting al `crops.elements.sequences.sequence` contained.
Return type:	int

nseqs()[source]¶

Return number of sequence objects in object.

Returns:	Number of `crops.elements.sequences.sequence` objects in object.
Return type:	int

purge()[source]¶: Clear the object’s content without deleting the object itself.

set_cropmaps(mapdict, cropmain=False)[source]¶

Sets the parsed cropmaps from crops.iomod.parsers.parsemapfile.

Parameters:	mapdict (dict [str, dict [str, dict [int, int]]]) – Parsed maps for this specific object. cropmain (bool, optional) – If True, it will crop ‘mainseq’ and generate ‘fullseq’ and ‘cropseq’. If ‘mainseq’ has been edited before this operation will yield wrong results, defaults to False.
Raises:	TypeError – When mapdict has not the appropriate format.

whatseq(chain)[source]¶

Return the sequence number corresponding to a given chain.

Parameters:	chain (str) – The chain ID.
Returns:	The `crops.elements.sequences.sequence` of that chain.
Return type:	str

write(outdir, infix='', split=False, oneline=False)[source]¶

Write all crops.elements.sequences.sequence objects to .fasta file or string.

Parameters:	outdir (str) – Output directory or ‘string’. infix (str, optional) – Filename tag to distinguish from original input file, defaults to “”. split (bool, optional) – If True, identical sequences are dumped for each chain, defaults to False. oneline (bool, optional) – If True, sequences are not split in 80 residue-lines, defaults to False.
Raises:	FileNotFoundError – Output directory not found.

class sequence(seqid=None, oligomer=None, seq=None, chains=None, source=None, header=None, biotype=None, extrainfo=None)[source]¶

Bases: object

A crops.elements.sequences.sequence object representing a single chain sequence.

The crops.elements.sequences.sequence class represents a data structure to hold all sequence versions and other useful information characterising it. It contains functions to store, manipulate and organise sequence versions.

Parameters:	seqid (str) – Sequence identifier. Can be used alone or together with oligomer ID, defaults to None. oligomer (str, optional) – Oligomer identifier. Sometimes as important as seqid, defaults to None. seq (str, optional) – Sequence string, defaults to None. chains (set [str], optional) – The names of chains having this sequence, defaults to None. source (str, optional) – Source of the sequence, defaults to None header (str, optional) – Standard .fasta header, starting with “>”, defaults to None. biotype (str, optional) – Type of molecule (‘Protein’, ‘DNA’, ‘RNA’…), defaults to None. extrainfo (str, optional) – Other useful information about the sequence, defaults to None.
Variables:	name (str) – Sequence identifier. oligomer_id (str) – Oligomer identifier. chains (set [str]) – The names of chains having this sequence. seqs (dict [str, str]) – The set of sequences, including default “mainseq”. source (str) – Source of the sequence. source_headers (list [str]) – A list of headers from input files. crops_header (str) – A new header containing the information from the object that will be used when printing sequence and cropmap. biotype (str) – Type of molecule (‘Protein’, ‘DNA’, ‘RNA’…). infostring (str) – Other useful information about the sequence. cropmap (dict [int, int]) – A dictionary mapping residue numbers from original sequence to cropped sequence. cropbackmap (dict [int, int]) – A dictionary mapping residue numbers from cropped sequence to original sequence. msa (Any) – A free variable not used by CROPS itself. cropmsa (Any) – A free variable not used by CROPS itself. intervals (`crops.elements.intervals.intinterval`) – The integer interval object containing the cropping information.
Raises:	TypeError – For wrong input formats.
Example:

>>> from crops.elements import sequences as ces
>>> myseq = ces.sequence(seqid='1', oligomer = 'exampleID')
>>> myseq.mainseq('GATTACA')
>>> myseq.mainseq()
'GATTACA'
>>> myseq.chains = {'A', 'B'}
>>> myseq.addseq('gapseq','GAT--C-')
>>> myseq.addseq('cobra','TACATACA')
>>> myseq.length()
7
>>> myseq.ngaps('gapseq')
3
>>> myseq.guess_biotype()
'DNA'
>>> print(myseq)
Sequence object >EXAMPLEID_1|Chains A,B (seq=GATTACA, type=DNA, length=7)
>>> myseq.source = 'Example'
>>> myseq.addseq('cropseq', '+A+T++')
>>> myseq.addseq('cropgapseq', '+A+-++')
>>> myseq.full_length()
7
>>> myseq.mainseq('AT')
'AT'
>>> myseq.ncrops()
4
>>> myseq.update_cropsheader()
>>> myseq.cropinfo()
'#Residues cropped: 4 (1 not from terminals) ; % cropped: 66.67 (16.67 not from terminal segments)'
>>> myseq.dump(out='string')
'>crops|exampleID_1|Chains A,B|Source: Example|#Residues cropped: 4 (1 not from terminal segments) ; % cropped: 66.67 (16.67 not from terminal segments)\nAT\n'

Example:

>>> from crops.elements import sequences as ces
>>> from crops.iomod import parsers as cip
>>> myseq = cip.parseseqfile('7M6C.fasta')
>>> myseq
Sequence object: (>7M6C_1|Chain A, seq=MRTLWIMAVL[...]KPLCKKADPC, type=Undefined, length=138)
>>> myseq.guess_biotype()
'Protein'
>>> myseq
Sequence object: (>7M6C_1|Chain A, seq=MRTLWIMAVL[...]KPLCKKADPC, type=Protein, length=138)

addseq(newid, newseq)[source]¶

Add sequence to seqs dictionary.

Parameters:	newid (str) – New sequence’s identifier. newseq (str) – New sequence.
Raises:	TypeError – If newid is not a string. KeyError – If newseq is not a string.

biotype¶

chains¶

copy()[source]¶

cropbackmap¶

cropinfo()[source]¶

Return a string containing statistics about the cropped residues.

Returns:	Statistics on number of crops.
Return type:	str

cropmap¶

cropmsa¶

crops_header¶

deepcopy()[source]¶

delseq(delid=None, wipeall=False)[source]¶

Delete sequence(s) from the seqs dictionary.

Parameters:	delid (str, optional) – ID of sequence to be deleted, defaults to None. wipeall (bool, optional) – If True, all the sequences are deleted, defaults to False.
Raises:	TypeError – If delid is not a string or wipeall is not a boolean.

dump(out, split=False, oneline=False)[source]¶

Write header and main sequence to a file. If the file exists, output is appended.

Parameters:	out (str, file) – An output filepath (str), ‘string’, or an open file. split (bool, optional) – If True, identical sequences are dumped for every chain, defaults to False. oneline (bool, optional) – If True, sequences are not split in 80 residue-lines, defaults to False.
Raises:	TypeError – If out is neither a string nor an open file. KeyError – If object contains no chains.
Returns:	A string containing the output if and only if out==’string’.
Return type:	str

dumpmap(out, split=False)[source]¶

Write header and cropmap to a file. If file exists, output is appended.

Parameters:	out (str, file) – An output filepath (str) or an open file. backmap (bool, optional) – If True, the output will be self.cropbackmap, defaults to False. split (bool, optional) – If True, identical maps are dumped for every chain, defaults to False.
Raises:	TypeError – If out is neither a string nor an open file. ValueError – If one or both of cropmap and cropbackmap are empty. KeyError – If object contains no chains.

full_length()[source]¶

Return the length of the full sequence. If not found, the main sequence will be considered the full sequence, and will be saved as so.

Returns:	Length of the full sequence.
Return type:	int

guess_biotype()[source]¶

Save the guessed biotype and return it.

Returns:	Guessed biotype.
Return type:	str

infostring¶

intervals¶

length()[source]¶

Return the length of the main sequence.

Returns:	Length of the main sequence.
Return type:	int

mainseq(add=None)[source]¶

Return or modifies the main sequence.

Parameters:	add (str, optional) – If given, the main sequence is replaced by ‘add’, defaults to None.
Raises:	TypeError – If ‘add’ is given and is not a string.
Returns:	The (new) main sequence.
Return type:	str

msa¶

name¶

ncrops(seqid='cropseq', offterminals=False, offmidseq=False)[source]¶

Return the number of cropped elements (‘+’,’*’) in a sequence.

Parameters:	seqid (str, optional) – The ID of the sequence containing the cropped elements, defaults to ‘cropseq’. offterminals (bool, optional) – Count those removed from terminal segments only, defaults to False. offmidseq (bool, optional) – Count those removed NOT from terminal segments only, defaults to False.
Raises:	TypeError – If seqid is not a string, or offterminals, offmidseq are not boolean.
Returns:	Number of cropped elements in seqid according to interval chosen. If seqid not found, 0 is returned.
Return type:	int

ngaps(seqid='gapseq')[source]¶

Return the number of gaps (‘-’) in a sequence.

Parameters:	seqid (str, optional) – The ID of the sequence containing the gaps, defaults to ‘gapseq’.
Raises:	TypeError – If seqid is not a string.
Returns:	Number of gaps in seqid. If ‘gapseq’ is a list of several models, a list is returned. If seqid not found, 0 is returned.
Return type:	int or list [int]

oligomer_id¶

seqs¶

source¶

source_headers¶

update_cropsheader()[source]¶: Update cropsheader. Useful after updating any information from the sequence.