Sequence file splitting¶

For the separation of fasta sequences in several files grouped per protein ID just type:

crops-splitseqs PDBall.fasta --output mydir/

This command produces new sequence files of the format mydir/PDBID.fasta containing the sequences grouped per protein ID.

The output directory argument --output or -o is optional. If not provided, the results will be saved in the sequence file’s directory by default.

From a large fasta file where only a few sequences are required, the option --preselect or -p allows to preselect as many molecule ids as needed:

crops-splitseqs PDBall.fasta --output mydir/ --preselect 7m6c 4n5b 1o98

This command will create new files only for the three pdb ids inserted, regardless of the number of sequences contained within the input .fasta file.

Additionally, the option to separate the sequence files by unique sequence is also available by typing --individual or -i:

crops-splitseqs PDBall.fasta --output mydir/ --individual

This command produces new sequence files of the format mydir/PDBID_X.fasta containing just a single sequence of Protein ID PDBID and (numerical) sequence id X.

Options --preselect and --individual can be combined to produce individual sequence files only from the selected molecules.