.. _cl_crops_splitseqs:

Sequence file splitting
----------------------

For the separation of fasta sequences in several files grouped per protein ID just type:

.. code-block:: shell-session

   crops-splitseqs PDBall.fasta --output mydir/

This command produces new sequence files of the format ``mydir/PDBID.fasta`` containing the sequences grouped per protein ID.

The output directory argument ``--output`` or ``-o`` is optional. If not provided, the results will be saved in the sequence file's directory by default.

--------------------------------------------------------------

From a large fasta file where only a few sequences are required, the option ``--preselect`` or ``-p`` allows to preselect as many molecule ids as needed:

.. code-block:: shell-session

   crops-splitseqs PDBall.fasta --output mydir/ --preselect 7m6c 4n5b 1o98

This command will create new files only for the three pdb ids inserted, regardless of the number of sequences contained within the input *.fasta* file.

--------------------------------------------------------------

Additionally, the option to separate the sequence files by unique sequence is also available by typing ``--individual`` or ``-i``:

.. code-block:: shell-session

   crops-splitseqs PDBall.fasta --output mydir/ --individual

This command produces new sequence files of the format ``mydir/PDBID_X.fasta`` containing just a single sequence of Protein ID PDBID and (numerical) sequence id X.

Options ``--preselect`` and ``--individual`` can be combined to produce individual sequence files only from the selected molecules.