crops.iomod.parsers module

Parsing functions are defined here.

import_db(inpath, pdb_in=None)[source]

Import intervals database from a .csv file.

If the imported file is not ‘pdb_chain_uniprot.csv’ from the SIFTS database, the .csv file must be formatted as follows: four columns containing molecule ID, chain ID, lower element of subset, and higher element of subset, in this order.

Parameters:
  • inpath (str) – Path to the interval database to be imported.
  • pdb_in (str or set or dict, optional) – Molecule IDs to return, if None it returns them all, defaults to None.
Returns:

Parsed interval database.

Return type:

dict [str, crops.elements.intervals.intinterval]

parse_db(instream, pdbset=None)[source]

Import intervals database from csv-formatted string.

If imported file is not ‘pdb_chain_uniprot.csv’ from SIFTS database, the columns must contain, in this order, molecule ID, chain ID, lower element of subset, and higher element of subset, in this order. More than one row with the same molecule and chain IDs are used to indicate a discontinuous interval with more than one subset.

Parameters:
  • instream (str) – Interval database, csv-formatted string.
  • pdbset (str or set or dict, optional) – Molecule IDs to return, if None it returns them all, defaults to None.
Raises:

TypeError – When pdbset is given and is not one of a string, set or dictionary. It will also raise this error when the database is not from SIFTS or a minimal file (4 elements per line).

Returns:

A dictionary containing crops.elements.intervals.intinterval objects.

Return type:

dict [str, crops.elements.intervals.intinterval]

parsemap(instream)[source]

Parse cropmap from string.

Parameters:instream (str) – Imported-to-string cropmap file content.
Returns:Mapping and backmapping coordinates.
Return type:dict [str, dict [str, dict [str, dict [int, int]]]]
parsemapfile(input_map)[source]

Cropmap file parser.

Parameters:input_map (str) – Cropmap file path.
Returns:Mapping and backmapping coordinates.
Return type:dict [str, dict[str, dict[str, dict[int, int]]]]
parseseq(instream, inset=None)[source]

Parse sequence(s).

Parameters:
  • instream (str) – Imported-to-string sequence file content (fasta format).
  • inset (set or dict or str, optional) – Sequence IDs to return, if None it returns them all, defaults to None.
Raises:

TypeError – When inset a set [str]; or instream is not a string.

Returns:

Parsed sequences.

Return type:

dict [str, crops.elements.sequences.oligoseq]

parseseqfile(seq_input='server-only', inset=None, use_UPserver=False)[source]

Parse sequence file containing one or more sequences.

If ‘server-only’ is inserted instead of a local file name,

Parameters:
  • seq_input (str, optional) – Sequence file path, defaults to ‘server-only’.
  • inset (set or dict or str, optional) – Sequence IDs to return, if None it returns them all, defaults to None.
  • use_UPserver (bool, optional) – Use UniProt server as a backup for those ids not found in seq_input (all of them if seq_input == ‘server-only’), defaults to False.
Raises:
  • TypeError – If inset is not a str or set [str] or dict [str, str].
  • ValueError – If seq_input`==’server-only’ but use_UPserver is False or inset is None.
Returns:

Parsed sequences.

Return type:

dict [str, crops.elements.sequences.oligoseq]

parsestr(instream)[source]

Parse structure file from a string.

Parameters:instream (str) – Imported-to-string structure file.
Returns:Parsed structure
Return type:gemmi.Structure
parsestrfile(str_input, intype='path')[source]

Parse structure file(s).

Parameters:
  • str_input (str) – Either a directory or file path or a structure in string format.
  • intype (str, optional) – One of ‘path’ or ‘string’, defaults to ‘path’.
Raises:
  • KeyError – If more than one structure file contains the same identifier.
  • ValueError – If the argument ‘intype’ has an invalid value.
Return strdict:

A dictionary containing parsed structures.

Rtype strdict:

dict [str, gemmi.Structure]

Return filedict:
 

A dictionary containing the structure file name(s).

Rtype filedict:

dict [str, str]