NCBI C++ ToolKit
Classes | Public Types | Public Member Functions | Static Public Member Functions | Private Types | Private Member Functions | Private Attributes | List of all members
CAgpConverter Class Reference

Search Toolkit Book for CAgpConverter

#include <objtools/readers/agp_converter.hpp>

+ Collaboration diagram for CAgpConverter:


class  CErrorHandler
 Subclass this to override how errors are handled (example: to stop early on some kinds of errors) More...
class  IFileWrittenCallback
 This gets called after each file is written, so the caller can do useful things like run asnval on every file that's output. More...
class  IIdTransformer

Public Types

enum  EOutputFlags {
  fOutputFlags_Fuzz100 = (1<<0) , fOutputFlags_FastaId = (1<<1) , fOutputFlags_SetGapInfo = (1<<2) , fOutputFlags_AGPLenMustMatchOrig = (1<<3) ,
enum  EError {
  eError_OutputDirNotFoundOrNotADir , eError_ComponentNotFound , eError_ComponentTooShort , eError_ChromosomeMapIgnoredBecauseChromosomeSubsourceAlreadyInTemplate ,
  eError_ChromosomeFileBadFormat , eError_ChromosomeIsInconsistent , eError_WrongNumberOfSourceDescs , eError_SubmitBlockIgnoredWhenOneBigBioseqSet ,
  eError_EntrySkippedDueToFailedComponentValidation , eError_EntrySkipped , eError_SuggestUsingFastaIdOption , eError_AGPMessage ,
  eError_AGPErrorCode , eError_AGPLengthMismatchWithTemplateLength , eError_END
 The different kinds of errors that could occur while processing. More...
enum  EOutputBioseqsFlags { fOutputBioseqsFlags_OneObjectPerBioseq = (1 << 0) , fOutputBioseqsFlags_WrapInSeqEntry = (1 << 1) , fOutputBioseqsFlags_DoNOTUnwrapSingularBioseqSets = (1 << 2) , fOutputBioseqsFlags_LAST_PLUS_ONE }
typedef int TOutputFlags
 Bitwise-OR of EOutputFlags. More...
typedef map< string, stringTChromosomeMap
 Map id to chromosome name. More...
typedef int TOutputBioseqsFlags

Public Member Functions

 CAgpConverter (CConstRef< objects::CBioseq > pTemplateBioseq, const objects::CSubmit_block *pSubmitBlock=nullptr, TOutputFlags fOutputFlags=0, CRef< CErrorHandler > pErrorHandler=CRef< CErrorHandler >())
 Constructor. More...
void SetComponentsBioseqSet (CConstRef< objects::CBioseq_set > pComponentsBioseqSet)
 Give a bioseq-set containing all the components pieces, for verification. More...
void SetChromosomesInfo (const TChromosomeMap &mapChromosomeNames)
 Give the chromosomes to this object. More...
void LoadChromosomeMap (CNcbiIstream &chromosomes_istr)
 Input has 2 tab-delimited columns: id, then chromosome name. More...
void SetIdTransformer (IIdTransformer *pIdTransformer)
 When this reads an id, it will use the supplied transformer (if any) to change the CSeq_id. More...
void OutputBioseqs (CNcbiOstream &ostrm, const std::vector< std::string > &vecAgpFileNames, TOutputBioseqsFlags fFlags=0, size_t uMaxBioseqsToWrite=std::numeric_limits< size_t >::max()) const
 Outputs the result from the AGP file names as ASN.1. More...
void OutputOneFileForEach (const string &sDirName, const std::vector< std::string > &vecAgpFileNames, const string &sSuffix=kEmptyStr, IFileWrittenCallback *pFileWrittenCallback=nullptr) const
 Outputs the results of each Seq-entry (or Seq-submit if Submit-block was given) into its own file in the given directory. More...

Static Public Member Functions

static TOutputFlags OutputFlagStringToEnum (const string &sEnumAsString)
 Convert string to flag. More...
static EError ErrorStringToEnum (const string &sEnumAsString)
 Convert string to EError enum. More...

Private Types

typedef map< string, TSeqPosTCompLengthMap

Private Member Functions

 CAgpConverter (const CAgpConverter &)
CAgpConverteroperator= (const CAgpConverter &)
void x_ReadAgpEntries (const string &sAgpFileName, CAgpToSeqEntry::TSeqEntryRefVec &out_agp_entries) const
CRef< objects::CSeq_entry > x_InitializeAndCheckCopyOfTemplate (const objects::CBioseq &agp_bioseq, string &out_id_str) const
CRef< objects::CSeq_entry > x_InitializeCopyOfTemplate (const objects::CBioseq &agp_seq, string &out_unparsed_id_str, string &out_id_str) const
bool x_VerifyComponents (CConstRef< objects::CSeq_entry > new_entry, const string &id_str) const
void x_SetChromosomeNameInSourceSubtype (CRef< objects::CSeq_entry > new_entry, const string &unparsed_id_str) const
void x_SetCreateAndUpdateDatesToToday (CRef< objects::CSeq_entry > new_entry) const
void x_SetUpObjectOpeningAndClosingStrings (string &out_sObjectOpeningString, string &out_sObjectClosingString, TOutputBioseqsFlags fOutputBioseqsFlags, bool bOnlyOneBioseqInAllAGPFiles) const
 Each Bioseq written out will have the out_sObjectOpeningString before it and out_sObjectClosingString after it. More...

Private Attributes

CConstRef< objects::CBioseq > m_pTemplateBioseq
CConstRef< objects::CSubmit_block > m_pSubmitBlock
TOutputFlags m_fOutputFlags
CRef< CErrorHandlerm_pErrorHandler
CRef< IIdTransformerm_pIdTransformer
TCompLengthMap m_mapComponentLength
TChromosomeMap m_mapChromosomeNames

Detailed Description

Definition at line 49 of file agp_converter.hpp.

Member Typedef Documentation

◆ TChromosomeMap

Map id to chromosome name.

Definition at line 120 of file agp_converter.hpp.

◆ TCompLengthMap

Definition at line 223 of file agp_converter.hpp.

◆ TOutputBioseqsFlags

Definition at line 165 of file agp_converter.hpp.

◆ TOutputFlags

Bitwise-OR of EOutputFlags.

Definition at line 63 of file agp_converter.hpp.

Member Enumeration Documentation

◆ EError

The different kinds of errors that could occur while processing.


Definition at line 66 of file agp_converter.hpp.

◆ EOutputBioseqsFlags


If set, each AGP Bioseq is written as its own object.

* If Submit_block was given, each object is a Seq-submit * Otherwise, it's a Bioseq-set, Bioseq or Seq-entry depending on the other flags.


Bioseqs and Bioseq-sets should always be wrapped in a Seq-entry.

This has no effect if Submit-block was given because Seq-submits must take Seq-entry's.


Specify this if Bioseq-sets with just one Bioseq in them should _NOT_ be unwrapped into a Bioseq.


Definition at line 148 of file agp_converter.hpp.

◆ EOutputFlags


For gaps of length 100, put an Int-fuzz = unk in the literal.


Parse object ids (col. 1) as fasta-style ids if they contain '|'.


Set Seq-gap (gap type and linkage) in delta sequence.


When set, we give an error on AGP objects that don't have the same length as the original template.


Definition at line 52 of file agp_converter.hpp.

Constructor & Destructor Documentation

◆ CAgpConverter() [1/2]

CAgpConverter::CAgpConverter ( CConstRef< objects::CBioseq >  pTemplateBioseq,
const objects::CSubmit_block *  pSubmitBlock = nullptr,
TOutputFlags  fOutputFlags = 0,
CRef< CErrorHandler pErrorHandler = CRefCErrorHandler >() 


pTemplateBioseqThis holds the template bioseq that the output is built on.
pSubmitBlockThe Seq-submit Submit-block, which can be NULL to just output a Seq-entry.
fOutputFlagsFlags to control the behavior of the conversion
pErrorHandlerThis is called whenever an error occurs. The caller will want to give a subclass of CErrorHandler if the caller wants differently functionality from the default (which is to just print to stderr)

◆ CAgpConverter() [2/2]

CAgpConverter::CAgpConverter ( const CAgpConverter )

Member Function Documentation

◆ ErrorStringToEnum()

CAgpConverter::EError CAgpConverter::ErrorStringToEnum ( const string sEnumAsString)

◆ LoadChromosomeMap()

void CAgpConverter::LoadChromosomeMap ( CNcbiIstream chromosomes_istr)

◆ operator=()

CAgpConverter& CAgpConverter::operator= ( const CAgpConverter )

◆ OutputBioseqs()

void CAgpConverter::OutputBioseqs ( CNcbiOstream ostrm,
const std::vector< std::string > &  vecAgpFileNames,
TOutputBioseqsFlags  fFlags = 0,
size_t  uMaxBioseqsToWrite = std::numeric_limits<size_t>::max() 
) const

Outputs the result from the AGP file names as ASN.1.

The output could be a Seq-submit, Seq-entry, Bioseq-set or Bioseq, depending on the flags and whether a pSubmitBlock was given.

Definition at line 173 of file agp_converter.cpp.

References eError_EntrySkipped, CObjectOStream::Flush(), fOutputBioseqsFlags_OneObjectPerBioseq, CRef< C, Locker >::GetPointer(), CSerialObject::GetThisTypeInfo(), ITERATE, m_pErrorHandler, CRef< C, Locker >::Reset(), CSeq_entry_Base::SetSeq(), CObjectOStream::WriteObject(), x_InitializeAndCheckCopyOfTemplate(), x_ReadAgpEntries(), and x_SetUpObjectOpeningAndClosingStrings().

Referenced by BOOST_AUTO_TEST_CASE(), and CAgpconvertApplication::Run().

◆ OutputFlagStringToEnum()

CAgpConverter::TOutputFlags CAgpConverter::OutputFlagStringToEnum ( const string sEnumAsString)

◆ OutputOneFileForEach()

void CAgpConverter::OutputOneFileForEach ( const string sDirName,
const std::vector< std::string > &  vecAgpFileNames,
const string sSuffix = kEmptyStr,
IFileWrittenCallback pFileWrittenCallback = nullptr 
) const

Outputs the results of each Seq-entry (or Seq-submit if Submit-block was given) into its own file in the given directory.

sDirNameThe directory to put the output files into.
vecAgpFileNamesA list of the AGP filenames to read from.
sSuffixThe suffix for each file. If empty, it defaults to "sqn" for Seq-submits and "ent" for Seq-entrys.
pFileWrittenCallbackIf non-NULL, its Notify function is called after each file is written so the caller can perform any required custom logic.

Definition at line 274 of file agp_converter.cpp.

References eError_EntrySkipped, eError_OutputDirNotFoundOrNotADir, CDir::Exists(), CDirEntry::GetPath(), CDirEntry::IsDir(), ITERATE, m_pErrorHandler, m_pSubmitBlock, CDirEntry::MakePath(), MSerial_AsnText, CAgpConverter::IFileWrittenCallback::Notify(), SerialClone(), CSeq_submit_Base::SetData(), CSeq_submit_Base::SetSub(), x_InitializeAndCheckCopyOfTemplate(), and x_ReadAgpEntries().

Referenced by CAgpconvertApplication::Run().

◆ SetChromosomesInfo()

void CAgpConverter::SetChromosomesInfo ( const TChromosomeMap mapChromosomeNames)

◆ SetComponentsBioseqSet()

void CAgpConverter::SetComponentsBioseqSet ( CConstRef< objects::CBioseq_set >  pComponentsBioseqSet)

Give a bioseq-set containing all the components pieces, for verification.

Definition at line 93 of file agp_converter.cpp.

References map_checker< Container >::clear(), CBioseq_set_Base::GetSeq_set(), ITERATE, and m_mapComponentLength.

Referenced by CAgpconvertApplication::Run().

◆ SetIdTransformer()

void CAgpConverter::SetIdTransformer ( IIdTransformer pIdTransformer)

When this reads an id, it will use the supplied transformer (if any) to change the CSeq_id.

It is okay to even change it to a different type. Set to NULL to unset it

Definition at line 140 of file agp_converter.hpp.

References m_pIdTransformer.

Referenced by CAgpconvertApplication::Run().

◆ x_InitializeAndCheckCopyOfTemplate()

CRef< CSeq_entry > CAgpConverter::x_InitializeAndCheckCopyOfTemplate ( const objects::CBioseq &  agp_bioseq,
string out_id_str 
) const

◆ x_InitializeCopyOfTemplate()

CRef< CSeq_entry > CAgpConverter::x_InitializeCopyOfTemplate ( const objects::CBioseq &  agp_seq,
string out_unparsed_id_str,
string out_id_str 
) const

◆ x_ReadAgpEntries()

void CAgpConverter::x_ReadAgpEntries ( const string sAgpFileName,
CAgpToSeqEntry::TSeqEntryRefVec out_agp_entries 
) const

◆ x_SetChromosomeNameInSourceSubtype()

void CAgpConverter::x_SetChromosomeNameInSourceSubtype ( CRef< objects::CSeq_entry >  new_entry,
const string unparsed_id_str 
) const

◆ x_SetCreateAndUpdateDatesToToday()

void CAgpConverter::x_SetCreateAndUpdateDatesToToday ( CRef< objects::CSeq_entry >  new_entry) const

◆ x_SetUpObjectOpeningAndClosingStrings()

void CAgpConverter::x_SetUpObjectOpeningAndClosingStrings ( string out_sObjectOpeningString,
string out_sObjectClosingString,
TOutputBioseqsFlags  fOutputBioseqsFlags,
bool  bOnlyOneBioseqInAllAGPFiles 
) const

Each Bioseq written out will have the out_sObjectOpeningString before it and out_sObjectClosingString after it.

You can use the resulting strings as follows:

* print out_sObjectOpeningString * If out_sObjectOpeningString is empty, print "Bioseq ::= " * print all Bisoeqs in this object (comma-separated). (example: "seq { [...snip...] }" * print out_sObjectClosingString
If callers are printing more than one group of Bioseqs (or one bioseq per object), they can use the openers and closers for each object.

out_sObjectOpeningStringThis will be cleared then set to the string that should appear before each text ASN.1 object that is output.
out_sObjectClosingStringObvious closing analog to out_sObjectOpeningString
fOutputBioseqsFlagsThis function needs to know the flags that will be used to output so it can determine information such as whether or not a Bioseq-set or Bioseq will be used.
bOnlyOneBioseqInAllAGPFilesCaller sets this to true when there is only one Bioseq in all the AGP files being processed for this call.

Definition at line 682 of file agp_converter.cpp.

References CObjectOStream::Flush(), fOutputBioseqsFlags_DoNOTUnwrapSingularBioseqSets, fOutputBioseqsFlags_OneObjectPerBioseq, fOutputBioseqsFlags_WrapInSeqEntry, CConstRef< C, Locker >::GetPointer(), m_pSubmitBlock, and CObjectOStream::WriteObject().

Referenced by OutputBioseqs().

◆ x_VerifyComponents()

bool CAgpConverter::x_VerifyComponents ( CConstRef< objects::CSeq_entry >  new_entry,
const string id_str 
) const

Member Data Documentation

◆ m_fOutputFlags

TOutputFlags CAgpConverter::m_fOutputFlags

◆ m_mapChromosomeNames

TChromosomeMap CAgpConverter::m_mapChromosomeNames

◆ m_mapComponentLength

TCompLengthMap CAgpConverter::m_mapComponentLength

◆ m_pErrorHandler

CRef<CErrorHandler> CAgpConverter::m_pErrorHandler

◆ m_pIdTransformer

CRef<IIdTransformer> CAgpConverter::m_pIdTransformer

Definition at line 221 of file agp_converter.hpp.

Referenced by SetIdTransformer(), and x_InitializeCopyOfTemplate().

◆ m_pSubmitBlock

CConstRef<objects::CSubmit_block> CAgpConverter::m_pSubmitBlock

◆ m_pTemplateBioseq

CConstRef<objects::CBioseq> CAgpConverter::m_pTemplateBioseq

The documentation for this class was generated from the following files:
Modified on Sat Jul 20 11:14:04 2024 by rev. 669887