NCBI C++ ToolKit
Classes | Public Types | Public Member Functions | Static Public Member Functions | Private Types | Private Member Functions | Private Attributes | List of all members
CAgpConverter Class Reference

Search Toolkit Book for CAgpConverter

#include <objtools/readers/agp_converter.hpp>

+ Collaboration diagram for CAgpConverter:

Classes

class  CErrorHandler
 Subclass this to override how errors are handled (example: to stop early on some kinds of errors) More...
 
class  IFileWrittenCallback
 This gets called after each file is written, so the caller can do useful things like run asnval on every file that's output. More...
 
class  IIdTransformer
 

Public Types

enum  EOutputFlags {
  fOutputFlags_Fuzz100 = (1<<0) , fOutputFlags_FastaId = (1<<1) , fOutputFlags_SetGapInfo = (1<<2) , fOutputFlags_AGPLenMustMatchOrig = (1<<3) ,
  fOutputFlags_LAST_PLUS_ONE
}
 
enum  EError {
  eError_OutputDirNotFoundOrNotADir , eError_ComponentNotFound , eError_ComponentTooShort , eError_ChromosomeMapIgnoredBecauseChromosomeSubsourceAlreadyInTemplate ,
  eError_ChromosomeFileBadFormat , eError_ChromosomeIsInconsistent , eError_WrongNumberOfSourceDescs , eError_SubmitBlockIgnoredWhenOneBigBioseqSet ,
  eError_EntrySkippedDueToFailedComponentValidation , eError_EntrySkipped , eError_SuggestUsingFastaIdOption , eError_AGPMessage ,
  eError_AGPErrorCode , eError_AGPLengthMismatchWithTemplateLength , eError_END
}
 The different kinds of errors that could occur while processing. More...
 
enum  EOutputBioseqsFlags { fOutputBioseqsFlags_OneObjectPerBioseq = (1 << 0) , fOutputBioseqsFlags_WrapInSeqEntry = (1 << 1) , fOutputBioseqsFlags_DoNOTUnwrapSingularBioseqSets = (1 << 2) , fOutputBioseqsFlags_LAST_PLUS_ONE }
 
typedef int TOutputFlags
 Bitwise-OR of EOutputFlags. More...
 
typedef map< string, stringTChromosomeMap
 Map id to chromosome name. More...
 
typedef int TOutputBioseqsFlags
 

Public Member Functions

 CAgpConverter (CConstRef< objects::CBioseq > pTemplateBioseq, const objects::CSubmit_block *pSubmitBlock=nullptr, TOutputFlags fOutputFlags=0, CRef< CErrorHandler > pErrorHandler=CRef< CErrorHandler >())
 Constructor. More...
 
void SetComponentsBioseqSet (CConstRef< objects::CBioseq_set > pComponentsBioseqSet)
 Give a bioseq-set containing all the components pieces, for verification. More...
 
void SetChromosomesInfo (const TChromosomeMap &mapChromosomeNames)
 Give the chromosomes to this object. More...
 
void LoadChromosomeMap (CNcbiIstream &chromosomes_istr)
 Input has 2 tab-delimited columns: id, then chromosome name. More...
 
void SetIdTransformer (IIdTransformer *pIdTransformer)
 When this reads an id, it will use the supplied transformer (if any) to change the CSeq_id. More...
 
void OutputBioseqs (CNcbiOstream &ostrm, const std::vector< std::string > &vecAgpFileNames, TOutputBioseqsFlags fFlags=0, size_t uMaxBioseqsToWrite=std::numeric_limits< size_t >::max()) const
 Outputs the result from the AGP file names as ASN.1. More...
 
void OutputOneFileForEach (const string &sDirName, const std::vector< std::string > &vecAgpFileNames, const string &sSuffix=kEmptyStr, IFileWrittenCallback *pFileWrittenCallback=nullptr) const
 Outputs the results of each Seq-entry (or Seq-submit if Submit-block was given) into its own file in the given directory. More...
 

Static Public Member Functions

static TOutputFlags OutputFlagStringToEnum (const string &sEnumAsString)
 Convert string to flag. More...
 
static EError ErrorStringToEnum (const string &sEnumAsString)
 Convert string to EError enum. More...
 

Private Types

typedef map< string, TSeqPosTCompLengthMap
 

Private Member Functions

 CAgpConverter (const CAgpConverter &)
 
CAgpConverteroperator= (const CAgpConverter &)
 
void x_ReadAgpEntries (const string &sAgpFileName, CAgpToSeqEntry::TSeqEntryRefVec &out_agp_entries) const
 
CRef< objects::CSeq_entry > x_InitializeAndCheckCopyOfTemplate (const objects::CBioseq &agp_bioseq, string &out_id_str) const
 
CRef< objects::CSeq_entry > x_InitializeCopyOfTemplate (const objects::CBioseq &agp_seq, string &out_unparsed_id_str, string &out_id_str) const
 
bool x_VerifyComponents (CConstRef< objects::CSeq_entry > new_entry, const string &id_str) const
 
void x_SetChromosomeNameInSourceSubtype (CRef< objects::CSeq_entry > new_entry, const string &unparsed_id_str) const
 
void x_SetCreateAndUpdateDatesToToday (CRef< objects::CSeq_entry > new_entry) const
 
void x_SetUpObjectOpeningAndClosingStrings (string &out_sObjectOpeningString, string &out_sObjectClosingString, TOutputBioseqsFlags fOutputBioseqsFlags, bool bOnlyOneBioseqInAllAGPFiles) const
 Each Bioseq written out will have the out_sObjectOpeningString before it and out_sObjectClosingString after it. More...
 

Private Attributes

CConstRef< objects::CBioseq > m_pTemplateBioseq
 
CConstRef< objects::CSubmit_block > m_pSubmitBlock
 
TOutputFlags m_fOutputFlags
 
CRef< CErrorHandlerm_pErrorHandler
 
CRef< IIdTransformerm_pIdTransformer
 
TCompLengthMap m_mapComponentLength
 
TChromosomeMap m_mapChromosomeNames
 

Detailed Description

Definition at line 49 of file agp_converter.hpp.

Member Typedef Documentation

◆ TChromosomeMap

Map id to chromosome name.

Definition at line 120 of file agp_converter.hpp.

◆ TCompLengthMap

Definition at line 223 of file agp_converter.hpp.

◆ TOutputBioseqsFlags

Definition at line 165 of file agp_converter.hpp.

◆ TOutputFlags

Bitwise-OR of EOutputFlags.

Definition at line 63 of file agp_converter.hpp.

Member Enumeration Documentation

◆ EError

The different kinds of errors that could occur while processing.

Enumerator
eError_OutputDirNotFoundOrNotADir 
eError_ComponentNotFound 
eError_ComponentTooShort 
eError_ChromosomeMapIgnoredBecauseChromosomeSubsourceAlreadyInTemplate 
eError_ChromosomeFileBadFormat 
eError_ChromosomeIsInconsistent 
eError_WrongNumberOfSourceDescs 
eError_SubmitBlockIgnoredWhenOneBigBioseqSet 
eError_EntrySkippedDueToFailedComponentValidation 
eError_EntrySkipped 
eError_SuggestUsingFastaIdOption 
eError_AGPMessage 
eError_AGPErrorCode 
eError_AGPLengthMismatchWithTemplateLength 
eError_END 

Definition at line 66 of file agp_converter.hpp.

◆ EOutputBioseqsFlags

Enumerator
fOutputBioseqsFlags_OneObjectPerBioseq 

If set, each AGP Bioseq is written as its own object.

* If Submit_block was given, each object is a Seq-submit * Otherwise, it's a Bioseq-set, Bioseq or Seq-entry depending on the other flags.

fOutputBioseqsFlags_WrapInSeqEntry 

Bioseqs and Bioseq-sets should always be wrapped in a Seq-entry.

This has no effect if Submit-block was given because Seq-submits must take Seq-entry's.

fOutputBioseqsFlags_DoNOTUnwrapSingularBioseqSets 

Specify this if Bioseq-sets with just one Bioseq in them should _NOT_ be unwrapped into a Bioseq.

fOutputBioseqsFlags_LAST_PLUS_ONE 

Definition at line 148 of file agp_converter.hpp.

◆ EOutputFlags

Enumerator
fOutputFlags_Fuzz100 

For gaps of length 100, put an Int-fuzz = unk in the literal.

fOutputFlags_FastaId 

Parse object ids (col. 1) as fasta-style ids if they contain '|'.

fOutputFlags_SetGapInfo 

Set Seq-gap (gap type and linkage) in delta sequence.

fOutputFlags_AGPLenMustMatchOrig 

When set, we give an error on AGP objects that don't have the same length as the original template.

fOutputFlags_LAST_PLUS_ONE 

Definition at line 52 of file agp_converter.hpp.

Constructor & Destructor Documentation

◆ CAgpConverter() [1/2]

CAgpConverter::CAgpConverter ( CConstRef< objects::CBioseq >  pTemplateBioseq,
const objects::CSubmit_block *  pSubmitBlock = nullptr,
TOutputFlags  fOutputFlags = 0,
CRef< CErrorHandler pErrorHandler = CRefCErrorHandler >() 
)

Constructor.

Parameters
pTemplateBioseqThis holds the template bioseq that the output is built on.
pSubmitBlockThe Seq-submit Submit-block, which can be NULL to just output a Seq-entry.
fOutputFlagsFlags to control the behavior of the conversion
pErrorHandlerThis is called whenever an error occurs. The caller will want to give a subclass of CErrorHandler if the caller wants differently functionality from the default (which is to just print to stderr)

◆ CAgpConverter() [2/2]

CAgpConverter::CAgpConverter ( const CAgpConverter )
private

Member Function Documentation

◆ ErrorStringToEnum()

CAgpConverter::EError CAgpConverter::ErrorStringToEnum ( const string sEnumAsString)
static

◆ LoadChromosomeMap()

void CAgpConverter::LoadChromosomeMap ( CNcbiIstream chromosomes_istr)

◆ operator=()

CAgpConverter& CAgpConverter::operator= ( const CAgpConverter )
private

◆ OutputBioseqs()

void CAgpConverter::OutputBioseqs ( CNcbiOstream ostrm,
const std::vector< std::string > &  vecAgpFileNames,
TOutputBioseqsFlags  fFlags = 0,
size_t  uMaxBioseqsToWrite = std::numeric_limits<size_t>::max() 
) const

Outputs the result from the AGP file names as ASN.1.

The output could be a Seq-submit, Seq-entry, Bioseq-set or Bioseq, depending on the flags and whether a pSubmitBlock was given.

Definition at line 173 of file agp_converter.cpp.

References eError_EntrySkipped, CObjectOStream::Flush(), fOutputBioseqsFlags_OneObjectPerBioseq, CRef< C, Locker >::GetPointer(), CSerialObject::GetThisTypeInfo(), ITERATE, m_pErrorHandler, CRef< C, Locker >::Reset(), CSeq_entry_Base::SetSeq(), CObjectOStream::WriteObject(), x_InitializeAndCheckCopyOfTemplate(), x_ReadAgpEntries(), and x_SetUpObjectOpeningAndClosingStrings().

Referenced by BOOST_AUTO_TEST_CASE(), and CAgpconvertApplication::Run().

◆ OutputFlagStringToEnum()

CAgpConverter::TOutputFlags CAgpConverter::OutputFlagStringToEnum ( const string sEnumAsString)
static

◆ OutputOneFileForEach()

void CAgpConverter::OutputOneFileForEach ( const string sDirName,
const std::vector< std::string > &  vecAgpFileNames,
const string sSuffix = kEmptyStr,
IFileWrittenCallback pFileWrittenCallback = nullptr 
) const

Outputs the results of each Seq-entry (or Seq-submit if Submit-block was given) into its own file in the given directory.

Parameters
sDirNameThe directory to put the output files into.
vecAgpFileNamesA list of the AGP filenames to read from.
sSuffixThe suffix for each file. If empty, it defaults to "sqn" for Seq-submits and "ent" for Seq-entrys.
pFileWrittenCallbackIf non-NULL, its Notify function is called after each file is written so the caller can perform any required custom logic.

Definition at line 274 of file agp_converter.cpp.

References eError_EntrySkipped, eError_OutputDirNotFoundOrNotADir, CDir::Exists(), CDirEntry::GetPath(), CDirEntry::IsDir(), ITERATE, m_pErrorHandler, m_pSubmitBlock, CDirEntry::MakePath(), MSerial_AsnText, CAgpConverter::IFileWrittenCallback::Notify(), SerialClone(), CSeq_submit_Base::SetData(), CSeq_submit_Base::SetSub(), x_InitializeAndCheckCopyOfTemplate(), and x_ReadAgpEntries().

Referenced by CAgpconvertApplication::Run().

◆ SetChromosomesInfo()

void CAgpConverter::SetChromosomesInfo ( const TChromosomeMap mapChromosomeNames)

◆ SetComponentsBioseqSet()

void CAgpConverter::SetComponentsBioseqSet ( CConstRef< objects::CBioseq_set >  pComponentsBioseqSet)

Give a bioseq-set containing all the components pieces, for verification.

Definition at line 93 of file agp_converter.cpp.

References map_checker< Container >::clear(), CBioseq_set_Base::GetSeq_set(), ITERATE, and m_mapComponentLength.

Referenced by CAgpconvertApplication::Run().

◆ SetIdTransformer()

void CAgpConverter::SetIdTransformer ( IIdTransformer pIdTransformer)
inline

When this reads an id, it will use the supplied transformer (if any) to change the CSeq_id.

It is okay to even change it to a different type. Set to NULL to unset it

Definition at line 140 of file agp_converter.hpp.

References m_pIdTransformer.

Referenced by CAgpconvertApplication::Run().

◆ x_InitializeAndCheckCopyOfTemplate()

CRef< CSeq_entry > CAgpConverter::x_InitializeAndCheckCopyOfTemplate ( const objects::CBioseq &  agp_bioseq,
string out_id_str 
) const
private

◆ x_InitializeCopyOfTemplate()

CRef< CSeq_entry > CAgpConverter::x_InitializeCopyOfTemplate ( const objects::CBioseq &  agp_seq,
string out_unparsed_id_str,
string out_id_str 
) const
private

◆ x_ReadAgpEntries()

void CAgpConverter::x_ReadAgpEntries ( const string sAgpFileName,
CAgpToSeqEntry::TSeqEntryRefVec out_agp_entries 
) const
private

◆ x_SetChromosomeNameInSourceSubtype()

void CAgpConverter::x_SetChromosomeNameInSourceSubtype ( CRef< objects::CSeq_entry >  new_entry,
const string unparsed_id_str 
) const
private

◆ x_SetCreateAndUpdateDatesToToday()

void CAgpConverter::x_SetCreateAndUpdateDatesToToday ( CRef< objects::CSeq_entry >  new_entry) const
private

◆ x_SetUpObjectOpeningAndClosingStrings()

void CAgpConverter::x_SetUpObjectOpeningAndClosingStrings ( string out_sObjectOpeningString,
string out_sObjectClosingString,
TOutputBioseqsFlags  fOutputBioseqsFlags,
bool  bOnlyOneBioseqInAllAGPFiles 
) const
private

Each Bioseq written out will have the out_sObjectOpeningString before it and out_sObjectClosingString after it.

You can use the resulting strings as follows:

* print out_sObjectOpeningString * If out_sObjectOpeningString is empty, print "Bioseq ::= " * print all Bisoeqs in this object (comma-separated). (example: "seq { [...snip...] }" * print out_sObjectClosingString
If callers are printing more than one group of Bioseqs (or one bioseq per object), they can use the openers and closers for each object.

Parameters
out_sObjectOpeningStringThis will be cleared then set to the string that should appear before each text ASN.1 object that is output.
out_sObjectClosingStringObvious closing analog to out_sObjectOpeningString
fOutputBioseqsFlagsThis function needs to know the flags that will be used to output so it can determine information such as whether or not a Bioseq-set or Bioseq will be used.
bOnlyOneBioseqInAllAGPFilesCaller sets this to true when there is only one Bioseq in all the AGP files being processed for this call.

Definition at line 682 of file agp_converter.cpp.

References CObjectOStream::Flush(), fOutputBioseqsFlags_DoNOTUnwrapSingularBioseqSets, fOutputBioseqsFlags_OneObjectPerBioseq, fOutputBioseqsFlags_WrapInSeqEntry, CConstRef< C, Locker >::GetPointer(), m_pSubmitBlock, and CObjectOStream::WriteObject().

Referenced by OutputBioseqs().

◆ x_VerifyComponents()

bool CAgpConverter::x_VerifyComponents ( CConstRef< objects::CSeq_entry >  new_entry,
const string id_str 
) const
private

Member Data Documentation

◆ m_fOutputFlags

TOutputFlags CAgpConverter::m_fOutputFlags
private

◆ m_mapChromosomeNames

TChromosomeMap CAgpConverter::m_mapChromosomeNames
private

◆ m_mapComponentLength

TCompLengthMap CAgpConverter::m_mapComponentLength
private

◆ m_pErrorHandler

CRef<CErrorHandler> CAgpConverter::m_pErrorHandler
private

◆ m_pIdTransformer

CRef<IIdTransformer> CAgpConverter::m_pIdTransformer
private

Definition at line 221 of file agp_converter.hpp.

Referenced by SetIdTransformer(), and x_InitializeCopyOfTemplate().

◆ m_pSubmitBlock

CConstRef<objects::CSubmit_block> CAgpConverter::m_pSubmitBlock
private

◆ m_pTemplateBioseq

CConstRef<objects::CBioseq> CAgpConverter::m_pTemplateBioseq
private

The documentation for this class was generated from the following files:
Modified on Sat Jul 20 11:14:04 2024 by modify_doxy.py rev. 669887