NCBI C++ ToolKit
Classes | Typedefs | Enumerations | Enumerator | Functions | Variables
UTF-8 Conversion
+ Collaboration diagram for UTF-8 Conversion:

Classes

struct  SUnicodeTranslation
 Structure to keep substititutions for the particular unicode character. More...
 

Typedefs

typedef SUnicodeTranslation TUnicodePlan[256]
 
typedef TUnicodePlanTUnicodeTable[256]
 
typedef unsigned int TUnicode
 

Enumerations

enum  ESubstType {
  eSkip = 0 , eAsIs , eString , eException ,
  eHTML , ePicture , eOther
}
 Types of substitutors. More...
 
enum  EConversionResult { eConvertedFine , eDefaultTranslationUsed }
 
enum  EConversionStatus { eSuccess , eSkipChar , eOutrangeChar }
 

Functions

const SUnicodeTranslationUnicodeToAscii (TUnicode character, const TUnicodeTable *table=NULL, const SUnicodeTranslation *default_translation=NULL)
 Convert Unicode character into ASCII string. More...
 
size_t UTF8ToUnicode (const char *utf, TUnicode *unicode)
 Convert UTF8 into Unicode character. More...
 
size_t UnicodeToUTF8 (TUnicode unicode, char *buffer, size_t buf_length)
 Convert Unicode character into UTF8. More...
 
string UnicodeToUTF8 (TUnicode unicode)
 Convert Unicode character into UTF8. More...
 
ssize_t UTF8ToAscii (const char *src, char *dst, size_t dst_len, const SUnicodeTranslation *default_translation, const TUnicodeTable *table=NULL, EConversionResult *result=NULL)
 Convert UTF8 into ASCII character buffer. More...
 
string UTF8ToAsciiString (const char *src, const SUnicodeTranslation *default_translation, const TUnicodeTable *table=NULL, EConversionResult *result=NULL)
 Convert UTF8 into ASCII string. More...
 
char StringToChar (const string &src, size_t *seq_len=0, bool ascii_table=true, EConversionStatus *status=0)
 
string StringToAscii (const string &src, bool ascii_table=true)
 
long StringToCode (const string &src, size_t *seq_len=0, EConversionStatus *status=0)
 
vector< long > StringToVector (const string &src)
 
char CodeToChar (const long src, EConversionStatus *status=0)
 

Variables

const char * SUnicodeTranslation::Subst
 Substitutor for unicode. More...
 
ESubstType SUnicodeTranslation::Type
 Type of the substitutor. More...
 
const char kOutrangeChar = '?'
 
const char kSkipChar = '\xFF'
 

Detailed Description

Typedef Documentation

◆ TUnicode

typedef unsigned int TUnicode

Definition at line 77 of file unicode.hpp.

◆ TUnicodePlan

typedef SUnicodeTranslation TUnicodePlan[256]

Definition at line 75 of file unicode.hpp.

◆ TUnicodeTable

typedef TUnicodePlan* TUnicodeTable[256]

Definition at line 76 of file unicode.hpp.

Enumeration Type Documentation

◆ EConversionResult

Enumerator
eConvertedFine 
eDefaultTranslationUsed 

Definition at line 62 of file unicode.hpp.

◆ EConversionStatus

Enumerator
eSuccess 
eSkipChar 
eOutrangeChar 

Definition at line 64 of file utf8.hpp.

◆ ESubstType

enum ESubstType

Types of substitutors.

Enumerator
eSkip 

Unicode to be skipped in translation. Usually it is combined mark.

eAsIs 

Unicodes which should go into the text as is.

eString 

String of symbols.

eException 

Throw exception (CUtilException, with type eWrongData)

eHTML 

HTML tag or, for example, HTML entity.

ePicture 

Path to the picture, or maybe picture itself.

eOther 

Something else.

Definition at line 50 of file unicode.hpp.

Function Documentation

◆ CodeToChar()

char CodeToChar ( const long  src,
EConversionStatus status = 0 
)

Definition at line 295 of file utf8.cpp.

References eOutrangeChar, eSkipChar, eSuccess, kOutrangeChar, kSkipChar, RETURN_S, tblTrans, and tblTransA.

Referenced by StringToChar().

◆ StringToAscii()

string StringToAscii ( const string src,
bool  ascii_table = true 
)

Definition at line 187 of file utf8.cpp.

References i, kSkipChar, and StringToChar().

Referenced by CLBLASTObjectLoader::CreateLoader().

◆ StringToChar()

char StringToChar ( const string src,
size_t *  seq_len = 0,
bool  ascii_table = true,
EConversionStatus status = 0 
)

Definition at line 149 of file utf8.cpp.

References CodeToChar(), eOutrangeChar, eSuccess, kOutrangeChar, RETURN_S, and StringToCode().

Referenced by StringToAscii().

◆ StringToCode()

long StringToCode ( const string src,
size_t *  seq_len = 0,
EConversionStatus status = 0 
)

Definition at line 215 of file utf8.cpp.

References eOutrangeChar, eSkipChar, eSuccess, kOutrangeChar, kSkipChar, mask, and RETURN_LS.

Referenced by StringToChar(), and StringToVector().

◆ StringToVector()

vector<long> StringToVector ( const string src)

Definition at line 268 of file utf8.cpp.

References i, and StringToCode().

◆ UnicodeToAscii()

const SUnicodeTranslation* UnicodeToAscii ( TUnicode  character,
const TUnicodeTable table = NULL,
const SUnicodeTranslation default_translation = NULL 
)

Convert Unicode character into ASCII string.

Parameters
charactercharacter to translate
tableTable to use in translation. If Table is not specified, the internal default one will be used.
Returns
Pointer to substitute structure

Definition at line 324 of file unicode.cpp.

References eException, g_DefaultUnicodeTable, g_UnicodeTranslation, NCBI_THROW, NULL, t, table, and SUnicodeTranslation::Type.

Referenced by CWordPairIndexer::ConvertUTF8ToAscii(), CUnicodeToAsciiTranslation::CUnicodeToAsciiTranslation(), UTF8ToAscii(), and UTF8ToAsciiString().

◆ UnicodeToUTF8() [1/2]

string UnicodeToUTF8 ( TUnicode  unicode)

Convert Unicode character into UTF8.

Parameters
unicodeUnicode character
Returns
UTF8 buffer as a string

Definition at line 416 of file unicode.cpp.

References string.

◆ UnicodeToUTF8() [2/2]

size_t UnicodeToUTF8 ( TUnicode  unicode,
char *  buffer,
size_t  buf_length 
)

Convert Unicode character into UTF8.

Parameters
unicodeUnicode character
bufferUTF8 buffer to store the result
buf_lengthUTF8 buffer size
Returns
Length of the generated UTF8 sequence

Definition at line 424 of file unicode.cpp.

◆ UTF8ToAscii()

ssize_t UTF8ToAscii ( const char *  src,
char *  dst,
size_t  dst_len,
const SUnicodeTranslation default_translation,
const TUnicodeTable table = NULL,
EConversionResult result = NULL 
)

Convert UTF8 into ASCII character buffer.

Decode UTF8 buffer and substitute all Unicodes with appropriate symbols or words from dictionary.

Parameters
srcUTF8 buffer to decode
dstBuffer to put the result in
dst_lenLength of the destignation buffer
default_translationDefault translation of unknown Unicode symbols
tableTable to use in translation. If Table is not specified, the internal default one will be used.
resultResult of the conversion
Returns
Length of decoded string or -1 if buffer is too small

Definition at line 458 of file unicode.cpp.

References eAsIs, eConvertedFine, eDefaultTranslationUsed, eSkip, result, SUnicodeTranslation::Subst, table, SUnicodeTranslation::Type, UnicodeToAscii(), and UTF8ToUnicode().

◆ UTF8ToAsciiString()

string UTF8ToAsciiString ( const char *  src,
const SUnicodeTranslation default_translation,
const TUnicodeTable table = NULL,
EConversionResult result = NULL 
)

Convert UTF8 into ASCII string.

Decode UTF8 buffer and substitute all Unicodes with appropriate symbols or words from dictionary.

Parameters
srcUTF8 buffer to decode
default_translationDefault translation of unknown Unicode symbols
tableTable to use in translation. If Table is not specified, the internal default one will be used.
resultResult of the conversion
Returns
String with decoded text

Definition at line 526 of file unicode.cpp.

References eAsIs, eConvertedFine, eDefaultTranslationUsed, eSkip, kEmptyStr, result, string, SUnicodeTranslation::Subst, table, SUnicodeTranslation::Type, UnicodeToAscii(), and UTF8ToUnicode().

Referenced by ToAsciiStdString(), and utf8_to_string().

◆ UTF8ToUnicode()

size_t UTF8ToUnicode ( const char *  utf,
TUnicode unicode 
)

Convert UTF8 into Unicode character.

Parameters
utfStart of UTF8 character buffer
unicodePointer to Unicode character to store the result in
Returns
Length of the translated UTF8 or 0 in case of error.

Definition at line 382 of file unicode.cpp.

Referenced by CWordPairIndexer::ConvertUTF8ToAscii().

Variable Documentation

◆ kOutrangeChar

const char kOutrangeChar = '?'

Definition at line 54 of file utf8.hpp.

Referenced by CodeToChar(), StringToChar(), and StringToCode().

◆ kSkipChar

const char kSkipChar = '\xFF'

Definition at line 61 of file utf8.hpp.

Referenced by CodeToChar(), StringToAscii(), and StringToCode().

◆ Subst

const char* SUnicodeTranslation::Subst

Substitutor for unicode.

Definition at line 71 of file unicode.hpp.

Referenced by UTF8ToAscii(), UTF8ToAsciiString(), and CUnicodeToAsciiTranslation::x_Initialize().

◆ Type

ESubstType SUnicodeTranslation::Type

Type of the substitutor.

Definition at line 72 of file unicode.hpp.

Referenced by UnicodeToAscii(), UTF8ToAscii(), UTF8ToAsciiString(), and CUnicodeToAsciiTranslation::x_Initialize().

Modified on Sat May 25 14:22:04 2024 by modify_doxy.py rev. 669887