Ionflux::Tools::Tokenizer Class Reference
[String tokenizer]

Generic byte string tokenizer. More...

#include <Tokenizer.hpp>

Collaboration diagram for Ionflux::Tools::Tokenizer:


Public Member Functions
	Tokenizer ()
	Constructor.
	Tokenizer (const std::string &initInput)
	Constructor.
	Tokenizer (const std::vector< TokenType > &initTokenTypes)
	Constructor.
	Tokenizer (const std::vector< TokenType > &initTokenTypes, const std::string &initInput)
	Constructor.
virtual	~Tokenizer ()
	Destructor.
virtual void	clearTokenTypes ()
	Clear token types.
virtual void	useDefaultTokenTypes ()
	Use default token types.
virtual void	setTokenTypes (const std::vector< TokenType > &newTokenTypes)
	Set token types.
virtual void	addTokenType (const TokenType &newTokenType)
	Add a token type.
virtual void	addTokenTypes (const std::vector< TokenType > &newTokenTypes)
	Add token types.
virtual void	setInput (const std::string &newInput)
	Set input.
virtual Token	nextToken ()
	Get next token.
virtual Token	getNextToken (const TokenTypeMap &otherMap)
	Get next token.
virtual Token	getNextToken ()
	Get next token.
virtual Token	getCurrentToken ()
	Get current token.
virtual int	getCurrentTokenType ()
	Get type of current token.
virtual void	reset ()
	Reset the parser.
virtual void	setTokenTypeAnything ()
	Set special token type TT_ANYTHING.
virtual void	setExtractQuoted (bool newExtractQuoted)
	Set quoted string extraction flag.
virtual void	setExtractEscaped (bool newExtractEscaped)
	Set escaped character extraction flag.
virtual unsigned int	getCurrentPos ()
	Get current position.
virtual unsigned int	getCurrentTokenPos ()
	Get position of current token.
virtual char	getQuoteChar ()
	Get quote character.
Static Public Member Functions
static bool	isOneOf (char c, const std::string &testChars, bool invert)
	Check type of a character.
static bool	isValid (Token &token)
	Check whether a token is valid.
Public Attributes
TokenType	TT_ANYTHING
	Token type: Anything. (special).
Static Public Attributes
static const TokenType	TT_INVALID = {-1, "", false, 0}
	Token type: Invalid token. (special).
static const TokenType	TT_NONE = {0, "", false, 0}
	Token type: No token. (special).
static const TokenType	TT_QUOTED = {2, "", false, 0}
	Token type: Quoted string. (special).
static const TokenType	TT_ESCAPED = {3, "", false, 0}
	Token type: Escaped character. (special).
static const TokenType	TT_WHITESPACE = {4, " \t", false, 0}
	Token type: Linear whitespace.
static const TokenType	TT_LINETERM = {5, "\n\r", false, 1}
	Token type: Line terminator.
static const TokenType	TT_NUMBER = {7, "0123456789", false, 0}
	Token type: Number.
static const TokenType	TT_ALPHA
	Token type: Alpha (latin).
static const TokenType	TT_DEFAULT_SEP = {7, "_-.", false, 0}
	Token type: Default separator characters.
static const TokenType	TT_IDENTIFIER
	Token type: Identifier.
static const Token	TOK_INVALID = {Tokenizer::TT_INVALID.typeID, ""}
	Token: Invalid token. (special).
static const Token	TOK_NONE = {Tokenizer::TT_NONE.typeID, ""}
	Token: No token. (special).
static const int	TT_ANYTHING_TYPE_ID = 1
	Type ID of the TT_ANYTHING token type.
static const std::string	QUOTE_CHARS = "\"'"
	Quote characters.
static const char	ESCAPE_CHAR = '\\'
	Escape character.
Protected Attributes
std::string	theInput
	The input string to be parsed.
unsigned int	currentPos
	Current parsing position in the input string.
unsigned int	currentTokenPos
	Position of current token in the input string.
Token	currentToken
	Current token.
bool	extractQuoted
	Extract quoted strings flag.
char	currentQuoteChar
	Quote character.
bool	extractEscaped
	Extract escaped characters flag.
TokenTypeMap *	typeMap
	Token type map.

Detailed Description

Generic byte string tokenizer.

A generic tokenizer for parsing byte strings. To set up a tokenizer, first create a Tokenizer object. This will be set up using the default token types Tokenizer::TT_WHITESPACE, Tokenizer::TT_LINETERM and Tokenizer::TT_IDENTIFIER. You may then add your own custom token types and optionally set up the Tokenizer::TT_ANYTHING token type (which will match anything not matched by previously defined token types). To enable extraction of quoted strings and escaped characters, call Tokenizer::setExtractQuoted() with true as an argument.
To get a token from the token stream, call Tokenizer::getNextToken(). Make sure your code handles the Tokenizer::TT_NONE and Tokenizer::TT_INVALID special token types (which cannot be disabled). Tokenizer::getNextToken() will always return Tokenizer::TT_NONE at the end of the token stream and Tokenizer::TT_INVALID if an invalid token is encountered.

Constructor & Destructor Documentation

Ionflux::Tools::Tokenizer::Tokenizer ( )

Constructor.
Construct new Tokenizer object.

Ionflux::Tools::Tokenizer::Tokenizer ( const std::string & initInput )

Constructor.
Construct new Tokenizer object.

Parameters:

initInput The input string to be parsed.

Ionflux::Tools::Tokenizer::Tokenizer ( const std::vector< TokenType > & initTokenTypes )

Constructor.
Construct new Tokenizer object.

Parameters:

initTokenTypes Token types this tokenizer recognizes.

Ionflux::Tools::Tokenizer::Tokenizer ( const std::vector< TokenType > & initTokenTypes,

const std::string & initInput

)

Constructor.
Construct new Tokenizer object.

Parameters:

initTokenTypes Token types this tokenizer recognizes.

initInput The input string to be parsed.

Ionflux::Tools::Tokenizer::~Tokenizer ( ) [virtual]

Destructor.
Destruct Tokenizer object.

Member Function Documentation

void Ionflux::Tools::Tokenizer::addTokenType ( const TokenType & newTokenType ) [virtual]

Add a token type.
Adds a token type (possibly user defined) to the set of token types recognized by this tokenizer.

Parameters:

newTokenType Token type to be added.

void Ionflux::Tools::Tokenizer::addTokenTypes ( const std::vector< TokenType > & newTokenTypes ) [virtual]

Add token types.
Adds token types (possibly user defined) to the set of token types recognized by this Tokenizer.

Parameters:

newTokenTypes Set of token types to be added.

void Ionflux::Tools::Tokenizer::clearTokenTypes ( ) [virtual]

Clear token types.
Removes all token types from the set of recognized token types.

Note:
Special token types will still be available to the tokenizer. You can always restore the default set of token types with useDefaultTokenTypes().

See also:
useDefaultTokenTypes()

unsigned int Ionflux::Tools::Tokenizer::getCurrentPos ( ) [virtual]

Get current position.
Get the current parsing position relative to the first character of the input string.

Returns:
Current parsing position.

Token Ionflux::Tools::Tokenizer::getCurrentToken ( ) [virtual]

Get current token.
Get the current token.

Returns:
The current token.

unsigned int Ionflux::Tools::Tokenizer::getCurrentTokenPos ( ) [virtual]

Get position of current token.
Get the position of the current token relative to the first character of the input string.

Returns:
Position of current token.

int Ionflux::Tools::Tokenizer::getCurrentTokenType ( ) [virtual]

Get type of current token.
Get the type of the current token.

Returns:
Type ID of the current token.

Token Ionflux::Tools::Tokenizer::getNextToken ( ) [virtual]

Get next token.
Parse the input string and get the next token.

Returns:
The next token from the current input.

Token Ionflux::Tools::Tokenizer::getNextToken ( const TokenTypeMap & otherMap ) [virtual]

Get next token.
Parse the input string and get the next token.

Parameters:

otherMap Token type map to be used for extracting the next token.

Returns:
The next token from the current input.

char Ionflux::Tools::Tokenizer::getQuoteChar ( ) [virtual]

Get quote character.
Get the quote character of a quoted string.

Returns:
Quote character of the current token if this token is a quoted string, or 0, if the current token is not a quoted string.

bool Ionflux::Tools::Tokenizer::isOneOf ( char c,

const std::string & testChars,

bool invert

) [static]

Check type of a character.
Returns true if the character c is one of the characters of testChars (if invert is false). If you pass true to invert, the return value is inverted, i.e. the function returns true if c is not one of the characters of testChars.

Deprecated:
You should not use this, since it is obsolete and may be removed in future versions. Use Ionflux::Tools::isOneOf() instead. This function is provided for backward compatibility only.

Parameters:

c Character to be checked.

testChars String of characters.

invert Whether to invert the result.

Returns:
true if the character is one of testChars, false otherwise. The result is inverted if true is passed to invert.

bool Ionflux::Tools::Tokenizer::isValid ( Token & token ) [static]

Check whether a token is valid.
Check whether a token is a valid and well defined token (i.e., not TT_NONE or TT_INVALID).

Parameters:

token Token to be checked.

Token Ionflux::Tools::Tokenizer::nextToken ( ) [virtual]

Get next token.
Parse the input string and get the next token.

Deprecated:
You should not use this function because its name is inconsistent with the interface. Use getNextToken() instead. This function is provided for backward compatibility only.

Returns:
The next token from the current input.

See also:
getNextToken()

void Ionflux::Tools::Tokenizer::reset ( ) [virtual]

Reset the parser.
Reset the parser so the input can be parsed again from the beginning.

void Ionflux::Tools::Tokenizer::setExtractEscaped ( bool newExtractEscaped ) [virtual]

Set escaped character extraction flag.
Pass true to this function to enable extraction of escaped characters, or disable this feature by passing false.

Note:
If you enable extraction of escaped characters, you should make sure that your code handles the TT_ESCAPED special token type. If you have enabled quoted string extraction, escaped character extraction will also be enabled by default (and cannot be disabled).

Parameters:

newExtractEscaped Whether to extract escaped characters.

void Ionflux::Tools::Tokenizer::setExtractQuoted ( bool newExtractQuoted ) [virtual]

Set quoted string extraction flag.
Pass true to this function to enable extraction of quoted strings (and escaped characters), or disable this feature by passing false.

Note:
If you enable extraction of quoted strings, you should make sure that your code handles the TT_QUOTED and TT_ESCAPED special token types.

Parameters:

newExtractQuoted Whether to extract quoted strings.

void Ionflux::Tools::Tokenizer::setInput ( const std::string & newInput ) [virtual]

Set input.
Sets the input string to be parsed.

Parameters:

newInput The input string to be parsed.

void Ionflux::Tools::Tokenizer::setTokenTypeAnything ( ) [virtual]

Set special token type TT_ANYTHING.
This sets up a special token type TT_ANYTHING that will match any characters not matched by any of the previously defined token types.

Note:
You may call this again to update TT_ANYTHING if you add further token types after a call to setTokenTypeAnything().

void Ionflux::Tools::Tokenizer::setTokenTypes ( const std::vector< TokenType > & newTokenTypes ) [virtual]

Set token types.
Set the set of token types recognized by this tokenizer.

Note:
The special token types are always available, regardless of whether they are added or not.

Parameters:

newTokenTypes Set of token types.

void Ionflux::Tools::Tokenizer::useDefaultTokenTypes ( ) [virtual]

Use default token types.
Initializes the set of recognized token types with the default token types.

Member Data Documentation

unsigned int Ionflux::Tools::Tokenizer::currentPos [protected]

Current parsing position in the input string.

char Ionflux::Tools::Tokenizer::currentQuoteChar [protected]

Quote character.

Token Ionflux::Tools::Tokenizer::currentToken [protected]

Current token.

unsigned int Ionflux::Tools::Tokenizer::currentTokenPos [protected]

Position of current token in the input string.

const char Ionflux::Tools::Tokenizer::ESCAPE_CHAR = '\\' [static]

Escape character.

bool Ionflux::Tools::Tokenizer::extractEscaped [protected]

Extract escaped characters flag.

bool Ionflux::Tools::Tokenizer::extractQuoted [protected]

Extract quoted strings flag.

const std::string Ionflux::Tools::Tokenizer::QUOTE_CHARS = "\"'" [static]

Quote characters.

std::string Ionflux::Tools::Tokenizer::theInput [protected]

The input string to be parsed.

const Token Ionflux::Tools::Tokenizer::TOK_INVALID = {Tokenizer::TT_INVALID.typeID, ""} [static]

Token: Invalid token. (special).

const Token Ionflux::Tools::Tokenizer::TOK_NONE = {Tokenizer::TT_NONE.typeID, ""} [static]

Token: No token. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_ALPHA [static]

Initial value:
{8, "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", false, 0}
Token type: Alpha (latin).

TokenType Ionflux::Tools::Tokenizer::TT_ANYTHING

Token type: Anything. (special).

const int Ionflux::Tools::Tokenizer::TT_ANYTHING_TYPE_ID = 1 [static]

Type ID of the TT_ANYTHING token type.

const TokenType Ionflux::Tools::Tokenizer::TT_DEFAULT_SEP = {7, "_-.", false, 0} [static]

Token type: Default separator characters.

const TokenType Ionflux::Tools::Tokenizer::TT_ESCAPED = {3, "", false, 0} [static]

Token type: Escaped character. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_IDENTIFIER [static]

Initial value:
{6, "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_", false, 0}
Token type: Identifier.

const TokenType Ionflux::Tools::Tokenizer::TT_INVALID = {-1, "", false, 0} [static]

Token type: Invalid token. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_LINETERM = {5, "\n\r", false, 1} [static]

Token type: Line terminator.

const TokenType Ionflux::Tools::Tokenizer::TT_NONE = {0, "", false, 0} [static]

Token type: No token. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_NUMBER = {7, "0123456789", false, 0} [static]

Token type: Number.

const TokenType Ionflux::Tools::Tokenizer::TT_QUOTED = {2, "", false, 0} [static]

Token type: Quoted string. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_WHITESPACE = {4, " \t", false, 0} [static]

Token type: Linear whitespace.

TokenTypeMap* Ionflux::Tools::Tokenizer::typeMap [protected]

Token type map.

The documentation for this class was generated from the following files:

Generated on Tue Mar 14 21:11:19 2006 for Ionflux Tools Class Library (iftools) by

1.4.6

Ionflux::Tools::Tokenizer Class Reference [String tokenizer]

Public Member Functions

Static Public Member Functions

Public Attributes

Static Public Attributes

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation

Ionflux::Tools::Tokenizer Class Reference
[String tokenizer]