#include <Tokenizer.hpp>
Collaboration diagram for Ionflux::Tools::Tokenizer:
Public Member Functions | |
Tokenizer () | |
Constructor. | |
Tokenizer (const std::string &initInput) | |
Constructor. | |
Tokenizer (const std::vector< TokenType > &initTokenTypes) | |
Constructor. | |
Tokenizer (const std::vector< TokenType > &initTokenTypes, const std::string &initInput) | |
Constructor. | |
virtual | ~Tokenizer () |
Destructor. | |
virtual void | clearTokenTypes () |
Clear token types. | |
virtual void | useDefaultTokenTypes () |
Use default token types. | |
virtual void | setTokenTypes (const std::vector< TokenType > &newTokenTypes) |
Set token types. | |
virtual void | addTokenType (const TokenType &newTokenType) |
Add a token type. | |
virtual void | addTokenTypes (const std::vector< TokenType > &newTokenTypes) |
Add token types. | |
virtual void | setInput (const std::string &newInput) |
Set input. | |
virtual Token | nextToken () |
Get next token. | |
virtual Token | getNextToken (const TokenTypeMap &otherMap) |
Get next token. | |
virtual Token | getNextToken () |
Get next token. | |
virtual Token | getCurrentToken () |
Get current token. | |
virtual int | getCurrentTokenType () |
Get type of current token. | |
virtual void | reset () |
Reset the parser. | |
virtual void | setTokenTypeAnything () |
Set special token type TT_ANYTHING. | |
virtual void | setExtractQuoted (bool newExtractQuoted) |
Set quoted string extraction flag. | |
virtual void | setExtractEscaped (bool newExtractEscaped) |
Set escaped character extraction flag. | |
virtual unsigned int | getCurrentPos () |
Get current position. | |
virtual unsigned int | getCurrentTokenPos () |
Get position of current token. | |
virtual char | getQuoteChar () |
Get quote character. | |
Static Public Member Functions | |
static bool | isOneOf (char c, const std::string &testChars, bool invert) |
Check type of a character. | |
static bool | isValid (Token &token) |
Check whether a token is valid. | |
Public Attributes | |
TokenType | TT_ANYTHING |
Token type: Anything. (special). | |
Static Public Attributes | |
static const TokenType | TT_INVALID = {-1, "", false, 0} |
Token type: Invalid token. (special). | |
static const TokenType | TT_NONE = {0, "", false, 0} |
Token type: No token. (special). | |
static const TokenType | TT_QUOTED = {2, "", false, 0} |
Token type: Quoted string. (special). | |
static const TokenType | TT_ESCAPED = {3, "", false, 0} |
Token type: Escaped character. (special). | |
static const TokenType | TT_WHITESPACE = {4, " \t", false, 0} |
Token type: Linear whitespace. | |
static const TokenType | TT_LINETERM = {5, "\n\r", false, 1} |
Token type: Line terminator. | |
static const TokenType | TT_NUMBER = {7, "0123456789", false, 0} |
Token type: Number. | |
static const TokenType | TT_ALPHA |
Token type: Alpha (latin). | |
static const TokenType | TT_DEFAULT_SEP = {7, "_-.", false, 0} |
Token type: Default separator characters. | |
static const TokenType | TT_IDENTIFIER |
Token type: Identifier. | |
static const Token | TOK_INVALID = {Tokenizer::TT_INVALID.typeID, ""} |
Token: Invalid token. (special). | |
static const Token | TOK_NONE = {Tokenizer::TT_NONE.typeID, ""} |
Token: No token. (special). | |
static const int | TT_ANYTHING_TYPE_ID = 1 |
Type ID of the TT_ANYTHING token type. | |
static const std::string | QUOTE_CHARS = "\"'" |
Quote characters. | |
static const char | ESCAPE_CHAR = '\\' |
Escape character. | |
Protected Attributes | |
std::string | theInput |
The input string to be parsed. | |
unsigned int | currentPos |
Current parsing position in the input string. | |
unsigned int | currentTokenPos |
Position of current token in the input string. | |
Token | currentToken |
Current token. | |
bool | extractQuoted |
Extract quoted strings flag. | |
char | currentQuoteChar |
Quote character. | |
bool | extractEscaped |
Extract escaped characters flag. | |
TokenTypeMap * | typeMap |
Token type map. |
A generic tokenizer for parsing byte strings. To set up a tokenizer, first create a Tokenizer object. This will be set up using the default token types Tokenizer::TT_WHITESPACE, Tokenizer::TT_LINETERM and Tokenizer::TT_IDENTIFIER. You may then add your own custom token types and optionally set up the Tokenizer::TT_ANYTHING token type (which will match anything not matched by previously defined token types). To enable extraction of quoted strings and escaped characters, call Tokenizer::setExtractQuoted() with true
as an argument.
To get a token from the token stream, call Tokenizer::getNextToken(). Make sure your code handles the Tokenizer::TT_NONE and Tokenizer::TT_INVALID special token types (which cannot be disabled). Tokenizer::getNextToken() will always return Tokenizer::TT_NONE at the end of the token stream and Tokenizer::TT_INVALID if an invalid token is encountered.
|
Constructor. Construct new Tokenizer object. |
|
Constructor. Construct new Tokenizer object.
|
|
Constructor. Construct new Tokenizer object.
|
|
Constructor. Construct new Tokenizer object.
|
|
Destructor. Destruct Tokenizer object. |
|
Add a token type. Adds a token type (possibly user defined) to the set of token types recognized by this tokenizer.
|
|
Add token types. Adds token types (possibly user defined) to the set of token types recognized by this Tokenizer.
|
|
Clear token types. Removes all token types from the set of recognized token types.
|
|
Get current position. Get the current parsing position relative to the first character of the input string.
|
|
Get current token. Get the current token.
|
|
Get position of current token. Get the position of the current token relative to the first character of the input string.
|
|
Get type of current token. Get the type of the current token.
|
|
Get next token. Parse the input string and get the next token.
|
|
Get next token. Parse the input string and get the next token.
|
|
Get quote character. Get the quote character of a quoted string.
|
|
Check type of a character.
Returns
|
|
Check whether a token is valid. Check whether a token is a valid and well defined token (i.e., not TT_NONE or TT_INVALID).
|
|
Get next token. Parse the input string and get the next token.
|
|
Reset the parser. Reset the parser so the input can be parsed again from the beginning. |
|
Set escaped character extraction flag.
Pass
|
|
Set quoted string extraction flag.
Pass
|
|
Set input. Sets the input string to be parsed.
|
|
Set special token type TT_ANYTHING. This sets up a special token type TT_ANYTHING that will match any characters not matched by any of the previously defined token types.
|
|
Set token types.
Set the set of token types recognized by this tokenizer.
|
|
Use default token types. Initializes the set of recognized token types with the default token types. |
|
Current parsing position in the input string.
|
|
Quote character.
|
|
Current token.
|
|
Position of current token in the input string.
|
|
Escape character.
|
|
Extract escaped characters flag.
|
|
Extract quoted strings flag.
|
|
Quote characters.
|
|
The input string to be parsed.
|
|
Token: Invalid token. (special).
|
|
Token: No token. (special).
|
|
Initial value: {8, "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", false, 0}
|
|
Token type: Anything. (special).
|
|
Type ID of the TT_ANYTHING token type.
|
|
Token type: Default separator characters.
|
|
Token type: Escaped character. (special).
|
|
Initial value: {6, "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_", false, 0}
|
|
Token type: Invalid token. (special).
|
|
Token type: Line terminator.
|
|
Token type: No token. (special).
|
|
Token type: Number.
|
|
Token type: Quoted string. (special).
|
|
Token type: Linear whitespace.
|
|
Token type map.
|