| Topic: |
DEVELOP > c-Plus-Plus |
| User: |
"christopher diggins" |
| Date: |
10 Dec 2004 05:04:14 PM |
| Object: |
YARD : Generic regular expression parser |
There seems to be a gazillion regular expression libraries. Most of them
only work on text, but I wanted something that also worked on arbitrary
sequences of data ( this is useful, for instance, in building parse trees
from token lists ). This is possible, I think, using the Spirit library from
Boost, but the syntax and complexity again is too much for me. I almost
finished the YARD ( yet another recursive descent ) parser which is a really
lightweight truly generic regex parser (and runs like a bat out of hell).
Anyway, the syntax is essentially as follows:
You define rules as follows:
typedef CharRange_parser<'a', 'z'> LowerCaseLetter_parser;
typedef CharRange_parser<'A', 'Z'> UpperCaseLetter_parser;
typedef CharRange_parser<'0', '9'> Number_parser;
typedef re_or<LowerCaseLetter_parser, UpperCaseLetter_parser> Letter_parser;
typedef re_or<Letter_parser, Char_parser<'\''> > WordChar_parser;
typedef re_plus<WordChar_parser> Word_parser;
typedef re_or<Letter_parser, Char_parser<'_'> > IdentFirstChar_parser;
typedef re_or<IdentFirstChar_parser, Number_parser> IdentOtherChar_parser;
typedef re_and<IdentFirstChar_parser, re_star<IdentOtherChar_parser> >
Ident_parser;
Then you hand them to a tokenizer as follows:
int main ()
{
nBufSize = GetFileSize(sFileName);
pBuf = static_cast<char*>(calloc(nBufSize, 1));
ifstream f;
f.open(sFileName);
f.read(pBuf, nBufSize);
f.close();
Tokenizer<Word_parser> tknzr;
tknzr.Parse(pBuf, nBufSize);
OutputTokens(tknzr.Begin(), tknzr.End());
free(pBuf);
getchar();
return 0;
}
A tokenizer in this case is really simple:
template<typename Parser_T>
struct Tokenizer {
void Parse(char* pText, int nSize)
{
ParseInputStream stream(pText, nSize);
while (!stream.AtEnd()) {
int index = stream.GetIndex();
if (Rules_T::Accept(stream)) {
mTkns.push_back(Token(index, stream.GetIndex()));
}
stream.GotoNext();
}
}
TokenIter Begin() { return mTkns.begin(); }
TokenIter End() { return mTkns.end(); }
private:
TokenList mTkns;
};
What I want to know is this obvious to programmers how it works and how to
use it? Is the verbosity acceptable? Also, would it interest people more if
I showed some benchmarks comparing it to other libraries?
TIA
--
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com
.
|
|
| User: "Markus Elfring" |
|
| Title: Re: YARD : Generic regular expression parser |
05 Jan 2005 01:34:04 PM |
|
|
Can the definitions that are described in the section "7 Regular expressions
[tr.re]" of the document "(Draft) Technical Report on Standard Library
Extensions" be changed with other template parameters to match your
suggested use cases?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1687.pdf
.
|
|
|
| User: "christopher diggins" |
|
| Title: Re: YARD : Generic regular expression parser |
05 Jan 2005 02:05:50 PM |
|
|
"Markus Elfring" <Markus.Elfring@web.de> wrote in message
news:342u4pF470ueqU1@individual.net...
Can the definitions that are described in the section "7 Regular
expressions
[tr.re]" of the document "(Draft) Technical Report on Standard Library
Extensions" be changed with other template parameters to match your
suggested use cases?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1687.pdf
Sorry but I don't quite understand the question ( nor the document ), could
you explain more?
--
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com
.
|
|
|
| User: "Markus Elfring" |
|
| Title: Re: YARD : Generic regular expression parser |
09 Jan 2005 12:53:11 PM |
|
|
Sorry but I don't quite understand the question ( nor the document ),
could
you explain more?
What don't you understand from the referenced document?
Would you like to reuse anything from this template library for regular
expressions that is in development?
When do you want a regexp to be evaluated?
Compile (Boost::Spirit / Phoenix) or run time?
Regards,
Markus
.
|
|
|
|
|
|

|
Related Articles |
|
|