RegEx. RegEx ? RegEx dialect is Search Complete if it can, just by its synthax, find any pattern inside a text. RegEx dialects. I was inspired by these dialects but tried to simplify them, in C++. Search Complete. - which will link the first to the last element of a set. For example, a-z will define to search for a character included between a et z according to the ASCII table.
{ and } that will encompass a number designating the repetition of the previous condition. a-z{5} will search for 5 contiguous characters between a and z. { et }
a-z{+5} will search for 5 or more contiguous characters between a and z.
{0}
or contextThis will be possible by the introduction of the following special characters [ and ]
We'll escape the special characters with \.
For example, [a-z{3}5-9{2}]{+2} will search for all the contiguous characters between a and z 3*2 = 6 times or more, or all the characters between 5 and '9 2*2=4 times or more.
- Ok, but what about the implementation?
Firstly it is important to note that the first condition is crucial.
Indeed, this condition tells us the possible patterns to match inside a text.
For example, if i have A-Za-z{8} as an regular Expression and
Pierre et Marie ont acheté une baguette à Joséphine ce midi.
We must first evaluate the entry points of the condition, here it's all the indices of the string that are an uppercase (since the first condition is A-Z/uppercase). The indices (starting from 0) are: [0, 10, 42].
So, the following algorithm will start from these indices to evaluate the remaining conditions of the regular Expression contiguously.
Testing the first index:
ierre is 5 characters long between a and z, it's to small because we want 8.
Next index, and it's the same, to small, 4 against 8 required.
Last index, we have 'oséphine' which is 8 characters long between a and z.
The Regular Expression will match for Joséphine.
Note that if we had A-Za-z{2}, the expression would match for Pierre because, even if ierre is 5 characters long, the expression will just search evaluate if the 2 next characters after P are between a and z.
To match with only 2 characters we must have this expression:
A-Za-z{2}a-z{0}
It means to match for an uppercase, then 2 lowercase, and then no lowercase.
An interesting and powerfull feature is to allow to a condition to match characters until a pattern. We will call it a break pattern.
This will be possible by the introduction of the character ? according to this way:
condition1{?condition2}
For example, A-Z[a-z{+1} {+1}]{?sympa} on Connais-tu un endroit sympa où l'on pourrait randonner ?
will match Connais-tu un endroit , all the lowercases, uppercases or space before the last pattern sympa.
Implementation details: