RegEx
. RegEx
? RegEx
dialect is Search Complete
if it can, just by its synthax, find any pattern inside a text. RegEx
dialects. I was inspired by these dialects but tried to simplify them, in C++
. Search Complete
. -
which will link the first to the last element of a set. For example, a-z
will define to search for a character included between a
et z
according to the ASCII
table.
{
and }
that will encompass a number designating the repetition of the previous condition. a-z{5}
will search for 5 contiguous characters between a
and z
. {
et }
a-z{+5}
will search for 5 or more contiguous characters between a
and z
.
{0}
or
contextThis will be possible by the introduction of the following special characters [
and ]
We'll escape the special characters with \
.
For example, [a-z{3}5-9{2}]{+2}
will search for all the contiguous characters between a
and z
3*2 = 6 times or more, or all the characters between 5
and '9 2*2=4 times or more.
- Ok, but what about the implementation?
Firstly it is important to note that the first condition is crucial.
Indeed, this condition tells us the possible patterns to match inside a text.
For example, if i have A-Za-z{8}
as an regular Expression
and
Pierre et Marie ont acheté une baguette à Joséphine ce midi.
We must first evaluate the entry points of the condition, here it's all the indices of the string that are an uppercase (since the first condition is A-Z/uppercase). The indices (starting from 0) are: [0, 10, 42]
.
So, the following algorithm will start from these indices to evaluate the remaining conditions of the regular Expression
contiguously.
Testing the first index:
ierre
is 5 characters long between a
and z
, it's to small because we want 8.
Next index, and it's the same, to small, 4 against 8 required.
Last index, we have 'oséphine' which is 8 characters long between a
and z
.
The Regular Expression
will match for Joséphine
.
Note that if we had A-Za-z{2}
, the expression would match for Pierre
because, even if ierre
is 5 characters long, the expression will just search evaluate if the 2 next characters after P
are between a
and z
.
To match with only 2 characters we must have this expression:
A-Za-z{2}a-z{0}
It means to match for an uppercase, then 2 lowercase, and then no lowercase.
An interesting and powerfull feature is to allow to a condition to match characters until a pattern. We will call it a break pattern.
This will be possible by the introduction of the character ?
according to this way:
condition1{?condition2}
For example, A-Z[a-z{+1} {+1}]{?sympa}
on Connais-tu un endroit sympa où l'on pourrait randonner ?
will match Connais-tu un endroit
, all the lowercases, uppercases or space before the last pattern sympa
.
Implementation details: