Guide to Write Regular Expression Rules in PopTray

General discussion about PopTray. You love it? You hate it? Talk about it here.

Moderators: KY Dave, jojobear99, Rdsok

Locked
User avatar
vitoco
Veteran
Posts: 422
Joined: Wed Jul 09, 2003 9:22 pm
Location: Chile
Contact:

Guide to Write Regular Expression Rules in PopTray

Post by vitoco » Sat Mar 20, 2004 3:07 am

Guide to Write Regular Expression Rules in PopTray (Introduction)

Don't be afraid of this alternative new way to write rules conditions in PopTray. I'll told you some secrets you must know to take the advantage of this tool and not have to write so many simple rules to catch a variant of a condition.

In this topic I'll post many sections of my growing guide. If you note that something is not true or it was not so well written (Spanish is my language), please send me a PM instead of a reply and I'll update it here. Doing in this way, this topic will be clean and easy to read.

I added colors to the text to emphasize examples and Reg Expr patterns.

I hope this helps more than 10% of the PopTray users. :roll:

++Vitoco
Last edited by vitoco on Tue Apr 13, 2004 4:25 pm, edited 2 times in total.

User avatar
vitoco
Veteran
Posts: 422
Joined: Wed Jul 09, 2003 9:22 pm
Location: Chile
Contact:

Post by vitoco » Sat Mar 20, 2004 3:14 am

Guide to Write Regular Expression Rules in PopTray (Part 1)

Understanding what is a Reg Expr

Your rules may currently use one of the following test conditions that PopTray provides:

- Contains
- Equals
- Wildcard
- Empty

All this conditions can be rewritten using the new Reg Expr, as I will show you now. But you must read this whole article to get the main idea and start writing your own patterns. If you just pick some hints, you will have to come back again to find how to write a new rule... this is not the idea.

Patterns are the Regular Expression's search string. By the way, PopTray's patterns syntax are a subset of a major Regular Expression package, so what you learn here will also work on other applications, programming languages, ... The page describing the package used by PopTray also gives some other extensions which may not work elsewhere.

If a whole pattern can be applied to a continuous portion of a text area like an email subject, we will say that the Reg Expr succeed, and the rule actions will be fired.

Usually, patterns are case sensitive, but as Text area in PopTray rules is case insensitive for all condition types, patterns written in PopTray will also be case insensitive for Reg Expr :o


Contains

Contains is the simplest Regular Expression. It only tests if a given string is contained inside a section of the email.

You can replace the rule:

From (Name) | Contains | Renier

with:

From (Name) | Reg Expr | Renier

giving the same result. Congratulations, you wrote your first Regular Expression pattern in a rule :D


Equals

This is a condition that forces a text field to match exactly the string you specified in the rule.

For example, your rule:

To | Equals | my_account@yahoo.com

will not catch "my_account@yahoo.com.es" alternative email as expected.

The corresponding Reg Expr to Equals should be:

To | Reg Expr | ^my_account@yahoo.com$

Two new elements where introduced here:

- A beginning "^" means that the text should start with the string following it.

- An ending "$" means that the text should finish with the string preceding it.

You can place a "$" elsewhere in the pattern to search for a dollar sign, but what if I want to match an ending "$"? Just precede it with a "\" (backslash, the escape character without the quotes).

Then, if your rule says:

To | Reg Expr | ^my_account@yahoo.com

will also match "my_account@yahoo.com.es", "my_account@yahoo.com.mx", ...


Empty

Did you really understand the meaning of "^" and "$". Well, so what does a "^$" pattern mean for a Regular Expression? Guess! Right, it means "Nothing between the begin and the end of the text", i.e., and empty field.

So,

Subject | Empty |

becomes:

Subject | Reg Expr | ^$

and you will find messages without a subject.


Wildcard

Wildcards are quite familiar to us. We used them in Apple, Atari and MS-DOS age to list "*.BAS" programs, "PHOTO???.PIC" images, and so on. If you have not used them, I'll explain them here:

- "?" matches exactly one alphabetic char, number or whatever sign you may type using your keyboard or other input device.

- "*" matches zero or more chars... usually means "whatever it says".

Obviously, this two wildcards have a corresponding notation in a Regular Expression:

- "." (dot) is a replacement for "?" wildcard.

- ".*" (dot-asterisk) is a replacement for "*" wildcard.

Very simple... Now you can rewrite your rule:

Subject | Wildcard | *buy*online*

where the middle "*" replaces any product name, as:

Subject | Reg Expr | buy.*online

Note that I omit the leading and trailing "*" because, if you remember the Contains type, you don't need to specify anything around the pattern, unless you want to check one or both ends of the text using "^" and "$".


And that's all, folks! Now, you can rewrite all your rules as Regular Expressions.

But wait!!! Someone told you that Regular Expressions are powerful and you see nothing new in the preceding paragraphs? That's because all you write in your rules are constant strings and wildcards, but in Reg Expr patterns you can write more useful wildcards and use classes of chars instead of constants.

++Vitoco
Last edited by vitoco on Tue Apr 13, 2004 4:29 pm, edited 2 times in total.

User avatar
vitoco
Veteran
Posts: 422
Joined: Wed Jul 09, 2003 9:22 pm
Location: Chile
Contact:

Post by vitoco » Sat Mar 20, 2004 3:34 am

Guide to Write Regular Expression Rules in PopTray (Part 2)

Let's start with the character classes. A class is a group of characters that have something in common. Some useful predefined classes in Regular Expressions are:

\w = Alphanumeric: "A" to "Z", "a" to "z", "0" to "9" or "_"
\d = Numeric: "0" to "9"
\s = Whitespace: a space, tab or newline (platform specific, PopTray uses CR/LF pairs)
. = Any available char except a newline (exception disabled in PopTray)

That code I wrote in the first column is what you must put in your pattern to match a character of that class. Note that a "7" can be matched by a "\w" and by a "\d".

For example, the rule:

Subject | Reg Expr | m\wster

will find "master", "Mister" and "M1STER", but not "M.Ster" in the subject. Remember that patterns are case insensitive in PopTray.

There are also the complements classes for the previously described:

\W = Non-Alphanumeric: anything except "A" to "Z", "a" to "z", "0" to "9" or "_"
\D = Non-Numeric: anything except "0" to "9"
\S = Non-Whitespace: anything except a space, tab or newline
\n = Only a newline (not used in PopTray I think)

Now, you can write a pattern to find a version number, for example the rule:

Subject | Reg Expr | \D\d\.\d

will succeed if the subject contains something like "V1.0", "Webcopy/0.9beta7" or "PopTray_3.1.0", but not "v123.4". Please note that the dot in the pattern is preceded by a "\", and this means that I do not want to match any character but a real dot, and that position contains a "2" in the "v123.4" string. Also note that anything following the first digit after the dot is ignored. Guess why... The pattern says "a nondigit followed by a digit that is followed by a dot and this is followed by a digit, whatever surrounding this".

There are custom classes. These are many arbitrary characters between "[" and "]" and you can write ranges in it to simplify the notation. This can be read as "any of this enclosed characters". Look this rule:

Subject | Reg Expr | x[0-9a-f][0-9a-f]

This will succeed if there is at least a two digit hexadecimal number preceded by "x" in the Subject, as "xFF" or "x41".

To write a complement of a custom class, you can put a "^" as the first character inside the class to interpret it as "none of this enclosed characters". An example?

Subject | Reg Expr | m[^aeiou]ster

will not succeed for "master" nor "MISTER", but it will do for "M.Ster" and "mystery".

Be careful, if you want to find subjects without an email address in it, the following rule will not succeed:

Subject | Reg Expr | [^@]

because you are saying "any character that is not @", and the "my_account@yahoo.com" string as a subject has 19 characters that are not "@" and you will get almost every subject that has at least one character that is not "@", having or not having an email address in it. Instead, you should write:

Subject | Reg Expr | @

and set the Not option at the end of the rule line in the PopTray Rules window. But this new rule will also succeed with "SP@M", which is not an email address.

If you want to put "-", "^" or "]" inside a custom class to accept incoming ones, you must precede it with "\" as usual.

There are a bit more complex characters classes or signs that means something, but I'll stop here for now on this topic.

++Vitoco
Last edited by vitoco on Tue Apr 13, 2004 4:37 pm, edited 2 times in total.

User avatar
vitoco
Veteran
Posts: 422
Joined: Wed Jul 09, 2003 9:22 pm
Location: Chile
Contact:

Post by vitoco » Sat Mar 20, 2004 3:54 am

Guide to Write Regular Expression Rules in PopTray (Part 3)

Now, let's see iterators.

Well, the "*" in Regular Expression patterns have some more magic built-in. It really means "the previous char, zero or more times". As "." means "any character", then, the ".*" means "any char, zero or more times".

How can you search for "one or more chars" instead of "zero or more chars"? Using old wildcards you should say "?*", and using patterns you may say "..*", right? This works, but there is a special way to say this in a pattern: ".+". So, the plus sign "+" really means "the previous char, one or more times" or in other words "the previous char, at least one time".

Our search for a version number in the subject can be write as:

Subject | Reg Expr | \D\d+\.\d+

so we can now find "v123.4" because we want one or more digits around the dot.

Another way to repeat something is to follow it with a count in curly brackets. This can be specified in many forms like the following examples:

{2} Exactly two times
{1,3} From one to three times
{5,} At least five times

For example, you may want to search for some exclamation signs using:

Subject | Reg Expr | !{3}

to find subjects that has exactly 3 exclamation signs together, as in "Help!!!", but not in "!HELP!ME!".

Note that you have the following equivalences:

? = {0,1}
+ = {1,}
* = {0,}

But how can I repeat some strings or sub-patterns instead of just a character or class? The answer is parenthesis for grouping. An example:

Subject | Reg Expr | Please respond this( urgent)? mail

"urgent" is optional in this subject. It is not a bug the place where I put the "(", because the leading space is also optional, or the subject without "urgent" should require two spaces between "this" and "mail". Did you notice that?

Sometimes you need to select one of some exact words. The operator is "|".

Subject | Reg Expr | Please respond this (very,? )*(urgent|important|special)? ?mail

Wow, too many ideas in one example. And you got all of them, right? Let's see... "very" is optional and can appear many times consecutively in the subject, with an optional comma and a space between them. Only one of the three other words may be present, because there is also a "?" after that group of alternatives. Finally, the "?" following a space is just a trick to have not to write that space many times inside the parenthesis.

Another example? A modification of the buy-online example:

Subject | Reg Expr | (buy|purchase|rent|try).*(online|now|today|cheap)

This will succeed if it finds one word of each group in the subject, in that order, but with or without some other words between them. If the order is not required we can change that to:

Subject | Reg Expr | ((buy|purchase|rent|try|online|now|today|cheap).*){2}

to succeed if there are 2 of this words in the mail subject. As you can see, it is possible to put groups inside groups. Mmmm... not so bad, but it will also succeed with "I made a snowman for my parents!" :-(

Now, we can combine all the above to write a useful rule. Suppose that you are a member of some mailing lists, and all those lists inserts a stamp in the subject. You can be notified with a balloon by PopTray with this rule:

Subject | Reg Expr | ^[\(\[]?((Re|Fwd|FW):\s*)*\[[\w\-]+\]

This will succeed with any of the following:

[PopTray] New release!
Re: [Linux-Windows] please no more flames.
(FWD: Re: [Origami] try this model)

and it won't with:

Try our brand new [mustard] at home!
[our meeting is at 9:30]

As you can see in this example:

- you can write groups inside groups (nesting expressions).
- you can put built-in classes inside custom classes (only "\s", "\d" and "\w" supported).
- you can write a custom class to enhance a predefined one (e.g., adding "-" to "\w").
- you must escape special characters to expect for them in strings.
- you can use "*" for optional repetitions, but a "+" if at least one item is required.

Now, try by yourself to rewrite the example to find hexadecimal numbers in the subject, but with the condition that an even number of digits are required, followed by a space... The answer at the end of this article. ;)

++Vitoco
Last edited by vitoco on Tue Apr 13, 2004 4:41 pm, edited 2 times in total.

User avatar
vitoco
Veteran
Posts: 422
Joined: Wed Jul 09, 2003 9:22 pm
Location: Chile
Contact:

Post by vitoco » Sat Mar 20, 2004 4:08 am

Guide to Write Regular Expression Rules in PopTray (Part 4)

About SPAM

If you want to control spam using Regular Expressions in PopTray rules, you job will be poor because advanced spammers use many tricks, like inserting symbols (between|instead of) alphabetic characters. It's better to have some kind of antispam filter somewhere between PopTray and your mail server, and then check for flags generated by that software. If you don't have one, you can still try using classes and other tricks. 8)

Keep this in your mind, I'll tell you more about this later.


General Recommendations

- A class with a "+" iterator does not mean the same char one or more times, it means a sequence of one or more chars of the same class, so a "\w+" may stand for a full word, without punctuation inside it.

- Instead of a space in a pattern, write a "\s+" to also match tabs and multiple spaces together.

- Remember that there are some characters that have a special meaning in a pattern, and you must precede them with a "\" if you want to search for them in your fields.


These are the special chars in patterns:

\ Allows you to search the following char without it's special meaning.
. Any character (except newline)
^ Text begins with the following.
$ Text ends with the precedent.
* Zero or more times the precedent char or group.
+ One or more time the precedent char or group.
? The precedent char or group may appear or not.
( Begins a sub-pattern.
| Delimits patterns in an OR way.
) Ends a sub-pattern.
[ Begins a class of characters.
- Delimits a range of chars (ASCII code).
] Ends a class of characters.
{ Begins a required count of the precedent char or group.
, Delimits a range of counts.
} Ends a required count of the precedent char or group.


Oh! Here is one possible answer for the last example from the previous part:

Subject | Reg Expr | x([0-9a-f]{2})+\s

And if you just want to write an example for spam control, here is a tricky one but very simple if you really understood this article up to this point:

Subject | Reg Expr | (\\/|v)_*\W*_*[iíïìî1l!¡:\|]+_*\W*_*[aáäàâ4@]+
_*
\W*_*[gr6]+_*\W*_*[GR6]+_*\W*_*[aáäàâ4@]+_*(\W.*)?$

without any space in the Reg Expr pattern. Matching parentheses are in red, escaped chars in light blue, built-in classes are in olive, and an "or" separator in orange.

++Vitoco

Locked

Who is online

Users browsing this forum: No registered users and 2 guests