Regular expression to catch re[4] in subject?

General discussion about PopTray. You love it? You hate it? Talk about it here.

Moderators: KY Dave, jojobear99, Rdsok

Post Reply
henchard
Still here
Posts: 19
Joined: Fri Apr 16, 2004 9:54 am
Location: Dorset UK

Regular expression to catch re[4] in subject?

Post by henchard » Sat Jan 01, 2005 1:36 pm

Happy new year. Whilst a tad hung over I'm not thinking too straight so can someone more in tune with Regular expressions help?

I'm getting a number of emails with re[x] in the subject line where x can seem to be any number from one upwards. At present I use subject|contains|re[x] putting each number for x on a separate line in my rule.

Is there a regular expression that could catch all instances where this (i.e x is any number) is in the subject line?

Thanks

User avatar
KY Dave
Not the Developer
Posts: 1599
Joined: Thu Mar 14, 2002 7:29 pm
Location: Burkesville, KY. U.S.A.
Contact:

Re: Regular expression to catch re[4] in subject?

Post by KY Dave » Sat Jan 01, 2005 2:36 pm

henchard wrote: I'm getting a number of emails with re[x] in the subject line where x can seem to be any number from one upwards. At present I use subject|contains|re[x] putting each number for x on a separate line in my rule.

Is there a regular expression that could catch all instances where this (i.e x is any number) is in the subject line?
I don't know about REG EXPR but you can do it with a wildcard.
Use a RULE with the CRITERIA of SUBJECT, WILDCARD, re[*
SAVE RULES, QUIT PopTray and RESTART for changes to take effect.
KY Dave

Family Blog
You can STOP SPAM using PopFile and PopTray.

Vanguard
Enthusiast
Posts: 40
Joined: Tue Oct 21, 2003 10:36 am

Post by Vanguard » Sat Jan 01, 2005 9:50 pm

Rule:
Area = Subject
Compare = Reg Expr
Text = ^re\[.*\]:
Not = unchecked

^ = Start the match at the beginning of the tested string.
re = Well, that's the prefix string you're looking for.
\[ = Is the left bracket but you need to escape it using the "\" because brackets are used to note multiple choice, as in "[a-z0-9]" meaning to match on any *single* instance of an alphanumeric character (you would use "[a-z0-9]*" to mean zero, or, more occurrences of an alphanumeric and "[a-z0-9]+" to mean one, or more occurrences of an alphanumeric).

I'm brand new to PopTray (got it yesterday). I'm assuming the author uses PCRE (Perl core regular expressions) although I see in the sticky post in the forum that all tests are case-insensitive, so you don't need to use a match spec like "^[rR][eE]\[.*\]:". Note that the ".*" construct doesn't match on just alphanumeric characters but on any character, so a Subject with "re[go$ofy--#]:" would also match. If you want to match on only alphanumerics enclosed within the brackets, use:

^re\[[a-z0-9]+\]:

where,

^ = Start match at start of tested string.
re = The prefix string you're look for (case-insensitive for PopTray).
\[ = The left bracket (the "\" escapes the normal use of the bracket so it gets used as just a character).
[a-z0-9]+ = One, or more, occurrences of any alphanumeric character.
\] = Right bracket
: = Just another character in the string you want to find.

Note that this will not match on "re[]:". If you want to include that, too, then use "^re\[[a-z0-9]*]\]:". It can get pretty easy to get lost in regular expressions, especially because of the need to escape any special characters.

Be aware that wildcards (glob constructs) are not the same as qualifiers in regular expressions. Because they have somewhat similar meanings, users get confused between the two and when to use them. For example:

"that*string"

when wildcarding will match on "thatxxxxxstring". The wildcard is a placeholder marker for a variable length insert point. However, it will NOT behave the same if used as a regular expression where "*" is a qualifier which means to use zero, or more, occurrences of the PRIOR character or string.

When using "that*string" in wildcarded string matching:
- It will will NOT match on "thastring" (because "*" is a placeholder, not a qualifier).
- It will match on "thattttttstring" (a bunch of "t" between "that" and "string").
- It will match on "thatxxxxxstring" (because any characters are allowed at the insert point).

When using "that*string" where "*" is a qualifier in a regular expression:
- It WILL match on "thastring" (because "t*" means zero, or more, occurences of the second "t", and there is zero occurrences in this case).
- It will match on "thattttttstring" (because there are "or more" occurrences from the "*t" construct).
- It will NOT match on "thatxxxxxstring" (the "*t" construct only permits zero, or more, occurrences of the "t" character, not of any character).

So you will get different match results depending on whether you wildcard in a string search or you use qualifiers in a regular expression. Alas, it seems everyone that implements regular expressions has their own way of doing so or has their own peculiar flavor of regular expressions. It is possible that the author doesn't use PCRE. For example, I've seen other characters than the "\" used as the escape character. For PopTray, the sticky post says all matching is case insensitive. Well, that means instead of using:

^[rR][eE]\[[a-zA-Z0-9]*\]:

which would match on "re", "RE", "rE", and "Re" at the start of the string, followed by a left bracket, followed by zero, or more, of any alphanumeric characters, followed by a right bracket, and followed by a colon, you can just use:

^re\[[a-z0-9]*\]:

If you want to search on only numbers within the brackets, use:

^re\[[0-9]*\]:

And if you don't want to include matches where there wasn't anything within the brackets, as in "re[]:", then use:

^re\[[0-9]+\]:

Because the "+" qualifier means ONE, or more, occurrences instead of zero, or more.

You might just want to stick with wildcarded string matching by using a rule like:

Rule:
Area = Subject
Compare = Wildcard
Text = re[*]:
Not = unchecked

Although, in your case, you get messages with numbers between the brackets, it's probably not important that they are numbers. The fact that there are square brackets after the "Re" is important because most replies are prefixed with "Re:", not "Re[something]:". However, when using wildcarding, this rule would also match on "How do I check for Re[number]: in the Subject?". That's why you end up using regular expressions to be very specific WHERE to find the substring.

Like I said, I'm completely new at using PopTray so it might have its own peculiar flavor of regular expressions that I'm not yet familiar with.

User avatar
KY Dave
Not the Developer
Posts: 1599
Joined: Thu Mar 14, 2002 7:29 pm
Location: Burkesville, KY. U.S.A.
Contact:

Post by KY Dave » Sat Jan 01, 2005 11:06 pm

Vanguard wrote: The fact that there are square brackets after the "Re" is important because most replies are prefixed with "Re:", not "Re[something]:". However, when using wildcarding, this rule would also match on "How do I check for Re[number]: in the Subject?". That's why you end up using regular expressions to be very specific WHERE to find the substring.
By using the criteria I had suggested, RE[* or as you suggested RE:*, the rule would NOT match your example of "How do I check for Re[number]:.

It would only match where the RE[ or RE: were the first 3 characters of the subject with any number of characters after.

In order to match your example you would need the criteria of *RE[* or *RE:*
KY Dave

Family Blog
You can STOP SPAM using PopFile and PopTray.

Vanguard
Enthusiast
Posts: 40
Joined: Tue Oct 21, 2003 10:36 am

Post by Vanguard » Sat Jan 01, 2005 11:32 pm

Ah, I didn't realize that the wildcarded string matching actually anchored the string. So, as you say, using wildcarding with "re[*]: " would work (I included the closing bracket, colon, and space because perhaps henchard doesn't want to fire on "re[somestringWithNoClosingBracketAndMissingAcolon", although I did assume there would be a colon athough henchard didn't show one). Since wildcarding anchors the match string, you should also be able to look for a string at the end, like "*string". Well, that makes wildcarding a bit more useful. Thanks for the heads up. Still learning PopTray.

henchard
Still here
Posts: 19
Joined: Fri Apr 16, 2004 9:54 am
Location: Dorset UK

Thanks

Post by henchard » Sun Jan 02, 2005 12:42 pm

Thanks guys

KY Dave's solution does the trick nicely (I had forgotten/never used the wild card option). I find regular expressions very useful but have to think quite hard (and probably too logically for me) but thanks to Vanguard also for a fairly detailed explanation even if I didn't use it on this occasion.

User avatar
lemming
Groupie
Posts: 55
Joined: Sun Jan 09, 2005 3:51 am
Location: Malaysia

catching only re[4]

Post by lemming » Fri Jan 14, 2005 6:21 am

I have a similar problem, though my spams consist of only re[4], with no other text after that.

It's fairly simple to catch such subject headings with a regex. I use

^re ?\[\d{1,2}\]:$

This will catch everything from re[0]: to re[99]:.

^re limits the search to only subjects that begin with re.

(space question mark) will look for optional spaces, so it will also catch
re [4]:

$ limits the search to only subjects like re[4]:

Note that some e-mail programs do generate valid subject headings like re[4]:

This happens when you reply to a reply. So the program may turn

re: re: re: re: finalize presentation

into

re[4]: finalize presentation

That's why I use the $ for my regex.

-Lemming.

User avatar
Bateman
PopTray Family
Posts: 664
Joined: Sun Nov 11, 2001 9:53 pm
Location: Germany

Post by Bateman » Sat Aug 13, 2005 2:11 pm

Since RegExp are still pretty confusing (to me), maybe some of the geniuses over here can help.

Problem:
I get flooded by spam mails starting with e.g. Re: [0/18]: YOu don't need.... The figures in brackets are random, most of the time followed by the usual crap like enlargement offers, medical BS, invoice or mortgage stuff and the like.

Question:
Can somebody please write me a nice RegExp that filters these Re: [??/??] subject lines?

Thanks in advance :D

User avatar
lemming
Groupie
Posts: 55
Joined: Sun Jan 09, 2005 3:51 am
Location: Malaysia

Post by lemming » Sun Aug 14, 2005 6:55 pm

Hi bateman, a slight variation of the previous regex should do the trick:

^re:? ?\[\d{1,2}(\\|/)\d{1,2}\]:

The (\\|/) section looks for both back-and-forward slashes.

This should match the following:

Re: [0/18]:
Re:[0/18]:
Re[11/20]:
Re [23/1]:
Re[11\20]:

and all variants. However, you should test this rule first by having it "mark as spam" instead of deleting.

-Lemming 8)

User avatar
Bateman
PopTray Family
Posts: 664
Joined: Sun Nov 11, 2001 9:53 pm
Location: Germany

Post by Bateman » Mon Aug 15, 2005 1:39 pm

Great, thanks a lot! I will test the rule and keep you updated.

User avatar
Bateman
PopTray Family
Posts: 664
Joined: Sun Nov 11, 2001 9:53 pm
Location: Germany

Post by Bateman » Fri Aug 19, 2005 1:43 pm

Update:
Although I don't have to suffer that much from these annoying mails anymore, the few that still arrived were perfectly caught by that regex.

Thanks again, lemming. Works like a charm.

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 5 guests