|
|
Back to UserFriendly Strip Comments Index
|
<Deleted> | <Deleted> | 2010-04-25 08:41:21 |
|
Actually, you bring up a good point. | by kelli217 | 2010-04-25 08:23:03 |
| The way it's been explained before... |
by Sharku |
2010-04-25 09:08:21 |
...is that spiders aren't really that smart, they'll trigger on anything that matches [:alpha:]+(\.[:alpha:]+)* at-sign [:alpha:]+(\.[:alpha:]+)*
I think that's a comprehensive regex for it, not sure though and too lazy to actually check. Anyway, long story short, no they don't discriminate against valid TLDs etc. Mind, I'm not claiming to be authoritative on this, it's just what I remember as being the lore around here wrt email spiders, gleaned from both other UFies and TPTB (Illiad, Myke...).
It makes sense from a spammer's point of view to keep the matching rules as simple as possible: the goal is mostly high throughput, not so much data validation. Since both email harvesting and spamming are mostly done by botnets anyway, you could argue that they have ample CPU and bandwidth available; but then you have to take into account that while it's fairly easy for a human to recognize or determine from the context whether something is ROT13 (ROT1-25 or any other obfuscation method) there's probably no easy way for a computer to do so other than to bruteforce it and check it against a list of know TLDs. A quick check on wikipedia tells me there's at least two symmetrical pairs of TLDs: .am, Armenia ROT13s into .nz, New Zealand; as do Mozambique and Zambia.
Finally, checking against a list of TLDs adds a layer of responsibility for the spammer: they have to maintain that list for when new TLDs get added or old ones are deprecated.
Again, not saying I'm authoritative on any of this, just giving a coder's view on it. "How I'd do it if I were a spammer" if you will. |
|
[ Reply ] |
|
|
[Todays Cartoon Discussion]
[News Index]
|
|