"…and not for five minutes will I be distracted from the wonder…"

Sitting

Uncategorized — d-ashes on August 31, 2005 at 6:50 pm

New Orleans is only one hour from here but it might as well be
another world now. I can’t begin to imagine what people who lived
there, much less people who grew up there and still lived there, are
feeling right now. It is immeasurable, just as some of the more
quantifiable statistics that are the result of Katrina’s devastation. The death toll
stands to be large, it seems. So will the amount of time it will take
to drain New Orleans, and even more so before more than half a million
people can return to…what? Nothing, really. Such is also the case
with many towns along the Mississippi Gulf Coast. The national news
showed pictures of what used to be Pass Christian, which was at the
epicenter of Katrina, and it too is just gone.

Even when you factor in the sensationalism that the media brings to covering such events, it’s overwhelming. It’s to the point,
really, where getting overwhelmed is overwhelming. Every new set of
pictures or camera footage I see just leaves me staring in disbelief.
Baton Rouge’s population stands to double in size in the next couple of
days, according to the news. Once again, factoring media embellishment,
that still means there are going to be a lot more people here.

Kelly and I will find somewhere to volunteer tomorrow in Baton
Rouge. I think that will help a bit: to get a small, realistic purchase
on a part of this huge thing and do something to help rather than sitting at home and letting all of this wash over us. Steve
and his mother will come stay with us at some point in the next week,
I’m sure, so I’ll get his take on the whole thing. I finally got in
touch with him yesterday and was thankful to find he had left (Steve
can be hard-headed about his home turf sometimes). His parent’s house,
where his mother lived, was in Kenner, which was hit pretty bad
initially, so it’s most likely gone.

Most
immediately, though, it seems that most everyone that I know who was
living in NO or on the coast is accounted for, which is what matters most.
Most of my college friends from the area were already up in Brookhaven
at Brad Boerner’s for the fantasy football draft party before the
evacuation was ever called. The house here is back to normal with power
and cable (though I’m loathe to turn on the TV now knowing what I’ll
see) and I have to give a big thumbs up to the Baton Rouge utility
services. They worked quite quickly and we count ourselves lucky to
have our power back so soon after the storm. The place I work is still without
power. We got the servers up and running on generators today and with
the internet back up at the house I will be working here until we
get full power back at the office.

Katrina Cometh

Uncategorized — d-ashes on August 28, 2005 at 8:47 am

Well, it looks like time has almost run out for Katrina to change
direction, though it’s not very nice to wish that kind of devastation
on anyone else. It seems that all the doomsday talk of the ‘perfect
storm’ that could practically sink (pdf) New Orleans wasn’t so doomsday after
all. I’d been followinig the tracking charts pretty closely (the design
firm I work for created this one
so I spend a fair amount of time with it) but it wasn’t until yesterday
evening that I saw a satellite image of Katrina. Damn, the girl’s big,
for sure.

While Baton Rouge stands to get a lot of rain and maybe some wind,
the fact that we’re on the weaker, western edge of the hurricane means
we’ll probably be
okay aside from power outages (unless a tree takes up residence in the
house, that is). But say a little prayer for New Orleans, friends, and
the people that can’t get out. I think they may need it.


In the meantime, it has been an absolutely beautiful day here. The
‘calm before the storm’ has brought a nice breeze and temperatures in
the low 80’s. I guess if we have to consider the possibility of as much
as a week without power then having some early fall weather beforehand is
some concession, though not a great one.

Comment Spam and PHP’s similar_text()

Uncategorized — d-ashes on August 21, 2005 at 8:22 pm

If you’ve wandered by the site in the last month or so (not that
I’ve given you much reason to, admittedly) then you may have noticed
that I now have a small comment spam
problem. I’m only getting between 7 or 10 a week, but it is annoying
and will probably only increase. So this morning I got up and started
at looking at ways that PHP can automatically filter out at least some
of the spam I’m receiving. The most prevalent method I’ve found is to
create an array of keywords to search for and when a comment is
submitted that contains any of those keywords it is identified as a
spam comment. I have two problems with this method:

  • It requires you to manage the list of keywords. Let’s face
    it, I’m lazy, and I don’t want to have to go digging through code for
    that array every time that I want to add a new banned word. I could
    store the words in a database table instead, but that would require me
    to manage that table either manually or by writing a section for it in
    the admin panel.
  • Based on the type of comment spam I’m
    getting (mostly gambling sites) and the company of friends that I keep,
    I would very likely block a legitimate post at some point in the
    future, as I like to play poker and sometimes mention either taking the
    table or losing my ass in a post and the person who I lost my ass to
    usually takes the opportunity to put salt in the wound. What good is a
    spam filter if it’s filtering the wrong things (though this does beg
    for developing a comment politeness filter)?

So, with the idea that I didn’t want to have to constantly manage
the comment spam catcher and that I wanted it to possess a bit of
‘intelligence’, I opted to compare the text of the comment when it was
received with all the messages that had been flagged as comment spam (it’s a
true/false field stored in the comments table of my database) and based
on the similarity with previous spam it would determine whether to
publish the message and not mark it as spam or to not publish it and
mark it as spam (therefore automatically maintaining a bank of messages to search against in the future).

Here’s the function I wrote and included in the ‘add comment’ script. It’s pretty simple stuff, really:

function commentSpam($new_comment) {
	$sql='SELECT * FROM comments WHERE spam=1;';
	$result=mysql_query($sql) or die(mysql_error());
	for($i=0;$i<mysql_numrows($result);$i++) {
		$spam_comment=mysql_result($result,$i,'comment');
		$compare=similar_text($new_comment,$spam_comment);
		if ($compare>100) {
			return true;
		}
	}
	return false;
}

You pass the text of the incoming comment as a parameter and the
function looks up the text of all existing comments marked as spam.
Then, for each comment spam already in the database it compares the
text of it to the new comment’s text with the similar_text()
function. This function returns a number that is the ‘similarity
score’. I ran the function on the text of all of my comments and found
that there was a very wide margin between the non spam comments and the
spam comments I’d already received. In my test spam comments returned a
score of 150 or more (more similar), with most of the scores in the 300
range. Legitimate comments usually returned a similarity score of no
more than 30 (less similar). For the time being I have the function
flagging a comment as spam if it returns a score of 100 or more, which
is pretty strict and may be made less strict (therefore a higher number) as the function is tested
in the real world.

As with any spam counter measures, this method certainly has its limitations.

  • It can only filter based on the comment spam that you’ve
    already received. As more diverse spam comes your way it will take the
    filter possibly missing a few posts that you’ll have to mark as spam
    manually. Really, though, if this thing will flag 3 out of every 5
    spam comments automatically, I’ll be happy.
  • It still stores
    comment spam in your database. If you have a high traffic site that is
    getting a large number of comment spam daily then this will continue to
    inflate the size of your database. A possible solution there is to
    write a function that goes through comments marked as spam and removes
    duplicate entries or entries with extra high similarity scores, since
    you’d only need one of those spam comments in the future to test
    against an incoming comment.
  • Similar_text() is expensive.
    From what I’ve read, it is a heavy draw on the server’s processing
    power. Looping through each existing comment spam and running
    similar_text() for each one exponentially increases that cost. It ran
    quite quickly on my server going through about 20 existing comment spam
    records, but the server load will increase as the number of spam
    comments in the database increases. Some possible fixes are:

    • To put a LIMIT restriction in the sql statement
      (and maybe also to sort by the date submitted to get more recent spam
      attempts) in the above function to only get a certain number of
      existing comment spam records, but this then limits the number of
      possible matches that the function can make to existing spam.
    • Nest
      the function in other logic that uses less taxing methods (like the
      keyword search mentioned above) to identify possible ‘problem comments’
      and then run the function that uses similar_text() if the comment text
      meets those preliminary criteria. This means you wouldn’t necessarily
      have to run the taxing code for every comment received, but does mean
      you have to be diligent in managing your keyword list.

Anyways, I started running the filter today and will post again in
the next couple of weeks as to how good a job it is doing. I haven’t
seen this method mentioned at all when searching for PHP solutions to
fight comment spam which is surprising to me, as it seems pretty
effective in theory and in tests. If any of you fellow geeks know a
good reason why this method isn’t very popular or if you have any
suggestions for improving the function, please let me know.

Settling

The Wonder — d-ashes on August 21, 2005 at 8:10 pm

Alright, the posting drought officially ends today. Kelly and I have moved into our little yellow house in Baton Rouge and while there are still some boxes to finish unpacking we are quite comfortable and settled. More pictures to come soon.

(more…)

North(er) Towards Home

Uncategorized — d-ashes on August 4, 2005 at 9:15 am

Howdy all. Time again for my bi-monthly check-in. To all you cats in
J-Town, Kelly and I will be up getting her stuff out of storage, and at
least I will be putting in my 4th year as author groupy at the Stories from the Blue Moon Cafe 4 celebration @ Lemuria (see the CL article and JFP article
also). I am sure I’ll be out on the town Saturday night, so I’ll come
haunt the regular haunts looking for all my favorite ghouls. Hope to
see you there.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. | Ashes & Water