php - Analysing alogithm possibly based on regular intervals to check for bots and spiders -
i'm trying build script shows me list of ip's bots/spiders.
i wrote script imports access log of apache mysql db can try manage php , mysql.
i've noticed lot of bots have regular intervals, send out request every 2 or 3 seconds. there easy way of showing these patterns query or php script? or, harder think, there algorithm can recognise these bots / spiders.
db:
create table if not exists `access_log` ( `ip` varchar(16) not null, `datetime` datetime not null, `method` varchar(255) not null, `status` varchar(255) not null, `referrer` varchar(255) not null, `agent` varchar(255) not null, `site` smallint(6) not null );
official bots identify themselves. there's list @ http://www.robotstxt.org/db.html
for unofficial ones guess try looking of following:
- page requests no other resource requests (images, css , javascript etc)
- strange url requests (lot's of requests login pages, ones don't exist such wp-admin on drupal site)
- successive page view's in short amount of time
- exactly same url signatures coming many different ip's
- no http referrer ip's you've never seen before
- lot's of comment posts in short session
- requests public proxy servers
that's of thing's i've noticed annoying ba***s keep trying scrape , spam site anyway. of them need combined in order filter out real requests same characteristics.
Comments
Post a Comment