php - Analysing alogithm possibly based on regular intervals to check for bots and spiders -


i'm trying build script shows me list of ip's bots/spiders.

i wrote script imports access log of apache mysql db can try manage php , mysql.

i've noticed lot of bots have regular intervals, send out request every 2 or 3 seconds. there easy way of showing these patterns query or php script? or, harder think, there algorithm can recognise these bots / spiders.

db:

create table if not exists `access_log` (   `ip` varchar(16) not null,   `datetime` datetime not null,   `method` varchar(255) not null,   `status` varchar(255) not null,   `referrer` varchar(255) not null,   `agent` varchar(255) not null,   `site` smallint(6) not null ); 

official bots identify themselves. there's list @ http://www.robotstxt.org/db.html

for unofficial ones guess try looking of following:

  • page requests no other resource requests (images, css , javascript etc)
  • strange url requests (lot's of requests login pages, ones don't exist such wp-admin on drupal site)
  • successive page view's in short amount of time
  • exactly same url signatures coming many different ip's
  • no http referrer ip's you've never seen before
  • lot's of comment posts in short session
  • requests public proxy servers

that's of thing's i've noticed annoying ba***s keep trying scrape , spam site anyway. of them need combined in order filter out real requests same characteristics.


Comments

Popular posts from this blog

linux - Mailx and Gmail nss config dir -

c# - Is it possible to remove an existing registration from Autofac container builder? -

php - Mysql PK and FK char(36) vs int(10) -