Hi all,
I have script which is used to read the apache server log file and
returns the set values. Here is the script,
require 'rubygems'
require 'apachelogregex'
require 'set'
require 'pp'
urls = Set.new
format = '%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}\"'
parser = ApacheLogRegex.new(format)
File.readlines('aleurier_access.log').collect do |line|
@line = parser.parse(line)
urls.add(@line["%h"])
# {"%r"=>"GET /blog/index.xml HTTP/1.1", "%h"=>"87.18.183.252", ... }
end
puts urls
pp @line
For eg, when I run this script it prints all host ID, what i like to do is
how to print repeated IP addresses only once and also prints the other IP
addresses which is not repeated again.
I tried uniq but it supports only array elements not reads through files. I
also tried this one it can used for small file but it doesn`t suits for
larger files.
require 'rubygems'
require 'apachelogregex'
require 'set'
require 'pp'
urls = Set.new
format = '%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}\"'
parser = ApacheLogRegex.new(format)
File.readlines('aleurier_access.log').collect do |line|
l = parser.parse(line)
urls.add(l["%h"])
# {"%r"=>"GET /blog/index.xml HTTP/1.1", "%h"=>"87.18.183.252", ... }
end
puts urls.to_a.sort
pp l
--
Cheers,
Ranjith Kumar.K,
Software Engineer,
Sedin Technologies,
http://ranjithtenz.wordpress.com/
http://victusads.com/