Removing duplicated lines

  • Follow


Dear users,

although I'm newbie to awk I did web search for my question without
result.

I have squid log and I need to extract all IP addresses, count them,
and suppress repeated lines.

I would use

awk '{print $3}'  access.log    ⛂\               #to print all
IP addresses

| uniq -c                                      #and pipe it to uniq


what yields:
 40 192.168.1.130
   7 192.168.1.254
   3 192.168.1.130
   1 192.168.1.254
   2 192.168.1.130
   1 192.168.1.254
   2 192.168.1.130
   1 192.168.1.254
   1 192.168.1.130
   4 192.168.1.254
   2 192.168.1.130
  11 192.168.1.254
   3 192.168.1.130
2576 192.168.1.254
  95 192.168.1.147
  34 192.168.1.100
   1 192.168.1.254
  43 192.168.1.100

How to suppress output of duplicated IP addresses and count occurence
of an IP address ising solely awk?

Svata
0
Reply svatoboj 9/24/2004 7:34:33 AM

Svatoboj wrote:

> I have squid log and I need to extract all IP addresses, count them,
> and suppress repeated lines.
> 
> I would use
> 
> awk '{print $3}'  access.log    ⛂\               #to print all
> IP addresses
> 
> | uniq -c                                      #and pipe it to uniq

Your solution is almost correct:

  awk '{print $3}'  access.log  | sort | uniq -c
0
Reply ISO 9/24/2004 8:03:20 AM


On 2004-09-24, Svatoboj wrote:
> Dear users,
>
> although I'm newbie to awk I did web search for my question without
> result.
>
> I have squid log and I need to extract all IP addresses, count them,
> and suppress repeated lines.
>
> I would use
>
> awk '{print $3}'  access.log    ⛂\               #to print all
> IP addresses
>
>| uniq -c                                      #and pipe it to uniq
>
>
> what yields:
>  40 192.168.1.130
[snip]
>   43 192.168.1.100
>
> How to suppress output of duplicated IP addresses and count occurence
> of an IP address ising solely awk?

    If you want to show the count:

awk '{x[$3]++}
    END { for ( ip in x ) print x[ip], ip }'

    Otherwise:

awk 'x[$3]++ == 0 { print $3 }'

-- 
    Chris F.A. Johnson                  http://cfaj.freeshell.org/shell
    ===================================================================
    My code (if any) in this post is copyright 2004, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License
0
Reply Chris 9/24/2004 8:26:51 AM

On Fri, 24 Sep 2004 10:03:20 +0200, J�rgen Kahrs wrote:
> Svatoboj wrote:
>> I have squid log and I need to extract all IP addresses, count them,
>> and suppress repeated lines.
>> 
>> I would use
>> 
>> awk '{print $3}'  access.log    ⛂\               #to print all
>> IP addresses
>> 
>> | uniq -c                                      #and pipe it to uniq
> 
> Your solution is almost correct:
> 
>   awk '{print $3}'  access.log  | sort | uniq -c

I suspect he wants to avoid doing a sort on the full list, since that list
might be really huge for a busy web site. Hence, I think his ideal
solution would be something that takes the output of his "uniq -c" and
combines those counts. Sorting his summary list might be feasible.

However, if one is going to reprocess the list (using awk?) anyway, then I
would prefer Chris Johnson's first proposed solution, which counts up the
occurrences of IP addresses in the original log, and outputs the counts
and IP addresses at the END. Love that generalized awk table indexing!

-- 
Juhan Leemet
Logicognosis, Inc.

0
Reply Juhan 9/25/2004 6:24:51 PM

Juhan Leemet wrote:

> However, if one is going to reprocess the list (using awk?) anyway, then I
> would prefer Chris Johnson's first proposed solution, which counts up the
> occurrences of IP addresses in the original log, and outputs the counts
> and IP addresses at the END. Love that generalized awk table indexing!

Yes, that's a very short, elegant and efficient solution.
0
Reply ISO 9/25/2004 7:11:12 PM

On 24 Sep 2004 08:26:51 GMT in comp.lang.awk, "Chris F.A. Johnson"
<cfajohnson@gmail.com> wrote:

>On 2004-09-24, Svatoboj wrote:

>> How to suppress output of duplicated IP addresses and count occurence
>> of an IP address ising solely awk?
>
>    If you want to show the count:
>
>awk '{x[$3]++}
>    END { for ( ip in x ) print x[ip], ip }'

To sort the output, add '| "sort +0n"' or '| "sort +1"' before the
final '}'. 

-- 
Thanks. Take care, Brian Inglis 	Calgary, Alberta, Canada

Brian.Inglis@CSi.com 	(Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
    fake address		use address above to reply
0
Reply Brian 9/26/2004 7:29:26 AM

"Chris F.A. Johnson" <cfajohnson@gmail.com> wrote in message news:<2ri42bF1a2h16U1@uni-berlin.de>...

> 
>     If you want to show the count:
> 
> awk '{x[$3]++}
>     END { for ( ip in x ) print x[ip], ip }'
> 
>     Otherwise:
> 
> awk 'x[$3]++ == 0 { print $3 }'

The top one didn't work as expected. Is this oneline command?
Here is the result I got:

 192.168.1.254
 192.168.1.142
 192.168.1.110
 192.168.1.147
 192.168.1.130
 192.168.1.100
 192.168.1.101

The second one yields this:

192.168.1.100
192.168.1.101
192.168.1.254
192.168.1.130
192.168.1.142
192.168.1.147
192.168.1.110

So the only difference is in tab indenting.


So there is no sign of occurence counting.

Svata

BTW: Might it be OS dependent issue?
0
Reply svatoboj 9/27/2004 11:23:17 AM

On 2004-09-27, Svatoboj wrote:
> "Chris F.A. Johnson" <cfajohnson@gmail.com> wrote in message news:<2ri42bF1a2h16U1@uni-berlin.de>...
>
>> 
>>     If you want to show the count:
>> 
>> awk '{x[$3]++}
>>     END { for ( ip in x ) print x[ip], ip }'
>> 
>>     Otherwise:
>> 
>> awk 'x[$3]++ == 0 { print $3 }'
>
> The top one didn't work as expected.

   It works for me in various OSs, and various versions of awk.

   Make sure you are using exactly what I posted.

   If you still cannot get it to work, post some example input.

> Is this oneline command?
> Here is the result I got:
>
>  192.168.1.254
>  192.168.1.142
>  192.168.1.110
>  192.168.1.147
>  192.168.1.130
>  192.168.1.100
>  192.168.1.101
>
> The second one yields this:
>
> 192.168.1.100
> 192.168.1.101
> 192.168.1.254
> 192.168.1.130
> 192.168.1.142
> 192.168.1.147
> 192.168.1.110
>
> So the only difference is in tab indenting.
>
>
> So there is no sign of occurence counting.
>
> Svata
>
> BTW: Might it be OS dependent issue?


-- 
    Chris F.A. Johnson                  http://cfaj.freeshell.org/shell
    ===================================================================
    My code (if any) in this post is copyright 2004, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License
0
Reply Chris 9/27/2004 7:13:29 PM

7 Replies
379 Views

(page loaded in 0.081 seconds)

Similiar Articles:













7/26/2012 10:10:39 AM


Reply: