f



Number lines of file2 based on the contents of file1.

Hi all,

I've two files, file1 and file2, which have the contents as following:

file1:

line_number_1 line_1
line_number_2 line_2
....
line_number_n line_n

file2:

line_1
line_2
...
line_m

Now, I want to add line numbers for each line in the file2, the rules are 
as follows:

[1] The line numbers begin from 1, and increase naturally.
[2] If the line is also appeared in the file1, then just not use it for 
numbering.
[3] If the line number also appeared in the file1, then just not use it.

For the above purpose, I write the following codes:

awk '
BEGIN { ind = 1 }
!x { 
  appeared_ind[$1]
  appeared_line[$2]
  next
}

! ($0 in appeared_line) {
  while (ind in appeared_ind) ind ++ 
  print ind, $0
  ind ++
}
' file1 x=1 file2

It seems this can do my job.  But my code seems not so graceful, could 
you please give me some notes/hints on touching-up my above codes? 

Regards
-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/17/2016 12:40:04 PM
comp.lang.awk 3450 articles. 0 followers. Post Follow

9 Replies
323 Views

Similar Articles

[PageSpeed] 18

El 17/12/2016 a las 13:40, Hongyi Zhao escribió:
> Hi all,
>
> I've two files, file1 and file2, which have the contents as following:
>
> file1:
>
> line_number_1 line_1
> line_number_2 line_2
> ...
> line_number_n line_n
>
> file2:
>
> line_1
> line_2
> ..
> line_m
>
> Now, I want to add line numbers for each line in the file2, the rules are
> as follows:
>
> [1] The line numbers begin from 1, and increase naturally.
> [2] If the line is also appeared in the file1, then just not use it for
> numbering.
> [3] If the line number also appeared in the file1, then just not use it.
>
> For the above purpose, I write the following codes:
>
> awk '
> BEGIN { ind = 1 }
> !x {
>    appeared_ind[$1]
>    appeared_line[$2]
>    next
> }
>
> ! ($0 in appeared_line) {
>    while (ind in appeared_ind) ind ++
>    print ind, $0
>    ind ++
> }
> ' file1 x=1 file2
>
> It seems this can do my job.  But my code seems not so graceful, could
> you please give me some notes/hints on touching-up my above codes?

1. Let 'ind' start at 0 and keep it as the last used line number. No 
need for a BEGIN rule.

2. Forget about the 'x' control variable. Use the usual NR==FNR idiom to 
detect records from the first file. And invoke awk just as
     awk '....' file1 file2

3. Use 'do ... while' instead of 'while ...' to search for the next 
unused line number:
     do ind++ while (ind in appeared_ind)
This way the last ind++ is unnecessary.

HTH.
0
Manuel
12/17/2016 6:08:52 PM
On Sat, 17 Dec 2016 19:08:52 +0100, Manuel Collado wrote:

> 2. Forget about the 'x' control variable. Use the usual NR==FNR idiom to
> detect records from the first file. And invoke awk just as
>      awk '....' file1 file2

I use the `x' control variable for dealing with the case of file1 is 
empty, while the  `NR==FNR' method cann't deal this case.

Regards
-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/17/2016 11:47:28 PM
On Sat, 17 Dec 2016 19:08:52 +0100, Manuel Collado wrote:

> 3. Use 'do ... while' instead of 'while ...' to search for the next
> unused line number:
>      do ind++ while (ind in appeared_ind)
> This way the last ind++ is unnecessary.

What's the corresponding for-loop based version of the above code? I 
tried with for-loop but still not succeed.

Regards
-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/18/2016 12:09:20 AM
On Sat, 17 Dec 2016 19:08:52 +0100, Manuel Collado wrote:

>      do ind++ while (ind in appeared_ind)

I tried and it seems that this must be written as follows:

do { ind ++ } while (ind in appeared_ind)

Regards
-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/18/2016 12:42:25 AM
On 18.12.2016 01:42, Hongyi Zhao wrote:
> On Sat, 17 Dec 2016 19:08:52 +0100, Manuel Collado wrote:
> 
>>      do ind++ while (ind in appeared_ind)
> 
> I tried and it seems that this must be written as follows:
> 
> do { ind ++ } while (ind in appeared_ind)

or...

  do ind++; while (ind in appeared_ind)

or...

  do ind++
  while (ind in appeared_ind)


Janis

> 
> Regards
> 

0
Janis
12/18/2016 12:53:20 AM
On Sun, 18 Dec 2016 01:53:20 +0100, Janis Papanagnou wrote:

> or...
> 
>   do ind++; while (ind in appeared_ind)
> 
> or...
> 
>   do ind++
>   while (ind in appeared_ind)

Thanks for your notes.

Regards
> 
> 
> Janis





-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/18/2016 1:01:41 AM
On Sun, 18 Dec 2016 01:53:20 +0100, Janis Papanagnou wrote:

>> do { ind ++ } while (ind in appeared_ind)
> 
> or...
> 
>   do ind++; while (ind in appeared_ind)
> 
> or...
> 
>   do ind++
>   while (ind in appeared_ind)

I usually use a space when writing the operator `++' and the 
corresponding variable for readability:

var ++ 

Regards
> 
> 
> Janis





-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/18/2016 2:16:30 AM
On Sat, 17 Dec 2016 19:08:52 +0100, Manuel Collado wrote:

> 1. Let 'ind' start at 0 and keep it as the last used line number. No
> need for a BEGIN rule.
> 
> 2. Forget about the 'x' control variable. Use the usual NR==FNR idiom to
> detect records from the first file. And invoke awk just as
>      awk '....' file1 file2
> 
> 3. Use 'do ... while' instead of 'while ...' to search for the next
> unused line number:
>      do ind++ while (ind in appeared_ind)
> This way the last ind++ is unnecessary.

Based on the above notes, I rewritten my code as follows:

awk '
!x { 
  appeared_ind[$1]
  appeared_line[$2]
  next
}

! ($0 in appeared_line) {
  do { ind ++ } while (ind in appeared_ind) 
  print ind, $0
}  
' file1 x=1 file2

This will do the job and has the more concise form.  But, I also tried 
the for-loop based method as follows:

awk '
!x { 
  appeared[$2] = $1
  next
}

! ($0 in appeared) {
  ind ++
  for (i in appeared) {
    if (ind == appeared[i]) { ind ++; contiune }
    else 
      break 
  }
  print ind, $0
} 
' file1 x=1 file2

But, the second method will give error results.  What's the bug in my 
code?

Regards

> 
> HTH.





-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/18/2016 7:51:09 AM
On Sun, 18 Dec 2016 07:51:09 +0000, Hongyi Zhao wrote:

> awk '
> !x {
>   appeared[$2] = $1 next
> }
> 
> ! ($0 in appeared) {
>   ind ++
>   for (i in appeared) {
>     if (ind == appeared[i]) { ind ++; contiune }
>     else
>       break
>   }
>   print ind, $0
> }
> ' file1 x=1 file2

After some thought on the code, I find the following will do the trick:

awk '
BEGIN { PROCINFO["sorted_in"] = "@val_num_asc" }
!x { 
  appeared[$2] = $1
  next
}

! ($0 in appeared) {
  ind ++
  for (i in appeared)
    if (ind == appeared[i]) 
      ind ++
  print ind, $0
} 
' file1 x=1 file2

Still, it seems the above codes are not the most elegant solution. If 
there is any touching-up for the code, please give me some hints/notes. 
Thanks in advance.

Regards
-- 
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
0
Hongyi
12/18/2016 2:54:18 PM
Reply: