f



Invalid pointer - Munmap_chunk

Hi everyone,

I have some trouble using awk -- mawk.
When I use gawk, I got no problem.
But when I use mawk, I got some error :

*** glibc detected *** awk: munmap_chunk(): invalid pointer: 0x000000000061944c ***
And then a backtrace in /lib/libc.so at line 6 for awk.

Now, an explanation : I use a shell script to read about 1 million lines, and get some information. Lines are like : <ID> <num> xxxx xxxx <TYPE_INFO> <INFO> xxxxx...
So I use a 3-dimensional array in order to save <INFO> for each couple <ID, Num>, and then write them into a file. So, the array is like : array[ID, Num, type1] = "blabla", array[ID, Num, type2]="something else", etc with different ID and Num, of course.

I suppose I get an error due to the huge array created. Using gawk is working, until I get a too big array again, I guess. So, not a "real" solution.

Is there a way I can free memory for those array when I don't need it ? Or maybe I'm doing something wrong -- very likely :(

Well, I hope it's understandable... I can put an exemple of my code if needed (reading myFileIn)

awk -v FILE_OUT=$myFileOut'
  BEGIN {
    # reading from another file
  }
  {
    attributs=$6
    for (i = 7; i <= NF; ++i) # we get all the data
      attributs=attributs" "$i
    nb = split(attributs, tab, "|") # args are separated by "|"
   
    if ($5 == "THIS") {
      myTab[$1, $2, 1] = tab(9)
      myTab[$1, $2, 2] = tab(10)
    }
    else if ($5 == "THAT") {
      myTab[$1, $2, 3] = tab(2)
      myTab[$1, $2, 4] = tab(6)
      myTab[$1, $2, 5] = tab(7)
    }
    else if ($5 == "FINALLY") {
      write_file($1, $2)
    }
  }
  END {
  }
  
  function write_file(id, num) {
    printf "%3.3s%.7d%5.5s%5.5s%.2d%4.4s\n",
      myTab[id, num, 1],
      myTab[id, num, 2],
      myTab[id, num, 3],
      myTab[id, num, 4],
      myTab[id, num, 5] >> FILE_OUT
  }
' myFileIn
  

I use more field in my array, but the semantic is the same.
Thanks for your help... 

Regards,

Robin
0
robin
10/22/2013 12:55:06 PM
comp.lang.awk 3450 articles. 0 followers. Post Follow

7 Replies
873 Views

Similar Articles

[PageSpeed] 32

In article <81e17b26-02d8-4b8b-a3d7-1c4e7df6e19d@googlegroups.com>,
 <robin.geffroy@gmail.com> wrote:
>Hi everyone,
>
>I have some trouble using awk -- mawk.
>When I use gawk, I got no problem.
>But when I use mawk, I got some error :
>
>*** glibc detected *** awk: munmap_chunk(): invalid pointer: 0x000000000061944c ***
>And then a backtrace in /lib/libc.so at line 6 for awk.

Well, the error message probably doesn't matter much or mean anything
(unless you a Mawk developer).  It just means (as you've correctly deduced)
"too much array storage".

One thing I don't completely understand from your description is whether or
not gawk really does solve your problem.  Above you say that you "get no
problem", but then below you write:

    I suppose I get an error due to the huge array created. Using gawk is
    working, until I get a too big array again, I guess. So, not a "real"
    solution.

(See P.S. below)

>Now, an explanation : I use a shell script to read about 1 million lines, and get
>some information. Lines are like : <ID> <num> xxxx xxxx <TYPE_INFO> <INFO>
>xxxxx...

Anyway, if my analysis of your code is correct, then it looks like your
code logic is:

read a record
do some processing
write out the results
repeat

and there is no need for any inter-record storage.  I.e., each record's
results don't depend on any other record's results.  So, you don't need to
accumulate a big array.  In that case, there are two ways to go about
eliminating the large array:

    1) Start each loop with "delete bigarray".  That will clean out any old
	results and prevent the array from growing incrementally.

    2) Re-write the code so that it doesn't create the array in the first
	place.  Generally, if you don't need inter-record storage, you
	don't need an array like this.  I leave the details of this
	re-analysis/re-write to you.

P.S., a side question/comment: I'm actually surprised that you did have a
problem with gawk.  My understanding was that gawk was not supposed to have
limits like this.  I know that sounds "fantastic" and, of course, all
software has limits, but still, I hope those who are familiar with gawk and
its design goals will see what I mean here.  Out of curiosity, what
platform (DOS, Windows, Unix, Linux, Mac OSX, ...) are you running on?

-- 

Some of the more common characteristics of Asperger syndrome include: 

* Inability to think in abstract ways (eg: puns, jokes, sarcasm, etc)
* Difficulties in empathising with others
* Problems with understanding another person's point of view
* Hampered conversational ability
* Problems with controlling feelings such as anger, depression 
    and anxiety
* Adherence to routines and schedules, and stress if expected routine 
    is disrupted
* Inability to manage appropriate social conduct
* Delayed understanding of sexual codes of conduct
* A narrow field of interests. For example a person with Asperger 
    syndrome may focus on learning all there is to know about 
    baseball statistics, politics or television shows.
* Anger and aggression when things do not happen as they want
* Sensitivity to criticism
* Eccentricity
* Behaviour varies from mildly unusual to quite aggressive 
    and difficult

0
gazelle
10/22/2013 1:14:22 PM
Thanks for your answer.=20

First, for gawk : it DOES solve the problem, but I'm afraid the problem can=
 appear again, if a get a huger array with huger files... That's the only r=
eason I said it was not a "real" solution, sorry for the misunderstanding :=
) (I didn't got the problem even with 10m lines file for now, but who knows=
, maybe it will happen again...)
FYI, I'm using Debian / Linux 2.6 (64 bits). (with a ssh connection)

For your analysis, it's not totally exact. Actually my lines are more like =
that :
<ID_1> <num_1> xxxx xxxx <TYPE_INFO> <INFO> xxxxx
<ID_2> <num_2> xxxx xxxx <TYPE_INFO> <INFO> xxxxx
<ID_1> <num_2> xxxx xxxx <TYPE_INFO> <INFO> xxxxx
<ID_3> <num_1> xxxx xxxx <TYPE_INFO> <INFO> xxxxx
<ID_2> <num_1> xxxx xxxx <TYPE_INFO> <INFO> xxxxx
....


And I neither know how much "num" I have to read, nor I know how much "ID" =
I have to.
That's why I'm using an array : I store data related to "ID_1" into an arra=
y, and when I get a line like "ID_1 num_x xxx xxx <type> <MY_FINAL_INFO>", =
I write the content of my array in a file.

Thus, I can't delete my array at the beginning of each loop. I could delete=
 it (field by field) after I wrote it into the file.

Maybe there is a way to do that without any array (or maybe without a=20
3-dimensional array, which would be less memory-consuming), but I didn't fi=
nd it (yet ? )

Regards,

Robin
0
robin
10/22/2013 1:28:36 PM
In article <19b91627-db63-40ae-ab27-bce3f6283658@googlegroups.com>,
 <robin.geffroy@gmail.com> wrote:
....
>Maybe there is a way to do that without any array (or maybe without a 
>3-dimensional array, which would be less memory-consuming), but I didn't
>find it (yet ? )

Comments:
    1) Is there any particular reason why you *want* to use Mawk (rather
	than Gawk) ?  Note that I am not saying there isn't; there may very
	well be good reasons to do so.  Among them the fact that in some
	Debian distros, the default AWK is Mawk not Gawk.  Mistakenly so,
	IMHO.
    2) If you're in a hurry, I'd say just go with GAWK.  In this modern
	world, a millions lines of input isn't really that big.  If/when
	you do try it on 10,000,000 lines of input and it fails, then you
	can consider the next point.
    3) If you are using (or can use) Gawk 4.x (or later), then you can use
	the new (true) multi-dimensional arrays, so you could delete things
	you don't need once you've written them.  I.e., what I have in mind
	is modifying your write_file() routine so it looks something like:

	write_file() {
	    do stuff with myTab[a][b][c]
	    delete myTab[a]
	    }

	That should delete everything in the array that has 'a' as its
	first subscript, while leaving everything else intact.

-- 
(This discussion group is about C, ...)

Wrong.  It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch [sic] revelations of the childhood
traumas of the participants...

0
gazelle
10/22/2013 4:27:57 PM
In article <l45tne$nf5$1@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
>P.S., a side question/comment: I'm actually surprised that you did have a
>problem with gawk.

He didn't, which you probably know by now.

> My understanding was that gawk was not supposed to have limits like this.

Indeed, that is true.  "No arbitrary limits" is one of the major
principles behind GNU software. To my knowledge gawk is only limited by
what the operating system will let it allocate.

> I know that sounds "fantastic" and, of course, all software has limits,
> but still, I hope those who are familiar with gawk and its design goals
> will see what I mean here.

There are limits, of course.  But they are imposed by the OS
and/or compiler and/or system architecture.

Arnold
-- 
Aharon (Arnold) Robbins 			arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972 8 979-0381
Nof Ayalon
D.N. Shimshon 9978500	ISRAEL
0
arnold
10/22/2013 5:46:02 PM
Hi.

In article <19b91627-db63-40ae-ab27-bce3f6283658@googlegroups.com>,
 <robin.geffroy@gmail.com> wrote:
>First, for gawk : it DOES solve the problem, but I'm afraid the problem
>can appear again, if a get a huger array with huger files... That's the
>only reason I said it was not a "real" solution,

See the note that I just posted. On a 64-bit system with modern OS
(GNU/Linux, other Unix, 64-bit Windows [if gawk was compiled in 64 bit
mode]) the limits are so large that you really should not have to worry
about this.

If you can use gawk, you probably should.

Thanks,

Arnold
-- 
Aharon (Arnold) Robbins 			arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972 8 979-0381
Nof Ayalon
D.N. Shimshon 9978500	ISRAEL
0
arnold
10/22/2013 5:48:18 PM
Well, thanks to all of you :)

Kenny, there is no reason I want to use mawk. I was just wondering if there=
 were a way to make things "cleaner".
I can use Gawk without any problem, and I will. But, as you said, default a=
wk on this machine is Mawk and not Gawk. I just wanted to use "awk" (-> maw=
k) in order not to change the script relative to others (there's a bunch of=
 script like mine, using "awk" and not "gawk", that's it. But those scripts=
 are less complexe, thus don't need gawk...)

I didn't know I can use multi-dimensionnal arrays like that. I'll take a lo=
ok at that :)

Anyway, subject close, that's fine for me, thanks to you :)

Regards,

Robin
0
robin
10/23/2013 9:08:13 AM
In article <c847d792-38d5-4e19-80e9-8d27557fc093@googlegroups.com>,
 <robin.geffroy@gmail.com> wrote:
>Well, thanks to all of you :)
>
>Kenny, there is no reason I want to use mawk. I was just wondering if there were
>a way to make things "cleaner".
>I can use Gawk without any problem, and I will. But, as you said, default awk on
>this machine is Mawk and not Gawk. I just wanted to use "awk" (-> mawk) in order
>not to change the script relative to others (there's a bunch of script like mine,
>using "awk" and not "gawk", that's it. But those scripts are less complexe, thus
>don't need gawk...)
>
>I didn't know I can use multi-dimensionnal arrays like that. I'll take a look at that :)
>
>Anyway, subject close, that's fine for me, thanks to you :)
>
>Regards,
>
>Robin

Great!  Sounds like everything is under control.

The net at its best...

-- 
"They shall be attended by boys graced with eternal youth, who to the
beholder?s eyes will seem like sprinkled pearls.  When you gaze  upon that
scene, you will behold a kingdom blissful and glorious."

--- Qur'an 76:19 ---
0
gazelle
10/23/2013 12:25:22 PM
Reply: