Value too large for defined data type...

  • Follow


This error has been popping up since a few days back on our production
servers. Googling it retrieved the following article:
http://docs.sun.com/app/docs/doc/806-1075/6jacsnin5?a=view

I'm not into Solaris administration but this issue has been bugging me
since quite some time. If anyone can help explain it to me in simple
terms; also what are the recommended solutions.

Thanks.
0
Reply mapsiddiqui (2) 12/22/2009 5:20:19 PM

maps wrote:
> This error has been popping up since a few days back on our production
> servers. Googling it retrieved the following article:
> http://docs.sun.com/app/docs/doc/806-1075/6jacsnin5?a=view
> 
> I'm not into Solaris administration but this issue has been bugging me
> since quite some time. If anyone can help explain it to me in simple
> terms; also what are the recommended solutions.
> 
> Thanks.

In plain English, it's trying to put ten pounds of shit in a five pound 
bag!  Most likely it's trying to put a 32 bit value into a 16 bit field.
Possibly trying to put a 16 bit value into an eight byte field or even a 
64 bit value into a smaller field.  Whichever sizes are involved, the 
value you are trying to save has more bits than the variable you are 
trying to store it in.

http://docs.sun.com/app/docs/doc/806-1075/6jacsnin5?a=view
explains in a little more detail.

Whatever program is issuing this error message is carelessly written! 
The compiler may have recognized the problem and issued an error or 
warning message.  If so, somebody chose to ignore it.

Go thou and clean your house and maybe demote a programmer to a position 
better suited to his abilities.  If you wrote it, put a bag over your 
head and hope that no one recognizes you.
0
Reply Richard 12/22/2009 6:43:02 PM


In article <39d2df64-5d73-4650-a5e9-266dc5d9211d@t42g2000yqd.googlegroups.com>,
	maps <mapsiddiqui@gmail.com> writes:
> This error has been popping up since a few days back on our production
> servers. Googling it retrieved the following article:
> http://docs.sun.com/app/docs/doc/806-1075/6jacsnin5?a=view
> 
> I'm not into Solaris administration but this issue has been bugging me
> since quite some time. If anyone can help explain it to me in simple
> terms; also what are the recommended solutions.

I can think of several things, but first, you'll have to give some context.

-- 
Andrew Gabriel
[email address is not usable -- followup in the newsgroup]
0
Reply andrew 12/22/2009 7:55:21 PM

On 22 Dec, 17:20, maps <mapsiddi...@gmail.com> wrote:
> This error has been popping up since a few days back on our production
> servers. Googling it retrieved the following article:http://docs.sun.com/app/docs/doc/806-1075/6jacsnin5?a=view
>
> I'm not into Solaris administration but this issue has been bugging me
> since quite some time. If anyone can help explain it to me in simple
> terms; also what are the recommended solutions.
>
> Thanks.

The program that you are running has no been compiled to handle large
user IDs or large group IDs and it is trying to process a large user
ID or group ID.

http://docs.sun.com/app/docs/doc/802-5366/6i94lvccc?a=view

"Previous Solaris 2.x software releases used 32-bit data types to
contain the user IDs (UIDs) and group IDs (GIDs), but UIDs and GIDs
were constrained to a maximum useful value of 60000. In the Solaris
2.5.1 release, the limit on UID and GID values has been raised to the
maximum value of a signed integer, or 2147483647."

David.
0
Reply david 12/22/2009 11:16:09 PM

Thanks to all for your replies ! I am just a humble programmer who is
needed to use solaris as our production servers run on it. there is a
separate solaris admin team which handles all administration tasks.

Coming back to the topic, allow me to cite a few examples and my
understanding on this whole issue:
1. we started facing this problem over the weekend with sendmail with
the following command erroring out:
       sed 's/RECIPIENT_EMAIL_ID/someemailid/' mailfiletemplate | /usr/
lib/sendmail -t
       stdin: Value too large for defined data type
    when this stopped working we came up with a workaround:
       sed 's/RECIPIENT_EMAIL_ID/someemailid/' mailfiletemplate >
mailfilefinal
       /usr/lib/sendmail -t < mailfilefinal
    and this worked.
2. The following also stopped working:
       zcat somearchive.Z | diff somefile -
       diff: stdin: Value too large for defined data type

It is interesting to note that none of the above programs (sendmail,
diff etc) have a modification date in the past one week (so they were
not compiled/replaced/modified). I am not sure if this has anything to
do with a 32-bit binary being executed on a 64-bit system (which
should work perfectly fine, as far as i know).

By the way, our production box has Solaris 5.9 64-bit for Sun Sparc
(obtained using isainfo -kv)

Thanks.
0
Reply maps 12/23/2009 4:15:35 PM

On 2009-12-23 16:15:35 +0000, maps said:

> Thanks to all for your replies ! I am just a humble programmer who is
> needed to use solaris as our production servers run on it. there is a
> separate solaris admin team which handles all administration tasks.
> 
> Coming back to the topic, allow me to cite a few examples and my
> understanding on this whole issue:
> 1. we started facing this problem over the weekend with sendmail with
> the following command erroring out:
>        sed 's/RECIPIENT_EMAIL_ID/someemailid/' mailfiletemplate | /usr/
> lib/sendmail -t
>        stdin: Value too large for defined data type
>     when this stopped working we came up with a workaround:
>        sed 's/RECIPIENT_EMAIL_ID/someemailid/' mailfiletemplate >
> mailfilefinal
>        /usr/lib/sendmail -t < mailfilefinal
>     and this worked.
> 2. The following also stopped working:
>        zcat somearchive.Z | diff somefile -
>        diff: stdin: Value too large for defined data type
> 
> It is interesting to note that none of the above programs (sendmail,
> diff etc) have a modification date in the past one week (so they were
> not compiled/replaced/modified). I am not sure if this has anything to
> do with a 32-bit binary being executed on a 64-bit system (which
> should work perfectly fine, as far as i know).
> 
> By the way, our production box has Solaris 5.9 64-bit for Sun Sparc
> (obtained using isainfo -kv)

It looks more like pipes aren't working. Has the shell changed?

-- 
Chris

0
Reply Chris 12/23/2009 4:21:41 PM

I actually do not know if somebody from the admin team changed it. I
have tried it in bash, ksh and csh and this still fails.
0
Reply maps 12/23/2009 4:31:10 PM

On 2009-12-23 16:31:10 +0000, maps said:

> I actually do not know if somebody from the admin team changed it. I
> have tried it in bash, ksh and csh and this still fails.

It probably isn't a shell problem then. Has libc changed? The shells 
will be calling pipe(2) which is in that library.

Does stracing each side of the pipe show the call that's failing?

-- 
Chris

0
Reply Chris 12/23/2009 4:38:47 PM

Chris, how does one check libc ? How to strace ? I know abt dtrace but
havent used it yet. And isnt dtrace available only since solaris 10 ?

Let me know.

Thanks.
0
Reply maps 12/23/2009 7:06:53 PM

On 2009-12-23 19:06:53 +0000, maps said:

> Chris, how does one check libc ? How to strace ? I know abt dtrace but
> havent used it yet. And isnt dtrace available only since solaris 10 ?

Check the modification time on /usr/lib/libc.*

I meant truss, not strace. Sorry! Truss has a good manpage.

-- 
Chris

0
Reply Chris 12/23/2009 7:54:29 PM

> Check the modification time on /usr/lib/libc.*

None of the files were modified last weekend (when the problem
actually started). All of them are at least over 2 months old.

> I meant truss, not strace. Sorry! Truss has a good manpage.

Good idea! I will check it out. thanks!

-maps.
0
Reply maps 12/23/2009 8:34:31 PM

maps <mapsiddiqui@gmail.com> writes:

> This error has been popping up since a few days back on our production
> servers. Googling it retrieved the following article:
> http://docs.sun.com/app/docs/doc/806-1075/6jacsnin5?a=view
>
> I'm not into Solaris administration but this issue has been bugging me
> since quite some time. If anyone can help explain it to me in simple
> terms; also what are the recommended solutions.
>
> Thanks.

I have seen this issue when some field in a stat buffer is out of
range, like atime, mtime or ctime, which is defined as a time_t, which
is typedef:ed as a long. A long is 32-bit in a 32-bit binary and
64-bit in a 64-bit binary, so you can set the times on a file with a
64-bit application that will be too large for a 32-bit application.

If you know how to provoke the issue, run the application under truss
to see if this is the issue, or if you are on s10 (or later) you may
try "dtrace" to see what happens.

Thomas
0
Reply Thomas 12/23/2009 11:22:06 PM

Heres an update:

A case was opened with Sun and they suggested a workaround for the -
problem in the following manner:

zcat somearchive.Z | diff somefile /dev/stdin

I compared the truss results from both variations and the output looks
the same except for write:

diff: stdin: Value too large for defined data type
write(1, " C o m p a n y , S t o r".., 3748)    Err#32 EPIPE
    Received signal #13, SIGPIPE [default]

for /dev/stdin:

write(1, " C o m p a n y , S t o r".., 3748)    = 3748

Is there a way I can dig deeper ?

-maps.
0
Reply maps 12/28/2009 5:00:01 PM

On 2009-12-28 17:00:01 +0000, maps said:

> Heres an update:
> 
> A case was opened with Sun and they suggested a workaround for the -
> problem in the following manner:
> 
> zcat somearchive.Z | diff somefile /dev/stdin
> 
> I compared the truss results from both variations and the output looks
> the same except for write:
> 
> diff: stdin: Value too large for defined data type
> write(1, " C o m p a n y , S t o r".., 3748)    Err#32 EPIPE
>     Received signal #13, SIGPIPE [default]
> 
> for /dev/stdin:
> 
> write(1, " C o m p a n y , S t o r".., 3748)    = 3748
> 
> Is there a way I can dig deeper ?

The man page for write says (Solaris 10) it returns EPIPE when:

     EPIPE      An attempt is made to write to a pipe or  a  FIFO
                that  is  not open for reading by any process, or
                that has only one end open (or to a file descrip-
                tor   created   by  socket(3SOCKET),  using  type
                SOCK_STREAM that is no longer connected to a peer
                endpoint).  A SIGPIPE signal will also be sent to
                the thread. The process dies unless special  pro-
                visions were taken to catch or ignore the signal.

So what's happening to the process with the other end of the pipe?

-- 
Chris

0
Reply Chris 12/28/2009 7:17:11 PM

> So what's happening to the process with the other end of the pipe?

we are comparing the standard input with somefile


zcat somearchive.Z | diff somefile -

-maps.
0
Reply maps 12/28/2009 7:58:46 PM

On Mon, 28 Dec 2009 11:58:46 -0800, maps wrote:

>> So what's happening to the process with the other end of the pipe?
> 
> we are comparing the standard input with somefile
> 
> 
> zcat somearchive.Z | diff somefile -

How big are these files?  Any chance you're running
into a 2GB or 4GB limit?

-- 
Jeremy

0
Reply jgh 12/28/2009 8:15:23 PM

> How big are these files? =A0Any chance you're running
> into a 2GB or 4GB limit?

The files are 3/4 kilobytes.

-maps.
0
Reply maps 12/28/2009 8:16:57 PM

On 2009-12-28 19:58:46 +0000, maps said:

>> 
>> So what's happening to the process with the other end of the pipe?
> 
> we are comparing the standard input with somefile
> 
> 
> zcat somearchive.Z | diff somefile -

I mean at a lower level - is the process with the other end of the pipe 
going away earlier than expected, or closing the pipe for some other 
reason?

How's disk space in /tmp and /var? (Long shot is you're out of 
temporary space somewhere.)

-- 
Chris

0
Reply Chris 12/29/2009 9:40:20 AM

> I mean at a lower level - is the process with the other end of the pipe
> going away earlier than expected, or closing the pipe for some other
> reason?

I cannot say for sure; diff is a standard unix tool and as yet I never
cared to learn how it works at the lower level.

> How's disk space in /tmp and /var? (Long shot is you're out of
> temporary space somewhere.)

No problems with disk space; there's plenty of space available for
both directories. /var is rwxr-xr-x; not sure if this will cause any
problems ?

-maps.
0
Reply maps 12/29/2009 6:47:37 PM

On Mon, 28 Dec 2009 19:17:11 +0000, Chris Ridd wrote:

> On 2009-12-28 17:00:01 +0000, maps said:
> 
>> Heres an update:
>> 
>> A case was opened with Sun and they suggested a workaround for the -
>> problem in the following manner:
>> 
>> zcat somearchive.Z | diff somefile /dev/stdin
>> 
>> I compared the truss results from both variations and the output looks
>> the same except for write:
>> 
>> diff: stdin: Value too large for defined data type write(1, " C o m p a
>> n y , S t o r".., 3748)    Err#32 EPIPE
>>     Received signal #13, SIGPIPE [default]

Exactly what were you trussing here?  The zcat process,
the diff process, or both?

I'd like to see a truss of the diff, but I don't
think that was it.

-- 
jgh
0
Reply jgh 12/29/2009 7:04:43 PM

> Exactly what were you trussing here? =A0The zcat process,
> the diff process, or both?

trussed the entire command line (i.e. zcat and diff)


> I'd like to see a truss of the diff, but I don't
> think that was it.

diff, per se, works and its only in this particular usage it fails. So
I am not sure if trussing diff itself would help

-maps.
0
Reply maps 12/29/2009 7:41:42 PM

On 2009-12-29 19:41:42 +0000, maps said:

>> 
>> Exactly what were you trussing here?  The zcat process,
>> the diff process, or both?
> 
> trussed the entire command line (i.e. zcat and diff)

In that case you only trussed the zcat :-) You need to do something 
like this instead to truss both ends of the pipe:

truss -o /tmp/lhs zcat foo.Z | truss -o /tmp/rhs whatever command

> 
> 
>> I'd like to see a truss of the diff, but I don't
>> think that was it.
> 
> diff, per se, works and its only in this particular usage it fails. So
> I am not sure if trussing diff itself would help

The left hand side of the pipe is failing to write data, and the main 
documented way that can happen is if there's something odd happening to 
the process on the right hand side of the pipe. So you need to look 
closely at that.

You'll probably want to use truss's -d option (perhaps a better 
timestamping option?) on both invocations so you can correlate what's 
happening at any given point.

-- 
Chris

0
Reply Chris 12/29/2009 7:54:32 PM

ok heres the truss output after calling truss on both sides with -d
option:

Base time stamp:  1262120381.9711  [ Tue Dec 29 14:59:41 CST 2009 ]
 0.0000	execve("/usr/bin/diff", 0xFFBFFABC, 0xFFBFFACC)  argc = 3
 0.0041	resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
 0.0044	resolvepath("/usr/bin/diff", "/usr/bin/diff", 1023) = 13
 0.0048	stat("/usr/bin/diff", 0xFFBFF880)		= 0
 0.0048	open("/var/ld/ld.config", O_RDONLY)		Err#2 ENOENT
 0.0052	stat("/opt/app/xxxxxx/ncr/tbuild/12.00.00.00/lib/libc.so.1",
0xFFBFF388) Err#2 ENOENT
 0.0056	stat("/usr/lib/libc.so.1", 0xFFBFF388)		= 0
 0.0057	resolvepath("/usr/lib/libc.so.1", "/usr/lib/libc.so.1", 1023)
= 18
 0.0064	open("/usr/lib/libc.so.1", O_RDONLY)		= 3
 0.0067	mmap(0x00010000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_ALIGN, 3, 0) = 0xFF3B0000
 0.0072	mmap(0x00010000, 802816, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|
MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF280000
 0.0076	mmap(0xFF280000, 703464, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_FIXED, 3, 0) = 0xFF280000
 0.0078	mmap(0xFF33C000, 24496, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED, 3, 704512) = 0xFF33C000
 0.0080	mmap(0xFF342000, 6720, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFF342000
 0.0081	munmap(0xFF32C000, 65536)			= 0
 0.0086	memcntl(0xFF280000, 117696, MC_ADVISE, MADV_WILLNEED, 0, 0) =
0
 0.0087	close(3)					= 0
 0.0089	stat("/opt/app/xxxxxx/ncr/tbuild/12.00.00.00/lib/libdl.so.1",
0xFFBFF388) Err#2 ENOENT
 0.0094	stat("/usr/lib/libdl.so.1", 0xFFBFF388)		= 0
 0.0096	resolvepath("/usr/lib/libdl.so.1", "/usr/lib/libdl.so.1",
1023) = 19
 0.0098	open("/usr/lib/libdl.so.1", O_RDONLY)		= 3
 0.0103	mmap(0xFF3B0000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_FIXED, 3, 0) = 0xFF3B0000
 0.0107	mmap(0x00010000, 8192, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|
MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF3A0000
 0.0111	mmap(0xFF3A0000, 2210, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3A0000
 0.0115	close(3)					= 0
 0.0117	stat("/usr/platform/FJSV,GPUZC-L/lib/libc_psr.so.1",
0xFFBFF088) = 0
 0.0122	resolvepath("/usr/platform/FJSV,GPUZC-L/lib/libc_psr.so.1", "/
usr/platform/FJSV,GPUZC-M/lib/libc_psr.so.1", 1023) = 44
 0.0128	open("/usr/platform/FJSV,GPUZC-L/lib/libc_psr.so.1", O_RDONLY)
= 3
 0.0133	mmap(0xFF3B0000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_FIXED, 3, 0) = 0xFF3B0000
 0.0136	munmap(0xFF3B2000, 24576)			= 0
 0.0138	close(3)					= 0
 0.0140	mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFF390000
 0.0148	getustack(0xFFBFF6C4)
 0.0152	getrlimit(RLIMIT_STACK, 0xFFBFF6BC)		= 0
 0.0155	getcontext(0xFFBFF4F8)
 0.0158	setustack(0xFF3439B4)
 0.0165	issetugid()					= 0
 0.0167	brk(0x00028CD0)					= 0
 0.0168	brk(0x0002ACD0)					= 0
 0.0172	stat("/xxxxx/temp/20091221.src_file.csv.new", 0x00028BA0) = 0
 0.0178	fstat(0, 0x00028C28)				Err#79 EOVERFLOW
 0.0188	fstat64(2, 0xFFBFE308)				= 0
 0.0192	write(2, " d i f f :  ", 6)			= 6
 0.0196	open("/opt/app/xxxxx/ncr/tbuild/12.00.00.00/msg/
SUNW_OST_OSLIB", O_RDONLY) Err#2 ENOENT
 0.0199	open("/usr/lib/locale/C/LC_MESSAGES/SUNW_OST_OSLIB.mo",
O_RDONLY) Err#2 ENOENT
 0.0206	write(2, " s t d i n", 5)			= 5
 0.0208	write(2, " :  ", 2)				= 2
 0.0212	write(2, " V a l u e   t o o   l a".., 37)	= 37
 0.0216	write(2, "\n", 1)				= 1
 0.0220	_exit(2)
= 0
 0.0231	brk(0x000E8FA8)					= 0
 0.0237	fstat64(3, 0xFFBFE9E8)				= 0
 0.0240	ioctl(3, TCGETA, 0xFFBFEACC)			Err#25 ENOTTY
 0.0246	read(3, "1F9D90 CDEB48113C6 M1E16".., 8192)	= 2206
 0.0248	ioctl(1, TCGETA, 0xFFBFEA04)			Err#22 EINVAL
 0.0250	fstat64(1, 0xFFBFEA78)				= 0
 0.0252	brk(0x000E8FA8)					= 0
 0.0253	brk(0x000EAFA8)					= 0
 0.0254	fstat64(1, 0xFFBFE920)				= 0
 0.0258	read(3, 0x000E61CC, 8192)			= 0
 0.0259	write(1, " C o m p a n y , S t o r".., 3748)	= 3748
 0.0261	llseek(3, 0, SEEK_CUR)				= 2206
 0.0262	_exit(0)
0
Reply maps 12/29/2009 9:08:39 PM

0.0178 fstat(0, 0x00028C28)                            Err#79
EOVERFLOW

This is probably the root cause of the issue. But then we already had
guessed it earlier; so how can this problem be resolved now ?
0
Reply maps 12/29/2009 9:10:08 PM

On Dec 29, 1:10=A0pm, maps <mapsiddi...@gmail.com> wrote:
> 0.0178 fstat(0, 0x00028C28) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0Err#79
> EOVERFLOW
>
> This is probably the root cause of the issue. But then we already had
> guessed it earlier; so how can this problem be resolved now ?

     EOVERFLOW
           The file size in bytes or the number of  blocks  allo-
           cated  to the file or the file serial number cannot be
           represented correctly in the structure pointed  to  by
           buf.


I wouldn't expect this failure on a pipe which doesn't have a size or
a serial number.  I would expect it on a "large" file, but not how
you're using it.

Given that this used to work and now doesn't in more than one case,
and that the error doesn't make sense to me, I wonder if something got
screwed up on the system.  Seems very odd to me.

--
Darren
0
Reply Darren 12/29/2009 9:59:49 PM

On 2009-12-29 21:59:49 +0000, Darren Dunham said:

> On Dec 29, 1:10 pm, maps <mapsiddi...@gmail.com> wrote:
>> 0.0178 fstat(0, 0x00028C28)                           Err#79
>> EOVERFLOW
>> 
>> This is probably the root cause of the issue. But then we already had
>> guessed it earlier; so how can this problem be resolved now ?
> 
>      EOVERFLOW
>            The file size in bytes or the number of  blocks  allo-
>            cated  to the file or the file serial number cannot be
>            represented correctly in the structure pointed  to  by
>            buf.
> 
> 
> I wouldn't expect this failure on a pipe which doesn't have a size or
> a serial number.  I would expect it on a "large" file, but not how
> you're using it.

It would appear in this truss that diff is opening the largefile 
"/xxxxx/temp/20091221.src_file.csv.new".

> Given that this used to work and now doesn't in more than one case,
> and that the error doesn't make sense to me, I wonder if something got
> screwed up on the system.  Seems very odd to me.

Me too. I'd repeat the trusses on the original pipe sequence which 
didn't involve diff (IIRC).

-- 
Chris

0
Reply Chris 12/30/2009 9:34:32 AM

> Given that this used to work and now doesn't in more than one case,
> and that the error doesn't make sense to me, I wonder if something got
> screwed up on the system. =A0Seems very odd to me.

We've been wondering the same; and no one has been able to resolve it
so far. I am wondering how do we move beyond this point ? How do we
dig deeper ? or backtrace, maybe ?

-maps.
0
Reply maps 12/30/2009 3:38:14 PM

On Dec 30, 7:38=A0am, maps <mapsiddi...@gmail.com> wrote:
> > Given that this used to work and now doesn't in more than one case,
> > and that the error doesn't make sense to me, I wonder if something got
> > screwed up on the system. =A0Seems very odd to me.
>
> We've been wondering the same; and no one has been able to resolve it
> so far. I am wondering how do we move beyond this point ? How do we
> dig deeper ? or backtrace, maybe ?
>
> -maps.

Get another system.  Try the same commands there.  If it works,
something on your current system is screwed up.  Consider
reinstalling.

If the same commands don't work on another system (and fail in the
same way), then we're all missing something.

Report back.

If you don't have other hardware, do you have anything that would run
VMware or Virtualbox?  Maybe you could spin up a Solaris virtual
machine pretty quickly to try as another data point.

--
Darren
0
Reply Darren 12/30/2009 5:01:22 PM

> Get another system. =A0Try the same commands there. =A0If it works,
> something on your current system is screwed up. =A0Consider
> reinstalling.

Works on other systems. Interestingly on one of the other system fstat
with the same parameters works. I tried this on another server having
solaris 10 and the command runs just fine.


> If you don't have other hardware, do you have anything that would run
> VMware or Virtualbox? =A0Maybe you could spin up a Solaris virtual
> machine pretty quickly to try as another data point.

Maybe but the problem is we dont understand where the problem is
originating from and unless we do so it may not be possible to
recreate the problem.

-maps.
0
Reply maps 12/30/2009 7:18:52 PM

maps wrote:
> 0.0178 fstat(0, 0x00028C28)                            Err#79
> EOVERFLOW
> 
> This is probably the root cause of the issue. But then we already had
> guessed it earlier; so how can this problem be resolved now ?

Do I assume that 0.0178 is a line number rather than a part of the fstat 
call?  Where did the value 0x00028C28 come from?  I assume it's a 
pointer to something, but what?

I really don't want to try to go back to the beginning of this thread in 
order to make sense of your post.  Try posting a "reproducer"; e.g. 
reproduce the error with fewer than, say, fifteen lines of code.
0
Reply Richard 12/30/2009 7:32:26 PM

> Do I assume that 0.0178 is a line number rather than a part of the fstat
> call? =A0Where did the value 0x00028C28 come from? =A0I assume it's a
> pointer to something, but what?

thats the timestamp from truss output; it wasnt a part of the fstat
call.

> I really don't want to try to go back to the beginning of this thread in
> order to make sense of your post. =A0Try posting a "reproducer"; e.g.
> reproduce the error with fewer than, say, fifteen lines of code.

Well this error didnt come up while executing a code. It started
appearing all of a sudden on one of our production servers whenever we
used a pipe ( | ). In the specific case I quoted above, it occurred in
the following manner :

zcat foo.txt.Z | diff foo.txt -
diff: stdin: value too large for defined data type.

-maps.
0
Reply maps 12/30/2009 7:44:54 PM

maps wrote:
>> Do I assume that 0.0178 is a line number rather than a part of the fstat
>> call?  Where did the value 0x00028C28 come from?  I assume it's a
>> pointer to something, but what?
> 
> thats the timestamp from truss output; it wasnt a part of the fstat
> call.
> 
>> I really don't want to try to go back to the beginning of this thread in
>> order to make sense of your post.  Try posting a "reproducer"; e.g.
>> reproduce the error with fewer than, say, fifteen lines of code.
> 
> Well this error didnt come up while executing a code. It started
> appearing all of a sudden on one of our production servers whenever we
> used a pipe ( | ). In the specific case I quoted above, it occurred in
> the following manner :
> 
> zcat foo.txt.Z | diff foo.txt -
> diff: stdin: value too large for defined data type.
> 
> -maps.

What has changed since it last worked?  O/S upgrades?  Patches 
installed?  Different hardware platform?

If you don't use "change control", problems like this are the reason why 
you should!  I thought change control was a PITA when my employers first 
introduced it but I've seen the advantages.  Briefly: before making any 
change to the hardware, firmware, software, or operating procedures, you 
document exactly what you are going to do and how you plan to back out 
the change if it causes problems.

0
Reply Richard 12/30/2009 9:28:18 PM

> What has changed since it last worked? =A0O/S upgrades? =A0Patches
> installed? =A0Different hardware platform?

None to my knowledge.

> If you don't use "change control", problems like this are the reason why
> you should! =A0I thought change control was a PITA when my employers firs=
t
> introduced it but I've seen the advantages. =A0Briefly: before making any
> change to the hardware, firmware, software, or operating procedures, you
> document exactly what you are going to do and how you plan to back out
> the change if it causes problems.

Oh we are quite sound in this aspect. Trust me, we have so many
processes that they do become PITA (good abbrn by the way, lol) and I
really mean it.

Coming back to the problem; I was wondering where does fstat get
invoked from ? is it present in libc.so ? One of our admins suggested
that this might be due to a 64-bit library getting replaced by a 32-
bit one. But the last modification timestamp on all of the files under
suspicion look far too old to suggest that possibility.

-maps.
0
Reply maps 12/30/2009 9:37:35 PM

maps wrote:
>> What has changed since it last worked?  O/S upgrades?  Patches
>> installed?  Different hardware platform?
> 
> None to my knowledge.
> 
>> If you don't use "change control", problems like this are the reason why
>> you should!  I thought change control was a PITA when my employers first
>> introduced it but I've seen the advantages.  Briefly: before making any
>> change to the hardware, firmware, software, or operating procedures, you
>> document exactly what you are going to do and how you plan to back out
>> the change if it causes problems.
> 
> Oh we are quite sound in this aspect. Trust me, we have so many
> processes that they do become PITA (good abbrn by the way, lol) and I
> really mean it.
> 
> Coming back to the problem; I was wondering where does fstat get

Can't help you there!  The program that was executing at the time of the 
failure is the guilty program.  If it's part of the O/S you need to talk 
to Sun about it.  If it's something you bought from a third party, you 
need to talk to the vendor.  If it's home made you need to get a "loader 
map" for the program to be able to pin down exactly what's going on.
0
Reply Richard 12/30/2009 9:50:30 PM

On 2009-12-30 19:18:52 +0000, maps said:

>> 
>> Get another system.  Try the same commands there.  If it works,
>> something on your current system is screwed up.  Consider
>> reinstalling.
> 
> Works on other systems. Interestingly on one of the other system fstat
> with the same parameters works. I tried this on another server having
> solaris 10 and the command runs just fine.

Are you using exactly the same input files on each system? As it looks 
like one file's now larger than 32-bits on the problem system, you need 
to keep all the input the same when you're testing.

-- 
Chris

0
Reply Chris 12/31/2009 7:28:49 AM

maps <mapsiddiqui@gmail.com> writes:

>> Do I assume that 0.0178 is a line number rather than a part of the fstat
>> call? =A0Where did the value 0x00028C28 come from? =A0I assume it's a
>> pointer to something, but what?

>thats the timestamp from truss output; it wasnt a part of the fstat
>call.

>> I really don't want to try to go back to the beginning of this thread in
>> order to make sense of your post. =A0Try posting a "reproducer"; e.g.
>> reproduce the error with fewer than, say, fifteen lines of code.

>Well this error didnt come up while executing a code. It started
>appearing all of a sudden on one of our production servers whenever we
>used a pipe ( | ). In the specific case I quoted above, it occurred in
>the following manner :

>zcat foo.txt.Z | diff foo.txt -
>diff: stdin: value too large for defined data type.

Since a pipe is a file with a small number of bytes, the only possible issue
is the dev number of the pipe.

Is this a 64 bit and is the system up for quite some time?

Casper
-- 
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
0
Reply Casper 12/31/2009 4:17:34 PM

> Are you using exactly the same input files on each system? As it looks
> like one file's now larger than 32-bits on the problem system, you need
> to keep all the input the same when you're testing.


Not sure I understand; can you please elaborate ?

-maps.
0
Reply maps 1/2/2010 2:28:04 AM

> Since a pipe is a file with a small number of bytes, the only possible issue
> is the dev number of the pipe.
>
> Is this a 64 bit and is the system up for quite some time?

Spot on ! But how do these factors relate to the problem at hand ?

-maps.
0
Reply maps 1/2/2010 2:29:55 AM

I see a couple of puzzling things in the truss output:

1) In several lines, this /usr/bin/diff tries to find standard OS
   files (libc.so.1, libdl.so.1, Locale files) in a strange place:

> 0.0052	stat("/opt/app/xxxxxx/ncr/tbuild/12.00.00.00/lib/libc.so.1",
>0xFFBFF388) Err#2 ENOENT
....
> 0.0089	stat("/opt/app/xxxxxx/ncr/tbuild/12.00.00.00/lib/libdl.so.1",
....
> 0.0196	open("/opt/app/xxxxx/ncr/tbuild/12.00.00.00/msg/
>SUNW_OST_OSLIB", O_RDONLY) Err#2 ENOENT
> 0.0199	open("/usr/lib/locale/C/LC_MESSAGES/SUNW_OST_OSLIB.mo",
>O_RDONLY) Err#2 ENOENT


2) /usr/bin/diff performs a different fstat() on stdin and stderr
   (and in the middle of outputting the complaint to stderr, makes
   the locale calls noted above):

> 0.0178	fstat(0, 0x00028C28)				Err#79 EOVERFLOW
> 0.0188	fstat64(2, 0xFFBFE308)				= 0
> 0.0192	write(2, " d i f f :  ", 6)			= 6
> 0.0196	open("/opt/app/xxxxx/ncr/tbuild/12.00.00.00/msg/
>SUNW_OST_OSLIB", O_RDONLY) Err#2 ENOENT
> 0.0199	open("/usr/lib/locale/C/LC_MESSAGES/SUNW_OST_OSLIB.mo",
>O_RDONLY) Err#2 ENOENT
> 0.0206	write(2, " s t d i n", 5)			= 5
> 0.0208	write(2, " :  ", 2)				= 2
> 0.0212	write(2, " V a l u e   t o o   l a".., 37)	= 37
> 0.0216	write(2, "\n", 1)				= 1
> 0.0220	_exit(2)


Why would /usr/bin/diff invoke fstat() on stdin, but fstat64() on stderr?

But the real head scratcher is why the Solaris /usr/bin/diff would be
searching for libs under "/opt/app/xxxxx/ncr/tbuild/12.00.00.00", which
looks like an application's build directory.

Has the environment variable LD_LIBRARY_PATH been set to something on
this server?  If so, what happens when it is removed?

  -Greg
-- 
Do NOT reply via e-mail.
Reply in the newsgroup.
0
Reply gerg 1/4/2010 8:21:09 PM

> Why would /usr/bin/diff invoke fstat() on stdin, but fstat64() on stderr?

This is the real head scratcher !!

> But the real head scratcher is why the Solaris /usr/bin/diff would be
> searching for libs under "/opt/app/xxxxx/ncr/tbuild/12.00.00.00", which
> looks like an application's build directory.
>
> Has the environment variable LD_LIBRARY_PATH been set to something on
> this server? =A0If so, what happens when it is removed?

Indeed it contains the lib path u mentioned above.

-maps.
0
Reply maps 1/4/2010 9:52:23 PM

On Jan 4, 1:52=A0pm, maps <mapsiddi...@gmail.com> wrote:
> > Has the environment variable LD_LIBRARY_PATH been set to something on
> > this server? =A0If so, what happens when it is removed?
>
> Indeed it contains the lib path u mentioned above.

He is suggesting that you unset LD_LIBRARY_PATH and run the command
again.  Unexpected things can happen when you populate
LD_LIBRARY_PATH.

Does the behavior change?

--
Darren
0
Reply Darren 1/4/2010 11:26:45 PM

On Dec 31 2009, 5:17=A0pm, Casper H.S. Dik <Casper....@Sun.COM> wrote:
> maps <mapsiddi...@gmail.com> writes:
> >> Do I assume that 0.0178 is a line number rather than a part of the fst=
at
> >> call? =3DA0Where did the value 0x00028C28 come from? =3DA0I assume it'=
s a
> >> pointer to something, but what?
> >thats the timestamp from truss output; it wasnt a part of the fstat
> >call.
> >> I really don't want to try to go back to the beginning of this thread =
in
> >> order to make sense of your post. =3DA0Try posting a "reproducer"; e.g=
..
> >> reproduce the error with fewer than, say, fifteen lines of code.
> >Well this error didnt come up while executing a code. It started
> >appearing all of a sudden on one of our production servers whenever we
> >used a pipe ( | ). In the specific case I quoted above, it occurred in
> >the following manner :
> >zcat foo.txt.Z | diff foo.txt -
> >diff: stdin: value too large for defined data type.
>
> Since a pipe is a file with a small number of bytes, the only possible is=
sue
> is the dev number of the pipe.
>
> Is this a 64 bit and is the system up for quite some time?
>
> Casper
> --
> Expressed in this posting are my opinions. =A0They are in no way related
> to opinions held by my employer, Sun Microsystems.
> Statements on Sun products included here are not gospel and may
> be fiction rather than truth.

Hi Casper

We have had the same problem with using PHP-mail(). The server was
running over 1000 days without a reboot. The PHP-mail() has stopping
suddenly at 2. Jan 2010. After a reboot yesterday it works fine again.

Regards Ruedi :-)
0
Reply bi 1/6/2010 7:59:43 AM

> He is suggesting that you unset LD_LIBRARY_PATH and run the command
> again. =A0Unexpected things can happen when you populate
> LD_LIBRARY_PATH.
>
> Does the behavior change?

No it does not.

-maps.
0
Reply maps 1/7/2010 3:26:57 PM

gerg@panix.com (Greg Andrews) writes:

> But the real head scratcher is why the Solaris /usr/bin/diff would be
> searching for libs under "/opt/app/xxxxx/ncr/tbuild/12.00.00.00", which
> looks like an application's build directory.
>
> Has the environment variable LD_LIBRARY_PATH been set to something on
> this server?  If so, what happens when it is removed?

Is diff being called from within the app, where a
different environment has been set (e.g. through a shell
script wrapper?).

Also, what does "crle" say?

hth
t
0
Reply tony_curtis32 1/7/2010 9:14:15 PM

43 Replies
4971 Views

(page loaded in 0.534 seconds)

Similiar Articles:


















7/20/2012 2:29:51 PM


Reply: