Just reading "man lf64" about the so-called transitional interfaces for
largefile-aware 32-bit programs. I do understand why we have stat64() et
al: because they have to operate on a "struct stat64" which has members
capable of representing 64-bit offsets. That makes sense. But it
suddenly occurred to me that I have no idea why we have such interfaces
as open64() and fopen64().
Neither of these has any offsets in its signature. E.g. open() takes a
string and a bitmask (more or less) and returns an int and the same is
true for open64(). The number of available file descriptors might change
in a true 64-bit program but not in a "transitional" 32-bit program. And
neither variant of open ever has a need to "see" the entire file anyway.
I figured that for fopen64 it might be because it returns a FILE64 * but
I don't see such a thing in stdio.h ... So what do these interfaces do
that needs to be aware of file offsets?
--
Thanks,
M.Biswas
|
|
0
|
|
|
|
Reply
|
Mohun
|
7/13/2004 9:48:05 PM |
|
In article <oMYIc.69505$a24.59741@attbi_s03>, Mohun Biswas wrote:
>suddenly occurred to me that I have no idea why we have such interfaces
>as open64() and fopen64().
>I figured that for fopen64 it might be because it returns a FILE64 * but
>I don't see such a thing in stdio.h ... So what do these interfaces do
>that needs to be aware of file offsets?
What happens after you've started reading a file and it wants to
keep track of your current position ? fseek(), ftell() sort of stuff ?
--
Elvis Notargiacomo master AT barefaced DOT cheek
http://www.notatla.org.uk/goen/
|
|
0
|
|
|
|
Reply
|
elvis
|
7/13/2004 10:16:13 PM
|
|
all mail refused wrote:
> In article <oMYIc.69505$a24.59741@attbi_s03>, Mohun Biswas wrote:
>
>
>>suddenly occurred to me that I have no idea why we have such interfaces
>>as open64() and fopen64().
>
>
>>I figured that for fopen64 it might be because it returns a FILE64 * but
>>I don't see such a thing in stdio.h ... So what do these interfaces do
>>that needs to be aware of file offsets?
>
>
> What happens after you've started reading a file and it wants to
> keep track of your current position ? fseek(), ftell() sort of stuff ?
I didn't ask about why we have fseeko64. The answer to that is obvious.
The question is why do we have open64 and fopen64.
Put it another way: when adding functionality the A plan is always to
integrate it seamlessly. Only when that doesn't work do we go to plan B,
which is to add a new interface and deprecate the old one. Why is
open/open64 on the B plan?
--
Thanks,
M.Biswas
|
|
0
|
|
|
|
Reply
|
Mohun
|
7/14/2004 12:29:07 AM
|
|
Mohun Biswas wrote:
> Put it another way: when adding functionality the A plan is always to
> integrate it seamlessly. Only when that doesn't work do we go to plan B,
> which is to add a new interface and deprecate the old one. Why is
> open/open64 on the B plan?
Responding to my own post to clarify the question even further. Let's
say I open a file with open(), e.g.:
int fd = open("foo", O_WRONLY); // error checking elided
Now, what's to stop me from pumping data into that file till there's
more than 2GB accumulated? What data structure would overflow? How would
it have worked better if I'd used open64()? They'd both return the same
fd value, right? And if the kernel is going to prepare the file
descriptor differently in some way with open64(), enabling it to hold
more than 2GB of data, why shouldn't it give the same gift to a user of
open() since doing so wouldn't break the interface or SUS in any way?
--
Thanks,
M.Biswas
|
|
0
|
|
|
|
Reply
|
Mohun
|
7/14/2004 12:38:43 AM
|
|
Mohun Biswas <m.biswas@invalid.addr> writes:
>Neither of these has any offsets in its signature. E.g. open() takes a
>string and a bitmask (more or less) and returns an int and the same is
>true for open64(). The number of available file descriptors might change
>in a true 64-bit program but not in a "transitional" 32-bit program. And
>neither variant of open ever has a need to "see" the entire file anyway.
>I figured that for fopen64 it might be because it returns a FILE64 * but
>I don't see such a thing in stdio.h ... So what do these interfaces do
>that needs to be aware of file offsets?
The point is that a file opened with an old interface cannot be extended
beyond 2GB and open of files over 2GB will fail.
The reason for this is that such files might be corrupted when
manipulated with 32 bit functions. (E.g., 0xffffffff is a valid return value
from lseek(fd, 0, SEEK_CUR) and lseek "cannot fail" so when lseek
returns -1 and EOVERFLOW applications may believe that that is the
current file position and corruption may result.
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
|
|
0
|
|
|
|
Reply
|
Casper
|
7/14/2004 8:40:53 AM
|
|
Casper H.S. Dik wrote:
> The point is that a file opened with an old interface cannot be extended
> beyond 2GB and open of files over 2GB will fail.
>
> The reason for this is that such files might be corrupted when
> manipulated with 32 bit functions. (E.g., 0xffffffff is a valid return value
> from lseek(fd, 0, SEEK_CUR) and lseek "cannot fail" so when lseek
> returns -1 and EOVERFLOW applications may believe that that is the
> current file position and corruption may result.
I don't know if you read the other side of this thread but the same
response applies. People continue to speak only of lseek but it's clear
why lseek needs an lseek64. I am asking not about lseek but about
specifically about *open*.
The difference between lseek and lseek64 is obvious from their signatures:
off_t lseek(int fildes, off_t offset, int whence)
off64_t lseek64(int fildes, off64_t offset, int whence)
int open(const char *path, int oflag, /* mode_t mode */...)
int open64(const char *path, int oflag, /* mode_t mode */...)
Lseek takes and returns an off_t, lseek64 takes and returns an off64_t.
However, the signatures of open and open64 are identical. So what do
open/open64 do differently, and why doesn't open() *always* return a
64-bit-seekable file descriptor?
Are you trying to say that the only reason for open64 is "human
engineering"? I.e. that because a 32-bit seek on a 64-bit file pointer
is so dangerous, you want to force the user to state explicitly at open
time that he/she plans to use the xxx64() interfaces, by making an
otherwise artificial distinction between open and open64?
--
Thanks,
M.Biswas
|
|
0
|
|
|
|
Reply
|
Mohun
|
7/14/2004 3:42:21 PM
|
|
On Wed, 14 Jul 2004, Mohun Biswas wrote:
> Are you trying to say that the only reason for open64 is "human
> engineering"? I.e. that because a 32-bit seek on a 64-bit file pointer
> is so dangerous, you want to force the user to state explicitly at open
> time that he/she plans to use the xxx64() interfaces, by making an
> otherwise artificial distinction between open and open64?
I'm not sure of the reason (and what you suggest might indeed
be correct), but if this is such an issue, I would avoid the
transitional *64 interfaces altogether. Either compile the
program as a 64-bit excutable, or (perhaps more useful) compile
it to be large file aware using the large file compilation
environment (i.e., specify -D_FILE_OFFSET_BITS=64 on the
command line).
--
Rich Teer, SCNA, SCSA
President,
Rite Online Inc.
Voice: +1 (250) 979-1638
URL: http://www.rite-online.net
|
|
0
|
|
|
|
Reply
|
Rich
|
7/14/2004 4:12:21 PM
|
|
Mohun Biswas <m.biswas@invalid.addr> writes:
>Are you trying to say that the only reason for open64 is "human
>engineering"? I.e. that because a 32-bit seek on a 64-bit file pointer
>is so dangerous, you want to force the user to state explicitly at open
>time that he/she plans to use the xxx64() interfaces, by making an
>otherwise artificial distinction between open and open64?
The issue is that we don't want to mix 64 bit file opens with 32 bit
file ops; Unix programmers notoriously badly check error returns, especially
for "cannot fail" system calls like lseek().
BTW, it is specifically *not* the intention that you directly use
fopen64(), lstat64() etc.
You should use the special compile time options which map off_t to 64 bit
and which map all the functions to their "extended file" counter
parts.
Casper
|
|
0
|
|
|
|
Reply
|
Casper
|
7/14/2004 7:20:19 PM
|
|
Mohun Biswas <m.biswas@invalid.addr> wrote:
> Casper H.S. Dik wrote:
>> The point is that a file opened with an old interface cannot be extended
>> beyond 2GB and open of files over 2GB will fail.
>>
>> The reason for this is that such files might be corrupted when
>> manipulated with 32 bit functions. (E.g., 0xffffffff is a valid return value
>> from lseek(fd, 0, SEEK_CUR) and lseek "cannot fail" so when lseek
>> returns -1 and EOVERFLOW applications may believe that that is the
>> current file position and corruption may result.
> I don't know if you read the other side of this thread but the same
> response applies. People continue to speak only of lseek but it's clear
> why lseek needs an lseek64. I am asking not about lseek but about
> specifically about *open*.
Yes. An open64 will succeed on a largefile, an open will fail. If this
were not true, you could open (32 bit) a 2+ GB file, followed by 32 bit
manipulation (like the above 32 bit lseek).
Having open fail in such situations prevents the misinterpretation of
the return codes from the 32 bit operations.
--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
|
|
0
|
|
|
|
Reply
|
Darren
|
7/14/2004 7:47:26 PM
|
|
On 14 Jul 2004 19:20:19 GMT Casper H.S. Dik <Casper.Dik@Sun.COM> wrote:
> Mohun Biswas <m.biswas@invalid.addr> writes:
>
>>Are you trying to say that the only reason for open64 is "human
>>engineering"? I.e. that because a 32-bit seek on a 64-bit file pointer
>>is so dangerous, you want to force the user to state explicitly at open
>>time that he/she plans to use the xxx64() interfaces, by making an
>>otherwise artificial distinction between open and open64?
>
> The issue is that we don't want to mix 64 bit file opens with 32 bit
> file ops; Unix programmers notoriously badly check error returns, especially
> for "cannot fail" system calls like lseek().
>
> BTW, it is specifically *not* the intention that you directly use
> fopen64(), lstat64() etc.
>
> You should use the special compile time options which map off_t to 64 bit
> and which map all the functions to their "extended file" counter
> parts.
And that's why open64() exists.
Not "human engineering", but because you *cannot* allow the sequence
off_t off;
int fd;
fd = open(...);
...
off = lseek(fd, 0, SEEK_END);
to even *occur* if off_t is 32 bits and the file is >=2GB. By not occur,
I mean you can't even return an error, this sequence cannot be allowed.
exercise<-reader
/fc
|
|
0
|
|
|
|
Reply
|
Frank
|
7/15/2004 7:27:45 AM
|
|
In article <Pine.SOL.4.58.0407140904091.16715@zaphod.rite-online.net>,
Rich Teer <rich.teer@rite-group.com> wrote:
>I'm not sure of the reason (and what you suggest might indeed
>be correct), but if this is such an issue, I would avoid the
>transitional *64 interfaces altogether. Either compile the
>program as a 64-bit excutable, or (perhaps more useful) compile
>it to be large file aware using the large file compilation
>environment (i.e., specify -D_FILE_OFFSET_BITS=64 on the
>command line).
This is a bad advise :-(
If a program does use "int" typed vars to store file offsets internaly,
compiling in 64 bit mode will result in a program that may corrupt the
content of large files.
If a program has not been converted to use off_t whereever it may be needed
(and this cannot be told you by lint - remember of a large file aware editor
like "ved" where not primary directly file offset related vars need to be
bigger than "long") and you compile using -D_FILE_OFFSET_BITS=64, the resulting
binary may corrupt files.
If a program has not been converted to use fseeko()/ftello() instead of
fseek()/ftell(), and you compile using -D_FILE_OFFSET_BITS=64, the resulting
binary may corrupt files.
--
EMail:joerg@schily.isdn.cs.tu-berlin.de (home) J�rg Schilling D-13353 Berlin
js@cs.tu-berlin.de (uni) If you don't have iso-8859-1
schilling@fokus.fraunhofer.de (work) chars I am J"org Schilling
URL: http://www.fokus.fraunhofer.de/usr/schilling ftp://ftp.berlios.de/pub/schily
|
|
0
|
|
|
|
Reply
|
js
|
7/15/2004 8:44:47 AM
|
|
js@cs.tu-berlin.de (Joerg Schilling) writes:
>In article <Pine.SOL.4.58.0407140904091.16715@zaphod.rite-online.net>,
>Rich Teer <rich.teer@rite-group.com> wrote:
>>I'm not sure of the reason (and what you suggest might indeed
>>be correct), but if this is such an issue, I would avoid the
>>transitional *64 interfaces altogether. Either compile the
>>program as a 64-bit excutable, or (perhaps more useful) compile
>>it to be large file aware using the large file compilation
>>environment (i.e., specify -D_FILE_OFFSET_BITS=64 on the
>>command line).
>This is a bad advise :-(
>If a program does use "int" typed vars to store file offsets internaly,
>compiling in 64 bit mode will result in a program that may corrupt the
>content of large files.
Obviously, you can't just compile it and expect it to work.
You will need to analyze your code and test your program with large
files.
Progams should not use the transitional interfaces directly; that is
not their intended use (the direct use interfaces are primarily for
library developers)
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
|
|
0
|
|
|
|
Reply
|
Casper
|
7/15/2004 11:18:39 AM
|
|
Frank Cusack wrote:
> And that's why open64() exists.
>
> Not "human engineering", but because you *cannot* allow the sequence
>
> off_t off;
> int fd;
>
> fd = open(...);
> ...
> off = lseek(fd, 0, SEEK_END);
>
> to even *occur* if off_t is 32 bits and the file is >=2GB. By not occur,
> I mean you can't even return an error, this sequence cannot be allowed.
>
> exercise<-reader
Understood (I think), and sorry to be so thick but your example raises a
question about write behavior. What if I open() a "smallfile" in append
mode, write enough bytes to make it a "largefile", then try to lseek?
--
Thanks,
M.Biswas
|
|
0
|
|
|
|
Reply
|
Mohun
|
7/15/2004 1:30:06 PM
|
|
Mohun Biswas <m.biswas@invalid.addr> wrote:
> Understood (I think), and sorry to be so thick but your example raises a
> question about write behavior. What if I open() a "smallfile" in append
> mode, write enough bytes to make it a "largefile", then try to lseek?
Because it was a 32bit open, the writes that drive the file past 2G will
fail. It will never be a "largefile", so the lseek is still valid.
This should be the same behavior as on systems that do not support
largefiles (which is what a program written for that environment should
expect).
--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
|
|
0
|
|
|
|
Reply
|
Darren
|
7/15/2004 2:50:08 PM
|
|
Mohun Biswas <m.biswas@invalid.addr> writes:
>Frank Cusack wrote:
>> And that's why open64() exists.
>>
>> Not "human engineering", but because you *cannot* allow the sequence
>>
>> off_t off;
>> int fd;
>>
>> fd = open(...);
>> ...
>> off = lseek(fd, 0, SEEK_END);
>>
>> to even *occur* if off_t is 32 bits and the file is >=2GB. By not occur,
>> I mean you can't even return an error, this sequence cannot be allowed.
>>
>> exercise<-reader
>Understood (I think), and sorry to be so thick but your example raises a
>question about write behavior. What if I open() a "smallfile" in append
>mode, write enough bytes to make it a "largefile", then try to lseek?
The write will fail at the 2GB mark.
Casper
|
|
0
|
|
|
|
Reply
|
Casper
|
7/15/2004 4:42:47 PM
|
|
On Thu, 15 Jul 2004, Joerg Schilling wrote:
> This is a bad advise :-(
>
> If a program does use "int" typed vars to store file offsets internaly,
Stop right there! Correct programs - or at least, 64-bit clean
programs - should not be using fundamental types to store the
file offset in the first place!
> compiling in 64 bit mode will result in a program that may corrupt the
> content of large files.
Agreed - which is why that sort of practice is to be discouraged
in the first place.
> If a program has not been converted to use off_t whereever it may be needed
.... then it is not 64-bit clean.
--
Rich Teer, SCNA, SCSA
President,
Rite Online Inc.
Voice: +1 (250) 979-1638
URL: http://www.rite-online.net
|
|
0
|
|
|
|
Reply
|
Rich
|
7/15/2004 5:39:42 PM
|
|
In article <Pine.SOL.4.58.0407151034540.16715@zaphod.rite-online.net>,
Rich Teer <rich.teer@rite-group.com> wrote:
Rich, the assumptions you give us now have not been in your first posting.
>> If a program does use "int" typed vars to store file offsets internaly,
>
>Stop right there! Correct programs - or at least, 64-bit clean
>programs - should not be using fundamental types to store the
>file offset in the first place!
This is correct, but how do you find this out?
If you simply compile it, you will most likely succeed although the
program is not 64 bit clean.
You did write that a simple 64 bit compilation will make a program able to
use large files and my reply has been that this is not always the case.
People who like to have large file aware program should definitely check
the related white papers and thoroughly check their code.....
http://www.sas.com/standards/large.file/x_open.20Mar96.html
--
EMail:joerg@schily.isdn.cs.tu-berlin.de (home) J�rg Schilling D-13353 Berlin
js@cs.tu-berlin.de (uni) If you don't have iso-8859-1
schilling@fokus.fraunhofer.de (work) chars I am J"org Schilling
URL: http://www.fokus.fraunhofer.de/usr/schilling ftp://ftp.berlios.de/pub/schily
|
|
0
|
|
|
|
Reply
|
js
|
7/16/2004 9:23:10 AM
|
|
Joerg Schilling wrote:
> People who like to have large file aware program should definitely check
> the related white papers and thoroughly check their code.....
>
> http://www.sas.com/standards/large.file/x_open.20Mar96.html
And thank you for this link which provides the definitive answer to my
original question:
> To protect existing binaries from arbitrarily large files, a new value
> (offset maximum) will be part of the open file description. An offset
> maximum is the largest offset that can be used as a file offset.
> Operations attempting to go beyond the offset maximum will return an
> error. The offset maximum is normally established as the size of the
> off_t "extended signed integral type" used by the program creating the
> file description.
>
> The open() function and other interfaces establish the offset maximum
> for a file description, returning an error if the file size is larger
> than the offset maximum at the time of the call. Returning errors when
> the offset maximum is (or is likely to be) exceeded protects existing
> binaries effectively.
--
Thanks,
M.Biswas
|
|
0
|
|
|
|
Reply
|
Mohun
|
7/16/2004 1:46:50 PM
|
|
On Fri, 16 Jul 2004, Joerg Schilling wrote:
> In article <Pine.SOL.4.58.0407151034540.16715@zaphod.rite-online.net>,
> Rich Teer <rich.teer@rite-group.com> wrote:
>
> Rich, the assumptions you give us now have not been in your first posting.
True. But I think the assumptions are not that unreasonable,
especially for new code.
> >Stop right there! Correct programs - or at least, 64-bit clean
> >programs - should not be using fundamental types to store the
> >file offset in the first place!
>
> This is correct, but how do you find this out?
To me its intuitive: if one was supposed to use an int for the
file offset, then the functions that manipulate it would have
int in the function prototype.
Also, I read about it a couple of years ago while researching
my book, Solaris Systems Programming. Coming to all good vbook
sellers next month, it has a complete section on this sort of
thing.
> People who like to have large file aware program should definitely check
> the related white papers and thoroughly check their code.....
>
> http://www.sas.com/standards/large.file/x_open.20Mar96.html
Good advice. They should also buy a copy of my book! :-)
--
Rich Teer, SCNA, SCSA
President,
Rite Online Inc.
Voice: +1 (250) 979-1638
URL: http://www.rite-online.net
|
|
0
|
|
|
|
Reply
|
Rich
|
7/16/2004 4:43:01 PM
|
|
|
18 Replies
495 Views
(page loaded in 0.172 seconds)
|