I'm trying to figure out how to read large amounts (over 4 GB) from
disk on Linux using C. I read all the documentation regarding reading
amounts over 4GB (64-bit), but no matter what I do, it always reads in
2GB.
I'm running on an AMD64 machine running ubuntu.
Here's some sample code that demonstrates the problem:
#define _LARGE_FILE_API
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
int filedes;
off64_t off = 5;
uint64_t amount = 5000000000LL;
char* buf = (char*) malloc(amount);
filedes = open64("doc_repos.heap", O_RDONLY);
perror("open64");
uint64_t ret = pread64(filedes, buf, amount, off);
printf("We read %lld bytes.\n", ret);
}
Is there something I'm missing here? Maybe there is a special compile
flag for gcc I am missing, although I see nothing like that in the gcc
man pages.
|
|
0
|
|
|
|
Reply
|
chsalvia (83)
|
5/3/2007 12:23:15 AM |
|
On 3 May, 01:23, chsal...@gmail.com wrote:
> I'm trying to figure out how to read large amounts (over 4 GB) from
> disk on Linux using C. I read all the documentation regarding reading
> amounts over 4GB (64-bit), but no matter what I do, it always reads in
> 2GB.
http://en.wikipedia.org/wiki/Large_file_support
http://www.suse.de/~aj/linux_lfs.html
|
|
0
|
|
|
|
Reply
|
maxim.yegorushkin (752)
|
5/3/2007 7:50:22 AM
|
|
On 2007-05-03, chsalvia@gmail.com <chsalvia@gmail.com> wrote:
> I'm trying to figure out how to read large amounts (over 4 GB) from
> disk on Linux using C. I read all the documentation regarding reading
> amounts over 4GB (64-bit), but no matter what I do, it always reads in
> 2GB.
>
> I'm running on an AMD64 machine running ubuntu.
Is your Ubuntu 64-bit or 32-bit?
>
> Here's some sample code that demonstrates the problem:
Your example does not compile on my 64-bit linux.
If your Linux is 64-bit, then off_t is already 64-bit, you don't need
off64_t. You also don't need special functions because normal ones
already use appropriate size for the variables.
>
> #define _LARGE_FILE_API
Sorry, why to you define this constant? I couldn't find it in any header
on my system?
> uint64_t amount = 5000000000LL;
> char* buf = (char*) malloc(amount);
Why don't you check the return from malloc? Why do you cast the return
value? Do you have 5G of RAM and swap space to backup your request? When
you read into memory, that memory must be available.
Also, the pread64 takes the amout of data to read as size_t. On 32-bit
OS this is only 32-bit.
Usually I never read that much of data into memory. It does not make any
sense. I use mmap to attach file to the memory region and then simply
access that memory.
--
Minds, like parachutes, function best when open
|
|
0
|
|
|
|
Reply
|
avorop (41)
|
5/3/2007 8:01:52 AM
|
|
On May 3, 3:50 am, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:
> On 3 May, 01:23, chsal...@gmail.com wrote:
>
> > I'm trying to figure out how to read large amounts (over 4 GB) from
> > disk on Linux using C. I read all the documentation regarding reading
> > amounts over 4GB (64-bit), but no matter what I do, it always reads in
> > 2GB.
>
> http://en.wikipedia.org/wiki/Large_file_supporthttp://www.suse.de/~aj/linux_lfs.html
I'm aware of the historical limitations of file sizes to 2GB due to
the address space limit provided by a signed 32 bit integer. But, the
problem here is not file offsets. The problem here is read length.
I'm able to read at offsets greater than 2GB, but I can't actually
read in more than 2GB of data. I'm beginning to think that is just a
limitation of the OS, and that the only way to read in more than 2GB
is to call pread in a loop.
|
|
0
|
|
|
|
Reply
|
chsalvia (83)
|
5/3/2007 6:45:34 PM
|
|
On May 3, 4:01 am, Andrei Voropaev <avo...@mail.ru> wrote:
> On 2007-05-03, chsal...@gmail.com <chsal...@gmail.com> wrote:
>
> > I'm trying to figure out how to read large amounts (over 4 GB) from
> > disk on Linux using C. I read all the documentation regarding reading
> > amounts over 4GB (64-bit), but no matter what I do, it always reads in
> > 2GB.
>
> > I'm running on an AMD64 machine running ubuntu.
>
> Is your Ubuntu 64-bit or 32-bit?
Yes I am running the 64-bit version of ubuntu.
> > Here's some sample code that demonstrates the problem:
>
> Your example does not compile on my 64-bit linux.
That's strange. It compiles for me under gcc.
> If your Linux is 64-bit, then off_t is already 64-bit, you don't need
> off64_t. You also don't need special functions because normal ones
> already use appropriate size for the variables.
Right, off_t is already 8 bytes on a 64 bit machine. I was trying
various things to get this to work.
> > #define _LARGE_FILE_API
>
> Sorry, why to you define this constant? I couldn't find it in any header
> on my system?
It seems to be useless. I put it in there because I thought it might
help, based on something I read on the Internet.
> > uint64_t amount = 5000000000LL;
> > char* buf = (char*) malloc(amount);
>
> Why don't you check the return from malloc? Why do you cast the return
> value? Do you have 5G of RAM and swap space to backup your request? When
> you read into memory, that memory must be available.
In this case, the call to malloc() is successful. I checked the
return value. The machine I'm using has 8GB RAM, with approx. 7 GB
free. I cast the return value because I originally tried this in C++,
and forgot to remove that when I changed this to C.
> Also, the pread64 takes the amout of data to read as size_t. On 32-bit
> OS this is only 32-bit.
>
> Usually I never read that much of data into memory. It does not make any
> sense. I use mmap to attach file to the memory region and then simply
> access that memory.
True, I've never had to read that much memory before. But in this
project, I need to process huge amounts of data. Perhaps mmap()
would be a better solution, as you suggest. I'm just surprised that
even 64-bit OSes don't let you actually read in more than 2GB in one
call to pread().
|
|
0
|
|
|
|
Reply
|
chsalvia (83)
|
5/3/2007 6:50:36 PM
|
|
On 3 May 2007 11:45:34 -0700, chsalvia@gmail.com <chsalvia@gmail.com> wrote:
> problem here is not file offsets. The problem here is read length.
> I'm able to read at offsets greater than 2GB, but I can't actually
> read in more than 2GB of data. I'm beginning to think that is just a
> limitation of the OS, and that the only way to read in more than 2GB
> is to call pread in a loop.
Probably, can't be bothered to check. The essential teaching here is
that _always_ call your (p)read(2)s in a loop. They don't guarantee
to fill your buffer, they just guarantee not to overflow it.
--
Mikko Rauhala - mjr@iki.fi - <URL: http://www.iki.fi/mjr/ >
Transhumanist - WTA member - <URL: http://transhumanism.org/ >
Singularitarian - SIAI supporter - <URL: http://singinst.org/ >
|
|
0
|
|
|
|
Reply
|
mjr (19)
|
5/3/2007 11:44:12 PM
|
|
chsalvia wrote:
> I'm aware of the historical limitations of file sizes to 2GB due to
> the address space limit provided by a signed 32 bit integer. But, the
> problem here is not file offsets. The problem here is read length.
> I'm able to read at offsets greater than 2GB, but I can't actually
> read in more than 2GB of data. I'm beginning to think that is just a
> limitation of the OS, and that the only way to read in more than 2GB
> is to call pread in a loop.
The return value of pread() has type ssize_t. On your 32-bit system
this is almost certainly a 32-bit signed integer. The return value
of pread() is either the number of bytes it read or -1 on error. The
largest value that can be returned on your system is 2^31-1.
If pread() could read more than 2^31-1 bytes (2GB minus 1 byte), how do
you expect your program to be able to determine that it did?
--
Geoff Clare <netnews@gclare.org.uk>
|
|
0
|
|
|
|
Reply
|
geoff31 (365)
|
5/4/2007 12:56:09 PM
|
|
Geoff Clare wrote:
> chsalvia wrote:
>
>> I'm aware of the historical limitations of file sizes to 2GB due to
>> the address space limit provided by a signed 32 bit integer. But, the
>> problem here is not file offsets. The problem here is read length.
>> I'm able to read at offsets greater than 2GB, but I can't actually
>> read in more than 2GB of data. I'm beginning to think that is just a
>> limitation of the OS, and that the only way to read in more than 2GB
>> is to call pread in a loop.
>
> The return value of pread() has type ssize_t. On your 32-bit system [...]
He wrote: "Yes I am running the 64-bit version of ubuntu."
|
|
0
|
|
|
|
Reply
|
devnull8 (127)
|
5/4/2007 3:31:44 PM
|
|
Spoon wrote:
> Geoff Clare wrote:
>
>> chsalvia wrote:
>>
>>> I'm aware of the historical limitations of file sizes to 2GB due to
>>> the address space limit provided by a signed 32 bit integer. But, the
>>> problem here is not file offsets. The problem here is read length.
>>> I'm able to read at offsets greater than 2GB, but I can't actually
>>> read in more than 2GB of data. I'm beginning to think that is just a
>>> limitation of the OS, and that the only way to read in more than 2GB
>>> is to call pread in a loop.
>>
>> The return value of pread() has type ssize_t. On your 32-bit system [...]
>
> He wrote: "Yes I am running the 64-bit version of ubuntu."
Oops, so he did. I saw his later statement, "Also, the pread64 takes
the amount of data to read as size_t. On 32-bit OS this is only 32-bit"
and took that to imply he was running 32-bit.
So, what is ssize_t defined as when running the 64-bit version of ubuntu?
Most likely it's still a 32-bit signed integer, and the rest of my post
applies just the same.
--
Geoff Clare <netnews@gclare.org.uk>
|
|
0
|
|
|
|
Reply
|
geoff31 (365)
|
5/8/2007 12:44:14 PM
|
|
No, ssize_t on a 64-bit system is a 64-bit signed integer.
|
|
0
|
|
|
|
Reply
|
chsalvia (83)
|
5/9/2007 5:32:20 AM
|
|
chsalvia wrote:
[context restored]
>> So, what is ssize_t defined as when running the 64-bit version of ubuntu?
>>
>> Most likely it's still a 32-bit signed integer, and the rest of my post
>> applies just the same.
> No, ssize_t on a 64-bit system is a 64-bit signed integer.
Can anyone confirm this? The limited testing I have been able to do
does not match chsalvia's claim.
I don't have an x86-64 Linux system to test on, but on a 32-bit x86
Linux system if I use "gcc -m64" (which compiles for x86-64), nm reports
that a ssize_t object has a size of 4 bytes, the same as for 32-bit:
$ cat foo.c
#include <sys/types.h>
ssize_t foo;
$ gcc -c -m64 foo.c
$ nm -P foo.o
foo C 0000000000000004 0000000000000004
$ file foo.o
foo.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV), not stripped
$ gcc -c foo.c
$ nm -P foo.o
foo C 00000004 00000004
$ file foo.o
foo.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
--
Geoff Clare <netnews@gclare.org.uk>
|
|
0
|
|
|
|
Reply
|
geoff31 (365)
|
5/11/2007 1:51:55 PM
|
|
I wrote:
> chsalvia wrote:
>
> [context restored]
>>> So, what is ssize_t defined as when running the 64-bit version of ubuntu?
>>>
>>> Most likely it's still a 32-bit signed integer, and the rest of my post
>>> applies just the same.
>
>> No, ssize_t on a 64-bit system is a 64-bit signed integer.
>
> Can anyone confirm this?
Scratch that. I looked at the LSB spec for the AMD64 ABI (should have
thought of doing that before my previous post!) and it specifies ssize_t
as a typedef for int64_t.
> I don't have an x86-64 Linux system to test on, but on a 32-bit x86
> Linux system if I use "gcc -m64" (which compiles for x86-64), nm reports
> that a ssize_t object has a size of 4 bytes, the same as for 32-bit:
Presumably the gcc -m32 and -m64 options only work properly on x86-64
systems. Even though gcc happily produces x86-64 object files with -m64
on 32-bit x86 systems, it appears that the files are invalid because the
headers don't support choosing between 64-bit and 32-bit; they just
assume 32-bit.
--
Geoff Clare <netnews@gclare.org.uk>
|
|
0
|
|
|
|
Reply
|
geoff31 (365)
|
5/11/2007 2:25:37 PM
|
|
Geoff Clare <geoff@clare.See-My-Signature.invalid> writes:
>chsalvia wrote:
>[context restored]
>>> So, what is ssize_t defined as when running the 64-bit version of ubuntu?
>>>
>>> Most likely it's still a 32-bit signed integer, and the rest of my post
>>> applies just the same.
>> No, ssize_t on a 64-bit system is a 64-bit signed integer.
>Can anyone confirm this? The limited testing I have been able to do
>does not match chsalvia's claim.
It pretty much must be on a Unix system (Windows, however, seems
to differ)
>I don't have an x86-64 Linux system to test on, but on a 32-bit x86
>Linux system if I use "gcc -m64" (which compiles for x86-64), nm reports
>that a ssize_t object has a size of 4 bytes, the same as for 32-bit:
But does it use the proper include files for a 64 bit compile?
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
|
|
0
|
|
|
|
Reply
|
Casper.Dik (623)
|
5/11/2007 3:21:11 PM
|
|
Casper H.S. Dik <Casper.Dik@Sun.COM> writes:
> Geoff Clare <geoff@clare.See-My-Signature.invalid> writes:
[...]
>>I don't have an x86-64 Linux system to test on, but on a 32-bit x86
>>Linux system if I use "gcc -m64" (which compiles for x86-64), nm reports
>>that a ssize_t object has a size of 4 bytes, the same as for 32-bit:
>
> But does it use the proper include files for a 64 bit compile?
The relevant part is this:
#if __WORDSIZE == 32
# define __SQUAD_TYPE __quad_t
# define __UQUAD_TYPE __u_quad_t
# define __SWORD_TYPE int
# define __UWORD_TYPE unsigned int
# define __SLONG32_TYPE long int
# define __ULONG32_TYPE unsigned long int
# define __S64_TYPE __quad_t
# define __U64_TYPE __u_quad_t
#elif __WORDSIZE == 64
# define __SQUAD_TYPE long int
# define __UQUAD_TYPE unsigned long int
# define __SWORD_TYPE long int
# define __UWORD_TYPE unsigned long int
# define __SLONG32_TYPE int
# define __ULONG32_TYPE unsigned int
# define __S64_TYPE long int
# define __U64_TYPE unsigned long int
(/usr/include/bits/types.h)
WORDSIZE is defined in /usr/include/bits/wordsize.h and presumably
generated during a glibc-build.
|
|
0
|
|
|
|
Reply
|
rweikusat (2679)
|
5/11/2007 4:49:18 PM
|
|
On May 3, 11:45 am, chsal...@gmail.com wrote:
> I'm beginning to think that is just a
> limitation of the OS, and that the only way to read in more than 2GB
> is to call pread in a loop.
The only way to reliably read in any amount of data greater than one
byte is to call 'pread' in a loop. There are many cases in which you
will get a short read.
DS
|
|
0
|
|
|
|
Reply
|
David
|
5/11/2007 10:13:58 PM
|
|
|
14 Replies
90 Views
(page loaded in 0.41 seconds)
Similiar Articles: fopen() file size limitation - comp.unix.programmeryou can use read(), write(), lseek() or mmap() to access smaller parts of the ... Using #define _GNU_SOURCE did allow fseeko() to seek more than 2GB ahead AND get rid of ... System file page cache - read-ahead - comp.unix.solarisAnd if I try to read 22K data in single read() call, does it 'break' the call into 3 ... file size limitation - comp.unix.programmer you can use read ... seek more than 2GB ... tar > 2GB file - comp.sys.hp.hpuxThe 2Gb question is more than adequately answered before - www.deja.com is your friend. ... tar, cpio, pax, nor dump support the archival of files larger than 2GB ... backup problem - file reached 2GB - comp.unix.sco.misc... and rftp can't transfer files bigger than 2GB ... the stock tar program you can't even have tar read a mass of files that are each ok and merely add up to more than ... matGetVariable unable to read large .mat files on Win7 64-bit ...... that the only stated difference between v7.0 & v7.3 is the ability to read files larger than 2GB ... Get the fix you need from the authority on DLL files. Read more ›› why fopen64()? - comp.unix.solaris... descriptor differently in some way with open64(), enabling it to hold more than 2GB of data ... I don't know if you read the other side of this thread but the same ... No more memory available - comp.soft-sys.math.mathematica ...In the first part of the calculation I push more than I ... OSX 10.3.5 and Mathematica 5.0.1 on a G4 with 2GB ... the file system your accumulated unprocessed data and read ... Reading Large files - comp.soft-sys.matlabI'd like to read several thousand samples, process the data, then read in more. ... large" when I open any file larger than > 2^31 - 1 (around 2GB). you can use read ... Reading a data file with the command 'load' - comp.soft-sys.math ...... 1) file ('close' , fid) But for the data files that contain more than one ... > > > Thank you very much > > The read function is more restrictive, the data contained in the ... stat function limit for files over 2GB ? - comp.unix.programmer ...... size_t as >> %ju (or %jd if it can be signed, e.g. as returned from read ... Microsoft: Visual C++ - 2 GB Limit - Tek-Tips Forums I have a data file that is over 2GB and I ... [BEA][Oracle JDBC Driver]No more data available to read. - comp ...FileMaker 7 - must be damn good - comp.databases.filemaker ..... of more than is needed > today. Read my ... No more scripts ... native data format and use SQL to ... Speed-up the reading of large binary files with complex structures ...The problem is that the file is very large and Matlab spends more than 10 minutes to read a ... I have done that few such ways with complicated file up to 2Gb and it ... Read xlsx file to matlab - comp.soft-sys.matlabBut I can not do that because I have other files which has more rows than what xls can ... export data to notepad file - comp.soft-sys.matlab Intercepting data read in by ... How to import a .xlsx file over 256 columns? - comp.soft-sys.sas ...How to read more than 256 columns from an excel file (2007 format ... I'm trying to import a excel file with more than 256 columns using OLEDB in C#. ... import data from excel 2007 with more than 65536 rows to matlab ...Read xlsx file to matlab - comp.soft-sys.matlab Hello I was trying to import a ... Data More Than Rows - hi membes i wan t to import data of about 800000... - Free Excel ... Unix & Linux: Read more than 2GB of data - programming.itags.orgprogramming.itags.org: Unix & Linux question: Read more than 2GB of data, created at:Sat, 26 Apr 2008 22:11:00 GMT with 960 bytes, last updated: Monday, July 02, 2012 ... Tiered Verizon plans no biggie for most customers, says Consumer ...Users who typically consume more than 2GB of data a month can pay $50 for a 5GB plan or $ ... resolution radar can see individual raindrops in a storm; View more Most Read 7/11/2012 6:03:04 AM
|