Read more than 2GB of data

  • Follow


I'm trying to figure out how to read large amounts (over 4 GB) from
disk on Linux using C.  I read all the documentation regarding reading
amounts over 4GB (64-bit), but no matter what I do, it always reads in
2GB.

I'm running on an AMD64 machine running ubuntu.

Here's some sample code that demonstrates the problem:

#define _LARGE_FILE_API
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
        int filedes;
        off64_t off = 5;

        uint64_t amount = 5000000000LL;
        char* buf = (char*) malloc(amount);

        filedes = open64("doc_repos.heap", O_RDONLY);
                perror("open64");

        uint64_t ret = pread64(filedes, buf, amount, off);
        printf("We read %lld bytes.\n", ret);
}

Is there something I'm missing here?  Maybe there is a special compile
flag for gcc I am missing, although I see nothing like that in the gcc
man pages.

0
Reply chsalvia (83) 5/3/2007 12:23:15 AM

On 3 May, 01:23, chsal...@gmail.com wrote:
> I'm trying to figure out how to read large amounts (over 4 GB) from
> disk on Linux using C.  I read all the documentation regarding reading
> amounts over 4GB (64-bit), but no matter what I do, it always reads in
> 2GB.

http://en.wikipedia.org/wiki/Large_file_support
http://www.suse.de/~aj/linux_lfs.html

0
Reply maxim.yegorushkin (752) 5/3/2007 7:50:22 AM


On 2007-05-03, chsalvia@gmail.com <chsalvia@gmail.com> wrote:
> I'm trying to figure out how to read large amounts (over 4 GB) from
> disk on Linux using C.  I read all the documentation regarding reading
> amounts over 4GB (64-bit), but no matter what I do, it always reads in
> 2GB.
>
> I'm running on an AMD64 machine running ubuntu.

Is your Ubuntu 64-bit or 32-bit?

>
> Here's some sample code that demonstrates the problem:

Your example does not compile on my 64-bit linux. 

If your Linux is 64-bit, then off_t is already 64-bit, you don't need
off64_t. You also don't need special functions because normal ones
already use appropriate size for the variables.

>
> #define _LARGE_FILE_API

Sorry, why to you define this constant? I couldn't find it in any header
on my system?

>         uint64_t amount = 5000000000LL;
>         char* buf = (char*) malloc(amount);

Why don't you check the return from malloc? Why do you cast the return
value? Do you have 5G of RAM and swap space to backup your request? When
you read into memory, that memory must be available.

Also, the pread64 takes the amout of data to read as size_t. On 32-bit
OS this is only 32-bit.

Usually I never read that much of data into memory. It does not make any
sense. I use mmap to attach file to the memory region and then simply
access that memory.


-- 
Minds, like parachutes, function best when open
0
Reply avorop (41) 5/3/2007 8:01:52 AM

On May 3, 3:50 am, Maxim Yegorushkin <maxim.yegorush...@gmail.com>
wrote:
> On 3 May, 01:23, chsal...@gmail.com wrote:
>
> > I'm trying to figure out how to read large amounts (over 4 GB) from
> > disk on Linux using C.  I read all the documentation regarding reading
> > amounts over 4GB (64-bit), but no matter what I do, it always reads in
> > 2GB.
>
> http://en.wikipedia.org/wiki/Large_file_supporthttp://www.suse.de/~aj/linux_lfs.html

I'm aware of the historical limitations of file sizes to 2GB due to
the address space limit provided by a signed 32 bit integer.  But, the
problem here is not file offsets.  The problem here is read length.
I'm able to read at offsets greater than 2GB, but I can't actually
read in more than 2GB of data.  I'm beginning to think that is just a
limitation of the OS, and that the only way to read in more than 2GB
is to call pread in a loop.

0
Reply chsalvia (83) 5/3/2007 6:45:34 PM

On May 3, 4:01 am, Andrei Voropaev <avo...@mail.ru> wrote:
> On 2007-05-03, chsal...@gmail.com <chsal...@gmail.com> wrote:
>
> > I'm trying to figure out how to read large amounts (over 4 GB) from
> > disk on Linux using C.  I read all the documentation regarding reading
> > amounts over 4GB (64-bit), but no matter what I do, it always reads in
> > 2GB.
>
> > I'm running on an AMD64 machine running ubuntu.
>
> Is your Ubuntu 64-bit or 32-bit?

Yes I am running the 64-bit version of ubuntu.

> > Here's some sample code that demonstrates the problem:
>
> Your example does not compile on my 64-bit linux.

That's strange.  It compiles for me under gcc.

> If your Linux is 64-bit, then off_t is already 64-bit, you don't need
> off64_t. You also don't need special functions because normal ones
> already use appropriate size for the variables.

Right, off_t is already 8 bytes on a 64 bit machine.  I was trying
various things to get this to work.

> > #define _LARGE_FILE_API
>
> Sorry, why to you define this constant? I couldn't find it in any header
> on my system?

It seems to be useless.  I put it in there because I thought it might
help, based on something I read on the Internet.

> >         uint64_t amount = 5000000000LL;
> >         char* buf = (char*) malloc(amount);
>
> Why don't you check the return from malloc? Why do you cast the return
> value? Do you have 5G of RAM and swap space to backup your request? When
> you read into memory, that memory must be available.

In this case, the call to malloc() is successful.  I checked the
return value.  The machine I'm using has 8GB RAM, with approx. 7 GB
free.  I cast the return value because I originally tried this in C++,
and forgot to remove that when I changed this to C.

> Also, the pread64 takes the amout of data to read as size_t. On 32-bit
> OS this is only 32-bit.
>
> Usually I never read that much of data into memory. It does not make any
> sense. I use mmap to attach file to the memory region and then simply
> access that memory.

True, I've never had to read that much memory before.  But in this
project, I need to process  huge amounts of data.  Perhaps mmap()
would be a better solution, as you suggest.  I'm just surprised that
even 64-bit OSes don't let you actually read in more than 2GB in one
call to pread().


0
Reply chsalvia (83) 5/3/2007 6:50:36 PM

On 3 May 2007 11:45:34 -0700, chsalvia@gmail.com <chsalvia@gmail.com> wrote:
> problem here is not file offsets.  The problem here is read length.
> I'm able to read at offsets greater than 2GB, but I can't actually
> read in more than 2GB of data.  I'm beginning to think that is just a
> limitation of the OS, and that the only way to read in more than 2GB
> is to call pread in a loop.

Probably, can't be bothered to check. The essential teaching here is
that _always_ call your (p)read(2)s in a loop. They don't guarantee
to fill your buffer, they just guarantee not to overflow it.

-- 
Mikko Rauhala   - mjr@iki.fi     - <URL: http://www.iki.fi/mjr/ >
Transhumanist   - WTA member     - <URL: http://transhumanism.org/ >
Singularitarian - SIAI supporter - <URL: http://singinst.org/ >
0
Reply mjr (19) 5/3/2007 11:44:12 PM

chsalvia wrote:

> I'm aware of the historical limitations of file sizes to 2GB due to
> the address space limit provided by a signed 32 bit integer.  But, the
> problem here is not file offsets.  The problem here is read length.
> I'm able to read at offsets greater than 2GB, but I can't actually
> read in more than 2GB of data.  I'm beginning to think that is just a
> limitation of the OS, and that the only way to read in more than 2GB
> is to call pread in a loop.

The return value of pread() has type ssize_t.  On your 32-bit system
this is almost certainly a 32-bit signed integer.  The return value
of pread() is either the number of bytes it read or -1 on error.  The
largest value that can be returned on your system is 2^31-1.

If pread() could read more than 2^31-1 bytes (2GB minus 1 byte), how do
you expect your program to be able to determine that it did?

-- 
Geoff Clare <netnews@gclare.org.uk>
0
Reply geoff31 (365) 5/4/2007 12:56:09 PM

Geoff Clare wrote:

> chsalvia wrote:
> 
>> I'm aware of the historical limitations of file sizes to 2GB due to
>> the address space limit provided by a signed 32 bit integer.  But, the
>> problem here is not file offsets.  The problem here is read length.
>> I'm able to read at offsets greater than 2GB, but I can't actually
>> read in more than 2GB of data.  I'm beginning to think that is just a
>> limitation of the OS, and that the only way to read in more than 2GB
>> is to call pread in a loop.
> 
> The return value of pread() has type ssize_t.  On your 32-bit system [...]

He wrote: "Yes I am running the 64-bit version of ubuntu."
0
Reply devnull8 (127) 5/4/2007 3:31:44 PM

Spoon wrote:

> Geoff Clare wrote:
> 
>> chsalvia wrote:
>> 
>>> I'm aware of the historical limitations of file sizes to 2GB due to
>>> the address space limit provided by a signed 32 bit integer.  But, the
>>> problem here is not file offsets.  The problem here is read length.
>>> I'm able to read at offsets greater than 2GB, but I can't actually
>>> read in more than 2GB of data.  I'm beginning to think that is just a
>>> limitation of the OS, and that the only way to read in more than 2GB
>>> is to call pread in a loop.
>> 
>> The return value of pread() has type ssize_t.  On your 32-bit system [...]
> 
> He wrote: "Yes I am running the 64-bit version of ubuntu."

Oops, so he did.  I saw his later statement, "Also, the pread64 takes
the amount of data to read as size_t. On 32-bit OS this is only 32-bit"
and took that to imply he was running 32-bit.

So, what is ssize_t defined as when running the 64-bit version of ubuntu?

Most likely it's still a 32-bit signed integer, and the rest of my post
applies just the same.

-- 
Geoff Clare <netnews@gclare.org.uk>
0
Reply geoff31 (365) 5/8/2007 12:44:14 PM

No, ssize_t on a 64-bit system is a 64-bit signed integer.


0
Reply chsalvia (83) 5/9/2007 5:32:20 AM

chsalvia wrote:

[context restored]
>> So, what is ssize_t defined as when running the 64-bit version of ubuntu?
>> 
>> Most likely it's still a 32-bit signed integer, and the rest of my post
>> applies just the same.

> No, ssize_t on a 64-bit system is a 64-bit signed integer.

Can anyone confirm this?  The limited testing I have been able to do
does not match chsalvia's claim.

I don't have an x86-64 Linux system to test on, but on a 32-bit x86
Linux system if I use "gcc -m64" (which compiles for x86-64), nm reports
that a ssize_t object has a size of 4 bytes, the same as for 32-bit:

$ cat foo.c
#include <sys/types.h>

ssize_t foo;
$ gcc -c -m64 foo.c
$ nm -P foo.o
foo C 0000000000000004 0000000000000004
$ file foo.o
foo.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV), not stripped
$ gcc -c foo.c
$ nm -P foo.o 
foo C 00000004 00000004
$ file foo.o  
foo.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

-- 
Geoff Clare <netnews@gclare.org.uk>
0
Reply geoff31 (365) 5/11/2007 1:51:55 PM

I wrote:

> chsalvia wrote:
> 
> [context restored]
>>> So, what is ssize_t defined as when running the 64-bit version of ubuntu?
>>> 
>>> Most likely it's still a 32-bit signed integer, and the rest of my post
>>> applies just the same.
> 
>> No, ssize_t on a 64-bit system is a 64-bit signed integer.
> 
> Can anyone confirm this?

Scratch that.  I looked at the LSB spec for the AMD64 ABI (should have
thought of doing that before my previous post!) and it specifies ssize_t
as a typedef for int64_t.

> I don't have an x86-64 Linux system to test on, but on a 32-bit x86
> Linux system if I use "gcc -m64" (which compiles for x86-64), nm reports
> that a ssize_t object has a size of 4 bytes, the same as for 32-bit:

Presumably the gcc -m32 and -m64 options only work properly on x86-64
systems.  Even though gcc happily produces x86-64 object files with -m64
on 32-bit x86 systems, it appears that the files are invalid because the
headers don't support choosing between 64-bit and 32-bit; they just
assume 32-bit.

-- 
Geoff Clare <netnews@gclare.org.uk>
0
Reply geoff31 (365) 5/11/2007 2:25:37 PM

Geoff Clare <geoff@clare.See-My-Signature.invalid> writes:

>chsalvia wrote:

>[context restored]
>>> So, what is ssize_t defined as when running the 64-bit version of ubuntu?
>>> 
>>> Most likely it's still a 32-bit signed integer, and the rest of my post
>>> applies just the same.

>> No, ssize_t on a 64-bit system is a 64-bit signed integer.

>Can anyone confirm this?  The limited testing I have been able to do
>does not match chsalvia's claim.

It pretty much must be on a Unix system (Windows, however, seems
to differ)

>I don't have an x86-64 Linux system to test on, but on a 32-bit x86
>Linux system if I use "gcc -m64" (which compiles for x86-64), nm reports
>that a ssize_t object has a size of 4 bytes, the same as for 32-bit:

But does it use the proper include files for a 64 bit compile?

Casper
-- 
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
0
Reply Casper.Dik (623) 5/11/2007 3:21:11 PM

Casper H.S. Dik <Casper.Dik@Sun.COM> writes:
> Geoff Clare <geoff@clare.See-My-Signature.invalid> writes:

[...]

>>I don't have an x86-64 Linux system to test on, but on a 32-bit x86
>>Linux system if I use "gcc -m64" (which compiles for x86-64), nm reports
>>that a ssize_t object has a size of 4 bytes, the same as for 32-bit:
>
> But does it use the proper include files for a 64 bit compile?

The relevant part is this:

#if __WORDSIZE == 32
# define __SQUAD_TYPE           __quad_t
# define __UQUAD_TYPE           __u_quad_t
# define __SWORD_TYPE           int
# define __UWORD_TYPE           unsigned int
# define __SLONG32_TYPE         long int
# define __ULONG32_TYPE         unsigned long int
# define __S64_TYPE             __quad_t
# define __U64_TYPE             __u_quad_t
#elif __WORDSIZE == 64
# define __SQUAD_TYPE           long int
# define __UQUAD_TYPE           unsigned long int
# define __SWORD_TYPE           long int
# define __UWORD_TYPE           unsigned long int
# define __SLONG32_TYPE         int
# define __ULONG32_TYPE         unsigned int
# define __S64_TYPE             long int
# define __U64_TYPE             unsigned long int

(/usr/include/bits/types.h)

WORDSIZE is defined in /usr/include/bits/wordsize.h and presumably
generated during a glibc-build.
0
Reply rweikusat (2679) 5/11/2007 4:49:18 PM

On May 3, 11:45 am, chsal...@gmail.com wrote:

> I'm beginning to think that is just a
> limitation of the OS, and that the only way to read in more than 2GB
> is to call pread in a loop.

The only way to reliably read in any amount of data greater than one
byte is to call 'pread' in a loop. There are many cases in which you
will get a short read.

DS

0
Reply David 5/11/2007 10:13:58 PM

14 Replies
90 Views

(page loaded in 0.41 seconds)

Similiar Articles:


















7/11/2012 6:03:04 AM


Reply: