Reading files in a multithreaded application

  • Follow


Hello,

I have run into an awkward problem regarding files in a multithreaded
application.
I can solve the actual problem using a variety of methods, but the
simplest one
is to open the file at different LU-numbers.

(Some context: my application uses a Fortran library (DLL/shared
object) that reads
several files to do its job but because it is now used for several
simultaneous
computations in the same program, the matter of file positioning is a
problem.
I have converted an ordinary program into this library and the files
are read
sequentially during the computation.)

To test one method, I wrote this little program:

! multifile.f90 --
!     Can I open a file on more than one LU-number?
!
program multifile

    implicit none
    real :: x, y

    open( 10, file = 'multifile.inp' )
    open( 20, file = 'multifile.inp' )

    x = -999.0
    y = -999.0
    read( 10, * ) x
    read( 20, * ) y

    write(*,*) x, y

end program multifile

This program works - unexpectedly, I admit - with Intel Fortran, but
not with gfortran.
With Intel Fortran the file is opened on two different LU-numbers and
each is treated
as a completely separate connection (or at least that is one way of
explaining that
it produces the output it produces).

With gfortran it fails on the second open statement - file already
open.

Which behaviour is conforming to the standard? Or is there a way to
make
sure you can open a file on different LU-number at the same time?
(I'd rather rely on universal behaviour)

Regards,

Arjen
0
Reply arjen.markus895 (634) 1/12/2011 3:19:09 PM

In article <fcf9f38e-a38c-43ae-aa79-dc8a7917308e@c39g2000yqi.googlegroups.com>,
Arjen Markus  <arjen.markus895@gmail.com> wrote:
>
>I have run into an awkward problem regarding files in a multithreaded
>application.
>I can solve the actual problem using a variety of methods, but the
>simplest one
>is to open the file at different LU-numbers.

Nope.  Sorry.  That is a historical restriction of Fortran, that
is preserved because that works only on 'disk' files.  If the file
was on magnetic tape or a socket, it wouldn't work.  The best
implementations (and not just of Fortran) have made the open
succeed only if the file supports the appropriate access.  It's
easier than most people think, except for extremely unusual files.

Virtually every language for the past 40 years has allowed the
'write once / read multiple' semantics, but Fortran has stuck with
the ancient 'open once' semantics for all files.

I should like to see that fixed (and, yes, I regard it as a defect
in the standard), but cleaning up I/O lost out to coarrays for
Fortran 2008.  There are a LOT of historical defects that really
could do with sorting out, and some of them (like this one) need
very careful wordsmithing and probably extra functionality.


Regards,
Nick Maclaren.
0
Reply nmm12 (898) 1/12/2011 3:01:37 PM


"Arjen Markus" wrote

> I have run into an awkward problem regarding files in a multithreaded
> application.
> I can solve the actual problem using a variety of methods, but the
> simplest one
> is to open the file at different LU-numbers.

[snip]

> !     Can I open a file on more than one LU-number?
> !
> program multifile
>
>    implicit none
>    real :: x, y
>
>    open( 10, file = 'multifile.inp' )
>    open( 20, file = 'multifile.inp' )
>
>    x = -999.0
>    y = -999.0
>    read( 10, * ) x
>    read( 20, * ) y
>
>    write(*,*) x, y
>
> end program multifile

Just to clarify, LU is an abbreviation for "logical unit". It has nothing to 
do with factoring a matrix or the condition number of a matrix!

-- Elliot


0
Reply epc8 (1259) 1/12/2011 3:51:23 PM

Hello,

Arjen Markus schrieb:
> Hello,
> 
> I have run into an awkward problem regarding files in a multithreaded
> application.
> I can solve the actual problem using a variety of methods, but the
> simplest one
> is to open the file at different LU-numbers.
> 
> (Some context: my application uses a Fortran library (DLL/shared
> object) that reads
> several files to do its job but because it is now used for several
> simultaneous
> computations in the same program, the matter of file positioning is a
> problem.
> I have converted an ordinary program into this library and the files
> are read
> sequentially during the computation.)
> 
> To test one method, I wrote this little program:
> 
> ! multifile.f90 --
> !     Can I open a file on more than one LU-number?
> !
> program multifile
> 
>     implicit none
>     real :: x, y
> 
>     open( 10, file = 'multifile.inp' )
>     open( 20, file = 'multifile.inp' )

The standard does not allow a file to be connected to more than one unit
number at the same time. So the example you give is non-conforming;
gfortran's behaviour is more friendly in that it indicates this fact to you.

For the case of performing I/O to a single file from multiple threads,
if you effectively perform sequential reading anyway there should be
no trouble if you use a shared LU variable *and put a critical region
around the I/O statements* (file positioning statements must of
course happen in the same critical region as the corresponding
transfer statements).
Note that e.g. the OpenMP standard does not really cover parallel I/O,
so even in the context of C (which may allow multiple handles to the
same file) you might be in trouble when attempting parallel accesses
without mutual exclusion.


Regards
Reinhold

> 
>     x = -999.0
>     y = -999.0
>     read( 10, * ) x
>     read( 20, * ) y
> 
>     write(*,*) x, y
> 
> end program multifile
> 
> This program works - unexpectedly, I admit - with Intel Fortran, but
> not with gfortran.
> With Intel Fortran the file is opened on two different LU-numbers and
> each is treated
> as a completely separate connection (or at least that is one way of
> explaining that
> it produces the output it produces).
> 
> With gfortran it fails on the second open statement - file already
> open.
> 
> Which behaviour is conforming to the standard? Or is there a way to
> make
> sure you can open a file on different LU-number at the same time?
> (I'd rather rely on universal behaviour)
> 
> Regards,
> 
> Arjen
0
Reply Bader1 (138) 1/12/2011 3:52:49 PM

On 12 jan, 16:51, "e p chandler" <e...@juno.com> wrote:
> "Arjen Markus" wrote
>
> > I have run into an awkward problem regarding files in a multithreaded
> > application.
> > I can solve the actual problem using a variety of methods, but the
> > simplest one
> > is to open the file at different LU-numbers.
>
> [snip]
>
>
>
>
>
> > ! =A0 =A0 Can I open a file on more than one LU-number?
> > !
> > program multifile
>
> > =A0 =A0implicit none
> > =A0 =A0real :: x, y
>
> > =A0 =A0open( 10, file =3D 'multifile.inp' )
> > =A0 =A0open( 20, file =3D 'multifile.inp' )
>
> > =A0 =A0x =3D -999.0
> > =A0 =A0y =3D -999.0
> > =A0 =A0read( 10, * ) x
> > =A0 =A0read( 20, * ) y
>
> > =A0 =A0write(*,*) x, y
>
> > end program multifile
>
> Just to clarify, LU is an abbreviation for "logical unit". It has nothing=
 to
> do with factoring a matrix or the condition number of a matrix!
>
> -- Elliot- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

Oh, I had never thought "LU" could be confusing in this context.
Will have to remember that.

Regards,

Arjen
0
Reply arjen.markus895 (634) 1/12/2011 3:53:28 PM

On 2011-01-12, Arjen Markus <arjen.markus895@gmail.com> wrote:
> Hello,
>
> I have run into an awkward problem regarding files in a multithreaded
> application.
> I can solve the actual problem using a variety of methods, but the
> simplest one
> is to open the file at different LU-numbers.
>
> (Some context: my application uses a Fortran library (DLL/shared
> object) that reads
> several files to do its job but because it is now used for several
> simultaneous
> computations in the same program, the matter of file positioning is a
> problem.
> I have converted an ordinary program into this library and the files
> are read
> sequentially during the computation.)
>
> To test one method, I wrote this little program:
>
> ! multifile.f90 --
> !     Can I open a file on more than one LU-number?
> !
> program multifile
>
>     implicit none
>     real :: x, y
>
>     open( 10, file = 'multifile.inp' )
>     open( 20, file = 'multifile.inp' )
>
>     x = -999.0
>     y = -999.0
>     read( 10, * ) x
>     read( 20, * ) y
>
>     write(*,*) x, y
>
> end program multifile
>
> This program works - unexpectedly, I admit - with Intel Fortran, but
> not with gfortran.
> With Intel Fortran the file is opened on two different LU-numbers and
> each is treated
> as a completely separate connection (or at least that is one way of
> explaining that
> it produces the output it produces).
>
> With gfortran it fails on the second open statement - file already
> open.
>
> Which behaviour is conforming to the standard? Or is there a way to
> make
> sure you can open a file on different LU-number at the same time?
> (I'd rather rely on universal behaviour)

From F2008 9.5.4:

 A unit shall not be connected to more than one file at the same time,
 and a file shall not be connected to more than one unit at the same
 time.


Based on that, AFAICS, gfortran conforms to the standard.

What you can do, I suppose, is to use direct or stream access and then
READ with REC= or POS= in which case the data will be read from the
record or position specified, regardless of what the record/position
was before execution of the READ statement.

Note that while gfortran's IO library is thread safe, there is a
per-unit lock so you can't actually do IO in parallel to the same unit
from multiple threads.

-- 
JB
0
Reply foo33 (1360) 1/12/2011 3:53:49 PM

Arjen Markus <arjen.markus895@gmail.com> wrote:

> On 12 jan, 16:51, "e p chandler" <e...@juno.com> wrote:

> > Just to clarify, LU is an abbreviation for "logical unit". It has nothing to
> > do with factoring a matrix or the condition number of a matrix!

> Oh, I had never thought "LU" could be confusing in this context.
> Will have to remember that.

My usual form, as long as one is using an acronym anyway, is to include
the "number" in the acronym, giving me LUN instead of LU number. I don't
have any particular reason for that; it is just what I'm in the habit of
doing. My related symbolic names tend to have LUN in them.

-- 
Richard Maine                    | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle           |  -- Mark Twain
0
Reply nospam47 (9742) 1/12/2011 6:00:00 PM

On 12 jan, 16:52, Reinhold Bader <Ba...@lrz.de> wrote:
> Hello,
>
> Arjen Markus schrieb:
>
>
>
>
>
> > Hello,
>
> > I have run into an awkward problem regarding files in a multithreaded
> > application.
> > I can solve the actual problem using a variety of methods, but the
> > simplest one
> > is to open the file at different LU-numbers.
>
> > (Some context: my application uses a Fortran library (DLL/shared
> > object) that reads
> > several files to do its job but because it is now used for several
> > simultaneous
> > computations in the same program, the matter of file positioning is a
> > problem.
> > I have converted an ordinary program into this library and the files
> > are read
> > sequentially during the computation.)
>
> > To test one method, I wrote this little program:
>
> > ! multifile.f90 --
> > ! =A0 =A0 Can I open a file on more than one LU-number?
> > !
> > program multifile
>
> > =A0 =A0 implicit none
> > =A0 =A0 real :: x, y
>
> > =A0 =A0 open( 10, file =3D 'multifile.inp' )
> > =A0 =A0 open( 20, file =3D 'multifile.inp' )
>
> The standard does not allow a file to be connected to more than one unit
> number at the same time. So the example you give is non-conforming;
> gfortran's behaviour is more friendly in that it indicates this fact to y=
ou.
>
> For the case of performing I/O to a single file from multiple threads,
> if you effectively perform sequential reading anyway there should be
> no trouble if you use a shared LU variable *and put a critical region
> around the I/O statements* (file positioning statements must of
> course happen in the same critical region as the corresponding
> transfer statements).
> Note that e.g. the OpenMP standard does not really cover parallel I/O,
> so even in the context of C (which may allow multiple handles to the
> same file) you might be in trouble when attempting parallel accesses
> without mutual exclusion.
>
> Regards
> Reinhold
>
>
>
>
>
> > =A0 =A0 x =3D -999.0
> > =A0 =A0 y =3D -999.0
> > =A0 =A0 read( 10, * ) x
> > =A0 =A0 read( 20, * ) y
>
> > =A0 =A0 write(*,*) x, y
>
> > end program multifile
>
> > This program works - unexpectedly, I admit - with Intel Fortran, but
> > not with gfortran.
> > With Intel Fortran the file is opened on two different LU-numbers and
> > each is treated
> > as a completely separate connection (or at least that is one way of
> > explaining that
> > it produces the output it produces).
>
> > With gfortran it fails on the second open statement - file already
> > open.
>
> > Which behaviour is conforming to the standard? Or is there a way to
> > make
> > sure you can open a file on different LU-number at the same time?
> > (I'd rather rely on universal behaviour)
>
> > Regards,
>
> > Arjen- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -- Tekst uit oorspronkelijk =
bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

That assumes that the threads will read the same part of the file or
can be
synchronised to do so. I suppose it is possible in my particular case,
but it
might also hinder optimal performance.

Regards,

Arjen
0
Reply arjen.markus895 (634) 1/13/2011 7:52:33 AM

On 12 jan, 16:53, JB <f...@bar.invalid> wrote:
> On 2011-01-12, Arjen Markus <arjen.markus...@gmail.com> wrote:
>
>
>
>
>
> > Hello,
>
> > I have run into an awkward problem regarding files in a multithreaded
> > application.
> > I can solve the actual problem using a variety of methods, but the
> > simplest one
> > is to open the file at different LU-numbers.
>
> > (Some context: my application uses a Fortran library (DLL/shared
> > object) that reads
> > several files to do its job but because it is now used for several
> > simultaneous
> > computations in the same program, the matter of file positioning is a
> > problem.
> > I have converted an ordinary program into this library and the files
> > are read
> > sequentially during the computation.)
>
> > To test one method, I wrote this little program:
>
> > ! multifile.f90 --
> > ! =A0 =A0 Can I open a file on more than one LU-number?
> > !
> > program multifile
>
> > =A0 =A0 implicit none
> > =A0 =A0 real :: x, y
>
> > =A0 =A0 open( 10, file =3D 'multifile.inp' )
> > =A0 =A0 open( 20, file =3D 'multifile.inp' )
>
> > =A0 =A0 x =3D -999.0
> > =A0 =A0 y =3D -999.0
> > =A0 =A0 read( 10, * ) x
> > =A0 =A0 read( 20, * ) y
>
> > =A0 =A0 write(*,*) x, y
>
> > end program multifile
>
> > This program works - unexpectedly, I admit - with Intel Fortran, but
> > not with gfortran.
> > With Intel Fortran the file is opened on two different LU-numbers and
> > each is treated
> > as a completely separate connection (or at least that is one way of
> > explaining that
> > it produces the output it produces).
>
> > With gfortran it fails on the second open statement - file already
> > open.
>
> > Which behaviour is conforming to the standard? Or is there a way to
> > make
> > sure you can open a file on different LU-number at the same time?
> > (I'd rather rely on universal behaviour)
>
> From F2008 9.5.4:
>
> =A0A unit shall not be connected to more than one file at the same time,
> =A0and a file shall not be connected to more than one unit at the same
> =A0time.
>
> Based on that, AFAICS, gfortran conforms to the standard.
>
> What you can do, I suppose, is to use direct or stream access and then
> READ with REC=3D or POS=3D in which case the data will be read from the
> record or position specified, regardless of what the record/position
> was before execution of the READ statement.
>
> Note that while gfortran's IO library is thread safe, there is a
> per-unit lock so you can't actually do IO in parallel to the same unit
> from multiple threads.
>
> --
> JB- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

Yes, using a direct-access file is an alternative method I have been
considering. I am not sure all the files in question can be treated
easily in that fashion (some may have a few records at the start that
have a different length). Using stream access and then position the
file is probably the way to go.

Regards,

Arjen
0
Reply arjen.markus895 (634) 1/13/2011 7:55:37 AM

On 12 jan, 16:01, n...@cam.ac.uk wrote:
> In article <fcf9f38e-a38c-43ae-aa79-dc8a79173...@c39g2000yqi.googlegroups=
..com>,
> Arjen Markus =A0<arjen.markus...@gmail.com> wrote:
>
>
>
> >I have run into an awkward problem regarding files in a multithreaded
> >application.
> >I can solve the actual problem using a variety of methods, but the
> >simplest one
> >is to open the file at different LU-numbers.
>
> Nope. =A0Sorry. =A0That is a historical restriction of Fortran, that
> is preserved because that works only on 'disk' files. =A0If the file
> was on magnetic tape or a socket, it wouldn't work. =A0The best
> implementations (and not just of Fortran) have made the open
> succeed only if the file supports the appropriate access. =A0It's
> easier than most people think, except for extremely unusual files.
>
> Virtually every language for the past 40 years has allowed the
> 'write once / read multiple' semantics, but Fortran has stuck with
> the ancient 'open once' semantics for all files.
>
> I should like to see that fixed (and, yes, I regard it as a defect
> in the standard), but cleaning up I/O lost out to coarrays for
> Fortran 2008. =A0There are a LOT of historical defects that really
> could do with sorting out, and some of them (like this one) need
> very careful wordsmithing and probably extra functionality.
>
> Regards,
> Nick Maclaren.

Pity, though in this case I have a bunch of alternative methods
to help out.

Regards,

Arjen
0
Reply arjen.markus895 (634) 1/13/2011 7:56:53 AM

9 Replies
43 Views

(page loaded in 0.099 seconds)


Reply: