Reduction of Sample size in fortran

  • Follow


Hi all,

I have a data of 625 individual in a square grid of 25 by 25 located at the=
 coordinates (1,1), (1, 2), ......, (1, 25),(2,1), (2,2),........, (2,25), =
............,(25, 1).................., (25,25) and they are infected at dif=
ferent times from 1 day to 10 days. Only one individual is infected at the =
first day, some are infected second day, some are third day and so on. Some=
 individuals are non-infected so they are associated at day 0. Now I want  =
first to sort them out in the group of same infected day and want to sample=
 randomly 10% of them from each group which are infected. I have make the c=
odes in Fortran.

Any idea and suggestion will be great for me.

Mark
1
Reply gyanendra.pokharel (31) 6/13/2012 5:09:33 PM

On Wednesday, June 13, 2012 1:09:33 PM UTC-4, Mark wrote:
> Hi all,
>=20
> I have a data of 625 individual in a square grid of 25 by 25 located at t=
he coordinates (1,1), (1, 2), ......, (1, 25),(2,1), (2,2),........, (2,25)=
, ...........,(25, 1).................., (25,25) and they are infected at d=
ifferent times from 1 day to 10 days. Only one individual is infected at th=
e first day, some are infected second day, some are third day and so on. So=
me individuals are non-infected so they are associated at day 0. Now I want=
  first to sort them out in the group of same infected day and want to samp=
le randomly 10% of them from each group which are infected. I have make the=
 codes in Fortran.
>=20
> Any idea and suggestion will be great for me.
>=20
> Mark

Sorry for the typo. I need to programming this problem in Fortran. Any idea=
 or suggestion will be great for me.
Thanks
0
Reply gyanendra.pokharel (31) 6/13/2012 5:29:29 PM


On 14/06/2012 5:09 a.m., Mark wrote:
> Hi all,
>
> I have a data of 625 individual in a square grid of 25 by 25 located
> at the coordinates (1,1), (1, 2), ......, (1, 25),(2,1),
> (2,2),........, (2,25), ...........,(25, 1)..................,
> (25,25) and they are infected at different times from 1 day to 10
> days. Only one individual is infected at the first day, some are
> infected second day, some are third day and so on. Some individuals
> are non-infected so they are associated at day 0. Now I want  first
> to sort them out in the group of same infected day and want to sample
> randomly 10% of them from each group which are infected. I have make
> the codes in Fortran.
>
> Any idea and suggestion will be great for me.
>
> Mark

The programming language is not very relevant.  First you need to
develop an algorithm.  If you need help with Fortran after that, you
might find it here.
0
Reply g.bogle6025 (28) 6/13/2012 9:07:42 PM

On Wednesday, June 13, 2012 5:07:42 PM UTC-4, Gib Bogle wrote:
> On 14/06/2012 5:09 a.m., Mark wrote:
> > Hi all,
> >
> > I have a data of 625 individual in a square grid of 25 by 25 located
> > at the coordinates (1,1), (1, 2), ......, (1, 25),(2,1),
> > (2,2),........, (2,25), ...........,(25, 1)..................,
> > (25,25) and they are infected at different times from 1 day to 10
> > days. Only one individual is infected at the first day, some are
> > infected second day, some are third day and so on. Some individuals
> > are non-infected so they are associated at day 0. Now I want  first
> > to sort them out in the group of same infected day and want to sample
> > randomly 10% of them from each group which are infected. I have make
> > the codes in Fortran.
> >
> > Any idea and suggestion will be great for me.
> >
> > Mark
>=20
> The programming language is not very relevant.  First you need to
> develop an algorithm.  If you need help with Fortran after that, you
> might find it here.
Thanks Gib,
I have reduced the problem in this form.

X     Y    INFTIME
1     1     0
1     2     4
1     3     4
1     4     3
2     1     3
2     2     1
2     3     3
2     4     4
3     1     2
3     2     2
3     3     0
3     4     2
4     1     4
4     2     3
4     3     3
4     4     0

X and Y represent he X and Y components in the square grid of 4 by 4. Now I=
 need and algorithm to short the matrix depending in the INFTIME. That is I=
 want to make the group of the individuals of the same INFTIME, then I need=
 to sample 10% from each group. I think this helps better to understand the=
 problem.

Here my main question is to make the algorithm to sort out the INFTIME and =
construct the group of individuals infected at the same time.

Thanks
0
Reply gyanendra.pokharel (31) 6/13/2012 9:17:49 PM

Mark <gyanendra.pokharel@gmail.com> wrote:

(snip)

>> > (25,25) and they are infected at different times from 1 day to 10
>> > days. Only one individual is infected at the first day, some are
>> > infected second day, some are third day and so on. Some individuals
>> > are non-infected so they are associated at day 0. Now I want  first
>> > to sort them out in the group of same infected day and want to sample
>> > randomly 10% of them from each group which are infected. I have make
>> > the codes in Fortran.

(snip)
> X     Y    INFTIME
> 1     1     0
> 1     2     4
> 1     3     4

(snip)
> X and Y represent he X and Y components in the square grid 
> of 4 by 4. Now I need and algorithm to short the matrix 
> depending in the INFTIME. That is I want to make the group 
> of the individuals of the same INFTIME, then I need to 
> sample 10% from each group. I think this helps better to 
> understand the problem.

> Here my main question is to make the algorithm to sort 
> out the INFTIME and construct the group of individuals 
> infected at the same time.

Unless it is part of the requirements, you don't need to sort.
For large arrays, sorting might be faster, but isn't always needed.

You can, for example, go through the array and count how many
of each, either in one pass or in 10 passes.

Sorted or not, you should have a random number generator
for the sampling. 

In either a 4x4 or 25x25 case, your computer will be plenty
fast enough to go through the array unsorted.

Imagine if you have cards with numbers on in your hand, and
you have to go through them without sorting, but don't worry
about how long it takes.

-- glen
0
Reply gah (12261) 6/13/2012 10:20:21 PM


"Mark"  wrote in message 
news:e67f6943-b9ff-434b-bc6f-81b49b982b32@googlegroups.com...

Hi all,

I have a data of 625 individual in a square grid of 25 by 25 located at the 
coordinates (1,1), (1, 2), ......, (1, 25),(2,1), (2,2),........, (2,25), 
............,(25, 1).................., (25,25) and they are infected at 
different times from 1 day to 10 days. Only one individual is infected at 
the first day, some are infected second day, some are third day and so on. 
Some individuals are non-infected so they are associated at day 0. Now I 
want  first to sort them out in the group of same infected day and want to 
sample randomly 10% of them from each group which are infected. I have make 
the codes in Fortran.

Any idea and suggestion will be great for me.

Mark

--> I can not suggest anything specific to Fortran that would be better than 
some other programming language. But I do suggest a re-think of the problem. 
A different way to look at it would be to make one pass through the x-y 
array examining each square. For the value in that cell, add the X and Y 
coordinates to a list of coordinates for that particular day number. In 
effect you are creating a tree data structure with the first level nodes 
consisting of the day numbers and their child nodes consisting of X and Y 
values. If you know a reasonable upper bound for the number of days, then 
you can actually do all of this in any Fortran by using arrays or you could 
adapt code that already works with a more desirable data structure.

For this type of problem, I would probably just write out the cell contents 
as a table of triplets: D, X, Y and them process them in some scripting 
language which has associative arrays.

--- e

1
Reply epc8 (1259) 6/14/2012 12:01:58 AM

Yes geh, but the data I posted is a simple example. I have to work with the data greater than 100, 000 individual. Which takes plenty of long time to simulate, so what I want to do is I want to sample only 10% or 5% of the data from the population and find the estimate from the sampling data. For this first I need to sort the population and make a group of the individual infected at the same day and sample a fixed % from each group.

Thanks 

--
0
Reply gyanendra.pokharel (31) 6/14/2012 3:36:39 AM

You must form a vector of pointers to the sampled individuals.
You count the number N1 in the subset (same as entries in the vector).
You now select a random number between 1 and N1 and choose this individual.
Either repeat N1/10 times, rejecting individual already selected, OR shorten
the vector by one unit, removing the selected entry and select another
random number between 1 and N2 where  ( N2=N-1).
As before, repeat the operation till N1/10 individuals are selected.

It's not a good method, because the full sample set is not infinite, or even
large.

You might want to look up methods used for soil and crop sampling ("Latin
squares").



0
Reply tbwright1 (218) 6/14/2012 6:27:42 AM

gyanendra.pokharel <gyanendra.pokharel@gmail.com> wrote:
> Yes geh, but the data I posted is a simple example. 
> I have to work with the data greater than 100, 000 individual. 
> Which takes plenty of long time to simulate, so what I want 
> to do is I want to sample only 10% or 5% of the data from 
> the population and find the estimate from the sampling data. 
> For this first I need to sort the population and make a group 
> of the individual infected at the same day and sample a 
> fixed % from each group.

Maybe you haven't explained it all, but I still don't see a
need to sort. You can in one pass through the unsorted data
count how many of each group. (Learn about arrays first.)

Then in a second pass through the data select and do whatever
to the randomly selected items. 100,000 is pretty small these
days, but even with 40 year old computers it would have been
just fine.

Since you didn't say what to do with the samples it is hard
to say more, but most are just as easy to do without sorting.

The problem seems to be O(N) as it is. Sorting will be either
O(N log N) for fast sort algorithms, or O(N**2) for slower ones,
which will still be very fast. But then you go through the sorted
data and still do the same thing.

What do you actually want to do with the sampled data?

-- glen
1
Reply gah (12261) 6/14/2012 10:52:04 AM

Well, this is my idea to sort it out and sample. If we could sample 10% or some fixed % from the population (But it should be same proportion from each group of INFTIME)without sorting, how could I sample? 

The main approach  of sampling is to reduce the size of data so that the computation time would be much more smaller than the whole data. I already used the whole data and simulated the parameters but it took pretty long time. I want to reduce the computation time by sampling and reducing the population size.

[object XMLDocument]
--
0
Reply gyanendra.pokharel (31) 6/14/2012 5:00:01 PM

gyanendra.pokharel <gyanendra.pokharel@gmail.com> wrote:

> Well, this is my idea to sort it out and sample. 
> If we could sample 10% or some fixed % from the population 
> (But it should be same proportion from each group of 
> INFTIME)without sorting, how could I sample? 

> The main approach  of sampling is to reduce the size of 
> data so that the computation time would be much more smaller 
> than the whole data. I already used the whole data and 
> simulated the parameters but it took pretty long time. 

It might be that there is a faster way, but anyway ...

> I want to reduce the computation time by sampling and 
> reducing the population size.

Say, just as an example, you want the average (mean) 
temperature for each. With only 10, it isn't so bad to
loop through and do one at a time:

      NUM=0
      SUM=0
      DO I=1,10
         DO X=1,?
             DO Y=1,?
                IF(INFTIME(X,Y).EQ.I) THEN
                   NUM=NUM+1
                   SUM=SUM+TEMP(X,Y)
                ENDIF
             ENDDO
          ENDDO
          PRINT *,I,SUM/NUM
      ENDDO

So, 10 passes through the array, still probably plenty fast.

But you can do it all in one pass:

Keep an array of NUM and SUM for each INFTIME:

      INTEGER NUMS(10)
      REAL SUMS(10)
      DO X=1,?
          DO Y=1,?
             NUMS(X,Y)=NUMS(X,Y)+1
             SUMS(X,Y)=SUMS(X,Y)+TEMP(X,Y)
          ENDDO
      ENDDO
! now print them all out
      DO I=1,10
         PRINT *,I,SUMS(I)/NUMS(I)
      ENDDO

-- glen
0
Reply gah (12261) 6/14/2012 9:15:07 PM

gyanendra.pokharel <gyanendra.pokharel@gmail.com> wrote:-

The main approach  of sampling is to reduce the size of data so that the
computation time would be much more smaller than the whole data. I already
used the whole data and simulated the parameters but it took pretty long
time. I want to reduce the computation time by sampling and reducing the
population size.


But it it always better to use all the data to get a higher confidence level
in the result (which after all is only a statistic).

A still better way is to randomly divide the data set into two separate 50%
samples and compute the statistics you need, then compare the two results
for the confidence level in being truly from the same population.
On the whole, between processing one tenth and all the data, the extra
simplicity in not taking samples might give a faster calculation, seeing
that any sampling requires passing the entire data set (which is always the
slowest process).


0
Reply tbwright1 (218) 6/14/2012 10:42:49 PM

Thanks gah,

I got what you meant but still problem in fitting this code to my data because my data has one dimensional INFTIME and I don't have to find the average but just to sample 10% from the population. 
 
0
Reply gyanendra.pokharel (31) 6/15/2012 5:30:21 PM

gyanendra.pokharel <gyanendra.pokharel@gmail.com> wrote:

> I got what you meant but still problem in fitting this code to 
> my data because my data has one dimensional INFTIME and 
> I don't have to find the average but just to sample 10% 
> from the population. 

What do you want to do with the 10%?

The average was just an example. 

When you say "sample 10%" you usually mean that you want to
do something with the 10%.

Now, maybe you just want to print out those 10%.
(That is the least I can think to do to them.)

In that case, though, you get different output depending on
what order you do it, but a loop over different INFTIME is
still likely faster (and easier to write) than sorting.

Also, you might want to be careful with your 10%. 
In the smaller cases that could easily result in 0 samples.
(Theoretically it can happen for any case, but more likely
for smaller cases.)

-- glen

0
Reply gah (12261) 6/15/2012 8:37:57 PM

Ok, kere is my code what I am doing. Still there are some errors:
program  epimatrix
IMPLICIT NONE
INTEGER ::l, i,T,K
REAL, DIMENSION(1:625):: X
REAL, DIMENSION(1:625):: Y
REAL :: u(63)
INTEGER, DIMENSION(1:625):: INFTIME

INTEGER,ALLOCATABLE,DIMENSION(:) :: seed
INTEGER,DIMENSION(8) :: time1
INTEGER :: size
!CALL RANDOM_SEED
CALL RANDOM_SEED(size=size)
 ALLOCATE (seed(size))
 CALL DATE_AND_TIME(values=time1)
 seed = 60*time1(6)+60*time1(7)+time1(8)
! print *, seed(1)
 CALL RANDOM_SEED(put=seed)

OPEN(10, FILE = 'epidemicSIR.txt', FORM = 'FORMATTED')
DO l = 1,625
   READ(10,*,END = 200) X(l), Y(l), INFTIME(l)
  ! WRITE(*,*) X(l),Y(l), INFTIME(l)
ENDDO
200 CONTINUE
CLOSE(10)

DO T = 1,10
   DO i = 1, 63
      CALL RANDOM_NUMBER(u(i))
      K = 1+int(u(i)*624)
      IF(INFTIME(K)/=0 .AND. INFTIME(K) .LE. T)THEN
         PRINT *,X(i),Y(i),INFTIME(i)
      ENDIF
   ENDDO
ENDDO

end program

1. it is sampling randomly from the population but only returning the individuals from the beginning part of data.
2. It is not controlling the individuals which are non infected and time greater than the 10.

So any idea to fix these issues are very helpful for me.
Tanks
0
Reply gyanendra.pokharel (31) 6/15/2012 10:07:23 PM

gyanendra.pokharel <gyanendra.pokharel@gmail.com> wrote:

(snip)

> DO T = 1,10
>   DO i = 1, 63
>      CALL RANDOM_NUMBER(u(i))
>      K = 1+int(u(i)*624)
>      IF(INFTIME(K)/=0 .AND. INFTIME(K) .LE. T)THEN
>         PRINT *,X(i),Y(i),INFTIME(i)
>      ENDIF
>   ENDDO
> ENDDO

> 1. it is sampling randomly from the population but only 
>    returning the individuals from the beginning part of data.

First, the 624 should be 625. I presume the 63 is 625/10 
rounded up. But that does 63 each time, not depending on how
many there are at each INFTIME value. You should probably
also print out T so you know what it is doing, and when.

I would start with one pass through the array histogramming
(counting) values in INFTIME. Then you can process each possible
value of INFTIME properly.

> 2. It is not controlling the individuals which are non 
>    infected and time greater than the 10.

What should it do with them?

-- glen
1
Reply gah (12261) 6/15/2012 10:50:36 PM

Yes, glen, I made following coding for sampling 

DO T = 1,10
   my_cnt=0
!   write(*,*) "T=",T

!!!!!!!!!!!!! 10% sampling !!!!!!!!!!!!!
      do while (my_cnt <= 0.1*625)
         K = 1+int(625*rand())
            if(inftime(K)/=0 .and. inftime(K) .le. T)then
               open(10, file = "epidemicdata.txt")
               write(*,*) x(k),y(k),inftime(k)
               my_cnt =my_cnt +1
            endif
      enddo
   enddo
enddo

but it is returning 10% of tolat individuals at each time T. Can you please edit this code so that I will get 10% of the infected individual from each time T 
0
Reply gyanendra.pokharel (31) 6/18/2012 11:58:12 PM

17 Replies
331 Views

(page loaded in 0.129 seconds)

Similiar Articles:


















7/29/2012 7:57:52 PM


Reply: