f



multiple files into one file based on unique entry in one of the files

I have 4 flat files where each field is separated by a pipe |.  In
each file the second field has a unique value that is in all the
files.  Each line is terminated by a newline.  I what to combine each
file and create one file.  In this one file, there should be one line
for each "unique" entry in that was found in the second file.  I was
able to do this in MS access by creating each file as table and
linking the "unique" field in the 2nd file to the other files.  I want
do this in Tcl, so I can automate the process.  Come some please point
me in a starting direction?  I know enough Tcl to get by, but not much
in the I/O region.  Any ideas or would be great!

Thx!!
0
cesear (4)
3/17/2010 12:47:09 PM
comp.lang.tcl 23429 articles. 2 followers. Post Follow

25 Replies
1043 Views

Similar Articles

[PageSpeed] 34

On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
> I have 4 flat files where each field is separated by a pipe |. =A0In
> each file the second field has a unique value that is in all the
> files. =A0Each line is terminated by a newline. =A0I what to combine each
> file and create one file. =A0In this one file, there should be one line
> for each "unique" entry in that was found in the second file. =A0I was
> able to do this in MS access by creating each file as table and
> linking the "unique" field in the 2nd file to the other files. =A0I want
> do this in Tcl, so I can automate the process. =A0Come some please point
> me in a starting direction? =A0I know enough Tcl to get by, but not much
> in the I/O region. =A0Any ideas or would be great!
>
> Thx!!

What you could do is:

while {[gets $infile line] } {
    set fields   [split $line |]
    set uniqueId [lindex $fields 1]
    set data1($uniqueId) [lreplace $fields 1 1]
}

(Same for the other files)

Now you have four arrays, data1, ... data4, that hold the
non-unique information for each unique ID.

Joining them into one file:

foreach id [array names data1] {
    puts $outfile [join [concat $id $data1($id) $data2($id)
$data3($id) $data4($id)] |]
}

Or code along those lines - this is mostly a sketch.

Regards,

Arjen
0
Arjen
3/17/2010 1:02:07 PM
At 2010-03-17 08:47AM, "Cesear" wrote:
>  I have 4 flat files where each field is separated by a pipe |.  In
>  each file the second field has a unique value that is in all the
>  files.  Each line is terminated by a newline.  I what to combine each
>  file and create one file.  In this one file, there should be one line
>  for each "unique" entry in that was found in the second file.  I was
>  able to do this in MS access by creating each file as table and
>  linking the "unique" field in the 2nd file to the other files.  I want
>  do this in Tcl, so I can automate the process.  Come some please point
>  me in a starting direction?  I know enough Tcl to get by, but not much
>  in the I/O region.  Any ideas or would be great!
>  
>  Thx!!

I don't understand what you want as the result.  Is it:
    1. a single output file containing all of the 4 original files
    2. several output files depending on the 2nd field of each line of
       the 4 original files

If it's #2,

    array set output {}
    foreach file {file1 file2 file3 file4} {
        set input [open $file r]
        while {[gets $input line] != -1} {
            set field2 [lindex [split $line |] 1]
            if { ! [info exists output($field2)]} {
                set output_filename [format "output_%s.out" $field2]
                # ... or whatever you want the output file to be named

                set output($field2) [open $output_filename w]
            }
            puts $output($field2) $line
        }
        close $input
    }
    foreach item [array names output] {close $output($item)}

-- 
Glenn Jackman
    Write a wise saying and your name will live forever. -- Anonymous
0
Glenn
3/17/2010 1:03:00 PM
At 2010-03-17 09:02AM, "Arjen Markus" wrote:
>  while {[gets $infile line] } {

Note, should be
   while {[gets $infile line] != -1} {


-- 
Glenn Jackman
    Write a wise saying and your name will live forever. -- Anonymous
0
Glenn
3/17/2010 1:05:19 PM
On Mar 17, 1:47=A0pm, Cesear <ces...@gmail.com> wrote:
> I have 4 flat files where each field is separated by a pipe |. =A0In
> each file the second field has a unique value that is in all the
> files. =A0Each line is terminated by a newline. =A0I what to combine each
> file and create one file. =A0In this one file, there should be one line
> for each "unique" entry in that was found in the second file. =A0I was
> able to do this in MS access by creating each file as table and
> linking the "unique" field in the 2nd file to the other files. =A0I want
> do this in Tcl, so I can automate the process. =A0Come some please point
> me in a starting direction? =A0I know enough Tcl to get by, but not much
> in the I/O region. =A0Any ideas or would be great!

Your description is a bit unclear, please provide an example.

-Alex

0
Alexandre
3/17/2010 1:16:53 PM
On Mar 17, 8:47=A0am, Cesear <ces...@gmail.com> wrote:
> I have 4 flat files where each field is separated by a pipe |. =A0In
> each file the second field has a unique value that is in all the
> files. =A0Each line is terminated by a newline. =A0I what to combine each
> file and create one file. =A0In this one file, there should be one line
> for each "unique" entry in that was found in the second file. =A0


What you are describing sounds like the unix join command.
0
Larry
3/17/2010 1:37:38 PM
On 17 mrt, 14:05, Glenn Jackman <gle...@ncf.ca> wrote:
> At 2010-03-17 09:02AM, "Arjen Markus" wrote:
>
> > =A0while {[gets $infile line] } {
>
> Note, should be
> =A0 =A0while {[gets $infile line] !=3D -1} {
>
> --
> Glenn Jackman
> =A0 =A0 Write a wise saying and your name will live forever. -- Anonymous

Argh, yes, or [gets ...] >=3D 0.

Regards,

Arjen
0
Arjen
3/17/2010 1:42:19 PM
On Mar 17, 9:16=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> On Mar 17, 1:47=A0pm, Cesear <ces...@gmail.com> wrote:
>
> > I have 4 flat files where each field is separated by a pipe |. =A0In
> > each file the second field has a unique value that is in all the
> > files. =A0Each line is terminated by a newline. =A0I what to combine ea=
ch
> > file and create one file. =A0In this one file, there should be one line
> > for each "unique" entry in that was found in the second file. =A0I was
> > able to do this in MS access by creating each file as table and
> > linking the "unique" field in the 2nd file to the other files. =A0I wan=
t
> > do this in Tcl, so I can automate the process. =A0Come some please poin=
t
> > me in a starting direction? =A0I know enough Tcl to get by, but not muc=
h
> > in the I/O region. =A0Any ideas or would be great!
>
> Your description is a bit unclear, please provide an example.
>
> -Alex

Here is an example:  In File 2, second field is unique, to the other 3
files.  In File 2, field 2 could occur in multiple lines of the file.
I want to combine all 4 files into ONE file FOREACH instance of field
2 in File 2.  Each instance should occur as ONE line in the combined
one File.  Also, I ONLY need to have field 1 and field 2 occur once in
the combine file.  So the grand outlook for "D0000012345678" (From
File 2) the one line would be this-->

D0001234|D0000012345678|CHUCK|BROWNSTOWN|123 WOODSON ROAD||ANYTOWN|USA|
111111|(111)111-1111||02/15/1970|F|111|||51|41.9|||28.4|JOESPH|M|UNIT|
03/13/2010||I|AAAA|BBBBBB|CCCCCCC|DDDDDDD|BLAH|B|C|DD"newline here"
.......
and so on field 2.....


Does that make sense??


File 1-->
D0001234|D00000123456||51|41.9|||28.4|JOE||DEPARTMENT|03/13/2010||I
D00012345|D0000012345678||51|41.9|||28.4|JOESPH|M|UNIT|03/13/2010||I
D0001234|D00000123456||51|41.9|||28.4||FRANK|UNIT|03/13/2010||I

File 2-->
D0001234|D00000123456|CHARLIE|BROWN|123 WOODS ROAD|APT 2110|ANYTOWN|
USA|111111|(111)111-1111||02/15/1952|M|111113
D0001234|D0000012345678|CHUCK|BROWNSTOWN|123 WOODSON ROAD||ANYTOWN|USA|
111111|(111)111-1111||02/15/1970|F|111

File 3-->
D00012345|D0000012345678|AAAA|BBBBBB|CCCCCCC|DDDDDDD
D0001234|D00000123456|KKKKK|YYYYYY|CC|DDD

File 4-->
D00012345|D0000012345678|BLAH|B|C|DD
D0001234|D00000123456|K|YY|CC|DDD
D0001234|D00000123456|KKKK|YYYY|CC|DDD

0
Cesear
3/17/2010 1:55:43 PM
Arjen Markus wrote:

> On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
>> I have 4 flat files where each field is separated by a pipe |.  In
>> each file the second field has a unique value that is in all the
>> files.  Each line is terminated by a newline.  I what to combine each
>> file and create one file.  In this one file, there should be one line
>> for each "unique" entry in that was found in the second file.  I was
>> able to do this in MS access by creating each file as table and
>> linking the "unique" field in the 2nd file to the other files.  I want
>> do this in Tcl, so I can automate the process.  Come some please point
>> me in a starting direction?  I know enough Tcl to get by, but not much
>> in the I/O region.  Any ideas or would be great!
>>
>> Thx!!
> 
> What you could do is:
> 
> while {[gets $infile line] } {
>     set fields   [split $line |]
>     set uniqueId [lindex $fields 1]
>     set data1($uniqueId) [lreplace $fields 1 1]
> }
> 
> (Same for the other files)
> 
> Now you have four arrays, data1, ... data4, that hold the
> non-unique information for each unique ID.
> 
> Joining them into one file:
> 
> foreach id [array names data1] {
>     puts $outfile [join [concat $id $data1($id) $data2($id)
> $data3($id) $data4($id)] |]
> }
> 
> Or code along those lines - this is mostly a sketch.
> 
> Regards,
> 
> Arjen

If all you want to do is to manipulate file contents, then certainly Arjen's
example (with the small correction) is a straight forward way to accomplish
that. But the original post referred to MS access, and so if you want to
actually reason on the data, you could consider casting the file data into
relational terms where the reasoning operations are much easier to formulate.
Either SQLite or TclRAL could be brought to task for that.

-- 
Andrew Mangogna
0
Andrew
3/17/2010 2:18:42 PM
On Mar 17, 10:18=A0am, Andrew Mangogna <amango...@mindspring.com> wrote:
> Arjen Markus wrote:
> > On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
> >> I have 4 flat files where each field is separated by a pipe |. =A0In
> >> each file the second field has a unique value that is in all the
> >> files. =A0Each line is terminated by a newline. =A0I what to combine e=
ach
> >> file and create one file. =A0In this one file, there should be one lin=
e
> >> for each "unique" entry in that was found in the second file. =A0I was
> >> able to do this in MS access by creating each file as table and
> >> linking the "unique" field in the 2nd file to the other files. =A0I wa=
nt
> >> do this in Tcl, so I can automate the process. =A0Come some please poi=
nt
> >> me in a starting direction? =A0I know enough Tcl to get by, but not mu=
ch
> >> in the I/O region. =A0Any ideas or would be great!
>
> >> Thx!!
>
> > What you could do is:
>
> > while {[gets $infile line] } {
> > =A0 =A0 set fields =A0 [split $line |]
> > =A0 =A0 set uniqueId [lindex $fields 1]
> > =A0 =A0 set data1($uniqueId) [lreplace $fields 1 1]
> > }
>
> > (Same for the other files)
>
> > Now you have four arrays, data1, ... data4, that hold the
> > non-unique information for each unique ID.
>
> > Joining them into one file:
>
> > foreach id [array names data1] {
> > =A0 =A0 puts $outfile [join [concat $id $data1($id) $data2($id)
> > $data3($id) $data4($id)] |]
> > }
>
> > Or code along those lines - this is mostly a sketch.
>
> > Regards,
>
> > Arjen
>
> If all you want to do is to manipulate file contents, then certainly Arje=
n's
> example (with the small correction) is a straight forward way to accompli=
sh
> that. But the original post referred to MS access, and so if you want to
> actually reason on the data, you could consider casting the file data int=
o
> relational terms where the reasoning operations are much easier to formul=
ate.
> Either SQLite or TclRAL could be brought to task for that.
>
> --
> Andrew Mangogna

What small correction from Arjen post are you referring too?
0
Cesear
3/17/2010 2:41:51 PM
On Mar 17, 10:18=A0am, Andrew Mangogna <amango...@mindspring.com> wrote:
> Arjen Markus wrote:
> > On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
> >> I have 4 flat files where each field is separated by a pipe |. =A0In
> >> each file the second field has a unique value that is in all the
> >> files. =A0Each line is terminated by a newline. =A0I what to combine e=
ach
> >> file and create one file. =A0In this one file, there should be one lin=
e
> >> for each "unique" entry in that was found in the second file. =A0I was
> >> able to do this in MS access by creating each file as table and
> >> linking the "unique" field in the 2nd file to the other files. =A0I wa=
nt
> >> do this in Tcl, so I can automate the process. =A0Come some please poi=
nt
> >> me in a starting direction? =A0I know enough Tcl to get by, but not mu=
ch
> >> in the I/O region. =A0Any ideas or would be great!
>
> >> Thx!!
>
> > What you could do is:
>
> > while {[gets $infile line] } {
> > =A0 =A0 set fields =A0 [split $line |]
> > =A0 =A0 set uniqueId [lindex $fields 1]
> > =A0 =A0 set data1($uniqueId) [lreplace $fields 1 1]
> > }
>
> > (Same for the other files)
>
> > Now you have four arrays, data1, ... data4, that hold the
> > non-unique information for each unique ID.
>
> > Joining them into one file:
>
> > foreach id [array names data1] {
> > =A0 =A0 puts $outfile [join [concat $id $data1($id) $data2($id)
> > $data3($id) $data4($id)] |]
> > }
>
> > Or code along those lines - this is mostly a sketch.
>
> > Regards,
>
> > Arjen
>
> If all you want to do is to manipulate file contents, then certainly Arje=
n's
> example (with the small correction) is a straight forward way to accompli=
sh
> that. But the original post referred to MS access, and so if you want to
> actually reason on the data, you could consider casting the file data int=
o
> relational terms where the reasoning operations are much easier to formul=
ate.
> Either SQLite or TclRAL could be brought to task for that.
>
> --
> Andrew Mangogna

What small correction from Arjen post are you referring too?
0
Cesear
3/17/2010 2:42:00 PM
On 17 mrt, 15:42, Cesear <ces...@gmail.com> wrote:
> On Mar 17, 10:18=A0am, Andrew Mangogna <amango...@mindspring.com> wrote:
>
>
>
>
>
> > Arjen Markus wrote:
> > > On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
> > >> I have 4 flat files where each field is separated by a pipe |. =A0In
> > >> each file the second field has a unique value that is in all the
> > >> files. =A0Each line is terminated by a newline. =A0I what to combine=
 each
> > >> file and create one file. =A0In this one file, there should be one l=
ine
> > >> for each "unique" entry in that was found in the second file. =A0I w=
as
> > >> able to do this in MS access by creating each file as table and
> > >> linking the "unique" field in the 2nd file to the other files. =A0I =
want
> > >> do this in Tcl, so I can automate the process. =A0Come some please p=
oint
> > >> me in a starting direction? =A0I know enough Tcl to get by, but not =
much
> > >> in the I/O region. =A0Any ideas or would be great!
>
> > >> Thx!!
>
> > > What you could do is:
>
> > > while {[gets $infile line] } {
> > > =A0 =A0 set fields =A0 [split $line |]
> > > =A0 =A0 set uniqueId [lindex $fields 1]
> > > =A0 =A0 set data1($uniqueId) [lreplace $fields 1 1]
> > > }
>
> > > (Same for the other files)
>
> > > Now you have four arrays, data1, ... data4, that hold the
> > > non-unique information for each unique ID.
>
> > > Joining them into one file:
>
> > > foreach id [array names data1] {
> > > =A0 =A0 puts $outfile [join [concat $id $data1($id) $data2($id)
> > > $data3($id) $data4($id)] |]
> > > }
>
> > > Or code along those lines - this is mostly a sketch.
>
> > > Regards,
>
> > > Arjen
>
> > If all you want to do is to manipulate file contents, then certainly Ar=
jen's
> > example (with the small correction) is a straight forward way to accomp=
lish
> > that. But the original post referred to MS access, and so if you want t=
o
> > actually reason on the data, you could consider casting the file data i=
nto
> > relational terms where the reasoning operations are much easier to form=
ulate.
> > Either SQLite or TclRAL could be brought to task for that.
>
> > --
> > Andrew Mangogna
>
> What small correction from Arjen post are you referring too?- Tekst uit o=
orspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

The condition for terminating the loop:

while { [gets $infile line] >=3D 0 } { ... }

(Indeed, Andrew has a much more sophisticated and flexible solution
for you)

Regards,

Arjen
0
Arjen
3/17/2010 2:49:13 PM
On Mar 17, 10:18=A0am, Andrew Mangogna <amango...@mindspring.com> wrote:
> Arjen Markus wrote:
> > On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
> >> I have 4 flat files where each field is separated by a pipe |. =A0In
> >> each file the second field has a unique value that is in all the
> >> files. =A0Each line is terminated by a newline. =A0I what to combine e=
ach
> >> file and create one file. =A0In this one file, there should be one lin=
e
> >> for each "unique" entry in that was found in the second file. =A0I was
> >> able to do this in MS access by creating each file as table and
> >> linking the "unique" field in the 2nd file to the other files. =A0I wa=
nt
> >> do this in Tcl, so I can automate the process. =A0Come some please poi=
nt
> >> me in a starting direction? =A0I know enough Tcl to get by, but not mu=
ch
> >> in the I/O region. =A0Any ideas or would be great!
>
> >> Thx!!
>
> > What you could do is:
>
> > while {[gets $infile line] } {
> > =A0 =A0 set fields =A0 [split $line |]
> > =A0 =A0 set uniqueId [lindex $fields 1]
> > =A0 =A0 set data1($uniqueId) [lreplace $fields 1 1]
> > }
>
> > (Same for the other files)
>
> > Now you have four arrays, data1, ... data4, that hold the
> > non-unique information for each unique ID.
>
> > Joining them into one file:
>
> > foreach id [array names data1] {
> > =A0 =A0 puts $outfile [join [concat $id $data1($id) $data2($id)
> > $data3($id) $data4($id)] |]
> > }
>
> > Or code along those lines - this is mostly a sketch.
>
> > Regards,
>
> > Arjen
>
> If all you want to do is to manipulate file contents, then certainly Arje=
n's
> example (with the small correction) is a straight forward way to accompli=
sh
> that. But the original post referred to MS access, and so if you want to
> actually reason on the data, you could consider casting the file data int=
o
> relational terms where the reasoning operations are much easier to formul=
ate.
> Either SQLite or TclRAL could be brought to task for that.
>
> --
> Andrew Mangogna

What small correction from Arjen post are you referring too?
0
Cesear
3/17/2010 2:58:22 PM
On Mar 17, 2:55=A0pm, Cesear <ces...@gmail.com> wrote:
> On Mar 17, 9:16=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>
>
>
>
>
> > On Mar 17, 1:47=A0pm, Cesear <ces...@gmail.com> wrote:
>
> > > I have 4 flat files where each field is separated by a pipe |. =A0In
> > > each file the second field has a unique value that is in all the
> > > files. =A0Each line is terminated by a newline. =A0I what to combine =
each
> > > file and create one file. =A0In this one file, there should be one li=
ne
> > > for each "unique" entry in that was found in the second file. =A0I wa=
s
> > > able to do this in MS access by creating each file as table and
> > > linking the "unique" field in the 2nd file to the other files. =A0I w=
ant
> > > do this in Tcl, so I can automate the process. =A0Come some please po=
int
> > > me in a starting direction? =A0I know enough Tcl to get by, but not m=
uch
> > > in the I/O region. =A0Any ideas or would be great!
>
> > Your description is a bit unclear, please provide an example.
>
> > -Alex
>
> Here is an example: =A0In File 2, second field is unique, to the other 3
> files. =A0In File 2, field 2 could occur in multiple lines of the file.
> I want to combine all 4 files into ONE file FOREACH instance of field
> 2 in File 2. =A0Each instance should occur as ONE line in the combined
> one File. =A0Also, I ONLY need to have field 1 and field 2 occur once in
> the combine file. =A0So the grand outlook for "D0000012345678" (From
> File 2) the one line would be this-->
> [...]
> Does that make sense??

No. You're using file indices inconsistently, and words like "unique"
are really ambiguous.

So, *please*, don't write a 3rd explanation in English, but just one
complete example with both input and wanted output files.

-Alex
0
Alexandre
3/17/2010 5:33:16 PM
On Mar 17, 9:02=A0am, Arjen Markus <arjen.markus...@gmail.com> wrote:
> On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
>
> > I have 4 flat files where each field is separated by a pipe |. =A0In
> > each file the second field has a unique value that is in all the
> > files. =A0Each line is terminated by a newline. =A0I what to combine ea=
ch
> > file and create one file. =A0In this one file, there should be one line
> > for each "unique" entry in that was found in the second file. =A0I was
> > able to do this in MS access by creating each file as table and
> > linking the "unique" field in the 2nd file to the other files. =A0I wan=
t
> > do this in Tcl, so I can automate the process. =A0Come some please poin=
t
> > me in a starting direction? =A0I know enough Tcl to get by, but not muc=
h
> > in the I/O region. =A0Any ideas or would be great!
>
> > Thx!!
>
> What you could do is:
>
> while {[gets $infile line] } {
> =A0 =A0 set fields =A0 [split $line |]
> =A0 =A0 set uniqueId [lindex $fields 1]
> =A0 =A0 set data1($uniqueId) [lreplace $fields 1 1]
>
> }
>
> (Same for the other files)
>
> Now you have four arrays, data1, ... data4, that hold the
> non-unique information for each unique ID.
>
> Joining them into one file:
>
> foreach id [array names data1] {
> =A0 =A0 puts $outfile [join [concat $id $data1($id) $data2($id)
> $data3($id) $data4($id)] |]
>
> }
>
> Or code along those lines - this is mostly a sketch.
>
> Regards,
>
> Arjen

Arjen u code works well, but I need to modify it some.  I need to be
able to join the final file and only show the unqiueid at the
beginning of the row.  How can I do that?
0
Cesear
3/18/2010 12:25:07 PM
On Mar 17, 9:02=A0am, Arjen Markus <arjen.markus...@gmail.com> wrote:
> On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
>
> > I have 4 flat files where each field is separated by a pipe |. =A0In
> > each file the second field has a unique value that is in all the
> > files. =A0Each line is terminated by a newline. =A0I what to combine ea=
ch
> > file and create one file. =A0In this one file, there should be one line
> > for each "unique" entry in that was found in the second file. =A0I was
> > able to do this in MS access by creating each file as table and
> > linking the "unique" field in the 2nd file to the other files. =A0I wan=
t
> > do this in Tcl, so I can automate the process. =A0Come some please poin=
t
> > me in a starting direction? =A0I know enough Tcl to get by, but not muc=
h
> > in the I/O region. =A0Any ideas or would be great!
>
> > Thx!!
>
> What you could do is:
>
> while {[gets $infile line] } {
> =A0 =A0 set fields =A0 [split $line |]
> =A0 =A0 set uniqueId [lindex $fields 1]
> =A0 =A0 set data1($uniqueId) [lreplace $fields 1 1]
>
> }
>
> (Same for the other files)
>
> Now you have four arrays, data1, ... data4, that hold the
> non-unique information for each unique ID.
>
> Joining them into one file:
>
> foreach id [array names data1] {
> =A0 =A0 puts $outfile [join [concat $id $data1($id) $data2($id)
> $data3($id) $data4($id)] |]
>
> }
>
> Or code along those lines - this is mostly a sketch.
>
> Regards,
>
> Arjen

Arjen u code works well, but I need to modify it some.  I need to be
able to join the final file and only show the unqiueid at the
beginning of the row.  How can I do that?
0
Cesear
3/18/2010 12:25:17 PM
On Mar 17, 9:02=A0am, Arjen Markus <arjen.markus...@gmail.com> wrote:
> On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
>
> > I have 4 flat files where each field is separated by a pipe |. =A0In
> > each file the second field has a unique value that is in all the
> > files. =A0Each line is terminated by a newline. =A0I what to combine ea=
ch
> > file and create one file. =A0In this one file, there should be one line
> > for each "unique" entry in that was found in the second file. =A0I was
> > able to do this in MS access by creating each file as table and
> > linking the "unique" field in the 2nd file to the other files. =A0I wan=
t
> > do this in Tcl, so I can automate the process. =A0Come some please poin=
t
> > me in a starting direction? =A0I know enough Tcl to get by, but not muc=
h
> > in the I/O region. =A0Any ideas or would be great!
>
> > Thx!!
>
> What you could do is:
>
> while {[gets $infile line] } {
> =A0 =A0 set fields =A0 [split $line |]
> =A0 =A0 set uniqueId [lindex $fields 1]
> =A0 =A0 set data1($uniqueId) [lreplace $fields 1 1]
>
> }
>
> (Same for the other files)
>
> Now you have four arrays, data1, ... data4, that hold the
> non-unique information for each unique ID.
>
> Joining them into one file:
>
> foreach id [array names data1] {
> =A0 =A0 puts $outfile [join [concat $id $data1($id) $data2($id)
> $data3($id) $data4($id)] |]
>
> }
>
> Or code along those lines - this is mostly a sketch.
>
> Regards,
>
> Arjen

Arjen u code works well, but I need to modify it some.  I need to be
able to join the final file and only show the unqiueid at the
beginning of the row.  How can I do that?
0
Cesear
3/18/2010 12:25:51 PM
On Mar 17, 9:02=A0am, Arjen Markus <arjen.markus...@gmail.com> wrote:
> On 17 mrt, 13:47, Cesear <ces...@gmail.com> wrote:
>
> > I have 4 flat files where each field is separated by a pipe |. =A0In
> > each file the second field has a unique value that is in all the
> > files. =A0Each line is terminated by a newline. =A0I what to combine ea=
ch
> > file and create one file. =A0In this one file, there should be one line
> > for each "unique" entry in that was found in the second file. =A0I was
> > able to do this in MS access by creating each file as table and
> > linking the "unique" field in the 2nd file to the other files. =A0I wan=
t
> > do this in Tcl, so I can automate the process. =A0Come some please poin=
t
> > me in a starting direction? =A0I know enough Tcl to get by, but not muc=
h
> > in the I/O region. =A0Any ideas or would be great!
>
> > Thx!!
>
> What you could do is:
>
> while {[gets $infile line] } {
> =A0 =A0 set fields =A0 [split $line |]
> =A0 =A0 set uniqueId [lindex $fields 1]
> =A0 =A0 set data1($uniqueId) [lreplace $fields 1 1]
>
> }
>
> (Same for the other files)
>
> Now you have four arrays, data1, ... data4, that hold the
> non-unique information for each unique ID.
>
> Joining them into one file:
>
> foreach id [array names data1] {
> =A0 =A0 puts $outfile [join [concat $id $data1($id) $data2($id)
> $data3($id) $data4($id)] |]
>
> }
>
> Or code along those lines - this is mostly a sketch.
>
> Regards,
>
> Arjen

Arjen u code works well, but I need to modify it some.  I need to be
able to join the final file and only show the unqiueid at the
beginning of the row.  How can I do that?
0
Cesear
3/18/2010 12:52:17 PM
On Mar 18, 1:52=A0pm, Cesear <ces...@gmail.com> wrote:
>
> Arjen u code works well, but I need to modify it some. =A0I need to be
> able to join the final file and only show the unqiueid at the
> beginning of the row. =A0How can I do that?

Instead of repeating the question 4 times, just post the full example
with inputs *and* wanted output.

-Alex

0
Alexandre
3/18/2010 2:32:59 PM
On Mar 18, 10:32=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> On Mar 18, 1:52=A0pm, Cesear <ces...@gmail.com> wrote:
>
>
>
> > Arjen u code works well, but I need to modify it some. =A0I need to be
> > able to join the final file and only show the unqiueid at the
> > beginning of the row. =A0How can I do that?
>
> Instead of repeating the question 4 times, just post the full example
> with inputs *and* wanted output.
>
> -Alex

Sorry must have hit the reply too many times  :)  Ok here is what I
want, just data no English :) --->>>

File ONE-->

V0100|NAME1|blah|blah|
V0102|NAME2|blah|blah|

File TWO-->

V0100|NAME1|chargeX|blah|
V0100|NAME1|chargeY|blah|
V0102|NAME2|chargeX|blah|
V0102|NAME2|chargeY|blah|
V0100|NAME1|blahcharge|blah|
V0102|NAME2|blahcahrge|blah|

The FINAL OUTPUT, I want to look like this-->

V0100|NAME1|blah|blah|chargeX|blah|
V0100|NAME1|blah|blah|chargeY|blah|
V0102|NAME2|blah|blah|chargeX|blah|
V0102|NAME2|blah|blah|chargeY|blah|
V0100|NAME1|blah|blah|blahcharge|blah|
V0102|NAME2|blah|blah|blahcahrge|blah|

0
Cesear
3/18/2010 3:12:19 PM
On Mar 18, 10:32=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> On Mar 18, 1:52=A0pm, Cesear <ces...@gmail.com> wrote:
>
>
>
> > Arjen u code works well, but I need to modify it some. =A0I need to be
> > able to join the final file and only show the unqiueid at the
> > beginning of the row. =A0How can I do that?
>
> Instead of repeating the question 4 times, just post the full example
> with inputs *and* wanted output.
>
> -Alex

Sorry must have hit the reply too many times  :)  Ok here is what I
want, just data no English :) --->>>

File ONE-->

V0100|NAME1|blah|blah|
V0102|NAME2|blah|blah|

File TWO-->

V0100|NAME1|chargeX|blah|
V0100|NAME1|chargeY|blah|
V0102|NAME2|chargeX|blah|
V0102|NAME2|chargeY|blah|
V0100|NAME1|blahcharge|blah|
V0102|NAME2|blahcahrge|blah|

The FINAL OUTPUT, I want to look like this-->

V0100|NAME1|blah|blah|chargeX|blah|
V0100|NAME1|blah|blah|chargeY|blah|
V0102|NAME2|blah|blah|chargeX|blah|
V0102|NAME2|blah|blah|chargeY|blah|
V0100|NAME1|blah|blah|blahcharge|blah|
V0102|NAME2|blah|blah|blahcahrge|blah|

0
Cesear
3/18/2010 3:12:36 PM
On Mar 18, 10:32=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> On Mar 18, 1:52=A0pm, Cesear <ces...@gmail.com> wrote:
>
>
>
> > Arjen u code works well, but I need to modify it some. =A0I need to be
> > able to join the final file and only show the unqiueid at the
> > beginning of the row. =A0How can I do that?
>
> Instead of repeating the question 4 times, just post the full example
> with inputs *and* wanted output.
>
> -Alex

Sorry must have hit the reply too many times  :)  Ok here is what I
want, just data no English :) --->>>

File ONE-->

V0100|NAME1|blah|blah|
V0102|NAME2|blah|blah|

File TWO-->

V0100|NAME1|chargeX|blah|
V0100|NAME1|chargeY|blah|
V0102|NAME2|chargeX|blah|
V0102|NAME2|chargeY|blah|
V0100|NAME1|blahcharge|blah|
V0102|NAME2|blahcahrge|blah|

The FINAL OUTPUT, I want to look like this-->

V0100|NAME1|blah|blah|chargeX|blah|
V0100|NAME1|blah|blah|chargeY|blah|
V0102|NAME2|blah|blah|chargeX|blah|
V0102|NAME2|blah|blah|chargeY|blah|
V0100|NAME1|blah|blah|blahcharge|blah|
V0102|NAME2|blah|blah|blahcahrge|blah|

0
Cesear
3/18/2010 3:27:44 PM
On Mar 18, 10:32=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:
> On Mar 18, 1:52=A0pm, Cesear <ces...@gmail.com> wrote:
>
>
>
> > Arjen u code works well, but I need to modify it some. =A0I need to be
> > able to join the final file and only show the unqiueid at the
> > beginning of the row. =A0How can I do that?
>
> Instead of repeating the question 4 times, just post the full example
> with inputs *and* wanted output.
>
> -Alex

Sorry must have hit the reply too many times  :)  Ok here is what I
want, just data no English :) --->>>

File ONE-->

V0100|NAME1|blah|blah|
V0102|NAME2|blah|blah|

File TWO-->

V0100|NAME1|chargeX|blah|
V0100|NAME1|chargeY|blah|
V0102|NAME2|chargeX|blah|
V0102|NAME2|chargeY|blah|
V0100|NAME1|blahcharge|blah|
V0102|NAME2|blahcahrge|blah|

The FINAL OUTPUT, I want to look like this-->

V0100|NAME1|blah|blah|chargeX|blah|
V0100|NAME1|blah|blah|chargeY|blah|
V0102|NAME2|blah|blah|chargeX|blah|
V0102|NAME2|blah|blah|chargeY|blah|
V0100|NAME1|blah|blah|blahcharge|blah|
V0102|NAME2|blah|blah|blahcahrge|blah|

0
Cesear
3/18/2010 3:51:23 PM
On Mar 18, 10:32=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

>
> Instead of repeating the question 4 times, just post the full example
> with inputs *and* wanted output.


I don't understand why, but I've been seeing repeating messages on
several usenet groups this week. While it could be user error, I am
beginning to suspect some sort of tech issue.

0
Larry
3/18/2010 4:44:45 PM
On Mar 18, 12:44=A0pm, "Larry W. Virden" <lvir...@gmail.com> wrote:
> On Mar 18, 10:32=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>
>
>
> > Instead of repeating the question 4 times, just post the full example
> > with inputs *and* wanted output.
>
> I don't understand why, but I've been seeing repeating messages on
> several usenet groups this week. While it could be user error, I am
> beginning to suspect some sort of tech issue.

I only hit the send once!!
0
Cesear
3/18/2010 6:53:26 PM
On Mar 18, 4:12=A0pm, Cesear <ces...@gmail.com> wrote:
> On Mar 18, 10:32=A0am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
> wrote:
>
> > On Mar 18, 1:52=A0pm, Cesear <ces...@gmail.com> wrote:
>
> > > Arjen u code works well, but I need to modify it some. =A0I need to b=
e
> > > able to join the final file and only show the unqiueid at the
> > > beginning of the row. =A0How can I do that?
>
> > Instead of repeating the question 4 times, just post the full example
> > with inputs *and* wanted output.
>
> > -Alex
>
> Sorry must have hit the reply too many times =A0:) =A0Ok here is what I
> want, just data no English :) --->>>
>
> File ONE-->
>
> V0100|NAME1|blah|blah|
> V0102|NAME2|blah|blah|
>
> File TWO-->
>
> V0100|NAME1|chargeX|blah|
> V0100|NAME1|chargeY|blah|
> V0102|NAME2|chargeX|blah|
> V0102|NAME2|chargeY|blah|
> V0100|NAME1|blahcharge|blah|
> V0102|NAME2|blahcahrge|blah|
>
> The FINAL OUTPUT, I want to look like this-->
>
> V0100|NAME1|blah|blah|chargeX|blah|
> V0100|NAME1|blah|blah|chargeY|blah|
> V0102|NAME2|blah|blah|chargeX|blah|
> V0102|NAME2|blah|blah|chargeY|blah|
> V0100|NAME1|blah|blah|blahcharge|blah|
> V0102|NAME2|blah|blah|blahcahrge|blah|

Ah, what you're after is called a "join" in database circles.
It is roughly a "diagonal" hyperplane in the cartesian product of the
input.

Now, to compute it, you can first use simple unix commands: 'sort' and
'join'.
If you accept the result to be sorted on the joined field (2nd field
in your example):

  sort -t \| -k 2,2 FILE1 > tmp1
  sort -t \| -k 2,2 FILE2 > tmp2
  join -t \| -j 2 tmp1 tmp2 > tmp3

note that the fields are not exactly in the order you want. To get
them right:

  awk -F \| '{print $2,$1,$3,$4,$7,$8}'  OFS=3D\| < tmp3 > OUTPUT


Now, since this is comp.lang.tcl, you can also do it in Tcl of
course ;-)
The idea is to build an internal "lookup table" based on FILE1, keyed
on 2nd field:

  set ff [open FILE1 r]
  while {[gets $ff line]>=3D0} {
    set key [lindex [split $line |] 1]
    set tab($key) $line
  }
  close $ff

Then you just scan the remaining inputs, and concatenate each line
with the lookup result:

  set ff [open FILE2 r]
  while {[gets $ff line]>=3D0} {
    set key [lindex [split $line |] 1]
    puts [join [concat [split $tab($key) |] [lrange [split $line |] 2
end]] |]
  }

-Alex







0
Alexandre
3/18/2010 10:06:59 PM
Reply: