f



Python3, column names from array - numpy or pandas

I have a dataset in the below format.

id	A	B	C	D	E
100	1	0	0	0	0
101	0	1	1	0	0
102	1	0	0	0	0
103	0	0	0	1	1

I would like to convert this into below:
100, A
101, B C
102, A
103, D E

How do I do this ? I tried numpy argsort but I am new to Python and finding this challenging.
Appreciate any help in this.
0
renjith
12/15/2016 1:56:10 AM
comp.lang.python 77058 articles. 6 followers. Post Follow

3 Replies
672 Views

Similar Articles

[PageSpeed] 51

You can do this with pandas:

    import pandas as pd
    from io import StringIO

    io = StringIO('''\
    id        A        B        C        D        E 
    100        1        0        0        0        0 
    101        0        1        1        0        0 
    102        1        0        0        0        0 
    103        0        0        0        1        1 
    ''')

    df = pd.read_csv(io, sep='\s+', index_col='id')
    val = df.apply(lambda row: ' '.join(df.columns[row==1]), axis=1)

Google for "boolean indexing" to see what's going on :)
0
Miki
12/15/2016 5:04:41 AM
On 15/12/16 01:56, renjith madhavan wrote:
> I have a dataset in the below format.
>
> id	A	B	C	D	E
> 100	1	0	0	0	0
> 101	0	1	1	0	0
> 102	1	0	0	0	0
> 103	0	0	0	1	1
>
> I would like to convert this into below:
> 100, A
> 101, B C
> 102, A
> 103, D E
>
> How do I do this ? I tried numpy argsort but I am new to Python and finding this challenging.
> Appreciate any help in this.
>

Numpy or pandas?  Neither, this is a straightforward bit of text 
manipulation you can do without needing to import anything.  I wouldn't 
bother considering either unless your dataset is massive and speed is 
anything of an issue.

with open("data.txt") as datafile:
     # First line needs handling separately
     line = next(datafile)
     columns = line.split()[1:]
     # Now iterate through the rest
     for line in datafile:
         results = []
         for col, val in zip(columns, line.split()[1:]:
              if val == "1":
                  results.append(col)
         print("{0}, {1}".format(data[0], " ".join(results)))

Obviously there's no defensive coding for blank lines or unexpected data 
in there, and if want to use the results later on you probably want to 
stash them in a dictionary, but that will do the job.

-- 
Rhodri James *-* Kynesim Ltd
0
Rhodri
12/15/2016 1:51:59 PM
Thank you for the reply. 
I tried that, I am trying to do this.

The context is I am trying to find mapk ( k = 3 ) for this list.
A, B , C, D and E are product names.

If I am trying manually I will do something like this.

TRUTH = [[A], [B,C], [A], [D,E]]
and if my prediction is :
PRED=[[B,A, D], [A,C,B], [A,B,C], [B,D,E]]

map3(truth, pred, 3)

 
How do I convert my input truth values in the TRUTH format.
Should I be looking for "boolean indexing" for this case.
id        A        B        C        D        E 
    100        1        0        0        0        0 
    101        0        1        1        0        0 
    102        1        0        0        0        0 
    103        0        0        0        1        1 
    ''') 
0
renjith
12/15/2016 1:56:41 PM
Reply: