I'm optimizing the inner most loop of my script. I need to convert month
name to month number. I'm using python 2.6 on linux x64.
month_dict = {"Jan":1,"Feb":2,"Mar":3,"Apr":4, "May":5, "Jun":6,
"Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12}
def to_dict(name):
return month_dict[name]
def to_if(name):
if name == "Jan": return 1
elif name == "Feb": return 2
elif name == "Mar": return 3
elif name == "Apr": return 4
elif name == "May": return 5
elif name == "Jun": return 6
elif name == "Jul": return 7
elif name == "Aug": return 8
elif name == "Sep": return 9
elif name == "Oct": return 10
elif name == "Nov": return 11
elif name == "Dec": return 12
else: raise ValueError
import random
l = [random.choice(month_dict.keys()) for _ in range(1000000)]
from time import time
t = time(); xxx=map(to_dict,l); print time() - t # 0.5
t = time(); xxx=map(to_if,l); print time() - t # 1.0
is there a faster solution? Maybe something with str.translate?
The problem is a little different because I don't read random data, but
sorted data. For example:
l = [x for x in
("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
for _ in range(1000)] # ["Jan","Jan", ..., "Feb", "Feb", ...]
so maybe the to_if approach will be faster if I write the case in the best
order. Look:
l = ["Jan"] * 1000000 # to_if is in the best order for "Jan"
t = time(); xxx=map(to_dict,l); print time() - t # 0.5
t = time(); xxx=map(to_if,l); print time() - t # 0.5
|
|
0
|
|
|
|
Reply
|
gtu2003 (11)
|
1/6/2010 11:03:36 AM |
|
=20
How about using list.index() and storing month names in a list? You may
want to measure performance your self and conclude.
Regards,
Ashish Vyas
-----Original Message-----
From: python-list-bounces+ntb837=3Dmotorola.com@python.org
[mailto:python-list-bounces+ntb837=3Dmotorola.com@python.org] On Behalf =
Of
wiso
Sent: Wednesday, January 06, 2010 4:34 PM
To: python-list@python.org
Subject: Convert month name to month number faster
I'm optimizing the inner most loop of my script. I need to convert month
name to month number. I'm using python 2.6 on linux x64.
month_dict =3D {"Jan":1,"Feb":2,"Mar":3,"Apr":4, "May":5, "Jun":6,
"Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12}
def to_dict(name):
return month_dict[name]
def to_if(name):
if name =3D=3D "Jan": return 1
elif name =3D=3D "Feb": return 2
elif name =3D=3D "Mar": return 3
elif name =3D=3D "Apr": return 4
elif name =3D=3D "May": return 5
elif name =3D=3D "Jun": return 6
elif name =3D=3D "Jul": return 7
elif name =3D=3D "Aug": return 8
elif name =3D=3D "Sep": return 9
elif name =3D=3D "Oct": return 10
elif name =3D=3D "Nov": return 11
elif name =3D=3D "Dec": return 12
else: raise ValueError
import random
l =3D [random.choice(month_dict.keys()) for _ in range(1000000)]
from time import time
t =3D time(); xxx=3Dmap(to_dict,l); print time() - t # 0.5
t =3D time(); xxx=3Dmap(to_if,l); print time() - t # 1.0
is there a faster solution? Maybe something with str.translate?
The problem is a little different because I don't read random data, but=20
sorted data. For example:
l =3D [x for x in=20
("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"
)=20
for _ in range(1000)] # ["Jan","Jan", ..., "Feb", "Feb", ...]
so maybe the to_if approach will be faster if I write the case in the
best=20
order. Look:
l =3D ["Jan"] * 1000000 # to_if is in the best order for "Jan"
t =3D time(); xxx=3Dmap(to_dict,l); print time() - t # 0.5
t =3D time(); xxx=3Dmap(to_if,l); print time() - t # 0.5
--=20
http://mail.python.org/mailman/listinfo/python-list
|
|
0
|
|
|
|
Reply
|
VYAS
|
1/6/2010 11:14:04 AM
|
|
Le Wed, 06 Jan 2010 12:03:36 +0100, wiso a écrit :
> from time import time
> t = time(); xxx=map(to_dict,l); print time() - t # 0.5 t = time();
> xxx=map(to_if,l); print time() - t # 1.0
Don't define your own function just for attribute access. Instead just
write:
xxx = map(month_dict.__getitem__, l)
|
|
1
|
|
|
|
Reply
|
Antoine
|
1/6/2010 11:53:46 AM
|
|
On Jan 6, 9:03=A0pm, wiso <gtu2...@alice.it> wrote:
> I'm optimizing the inner most loop of my script. I need to convert month
> name to month number. I'm using python 2.6 on linux x64.
>
> month_dict =3D {"Jan":1,"Feb":2,"Mar":3,"Apr":4, "May":5, "Jun":6,
> =A0 =A0 =A0 =A0 =A0 =A0"Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12=
}
>
> def to_dict(name):
> =A0 return month_dict[name]
Try replacing the to_dict function with:
to_dict =3D month_dict.get
That removes one extra function call per lookup. On my computer, this
reduces the time for your test from 0.26 to 0.09.
|
|
0
|
|
|
|
Reply
|
alex23
|
1/6/2010 11:58:39 AM
|
|
Antoine Pitrou wrote:
> Le Wed, 06 Jan 2010 12:03:36 +0100, wiso a écrit :
>
>
>> from time import time
>> t = time(); xxx=map(to_dict,l); print time() - t # 0.5 t = time();
>> xxx=map(to_if,l); print time() - t # 1.0
>
> Don't define your own function just for attribute access. Instead just
> write:
>
> xxx = map(month_dict.__getitem__, l)
t = time(); xxx=map(month_dict.__getitem__,l); print time() - t # 0.2
month_list =
("","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
t = time(); xxx=map(month_list.index,l); time() - t # 0.6
|
|
0
|
|
|
|
Reply
|
wiso
|
1/6/2010 12:03:38 PM
|
|
On Wed, 06 Jan 2010 12:03:36 +0100, wiso wrote:
> I'm optimizing the inner most loop of my script. I need to convert month
> name to month number. I'm using python 2.6 on linux x64.
According to your own figures below, it takes less than a nanosecond per
lookup, at worst, even using a remarkably inefficient technique. Are you
trying to tell us that this is the bottleneck in your script? I'm sorry,
I find that implausible. I think you're wasting your time trying to
optimise something that doesn't need optimizing.
Even if you halve the time, and deal with a million data points each time
you run your script, you will only save half a second per run. I can see
from the times you posted that you've spent at least an hour trying to
optimise this. To make up for that one hour, you will need to run your
script 7200 times, before you see *any* time savings at all.
> month_dict = {"Jan":1,"Feb":2,"Mar":3,"Apr":4, "May":5, "Jun":6,
> "Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12}
>
> def to_dict(name):
> return month_dict[name]
This leads to a pointless function call. Just call month_dict[name]
instead of calling a function that calls it.
> def to_if(name):
> if name == "Jan": return 1
> elif name == "Feb": return 2
> elif name == "Mar": return 3
> elif name == "Apr": return 4
> elif name == "May": return 5
> elif name == "Jun": return 6
> elif name == "Jul": return 7
> elif name == "Aug": return 8
> elif name == "Sep": return 9
> elif name == "Oct": return 10
> elif name == "Nov": return 11
> elif name == "Dec": return 12
> else: raise ValueError
That is remarkably awful.
> import random
> l = [random.choice(month_dict.keys()) for _ in range(1000000)]
>
> from time import time
> t = time(); xxx=map(to_dict,l); print time() - t # 0.5
> t = time(); xxx=map(to_if,l); print time() - t # 1.0
This is not a reliable way to do timings. You should use the timeit
module.
> is there a faster solution? Maybe something with str.translate?
What makes you think str.translate is even remotely useful for this?
--
Steven
|
|
0
|
|
|
|
Reply
|
Steven
|
1/6/2010 12:48:32 PM
|
|
|
5 Replies
976 Views
(page loaded in 0.116 seconds)
|