Convert month name to month number faster

  • Follow


I'm optimizing the inner most loop of my script. I need to convert month 
name to month number. I'm using python 2.6 on linux x64.


month_dict = {"Jan":1,"Feb":2,"Mar":3,"Apr":4, "May":5, "Jun":6,
	   "Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12}

def to_dict(name):
  return month_dict[name]

def to_if(name):
    if name == "Jan": return 1
    elif name == "Feb": return 2
    elif name == "Mar": return 3
    elif name == "Apr": return 4
    elif name == "May": return 5
    elif name == "Jun": return 6
    elif name == "Jul": return 7
    elif name == "Aug": return 8
    elif name == "Sep": return 9
    elif name == "Oct": return 10
    elif name == "Nov": return 11
    elif name == "Dec": return 12
    else: raise ValueError

import random
l = [random.choice(month_dict.keys()) for _ in range(1000000)]

from time import time
t = time(); xxx=map(to_dict,l); print time() - t # 0.5
t = time(); xxx=map(to_if,l); print time() - t   # 1.0


is there a faster solution? Maybe something with str.translate?

The problem is a little different because I don't read random data, but 
sorted data. For example:

l = [x for x in 
("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec") 
for _ in range(1000)] # ["Jan","Jan", ..., "Feb", "Feb", ...]

so maybe the to_if approach will be faster if I write the case in the best 
order. Look:

l = ["Jan"] * 1000000 # to_if is in the best order for "Jan"
t = time(); xxx=map(to_dict,l); print time() - t # 0.5
t = time(); xxx=map(to_if,l); print time() - t # 0.5


0
Reply gtu2003 (11) 1/6/2010 11:03:36 AM

=20
How about using list.index() and storing month names in a list? You may
want to measure performance your self and conclude.

Regards,
Ashish Vyas

-----Original Message-----
From: python-list-bounces+ntb837=3Dmotorola.com@python.org
[mailto:python-list-bounces+ntb837=3Dmotorola.com@python.org] On Behalf =
Of
wiso
Sent: Wednesday, January 06, 2010 4:34 PM
To: python-list@python.org
Subject: Convert month name to month number faster

I'm optimizing the inner most loop of my script. I need to convert month
name to month number. I'm using python 2.6 on linux x64.


month_dict =3D {"Jan":1,"Feb":2,"Mar":3,"Apr":4, "May":5, "Jun":6,
	   "Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12}

def to_dict(name):
  return month_dict[name]

def to_if(name):
    if name =3D=3D "Jan": return 1
    elif name =3D=3D "Feb": return 2
    elif name =3D=3D "Mar": return 3
    elif name =3D=3D "Apr": return 4
    elif name =3D=3D "May": return 5
    elif name =3D=3D "Jun": return 6
    elif name =3D=3D "Jul": return 7
    elif name =3D=3D "Aug": return 8
    elif name =3D=3D "Sep": return 9
    elif name =3D=3D "Oct": return 10
    elif name =3D=3D "Nov": return 11
    elif name =3D=3D "Dec": return 12
    else: raise ValueError

import random
l =3D [random.choice(month_dict.keys()) for _ in range(1000000)]

from time import time
t =3D time(); xxx=3Dmap(to_dict,l); print time() - t # 0.5
t =3D time(); xxx=3Dmap(to_if,l); print time() - t   # 1.0


is there a faster solution? Maybe something with str.translate?

The problem is a little different because I don't read random data, but=20
sorted data. For example:

l =3D [x for x in=20
("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"
)=20
for _ in range(1000)] # ["Jan","Jan", ..., "Feb", "Feb", ...]

so maybe the to_if approach will be faster if I write the case in the
best=20
order. Look:

l =3D ["Jan"] * 1000000 # to_if is in the best order for "Jan"
t =3D time(); xxx=3Dmap(to_dict,l); print time() - t # 0.5
t =3D time(); xxx=3Dmap(to_if,l); print time() - t # 0.5


--=20
http://mail.python.org/mailman/listinfo/python-list
0
Reply VYAS 1/6/2010 11:14:04 AM


Le Wed, 06 Jan 2010 12:03:36 +0100, wiso a écrit :


> from time import time
> t = time(); xxx=map(to_dict,l); print time() - t # 0.5 t = time();
> xxx=map(to_if,l); print time() - t   # 1.0

Don't define your own function just for attribute access. Instead just 
write:

xxx = map(month_dict.__getitem__, l)


1
Reply Antoine 1/6/2010 11:53:46 AM

On Jan 6, 9:03=A0pm, wiso <gtu2...@alice.it> wrote:
> I'm optimizing the inner most loop of my script. I need to convert month
> name to month number. I'm using python 2.6 on linux x64.
>
> month_dict =3D {"Jan":1,"Feb":2,"Mar":3,"Apr":4, "May":5, "Jun":6,
> =A0 =A0 =A0 =A0 =A0 =A0"Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12=
}
>
> def to_dict(name):
> =A0 return month_dict[name]

Try replacing the to_dict function with:

   to_dict =3D month_dict.get

That removes one extra function call per lookup. On my computer, this
reduces the time for your test from 0.26 to 0.09.
0
Reply alex23 1/6/2010 11:58:39 AM

Antoine Pitrou wrote:

> Le Wed, 06 Jan 2010 12:03:36 +0100, wiso a écrit :
> 
> 
>> from time import time
>> t = time(); xxx=map(to_dict,l); print time() - t # 0.5 t = time();
>> xxx=map(to_if,l); print time() - t   # 1.0
> 
> Don't define your own function just for attribute access. Instead just
> write:
> 
> xxx = map(month_dict.__getitem__, l)

t = time(); xxx=map(month_dict.__getitem__,l); print time() - t # 0.2

month_list = 
("","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")

t = time(); xxx=map(month_list.index,l); time() - t # 0.6
0
Reply wiso 1/6/2010 12:03:38 PM

On Wed, 06 Jan 2010 12:03:36 +0100, wiso wrote:

> I'm optimizing the inner most loop of my script. I need to convert month
> name to month number. I'm using python 2.6 on linux x64.

According to your own figures below, it takes less than a nanosecond per 
lookup, at worst, even using a remarkably inefficient technique. Are you 
trying to tell us that this is the bottleneck in your script? I'm sorry, 
I find that implausible. I think you're wasting your time trying to 
optimise something that doesn't need optimizing.

Even if you halve the time, and deal with a million data points each time 
you run your script, you will only save half a second per run. I can see 
from the times you posted that you've spent at least an hour trying to 
optimise this. To make up for that one hour, you will need to run your 
script 7200 times, before you see *any* time savings at all.


> month_dict = {"Jan":1,"Feb":2,"Mar":3,"Apr":4, "May":5, "Jun":6,
> 	   "Jul":7,"Aug":8,"Sep":9,"Oct":10,"Nov":11,"Dec":12}
> 
> def to_dict(name):
>   return month_dict[name]

This leads to a pointless function call. Just call month_dict[name] 
instead of calling a function that calls it.



> def to_if(name):
>     if name == "Jan": return 1
>     elif name == "Feb": return 2
>     elif name == "Mar": return 3
>     elif name == "Apr": return 4
>     elif name == "May": return 5
>     elif name == "Jun": return 6
>     elif name == "Jul": return 7
>     elif name == "Aug": return 8
>     elif name == "Sep": return 9
>     elif name == "Oct": return 10
>     elif name == "Nov": return 11
>     elif name == "Dec": return 12
>     else: raise ValueError

That is remarkably awful.

 
> import random
> l = [random.choice(month_dict.keys()) for _ in range(1000000)]
> 
> from time import time
> t = time(); xxx=map(to_dict,l); print time() - t # 0.5 
> t = time(); xxx=map(to_if,l); print time() - t   # 1.0

This is not a reliable way to do timings. You should use the timeit 
module.



> is there a faster solution? Maybe something with str.translate?

What makes you think str.translate is even remotely useful for this?




-- 
Steven
0
Reply Steven 1/6/2010 12:48:32 PM

5 Replies
976 Views

(page loaded in 0.116 seconds)

Similiar Articles:













7/25/2012 4:37:23 PM


Reply: