Hi,
We are finding that SimpleDateFormat is pretty slow, if your trying to
use it to pass millions of records. We improved upon it by added some
caches in the code, if things like the Month was the same and so on
but in all we find it to be a hog.
For example we can pass 20,000 records a second, if they don't contain
dates in them but when you add dates this can drop to 4,000.
So does anyone know of a good class out there or before we go and
build a faster one.
TIA
|
|
0
|
|
|
|
Reply
|
nick_wakefield (5)
|
9/15/2003 11:17:47 PM |
|
On 15 Sep 2003 16:17:47 -0700, nick_wakefield@hotmail.com (Niko) wrote
or quoted :
>So does anyone know of a good class out there or before we go and
>build a faster one.
You might want to look into BigDate if you are dealing only with Dates
not date/timestamps. It has a couple of toString methods. You could
roll your own on those models which should be much faster than
SimpleDateFormat.
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
|
|
0
|
|
|
|
Reply
|
roedy (1019)
|
9/15/2003 11:36:22 PM
|
|
On Mon, 15 Sep 2003 23:36:22 GMT, Roedy Green <roedy@mindprod.com>
wrote or quoted :
>You might want to look into BigDate
see http://mindprod.com/jgloss/bigdate.html
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
|
|
0
|
|
|
|
Reply
|
roedy (1019)
|
9/15/2003 11:37:59 PM
|
|
"Niko" <nick_wakefield@hotmail.com> wrote in message
news:9da94cd1.0309151517.3d389820@posting.google.com...
> Hi,
>
> We are finding that SimpleDateFormat is pretty slow, if your trying to
> use it to pass millions of records. We improved upon it by added some
> caches in the code, if things like the Month was the same and so on
> but in all we find it to be a hog.
>
> For example we can pass 20,000 records a second, if they don't contain
> dates in them but when you add dates this can drop to 4,000.
>
> So does anyone know of a good class out there or before we go and
> build a faster one.
If speed is the issue, you might want to consider turning the problem
around. There are only 365 days in a year, so over a 100 year period there
are only 36500 distinct dates. Pre-format those that are most likely to be
in your range of dates and put them in a hash table, or use a simple
indexing method. This completely sidesteps expensive string formatting
problems and is especially good if there are many redundant dates.
Cheers,
Matt Humphrey matth@iviz.com http://www.iviz.com/
|
|
0
|
|
|
|
Reply
|
matth (145)
|
9/16/2003 1:37:36 AM
|
|
Thanks for the bigdate and the index lookup ideas, unfortunately I'm
working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
thinking I could produce two hash tables, one for the time and one for
the date, ignoring the year, split the string and lookup in both
tables and adjust for year.
"Matt Humphrey" <matth@iviz.com> wrote in message news:<APt9b.1265$065.885327@news1.news.adelphia.net>...
> "Niko" <nick_wakefield@hotmail.com> wrote in message
> news:9da94cd1.0309151517.3d389820@posting.google.com...
> > Hi,
> >
> > We are finding that SimpleDateFormat is pretty slow, if your trying to
> > use it to pass millions of records. We improved upon it by added some
> > caches in the code, if things like the Month was the same and so on
> > but in all we find it to be a hog.
> >
> > For example we can pass 20,000 records a second, if they don't contain
> > dates in them but when you add dates this can drop to 4,000.
> >
> > So does anyone know of a good class out there or before we go and
> > build a faster one.
>
> If speed is the issue, you might want to consider turning the problem
> around. There are only 365 days in a year, so over a 100 year period there
> are only 36500 distinct dates. Pre-format those that are most likely to be
> in your range of dates and put them in a hash table, or use a simple
> indexing method. This completely sidesteps expensive string formatting
> problems and is especially good if there are many redundant dates.
>
> Cheers,
> Matt Humphrey matth@iviz.com http://www.iviz.com/
|
|
0
|
|
|
|
Reply
|
nick_wakefield (5)
|
9/16/2003 5:42:28 PM
|
|
On 16 Sep 2003 10:42:28 -0700, nick_wakefield@hotmail.com (Niko) wrote
or quoted :
>However I was
>thinking I could produce two hash tables, one for the time and one for
>the date, ignoring the year, split the string and lookup in both
>tables and adjust for year.
"adjust for year" means recreating the logic in BigDate.
you could precompute the strings for the days for a period of five
years, and index to get the YYYYYMMDD part and then plop in the time
part, but that is a rather big chunk of RAM.
You could get BigDate to get you the date part. You have to do the
time part yourself. You need a timezone adjust.
--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
|
|
0
|
|
|
|
Reply
|
roedy (1019)
|
9/16/2003 7:06:47 PM
|
|
"Niko" <nick_wakefield@hotmail.com> wrote in message
news:9da94cd1.0309160942.3cc5e006@posting.google.com...
> Thanks for the bigdate and the index lookup ideas, unfortunately I'm
> working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
> thinking I could produce two hash tables, one for the time and one for
> the date, ignoring the year, split the string and lookup in both
> tables and adjust for year.
That's workable and equivalent to forming the string via indexed lookup, but
with more lookup elements. Your tables for lookup would the day/month, the
year, the hour, minute and second (all the same table from 0..59) Assemble
them via a StringBuffer. There are over 84000 second time-stamps in a day,
so that's a bit much for direct lookup. Part of what you're trying to avoid
is the number-to-string conversion and the string assembly. This technique
does not avoid the string assembly problem, but the number-to-string lookup
is reduced to table index.
Another way to avoid string assembly is to arrange the string to always have
the same layout: e.g. 2dights, th|rd, sp, 3-letters, sp, dd:dd:dd. This way
you only allocate the string once and copy the elements to fixed places. But
I really only suggest this after a run with a serious profiler.
Cheers,
|
|
0
|
|
|
|
Reply
|
matth (145)
|
9/16/2003 7:34:44 PM
|
|
Niko wrote:
> Hi,
>
> We are finding that SimpleDateFormat is pretty slow, if your trying to
> use it to pass millions of records. We improved upon it by added some
> caches in the code, if things like the Month was the same and so on
> but in all we find it to be a hog.
>
> For example we can pass 20,000 records a second, if they don't contain
> dates in them but when you add dates this can drop to 4,000.
On what hardware?
>
> So does anyone know of a good class out there or before we go and
> build a faster one.
>
Are you using SimpleDateFormat correctly? You should not create a new
instance for each record. I get a throughput of about 150,000 calls to
format() per second using an array of one million random dates.
> TIA
|
|
0
|
|
|
|
Reply
|
niceguy (1)
|
9/16/2003 10:53:29 PM
|
|
Niko wrote:
> Thanks for the bigdate and the index lookup ideas, unfortunately I'm
> working with DateTimes, i.e 3rd Jun 1993 05:01:43.
I realise this probably won't help, but do you actually *have* to format all
the dates ? If you can arrange to keep them in their initial (not String) form
thoughout, and only change them into strings when/if displayed to a user then
you can avoid the overhead that way. That might well be difficult, but not
necessarily worse than messing around with faster parsing or complex cacheing
schemes.
-- chris
|
|
0
|
|
|
|
Reply
|
chris.uppal (3970)
|
9/17/2003 8:27:40 AM
|
|
I create one single instance but when we look at the profiler we see a
chunk of time spent in SimpleDateFormat, it may only be a few percent
but when you are loading a file with 50 fields and maybe 8 dates then
you really start to see the chunk grow. We spent a long time
optimizing other parts of the code and even NIO showed no improvement
over our enhanced buffered IO (though we prefer NIO as it reduces the
amount of custom code) so it seams an awful shame to let
SimpleDateFormat get away without being optimized.
As for the source supplying pure dates, it sometimes can come like
that but the code is part of data loading tool which is configurable
for any data source that can come via Streams or Channels, and we only
format the date for display at the very end. It's the passing that
takes the time and creating a table with all known value sections
doesn't scare us to much as memory is cheap and this type of software
is running on big boxes overnight.
Jim Sculley <niceguy@wisdomteeth.tlanta.com> wrote in message news:<bk849a0203i@enews3.newsguy.com>...
> Niko wrote:
|
|
0
|
|
|
|
Reply
|
nick_wakefield (5)
|
9/17/2003 5:03:35 PM
|
|
On 16 Sep 2003 10:42:28 -0700, nick_wakefield@hotmail.com (Niko)
wrote:
>Thanks for the bigdate and the index lookup ideas, unfortunately I'm
>working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
>thinking I could produce two hash tables, one for the time and one for
>the date, ignoring the year, split the string and lookup in both
>tables and adjust for year.
Why not break the date string into parts using StringTokenizer, then
evaluate each part and build the input for a Calendar object, then
evaluate on the Calendar object.
3rd -> value of the number, ignore the text
Jun -> lookup on 12 possibilities, less if you use progressive
lookup (ie: check the first letter, if no match check the second, if
no match check the third)
1993 -> value
substring on the time
------------------------
Wojtek Bok
Solution Developer
|
|
0
|
|
|
|
Reply
|
su-news (43)
|
9/18/2003 4:19:01 AM
|
|
|
10 Replies
37 Views
(page loaded in 0.125 seconds)
Similiar Articles: Fast bit-reverse on an x86? - comp.dsp... but real fast is still better than real slow ... CFLAGS=-O0 --host=x86_64-pc -mingw32 --build=x86 ... even a few hardware designs would allow faster access than just one MHz. Exif Date/Time information formatting - comp.soft-sys.matlab ...Sec = etime(DateV, [1994, 1, 1, 0, 0, 0]); Or faster ... datenummx(DateV)) * 86400); Does this look more ... Character date to date format in PROC SQL - comp.soft-sys ... gfortran or ifort? - comp.lang.fortranIs one faster? Anything else I should know? For what ... gfortran was build with GCC 4.4.4. Most of what I ... I find that easy and convenient. My editor ("IDE") is Geany ... improve strlen - comp.lang.asm.x86... with four basic loop start blocks and "build" the ... is where the action is at, and it's hard to find faster ... compiler working on your code produces a faster result than one ... WTB: EPROM Emulator for 2716 - comp.arch.embeddedOne could almost make one of those from a ... If you are looking for real 2716's, I can probably find a few (even 1702's!). ... need for erasure and writing is also faster ... file opening slows down - puzzling - comp.lang.xharbourI am not really confident I'll find out an easy ... of users (or users that are always looking at ... have influence on file opening One possible reason for your slow ... Qt - comp.graphics.api.openglPerhaps you didn't install/build that one. I know it's not ... Hi, Can anybody tell me where I can find an easy-to ... We needed the textures to load faster so we switched ... Best Solaris 9 Firewall - comp.unix.solarisOne is the bundled one on the Solaris 9 CD and the other ... Or you could take a closed look at freeware CD in your ... a problem, but I excpect it would be relatively easy to ... New computer justification - comp.cad.solidworksThey are so slow that you loose your train of ... Once you collect a days worth of data on one or more systems you look at ... on my systems I have and post times for re-build. User Port expansion - comp.sys.cbm... the bus, but it was starting to look ... hand - it has to be desperately easy for a Commie user to build ... demux connected to PC2 will be a faster interface, but this one ... Fast Twitch and Slow Twitch Muscle FibersLook at the difference between doing a set ... by their neurons at a rate ten times faster than the rate of activation for slow ... Muscle Gaining part one; Muscle Gaining part ... OneLook Dictionary SearchContains a database of words, terms, names, and acronyms. 7/24/2012 4:16:27 AM
|