veritas filesystem and directories with large number of files

  • Follow


Folks,

We have directories that contain 1.8million+ files.  The filesystems
are VERY slow and they are vxfs. Is there a limit on how many files
can be in this a vxfs directory before it take a performance hit?


thanks,

keith
0
Reply codybear 8/10/2008 10:15:40 PM

In article <e00f6c34-51c7-4d1c-b843-4c2a7af51b97@m36g2000hse.googlegroups.com>,
	codybear <keithclay@gmail.com> writes:
> Folks,
> 
> We have directories that contain 1.8million+ files.  The filesystems
> are VERY slow and they are vxfs. Is there a limit on how many files
> can be in this a vxfs directory before it take a performance hit?
> 
> 
> thanks,
> 
> keith
Hi Keith,

	from experience I'd say around 100000. Our backup's (via Netbackup) have problems if the file count is much higher.

Yours, Hans Schwengeler
0
Reply t7321 8/12/2008 5:40:14 AM


On Sun, 10 Aug 2008, keithclay@gmail.com wrote:

> Folks,
> 
> We have directories that contain 1.8million+ files.  The filesystems
> are VERY slow and they are vxfs. Is there a limit on how many files
> can be in this a vxfs directory before it take a performance hit?
> 
> 
> thanks,
> 
> keith

Those are some fairly large directories. :-)

What version of HP-UX are you running? And what version of VxFS?

-- 
Carl Davidson  (carl.davidson@hp.com)
Hewlett-Packard Company, Cupertino, CA 95014
You can't please all of the people any of the time.
0
Reply Carl 8/12/2008 5:27:03 PM

codybear wrote:
> We have directories that contain 1.8 million+ files.  The filesystems
> are VERY slow and they are vxfs. Is there a limit on how many files
> can be in this a vxfs directory before it take a performance hit?

The experts on the ITRC say you should not use the directory structure 
as a database.  2 million is way to much.
By adding an extra level with 1000 max, you can really reduce the number.
0
Reply Dennis 8/13/2008 3:52:41 AM

codybear <keithclay@gmail.com> writes:

> Folks,
>
> We have directories that contain 1.8million+ files.  The filesystems
> are VERY slow and they are vxfs. Is there a limit on how many files
> can be in this a vxfs directory before it take a performance hit?

Hi!

Which layout version of VxFS do you have? Recent layouts are said to handle
many files in a directory better. Current layout for VxFS 5 is 7.

Despite of that a hash-table-based directory lookup will suffer from hash
clashes.  Every hash table has a design length regarding the number of table
entries and fil name lengths. The more hash collisions you have, the more
lookups (CPU) is required to look up a name.

Also, it highly depend how you access a directory:
A "ll" will read all entries, get attributes of each, and sort those, while a
"find" will only lookup the names. the fastest access is to directly probe for
a single file like "test -f a_name".

The general method to optimzie lookups is to reduce the number of files in a
directory (to less than 100 I'd suggest). Instead of looking up a file like
"01234567890" distribute them (assuming the names are equally distributed) in
a structure like "01/23/45/67/89/0" (i.e. a 5-level directory hierarchy with
100 entries at most each). You could also use the original file name and
compute a strong hash like MD5 or SHA-1 to determine the directories. As those
hases distribute quite equally, you might pick any "digits" for distributing
the files.

For example: If your eight files produce these MD5 hashes (fingerprints):
321d1b34ba06106ad8d15dbd0cff4252
8643682de5a441e36627fa810c7d5db2
8fa5e4afb2da5c0e3314c760210d3819
9c6b1d1b2bbd59b8eaaedbd1c1768a9f
9d02b629b97700478868c931969ae55a
9e35af0a9281d878f76cacfaea63ee75
a345b8cfba5c00ead9cd1c731783645e
d0b678f97e61987a160d737092a5e3cc

You could use the first four characters to put your files into
3/2/1/d/
8/6/4/3/
8/f/a/5/
9/c/6/b/
9/d/0/2/
9/e/3/5/
a/3/4/5/
d/0/b/6/

That is 16 entries per directory-level, 65536 "buckets" altogether. If you use
two characters (like 32/1d/1b/34/) you'll have 256 entries per directory
(about four billion buckets). With just three two-character levels you'd have
16 million "buckets".

So if you have control over the application that creates and accesses those
files, you could easily implement that. Alternatively you should consider
using some light-weight database like sleepycat's. I'd advise for the latter,
because when backing up those files (assuming they are rather short), the
backup software still has to enumerate them all before deciding whether to
save them or not.

Not a HP specialist ;-)

Regards,
Ulrich
0
Reply Ulrich 8/18/2008 9:09:36 AM

4 Replies
181 Views

(page loaded in 0.085 seconds)

Similiar Articles:













7/20/2012 7:48:07 AM


Reply: