If interested see previous thread: http://tinyurl.com/245w9k7
I'm waking this thread up again as I never received any conclusive resolution
to the problem of a client's system suffering from Streams issues.
The latest round occurred Dec 27 2010 when the primary system locked up. The
on-site administrator responded to the computer room and was unable to login
on the system console and so powered the system off and then back on. He then
checked the backup server and noted that it was scrolling the message of out
of disk space on the root file system. He power cycled the backup server but
it came up still out of root disk space.
Unix2 is a hot backup system that is not used (or monitored) until the live server
fails. Application and application-data files are copied nightly from the
primary server to the backup server using a find command piped to cpio
piped to rcmd unix2 cpio. So it is very disappointing that the backup server
was "down" (out of root disk space) when the live system locked up on December 27.
The client called me and I used ssh to connect to the primary system and then from there
used ssh to connect to the backup system. Unlike telnet, ssh was able to connect
me to the backup system even though there is no space on the root file system.
I managed to delete known large files in the /tmp directory until I had 9M of free space
in the root file system and was then able to issue commands and get a response
(while the root was at zero space, commands failed to execute and I had to delete
the files in tmp blind without getting back a "$" prompt until the free space was
sufficient).
Investigating the reason for the lack of root disk space turned up the
the nearly 1G /var/adm/syslog. I copied it to a file system with sufficient
space, zeroed /var/adm/syslog and then rebooted the system.
Now look at old syslog in /util
# cd /util/syslog
# ls -lt
total 2075788
-rw-r--r-- 1 root root 1058649630 Dec 27 15:43 syslog
# wc -l syslog
11,146,638 syslog
-rw-r--r-- 1 root root 1058638848 Dec 27 08:10 syslog
# split -500000 syslog
I investigated and deleted xab through xav and distilled xaa and xaw
to a compressed syslog file.
It turns out that on Dec 13, syslog started getting the message:
Dec 13 07:32:59 unix2 WARNING: table_grow - mblock table page limit of 500 pages (NSTRPAGES) exceeded by 1 pages
Dec 13 07:32:59 unix2 WARNING: strd - Cannot grow STREAMS message header table
Dec 13 08:15:36 unix2 WARNING: table_grow - mblock table page limit of 500 pages (NSTRPAGES) exceeded by 1 pages
Dec 13 08:15:36 unix2 WARNING: strd - Cannot grow STREAMS message header table
And by Dec 14 at 00:59:03 had grown syslog to nearly 1G (Dec 14 log entries at the
beginning of xaw). This is conformant with SCO MP5 (and MP3 and MP4) as it states:
> * A system hang was fixed. It was caused by strd looping and
> trying to allocate memory for message headers when the mblock
> table was full. fz527661 / erg712281
Thank you SCO. So now the system does not hang but just fills up the syslog
until you run out of disk space.
Dec 14 00:59:03 unix2 WARNING: strd - Cannot grow STREAMS message header table
Dec 14 00:59:03 unix2 WARNING: table_grow - mblock table page limit of 500 pages (NSTRPAGES) exceeded by 1 pages
Dec 14 00:59:03 unix2 WARNING: strd - Cannot grow STREAMS message header table
Dec 14 00:59:03 unix2 WARNING: table_grow - mblock table page limit of 500 pagWARNING: err: Error log buffer overflow
Dec 14 00:59:03 unix2 ge header table
Dec 14 00:59:03 unix2 WARNING: table_grow - mblock table page limit of 500 pages (NSTRPAGES) exceeded by 1 pages
Dec 14 00:59:03 unix2 WARNING: strd - Cannot grow STREAMS message header table
....
146566 lines deleted from xaw essentially the same as above
....
Dec 27 08:08:30 unix2 syslogd: restart
Dec 27 08:08:30 unix2 sco_pmd[58]: PMD started - PID 59
Dec 27 08:08:32 unix2 SCO OpenServer(TM) Release 5
Dec 27 08:08:32 unix2
Dec 27 08:08:32 unix2 (C) 1976-2003 Caldera International, Inc. and its suppliers.
Dec 27 08:08:32 unix2 All rights reserved.
Dec 27 08:08:32 unix2
Dec 27 08:08:32 unix2 For complete copyright credits,
Dec 27 08:08:32 unix2 enter "copyrights" at the command prompt.
Dec 27 08:08:32 unix2
Dec 27 08:08:32 unix2 device address vec dma comment
Dec 27 08:08:32 unix2 -------------------------------------------------------------------------------
Dec 27 08:08:32 unix2 %kernel - - - rel=3.2v5.0.7 kid=2003-02-18
Dec 27 08:08:32 unix2 %cpu - - - unit=1 family=6 type=gt PentIII
Dec 27 08:08:32 unix2 %cpuid - - - unit=1 vend=GenuineIntel tfms=0:6:15:11(0)
Dec 27 08:08:32 unix2 %fpu - 13 - unit=1 type=80387-compatible
Dec 27 08:08:32 unix2 %pci 0x0CF8-0x0CFF - - am=1 sc=0 buses=6
Dec 27 08:08:32 unix2 %PnP - - - nodes=0
Dec 27 08:08:32 unix2 %clock - - - type=TSC/2.133409657Ghz
Dec 27 08:08:32 unix2 %serial 0x03F8-0x03FF 4 - unit=0 type=Standard nports=1 base=0 16550A/16
Dec 27 08:08:32 unix2 %serial 0x02F8-0x02FF 3 - unit=1 type=Standard nports=1 base=8 16550A/16
Dec 27 08:08:32 unix2 %console - - - unit=vga type=0 num=12 scoansi=1 scroll=50
Dec 27 08:08:32 unix2 %floppy 0x03F2-0x03F7 6 2 unit=0 type=135ds18
Dec 27 08:08:32 unix2 %kbmouse 0x0060-0x0064 12 - type=Keyboard|PS/2 mouseid=0xFF
Dec 27 08:08:32 unix2 %udi - - - UDI environment
Dec 27 08:08:32 unix2 %adapter - - - ha=0 type=usb_msto UDI SCSI HBA
Dec 27 08:08:32 unix2 %adapter 0x01F0-0x01F7 14 - type=IDE ctlr=0 dvr=wd
Dec 27 08:08:32 unix2 %adapter - 10 - type=aacraid ha=0 driver=B7348
Dec 27 08:08:32 unix2 %bcme0 - 10 - chip=BCM5721 mem=FF7F0000addr=00:1e:8c:d5:66:06
Dec 27 08:08:32 unix2 NOTICE: bcme0: Firmware version 5721-v3.58
Dec 27 08:08:32 unix2 %cd-rom - - - type=IDE ctlr=0 cfg=mst unit=0 dvr=Srom->wd
Dec 27 08:08:32 unix2 %disk - - - type=S ha=0 id=0 lun=0 bus=0 ht=aacraid unit=0
Dec 27 08:08:32 unix2 %Sdsk - - - cyls=8922 hds=255 secs=63 unit=0 fts=sdb
Dec 27 08:08:32 unix2 %Sdsk-0 - - - Vnd=Adaptec Prd=ASR-2130S Mirro Rev=0001
Dec 27 08:08:32 unix2 %usb_ehci - 17 - PCI bus=0 dev=29 func=7
Dec 27 08:08:32 unix2 %usb_uhci - 17 - PCI bus=0 dev=29 func=0
Dec 27 08:08:32 unix2 %usb_uhci - 18 - PCI bus=0 dev=29 func=1
Dec 27 08:08:32 unix2 NOTICE: bcme0 (slot:0 port:1): Link is down
Dec 27 08:08:32 unix2 NOTICE: bcme0 (slot:0 port:1): Link is up (1000Mbps, Full Duplex)
Dec 27 08:08:32 unix2 mem: total = 2096600k, kernel = 367904k, user = 1728696k
Dec 27 08:08:32 unix2 swapdev = 1/41, swplo = 0, nswap = 1258292, swapmem = 629144k
Dec 27 08:08:32 unix2 Autoboot from rootdev = 1/42, pipedev = 1/42, dumpdev = 1/41
Dec 27 08:08:32 unix2 kernel: Hz = 100, i/o bufs = 300000k (high bufs = 298976k)
Dec 27 08:08:32 unix2
Dec 27 08:08:32 unix2 %cpu - 255 - unit=2 family=6 type=gt PentIII
Dec 27 08:08:32 unix2 %cpuid - - - unit=2 vend=GenuineIntel tfms=0:6:15:11(0)
Dec 27 08:08:32 unix2 %fpu - - - unit=2 type=80387-compatible
Dec 27 08:08:32 unix2 %cpu - 255 - unit=3 family=6 type=gt PentIII
Dec 27 08:08:32 unix2 %cpuid - - - unit=3 vend=GenuineIntel tfms=0:6:15:11(0)
Dec 27 08:08:32 unix2 %fpu - - - unit=3 type=80387-compatible
Dec 27 08:08:32 unix2 %cpu - 255 - unit=4 family=6 type=gt PentIII
Dec 27 08:08:32 unix2 %cpuid - - - unit=4 vend=GenuineIntel tfms=0:6:15:11(0)
Dec 27 08:08:32 unix2 %fpu - - - unit=4 type=80387-compatible
Dec 27 08:08:32 unix2 prngd[92]: prngd 0.9.29 (12 Jul 2004) started up for user root
Dec 27 08:08:32 unix2 prngd[92]: have 7 out of 110 filedescriptors open
Dec 27 08:08:37 unix2 NOTICE: HTFS: No space on dev hd (1/42)
Dec 27 08:08:42 unix2 CPU3: NOTICE: HTFS: No space on dev hd (1/42)
Dec 27 08:08:58 unix2 last message repeated 3 times
Dec 27 08:08:58 unix2 NOTICE: HTFS: No space on dev hd (1/42)
Dec 27 08:09:03 unix2 last message repeated 2 times
Dec 27 08:09:08 unix2 CPU3: NOTICE: HTFS: No space on dev hd (1/42)
Dec 27 08:09:33 unix2 last message repeated 5 times
Dec 27 08:09:38 unix2 NOTICE: HTFS: No space on dev hd (1/42)
Dec 27 08:09:43 unix2 CPU2: NOTICE: HTFS: No space on dev hd (1/42)
Dec 27 08:09:48 unix2 CPU3: NOTICE: HTFS: No space on dev hd (1/42)
Dec 27 08:10:28 unix2 last message repeated 6 times
(END of /util/syslog/syslog)
Sometime after the nightly mirroring of data from the live server to
the backup server completed the backup server begin getting the streams
error messages growing the syslog file and resulted in the out of
disk space condition. From the e-mail log of the mirror job:
> Mirror unix started: Mon Dec 13 03:00:01 CST 2010
> 5571878 blocks
> Mirror unix complete: Mon Dec 13 03:02:34 CST 2010
>
> Mirror unix started: Tue Dec 14 03:00:01 CST 2010
> unix2: Connection timed out
> Mirror unix complete: Tue Dec 14 03:03:04 CST 2010
From my nstr.log file:
Mon Dec 13 02:50:00 CST 2010 streams memory in use: 10302.71KB
Mon Dec 13 02:55:00 CST 2010 streams memory in use: 10302.71KB
Mon Dec 13 03:00:00 CST 2010 streams memory in use: 10302.71KB
Mon Dec 13 03:05:00 CST 2010 streams memory in use: 10302.24KB
Mon Dec 13 03:10:00 CST 2010 streams memory in use: 10302.24KB
Mon Dec 13 03:15:00 CST 2010 streams memory in use: 10302.24KB
....
Mon Dec 13 08:10:00 CST 2010 streams memory in use: 10317.60KB
Mon Dec 13 08:15:00 CST 2010 streams memory in use: 10318.55KB
Mon Dec 13 08:20:00 CST 2010 streams memory in use: 10318.55KB
Mon Dec 13 09:00:45 CST 2010 streams memory in use: 10332.39KB
Mon Dec 13 09:36:26 CST 2010 streams memory in use: 10332.39KB
....
Mon Dec 13 20:35:24 CST 2010 streams memory in use: 10332.39KB
Mon Dec 13 22:56:57 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 01:21:18 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 01:45:28 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 02:06:10 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 02:30:29 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 03:11:38 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 03:40:46 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 03:56:16 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 04:16:37 CST 2010 streams memory in use: 10332.39KTue Dec 14 18:10:40 CST 2010
streams memory in use: 10332.39KB
Tue Dec 14 18:40:14 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 19:11:12 CST 2010 streams memory in use: 10332.39KB
Tue Dec 14 19:26:20 CST 2010 streams memory in use: 10332.39KB
Oddly, the cron job logging to /usr/adm/nstr.log continued
to function (albeit badly: should log every 5 minutes) until
Dec 15 when the log ends until it resumes upon rebooting Dec 27:
Wed Dec 15 04:10:19 CST 2010 streams memory in use: 10332.39KB
Wed Dec 15 06:25:03 CST 2010 streams memory in use: 10332.39KB
Wed Dec 15 07:26:19 CST 2010 streams memory in use: 10332.39KB
Wed Dec 15 07:40:49 CST 2010 streams memory in use: 10332.39KB
Wed Dec 15 07:55:45 CST 2010 streams memory in use: 10332.39KB
Wed Dec 15 08:10:02 CST 2010 streams memory in use: 10332.39KB
Wed Dec 15 08:40Mon Dec 27 15:25:00 CST 2010 streams memory in use: 1772.66KB
Mon Dec 27 15:30:00 CST 2010 streams memory in use: 1773.00KB
-rw------- 1 adm adm 252 Dec 12 04:50 /etc/wtmp46
-rw------- 1 adm adm 0 Dec 12 04:50 /etc/wtmpx46
-rw------- 1 adm adm 252 Dec 5 04:50 /etc/wtmp45
-rw------- 1 adm adm 0 Dec 5 04:50 /etc/wtmpx45
-rw------- 1 adm adm 252 Nov 28 04:50 /etc/wtmp44
-rw------- 1 adm adm 0 Nov 28 04:50 /etc/wtmpx44
-rw------- 1 adm adm 252 Nov 21 04:50 /etc/wtmp43
-rw------- 1 adm adm 0 Nov 21 04:50 /etc/wtmpx43
-rw------- 1 adm adm 252 Nov 14 04:50 /etc/wtmp42
-rw------- 1 adm adm 0 Nov 14 04:50 /etc/wtmpx42
-rw------- 1 adm adm 252 Nov 7 04:50 /etc/wtmp41
-rw------- 1 adm adm 0 Nov 7 04:50 /etc/wtmpx41
-rw------- 1 adm adm 252 Oct 31 04:50 /etc/wtmp40
-rw------- 1 adm adm 0 Oct 31 04:50 /etc/wtmpx40
-rw------- 1 adm adm 324 Oct 24 04:50 /etc/wtmp39
-rw------- 1 adm adm 736 Oct 24 04:50 /etc/wtmpx39
-rw------- 1 adm adm 252 Oct 17 04:50 /etc/wtmp38
-rw------- 1 adm adm 0 Oct 17 04:50 /etc/wtmpx38
-rw------- 1 adm adm 252 Oct 10 04:50 /etc/wtmp37
-rw------- 1 adm adm 0 Oct 10 04:50 /etc/wtmpx37
-rw------- 1 adm adm 468 Oct 3 04:50 /etc/wtmp36
-rw------- 1 adm adm 2208 Oct 3 04:50 /etc/wtmpx36
-rw------- 1 adm adm 396 Sep 26 04:50 /etc/wtmp35
# last -w /etc/wtmp46
User Line Device PID Login time Elapsed Time Comments
# last -w /etc/wtmp45
User Line Device PID Login time Elapsed Time Comments
# last -w /etc/wtmp44
User Line Device PID Login time Elapsed Time Comments
# last -w /etc/wtmp43
User Line Device PID Login time Elapsed Time Comments
# last -w /etc/wtmp42
User Line Device PID Login time Elapsed Time Comments
# last -w /etc/wtmp41
User Line Device PID Login time Elapsed Time Comments
# last -w /etc/wtmp40
User Line Device PID Login time Elapsed Time Comments
# last -w /etc/wtmp39
User Line Device PID Login time Elapsed Time Comments
lainie p0 ttyp0 11273 Fri Oct 22 10:31 00:04
# last -w /etc/wtmp36
User Line Device PID Login time Elapsed Time Comments
root p1 ttyp1 1741 Fri Oct 1 00:38 00:03
smf p0 ttyp0 1596 Fri Oct 1 00:18 00:23
smf p0 ttyp0 1306 Thu Sep 30 23:56 00:16
Searching for the error message in SCO's data base:
"table_grow - mblock table page limit of 500 pages (NSTRPAGES) exceeded by 1 pages"
finds no TA covering this situation.
> TABLE_GROW found in 5 articles.
> MBLOCK found in 3 articles.
> TABLE found in 101 articles.
> PAGE found in 198 articles.
> LIMIT found in 43 articles.
> 500 found in 144 articles.
> PAGES found in 136 articles.
> (NSTRPAGES) found in 5 articles.
> EXCEEDED found in 16 articles.
> 1 found in 775 articles.
> PAGES found in 136 articles.
> 0 Articles found for search: table_grow - mblock table page limit of 500 pages (NSTRPAGES) exceeded by 1 pages
So to sum it up. We have a system that is idle except for the nightly
backup from the live server (and a Backup Edge backup to DVD-RAM prior
to the nightly mirror), and has a stream leak that results in the
streams memory in use creping up over time reaching 10332.39KB (at
last log point before the root filled up) since last booted Sept 7 2010.
And the SCO TA 116684 provides no direct hints at what's wrong.
Only two sections of TA 116684 seem applicable:
> 5. External network hardware misbehaving:
>
> Every network packet that is destined for the machine will be
> processed by the network card and driver and streams resources
> will be allocated. It is therefore possible that some other
> machine and/or piece of network hardware can misbehave and
> cause a depletion of streams resources on the local system. You
> can use a network sniffer and/or switch diagnostics to monitor
> the traffic on your local network and locate the offending
> machine.
>
> 6. Extremely high network traffic:
>
> Occasionally, on very highly traveled networks, there are too
> many packets to be processed at interrupt time. Consider the
> following:
>
> - Tuning str_pool_size and mblk_pool_size to see if it
> remedies the situation. You can start by doubling the pool
> sizes (relink and reboot). If the problem persists and you
> are sure that there is not some other problem, try tuning
> these up another 50%. Another thing to consider here is
> that there may be too much of a burden on your network
> infrastructure.
>
> - Consider upgrading your network hardware to support higher
> transfer rates or spreading the load across other servers
> and networks.
I have changed the str_pool_size and mblk_pool_size as suggested above to see
if that has any affect on the problem.
--
Steve Fabac
S.M. Fabac & Associates
816/765-1670
|
|
0
|
|
|
|
Reply
|
smfabac (423)
|
12/30/2010 7:39:34 PM |
|
Steve M. Fabac, Jr. typed (on Thu, Dec 30, 2010 at 01:39:34PM -0600):
| fails. Application and application-data files are copied nightly from the
| primary server to the backup server using a find command piped to cpio
| piped to rcmd unix2 cpio. So it is very disappointing that the backup server
Steve, I have no brilliant idea about your streams failures, but is that
find-to-cpio-to-rsh-cpio pipeline really effective, compared to rsync,
which I daresay would be faster?
--
JP
|
|
0
|
|
|
|
Reply
|
jpr5879 (1158)
|
12/30/2010 8:02:22 PM
|
|
Jean-Pierre Radley wrote:
> Steve M. Fabac, Jr. typed (on Thu, Dec 30, 2010 at 01:39:34PM -0600):
> | fails. Application and application-data files are copied nightly from the
> | primary server to the backup server using a find command piped to cpio
> | piped to rcmd unix2 cpio. So it is very disappointing that the backup server
>
>
> Steve, I have no brilliant idea about your streams failures, but is that
> find-to-cpio-to-rsh-cpio pipeline really effective, compared to rsync,
> which I daresay would be faster?
>
>
Actually:
find /usr1 /usr2 /usr3 /tmpwork -mtime -2 -print | cpio -oca | rcmd unix2 cpio -icvmu
JP,
I'm looking into rsync and had installed it on both systems in July 2010
and did some initial setup but continued to use the find-cpio-rcmd-cpio
pipeline for the nightly mirror at 03:00. (There is a tensive plan to
move the backup server to an off-site location where rsync would reduce the
VPN traffic.)
Since the backup system stopped receiving the backup on Dec 14, and the
find is run with mtime -2, I thought to use the rsync based script I
developed to bring the backup system up to date.
I wanted to make sure that the rsync was doing a faithful copy of the
source files so I used the dry-run switch "-n" to generate a list
of target files. From that list I created a script that would run
ls -l and sum -r on the target files logging to /tmp/sum.out then
run the script on both the sending system and the receiving system.
Doing the testing ad hoc I thought I was careful to preserve the
sum.out log files generated on each system before and after the
rsync based script but when I look at the log files today, I don't
see the problem I though I spotted on 12/27.
So I carefully generated a new checkit script today, ran it on
both systems before and after the rsync copy script and compared
the log file from the primary system (source system) to the backup
system (target system) and they are identical indicating that
rsync successfully copied the new/changed files from the source
system to the backup system.
As for speed, the log file from 12/29 (find-cpio-rcmd-cpio based)
shows:
Mirror unix started: Wed Dec 29 03:00:01 CST 2010
8952128 blocks < Approx 4.48 GByte >
Mirror unix complete: Wed Dec 29 03:04:30 CST 2010
And the log for the rsync based script executed manually:
rsync from unix to unix2::root started: Thu Dec 30 22:50:21 CST 2010
>
> Rsync from unix to unix2::root complete: Thu Dec 30 22:56:31 CST 2010
So rsync based script is slower then find-cpio-rcmd-cpio based script
over a 1G link.
/usr1
Thu Dec 30 22:50:21 CST 2010
sending incremental file list
Number of files: 6390
Number of files transferred: 397
Total file size: 3389913927 bytes
Total transferred file size: 596233758 bytes
Literal data: 140854644 bytes
Matched data: 455379114 bytes
File list size: 119637
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 26537122
Total bytes received: 491736
sent 26537122 bytes received 491736 bytes 478386.87 bytes/sec
total size is 3389913927 speedup is 125.42
/usr2
Thu Dec 30 22:51:17 CST 2010
sending incremental file list
Number of files: 1143
Number of files transferred: 117
Total file size: 6642430636 bytes
Total transferred file size: 3936816496 bytes
Literal data: 30264412 bytes
Matched data: 3906552084 bytes
File list size: 22502
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 9036221
Total bytes received: 1961597
sent 9036221 bytes received 1961597 bytes 41267.61 bytes/sec
total size is 6642430636 speedup is 603.98
/usr3
Thu Dec 30 22:55:43 CST 2010
sending incremental file list
Number of files: 9637
Number of files transferred: 0
Total file size: 1393347385 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 176411
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 176495
Total bytes received: 79
sent 176495 bytes received 79 bytes 70629.60 bytes/sec
total size is 1393347385 speedup is 7891.01
/tmpwork
Thu Dec 30 22:55:45 CST 2010
sending incremental file list
Number of files: 1985
Number of files transferred: 427
Total file size: 320033048 bytes
Total transferred file size: 117667640 bytes
Literal data: 94778725 bytes
Matched data: 22889783 bytes
File list size: 41059
File list generation time: 0.005 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 10626439
Total bytes received: 517434
sent 10626439 bytes received 517434 bytes 239653.18 bytes/sec
total size is 320033048 speedup is 28.72
Rsync from unix to unix2::root complete: Thu Dec 30 22:56:31 CST 2010
Here's the rsync part of the cp_to_backup script perhaps you can
suggest appropriate rsync options that might speed it up:
#Rsync commands
# NOTE change to -avzn is for dryrun
/usr/local/bin/rsync -avz --stats --delete --inplace \
/usr1 ${local_backup}::root >> /tmp/rsync_log.$$
echo "\n" >> /tmp/rsync_log.$$
date >> /tmp/rsync_log.$$
/usr/local/bin/rsync -avz --stats --delete --inplace \
/usr2 ${local_backup}::root >> /tmp/rsync_log.$$
echo "\n" >> /tmp/rsync_log.$$
date >> /tmp/rsync_log.$$
/usr/local/bin/rsync -avz --stats --delete --inplace \
/usr3 ${local_backup}::root >> /tmp/rsync_log.$$
date >> /tmp/rsync_log.$$
/usr/local/bin/rsync -avz --stats --delete --inplace \
/tmpwork ${local_backup}::root >> /tmp/rsync_logt.$$
echo "\n" >> /tmp/rsync_log.$$
echo "Rsync from ${host} to ${local_backup}::root complete: \c" >> /tmp/rsync_log.$$
touch /usr1/nfailupd.ran
date >> /tmp/rsync_vet.$$
mail -s "copy ${host} to ${local_backup}::root status" monitor < /tmp/rsync_log.$$
sleep 10
mv /tmp/rsync_log.$$ /tmp/rsync_log.last
--
Steve Fabac
S.M. Fabac & Associates
816/765-1670
|
|
0
|
|
|
|
Reply
|
smfabac (423)
|
12/31/2010 8:08:56 AM
|
|
Steve M. Fabac, Jr. wrote:
> Unix2 is a hot backup system that is not used (or monitored) until
> the live server fails. Application and application-data files are
> copied nightly from the primary server to the backup server using a
> find command piped to cpio piped to rcmd unix2 cpio. So it is very
> disappointing that the backup server was "down" (out of root disk
> space) when the live system locked up on December 27.
Why don't you use rsync? What happens in the backup server, over a long
period of time, when files are deleted in the source server -- are they
sometime later deleted too in the backup server, or will the backup
server's filesystem just fill up?
I think your scheme is OK if you want to have in the backup server a
backup of deleted files in the source server. However, if you want the
backup server just to mirror the source server, I think you should use
rsync.
You are running OSR 5.0.7 so the tool is there for you.
> It turns out that on Dec 13, syslog started getting the message:
>
> Dec 13 07:32:59 unix2 WARNING: table_grow - mblock table page limit
> of 500 pages (NSTRPAGES) exceeded by 1 pages Dec 13 07:32:59 unix2
> WARNING: strd - Cannot grow STREAMS message header table Dec 13
> 08:15:36 unix2 WARNING: table_grow - mblock table page limit of 500
> pages (NSTRPAGES) exceeded by 1 pages Dec 13 08:15:36 unix2 WARNING:
> strd - Cannot grow STREAMS message header table
>
> And by Dec 14 at 00:59:03 had grown syslog to nearly 1G (Dec 14 log
> entries at the beginning of xaw). This is conformant with SCO MP5
> (and MP3 and MP4) as it states:
>
>> * A system hang was fixed. It was caused by strd looping and trying
>> to allocate memory for message headers when the mblock table was
>> full. fz527661 / erg712281
>
>
> Thank you SCO. So now the system does not hang but just fills up the
> syslog until you run out of disk space.
(...)
> So to sum it up. We have a system that is idle except for the nightly
> backup from the live server (and a Backup Edge backup to DVD-RAM
> prior to the nightly mirror), and has a stream leak that results in
> the streams memory in use creping up over time reaching 10332.39KB
> (at last log point before the root filled up) since last booted Sept
> 7 2010. And the SCO TA 116684 provides no direct hints at what's
> wrong.
Well, 10 MB of RAM for a kernel structure is not much in today's world.
Are you short of RAM in the server? Why don't you just up the kernel
tunables related to STREAMS? Heavy duty servers need a config to match
such heavy duty, don't you think?
In reviewing your server's stune file which you posted before, I see
that my testing machine with a mere Pentium-III and 512 MB of RAM has
biggest values configured for several kernel parameters:
EVDEVS 192 -> 224
EVQUEUES 184 -> 216
NSTREAM 8192 -> 10240
TTHOG 5120 -> 8192
NCLIST 712 -> 1512
NSTREVENT 8448 -> 16512
NUMTIM 1040 -> 2064
NUMTRW 1040 -> 2064
(the values after the arrow are the one in my Pentium-III).
Also, have you read this documentation?:
http://localhost:8457/man/html.M/messages.M.html
http://localhost:8457/en/PERFORM/streams_rsc.html
http://localhost:8457/en/PERFORM/streams_tuning.html
|
|
0
|
|
|
|
Reply
|
pepe5 (204)
|
12/31/2010 12:11:17 PM
|
|
--oyUTqETQ0mS9luUI
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Fri, Dec 31, 2010, Pepe wrote:
>Steve M. Fabac, Jr. wrote:
>> Unix2 is a hot backup system that is not used (or monitored) until
>> the live server fails. Application and application-data files are
>> copied nightly from the primary server to the backup server using a
>> find command piped to cpio piped to rcmd unix2 cpio. So it is very
>> disappointing that the backup server was "down" (out of root disk
>> space) when the live system locked up on December 27.
>
>Why don't you use rsync? What happens in the backup server, over a long
>period of time, when files are deleted in the source server -- are they
>sometime later deleted too in the backup server, or will the backup
>server's filesystem just fill up?
That's totally up to you. We add the --delete option to rsync
periodically to remove these files, leaving them on the backup
system (hopefully) for a long enough period of time for somebody
to notice that an accidentally deleted file has gone missing.
Rsync also has a filter option, -F, which looks for a .rsync-filter
file in each directory containing patterns of files which should
not be backed up. This is used with the --delete-excluded option
which deletes any filtered files on the target machine.
I'm attaching one from the var directory of an OpenPKG instance on
this machine as an example.
We also back up each mounted file system to its own directory on
the backup server. The '/' file system is saved as 'rootfs' to
prevent overlap and others as 'home', etc. This makes it much
easier to restore a lost file system than having to sort out its
data from a mirror backup.
Back in the day when SCO's tar didn't back up non-file
information (e.g. directories, devices, etc.) we would make a
cpio archive of all non-files to a file which could then be used
to recreate all these before doing an actual restore. I forget
the cpio options used, but this gives the general idea:
find / -xdev ! -type f | cpio ...
>I think your scheme is OK if you want to have in the backup server a
>backup of deleted files in the source server. However, if you want the
>backup server just to mirror the source server, I think you should use
>rsync.
There's a difference between a backup server, and an archive server.
Apple's Time Machine does archival backups for simple systems,
and I'm sure there are others that are more appropriate for
more complex systems.
It's still a Good Idea(tm) to back up to external devices,
rotating them and keeping them off-site. Saving the external
devices permanently at month-end and year-end is also a good idea
as this enables one to get files back that go missing, but aren't
accessed on a regular basis so may have disappeared from a mirror
backup system over time (e.g. yearly tax reporting scripts).
Bill
--
INTERNET: bill@celestial.com Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
Voice: (206) 236-1676 Mercer Island, WA 98040-0820
Fax: (206) 232-9186 Skype: jwccsllc (206) 855-5792
The demands of the majority are always greater than taxation
alone can provide and thats where the FED comes in. The value of
the dollar has depreciated 97% since the creation of the FED.
--oyUTqETQ0mS9luUI
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=".rsync-filter"
- *.pid
- *.lock
- *.sock
- csadmin/*.lock
- postgresql/run/.s.PGSQL*
- deliver/tmp/dl.*
- findutils/tmp/*
- amavisd/tmp/*
# sockets
- sasl/saslauthd/mux
- mysql/mysql.sock
- whoson/run/whoson.s
- whoson/run/whoson.d
- postfix/private/tlsmgr
- postfix/private/rewrite
- postfix/private/bounce
- postfix/private/defer
- postfix/private/trace
- postfix/private/verify
- postfix/private/proxymap
- postfix/private/smtp
- postfix/private/relay
- postfix/private/error
- postfix/private/local
- postfix/private/virtual
- postfix/private/lmtp
- postfix/private/anvil
- postfix/private/scache
- postfix/private/smtp-amavis
- postfix/public/flush
- postfix/public/showq
- postfix/public/pre-cleanup
- postfix/public/cleanup
- postgresql/run/.s.PGSQL.5432
- clamav/clamd.sock
- courier-authlib/spool/authdaemon/socket
- zope/var/zopectlsock
# FIFOs
- postfix/public/pickup
- postfix/public/qmgr
- hylafax/FIFO
--oyUTqETQ0mS9luUI--
|
|
0
|
|
|
|
Reply
|
bill5504 (468)
|
12/31/2010 7:09:41 PM
|
|
|
4 Replies
27 Views
(page loaded in 0.086 seconds)
|