Odd BACKUP error: unsupported file structure !

  • Follow


Alpha VMS 8.3, the source disk is ODS5

backup/image/ignore=interlock disk1: lda2:/init

%BACKUP-E-OPENIN, error opening $11$DQB0:[ALLIN1.LIB_SHARE]BALLCAP.EPS;1
as input
-SYSTEM-W-FILESTRUCT, unsupported file structure level



$ dir/full $11$DQB0:[ALLIN1.LIB_SHARE]BALLCAP.EPS;1

Directory $11$DQB0:[ALLIN1.LIB_SHARE]

BALLCAP.EPS;1                 File ID:  (58136,1,0)
Size:           12/18         Owner:    [ALLIN1]
Created:    30-MAR-1996 18:20:43.55
Revised:    30-MAR-1996 21:29:14.43 (6)
Expires:    10-JUL-2022 00:00:00.00
Backup:     <No backup recorded>
Effective:  <None specified>
Recording:  <None specified>
Accessed:   <None specified>
Attributes: <None specified>
Modified:   <None specified>
Linkcount:  1
File organization:  Sequential
Shelved state:      Online
Caching attribute:  Writethrough
File attributes:    Allocation: 18, Extend: 0, Global buffer count: 0
                    No version limit
Record format:      Stream_CR, maximum 0 bytes, longest 96 bytes
Record attributes:  Carriage return carriage control
RMS attributes:     None
Journaling enabled: None
File protection:    System:RWED, Owner:RWED, Group:RE, World:RE
Access Cntrl List:  None
Client attributes:  None

Total of 1 file, 12/18 blocks.


The other *.EPS files in that directory are also Stream_CR and didn't
generate any error. ANA/RMS on the file uncovered no errors.

I do not trust the disk (which is why I am doing an /IMAGE to a
temporary disk with goal of reinitialzing this disk and repopulating it.
But if the backup is unreliable, I may have to think twice about it.

INDEXF.SYS had grown to over 170,000 blocks (I had done a backup of a
mac on it it it created a gazillion files which took forever to delete.

Knowing what causes this error message might help understanding the
health level of the disk and the trustability of the backup I am making.

BACKUP is being extremely slow and also slows down disk accesses in
another DCL process  despite CPU being at 0 and the disk IOs being
generally below 1 operation per second (sometimes peaks to about 25-30
for a short time)



Also, Backup is only using a working set of about 1200 pages. Isn't
backup supposed to use over 10k pages to make itself more efficient ? Or
was that only on VAX ?


0
Reply jfmezei.spamnot (8832) 11/10/2011 9:06:53 AM

On Nov 10, 10:06=A0am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Alpha VMS 8.3, the source disk is ODS5
>
> backup/image/ignore=3Dinterlock disk1: lda2:/init
>
> %BACKUP-E-OPENIN, error opening $11$DQB0:[ALLIN1.LIB_SHARE]BALLCAP.EPS;1
> as input
> -SYSTEM-W-FILESTRUCT, unsupported file structure level
>
>
>
> $ dir/full $11$DQB0:[ALLIN1.LIB_SHARE]BALLCAP.EPS;1

Try a $ dump/header/block=3Dcount=3D0 $11$DQB0:
[ALLIN1.LIB_SHARE]BALLCAP.EPS;1
and see what it dumps for "Structure level and version:".

> The other *.EPS files in that directory are also Stream_CR and didn't
> generate any error. ANA/RMS on the file uncovered no errors.

It's in the header. But it surprises that backup can't open the file
while ANA/RMS can.
I would rather try ana/disk/repair.
0
Reply becker.avd (47) 11/10/2011 12:18:36 PM


hb wrote:

>> backup/image/ignore=interlock disk1: lda2:/init
>>
>> %BACKUP-E-OPENIN, error opening $11$DQB0:[ALLIN1.LIB_SHARE]BALLCAP.EPS;1
>> as input
>> -SYSTEM-W-FILESTRUCT, unsupported file structure level
>
> 
> Try a $ dump/header/block=count=0 $11$DQB0:
> [ALLIN1.LIB_SHARE]BALLCAP.EPS;1
> and see what it dumps for "Structure level and version:".
> 

Dump of file $11$DQB0:[ALLIN1.LIB_SHARE]BALLCAP.EPS;1 on 10-NOV-2011
10:52:36.10
File ID (58136,1,0)   End of file block 12 / Allocated 18

                             File Header
Header area
    Identification area offset:           40
    Map area offset:                      100
    Access control area offset:           255
    Reserved area offset:                 255
    Extension segment number:             0
    Structure level and version:          5, 1
    File identification:                  (58136,1,0)
    Extension file identification:        (0,0,0)
    VAX-11 RMS attributes
        Record type:                      CR-terminated stream
        File organization:                Sequential
        Record attributes:                Implied carriage control
        Record size:                      96
        Highest block:                    18
        End of file block:                12
        End of file byte:                 504
        Bucket size:                      0
        Fixed control area size:          0
        Maximum record size:              0
        Default extension size:           0
        Global buffer count:              0
        Directory version limit:          0
    File characteristics:                 <none specified>
    Caching attribute:                    Writethrough
    Map area words in use:                3
    Access mode:                          0


If I type the file, the contents appear to be intact.   So far, only 2
files have generated that message. Both created when the Alpha version
of ALLIN1 was installed.

I can't run ana/disk/repair now because backup is still running.
Something definitely askew, but my priority is to get the disk copied to
another one first.  I know that ana/disk/repair spots a whole bunch of
errors about some bit not  properly set (likely due to file not being
properly closed when system is shutdown, but I had never seen those before).

0
Reply jfmezei.spamnot (8832) 11/10/2011 4:04:33 PM

On Nov 10, 5:06=A0am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Alpha VMS 8.3, the source disk is ODS5
>
> backup/image/ignore=3Dinterlock disk1: lda2:/init
>
[...]

> BACKUP is being extremely slow and also slows down disk accesses in
> another DCL process =A0despite CPU being at 0 and the disk IOs being
> generally below 1 operation per second (sometimes peaks to about 25-30
> for a short time)
>
> Also, Backup is only using a working set of about 1200 pages. Isn't
> backup supposed to use over 10k pages to make itself more efficient ? Or
> was that only on VAX ?

FWIW:

When I used to make tape backups of disk on MicroVAX 31xx's, I found
that WSQUOTA should be 1024, IIRC. Making it bigger actually slowed
things down.

AEF
0
Reply spamsink2001 (3065) 11/10/2011 6:45:49 PM

On Nov 10, 9:06=A0am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Alpha VMS 8.3, the source disk is ODS5
>
> backup/image/ignore=3Dinterlock disk1: lda2:/init
>
> %BACKUP-E-OPENIN, error opening $11$DQB0:[ALLIN1.LIB_SHARE]BALLCAP.EPS;1
> as input
> -SYSTEM-W-FILESTRUCT, unsupported file structure level
>
> $ dir/full $11$DQB0:[ALLIN1.LIB_SHARE]BALLCAP.EPS;1
>
> Directory $11$DQB0:[ALLIN1.LIB_SHARE]
>
> BALLCAP.EPS;1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 File ID: =A0(58136,1,0)
> Size: =A0 =A0 =A0 =A0 =A0 12/18 =A0 =A0 =A0 =A0 Owner: =A0 =A0[ALLIN1]
> Created: =A0 =A030-MAR-1996 18:20:43.55
> Revised: =A0 =A030-MAR-1996 21:29:14.43 (6)
> Expires: =A0 =A010-JUL-2022 00:00:00.00
> Backup: =A0 =A0 <No backup recorded>
> Effective: =A0<None specified>
> Recording: =A0<None specified>
> Accessed: =A0 <None specified>
> Attributes: <None specified>
> Modified: =A0 <None specified>
> Linkcount: =A01
> File organization: =A0Sequential
> Shelved state: =A0 =A0 =A0Online
> Caching attribute: =A0Writethrough
> File attributes: =A0 =A0Allocation: 18, Extend: 0, Global buffer count: 0
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 No version limit
> Record format: =A0 =A0 =A0Stream_CR, maximum 0 bytes, longest 96 bytes
> Record attributes: =A0Carriage return carriage control
> RMS attributes: =A0 =A0 None
> Journaling enabled: None
> File protection: =A0 =A0System:RWED, Owner:RWED, Group:RE, World:RE
> Access Cntrl List: =A0None
> Client attributes: =A0None
>
> Total of 1 file, 12/18 blocks.
>
> The other *.EPS files in that directory are also Stream_CR and didn't
> generate any error. ANA/RMS on the file uncovered no errors.
>
> I do not trust the disk (which is why I am doing an /IMAGE to a
> temporary disk with goal of reinitialzing this disk and repopulating it.
> But if the backup is unreliable, I may have to think twice about it.
>
> INDEXF.SYS had grown to over 170,000 blocks (I had done a backup of a
> mac on it it it created a gazillion files which took forever to delete.
>
> Knowing what causes this error message might help understanding the
> health level of the disk and the trustability of the backup I am making.
>
> BACKUP is being extremely slow and also slows down disk accesses in
> another DCL process =A0despite CPU being at 0 and the disk IOs being
> generally below 1 operation per second (sometimes peaks to about 25-30
> for a short time)
>
> Also, Backup is only using a working set of about 1200 pages. Isn't
> backup supposed to use over 10k pages to make itself more efficient ? Or
> was that only on VAX ?

Does BACKUP /IMAGE still have built-in intimate knowledge of the on-
disk structures? How current is that knowledge?

It is documented that MOUNT will produce the FILESTRUCT error message
on certain V7 versions of VMS if the index file has been extended
beyond certain limits. It's already mentioned here that INDEXF.SYS has
been unusually large. Obviously we're not looking at V7 here, but has
BACKUP /IMAGE kept up to date with the changes to MOUNT?

The file ID 58136,1,0 indicates, iirc, that this is indexfile slot
number 58136 or thereabouts, which doesn't immediately sound
excessive, but...

Copy the file in question to another file in another directory,
hopefully with a file header much closer to the start of INDEXF.SYS.
Then re-try the backup, and see if the error message recurs either on
the original file or the new copy?

hth
john
0
Reply johnwallace43 (187) 11/10/2011 7:15:04 PM

On Nov 10, 5:04=A0pm, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:

> =A0 =A0 Structure level and version: =A0 =A0 =A0 =A0 =A05, 1

Looks good to me. The source disk is ODS5 so almost all its files have
the 5, 1 in their header. From the error message one would have
expected a different level than 2 or 5.
0
Reply becker.avd (47) 11/10/2011 8:21:22 PM

JF Mezei <jfmezei.spamnot@vaxination.ca> writes:

>I do not trust the disk (which is why I am doing an /IMAGE to a
>temporary disk with goal of reinitialzing this disk and repopulating it.
>But if the backup is unreliable, I may have to think twice about it.

I, too, recently had an IDE disk which I wanted to copy data from.  
Unfortunately for me, this disk was in the process of dying.  Many
errors, unfortunately for me, many were in INDEXF.SYS.  These corrupted
headers often would produce unsupported file structure errors when 
accessed.

It was also slow as hell, MONITOR would consistently show it with a queue
length of 149.9 while the output drive was something like .05 on average.

0
Reply moroney (973) 11/10/2011 9:33:42 PM

A disk to saveset image back on the same system turned out to have the
same low speed (about 45KB/s according to Backup). So I decided to end
it (it would have taken about 21 hours to complete and I have electrical
work to do requiring the system be shutdown).

So I decided to do an ana/disk/repair to get the strange error messages
that don't seem to go away:

-ANALDISK-I-BAD_FIDNUM, unexpected FID_NUM or FID_NMX field
-ANALDISK-I-INVHEADER_BUSY, invalid file header marked "busy"
        in index file bitmap
%ANALDISK-W-BADHEADER, file (4479,639,0)
        invalid file header


Any hints on what this really means ? Interesting that backup only found
a couple of "invalid structure level files", but ana/disk finds a few
hundred of the above messages. It does not seem to repair them.

My hope is that with a reboot (and a volume rebuild that takes 45
minutes), I'll be able to do a proper backup and then reinitialise the
disk with a smaller indexf.sys and perhaps it will work fine.

What is interesting is that performance when running ALLIN1 did not eem
impacted at all. But Backup is slower than a TK50 on an all mighty
microvax II.
0
Reply jfmezei.spamnot (8832) 11/11/2011 2:46:00 AM

JF Mezei <jfmezei.spamnot@vaxination.ca> writes:

>So I decided to do an ana/disk/repair to get the strange error messages
>that don't seem to go away:

>-ANALDISK-I-BAD_FIDNUM, unexpected FID_NUM or FID_NMX field
>-ANALDISK-I-INVHEADER_BUSY, invalid file header marked "busy"
>        in index file bitmap
>%ANALDISK-W-BADHEADER, file (4479,639,0)
>        invalid file header


>Any hints on what this really means ? Interesting that backup only found
>a couple of "invalid structure level files", but ana/disk finds a few
>hundred of the above messages. It does not seem to repair them.

I got several of them as well.  Do you get read errors?  I got them
from corrupted file headers, bad blocks from within INDEXF.SYS

>My hope is that with a reboot (and a volume rebuild that takes 45
>minutes), I'll be able to do a proper backup and then reinitialise the
>disk with a smaller indexf.sys and perhaps it will work fine.

Good luck!  When I saw how slow it was going I figured that maybe a shadow
set copy to a spare drive would be faster.  Mounted the drive as a single
member shadowset, added a spare (SCSI) drive, and wham! The system 
crashed in the shadow driver, first time that this system crashed, ever.
(if power was more stable around here, I'd have uptime of a few years)

Then I tried BACKUP/IMAGE to the spare drive.  Got all the way through to
the end (about a day+) and it failed at the very end, and the resulting
drive won't mount.  The next attempt was a BACKUP/PHYSICAL to the spare
drive, which worked (except for all the errors) although it took nearly 3
days.  I see they changed BACKUP/PHYSICAL to only give a warning if the
dest. drive is not the same size (but is larger) than the source.  I was
ready to hack the UCB to force the copy by making the dest drive the same
size as the source.

I have yet to do an autopsy on the resulting mess to see which files I
lost.  At least I have another copy of most of what I want.
I'll look at that crash dump and see why the shadow driver tossed its
cookies someday, too.
0
Reply moroney (973) 11/11/2011 4:16:10 AM

An update:

I booted minimum, and backup/image was just as slow.

However, removing /image made backup much faster (about 130KB/s listed
in a <ctrl-T> compared to between 45 and 50 with /image.

I suspect my large indexf.sys is too much for backup to handle.

So, what am I missing when not doing /image ?

I take it that during a restore, the disk will not be initialised, and
boot blocks not set. (and alias processing would be different but this
disk  doesn't have any). Anything else that will be missing from a
backup without /image ?
0
Reply jfmezei.spamnot (8832) 11/11/2011 7:02:08 AM

Turns out that the /NOIMAGE backup just appeared to be faster because I
kept the source to just the device name so it only copied files in the
current directory of that device.

Have now begun a /IMAGE again. Now it is telling me about 8 hours to
completion instead of 12 to 15 before. I have opened the window to cool
that room down as much as possible. I was thinking of perhps sutting it
down for a few hours to let it cool. But am affraid the disk might not
start up again.


That message about invalid file struture level appears once so far, but
on a different file and did not appear for the files that generated it
the last time around. That is scary.

Do I have any assurance that Backup will get the right data off the disk ?


0
Reply jfmezei.spamnot (8832) 11/11/2011 9:56:27 AM

On Nov 11, 3:46=A0am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> So I decided to do an ana/disk/repair to get the strange error messages
> that don't seem to go away:
>
> -ANALDISK-I-BAD_FIDNUM, unexpected FID_NUM or FID_NMX field
> -ANALDISK-I-INVHEADER_BUSY, invalid file header marked "busy"
> =A0 =A0 =A0 =A0 in index file bitmap
> %ANALDISK-W-BADHEADER, file (4479,639,0)
> =A0 =A0 =A0 =A0 invalid file header
Looks like a different file, in the OP the FID was (58136,1,0); looks
like a different problem.
The secondary message BAD_FIDNUM is for a different file as well. Were
there other seconday messages and messages for FID (58136,1,0)?

If the file structure is invalid analyze/disk should give you:

%ANALDISK-W-BADHEADER, file (16,3,0)
        invalid file header
-ANALDISK-I-BAD_STRUC_LEVEL, STRUCLEV field is bad
-ANALDISK-I-INVHEADER_BUSY, invalid file header marked "busy"
        in index file bitmap

You may see subsequent messages like
%ANALDISK-W-ALLOCCLR, blocks incorrectly marked allocated
        LBN 45 to 51, RVN 1
%ANALDISK-W-BADDIRENT, invalid file identification in directory entry
        [000000]JUNK.DAT;1
-ANALDISK-I-BAD_DIRHEADER, no valid file header for directory
%ANALDISK-W-FREESPADRIFT, free block count of 48 is incorrect (RVN 1);
        the correct value is 55

Yes, the manipulated file header is for JUNK.DAT;1 which has FID
(16,3,0).

On the other hand, dump/header is not designed to show you anything in
case of
an invalid file structure:
$ dump/header lda16:[000000]junk.dat/bl=3Dco=3D0
%DUMP-E-OPENIN, error opening LDA16:[000000]JUNK.DAT;1 as input
-RMS-E-ACC, ACP file access failed
-SYSTEM-W-FILESTRUCT, unsupported file structure level
$
and
$ dump/header lda16:/id=3D16
%DUMP-E-BADHEADER, File ID references invalid or non-ODS-2 file header
$

The SYSTEM-W-FILESTRUCT secondary error message looks like the one you
had with backup/image. It looks like backup tried to read the file
header - probably the same way as dump/header <file_spec> - and it
failed.

> Any hints on what this really means ? Interesting that backup only found
> a couple of "invalid structure level files", but ana/disk finds a few
> hundred of the above messages. It does not seem to repair them.

Aha, the BAD_STRUC_LEVEL for the (58136,1,0) may be in the few
hundreds. But in another post you showed that dump/header worked on
that file. So there is very likely a different problem - maybe with
the disk.

Obviously anal/disk does not repair such errors. It assumes that the
file header is corrupt. It doesn't seem to try whether the correct
structure level would make it a valid header. I don't know much about
DFU, maybe it can do that. But I'm not yet convinced that "corrupt
headers" are the real problem.

> My hope is that with a reboot (and a volume rebuild that takes 45
> minutes), I'll be able to do a proper backup and then reinitialise the
> disk with a smaller indexf.sys and perhaps it will work fine.

I don't understand what you are going to do. If the headers are
corrupt in INDEXF.SYS, a reboot and a rebuild will not change anything.
0
Reply becker.avd (47) 11/11/2011 10:00:44 AM

On Nov 11, 9:56=A0am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Turns out that the /NOIMAGE backup just appeared to be faster because I
> kept the source to just the device name so it only copied files in the
> current directory of that device.
>
> Have now begun a /IMAGE again. Now it is telling me about 8 hours to
> completion instead of 12 to 15 before. I have opened the window to cool
> that room down as much as possible. I was thinking of perhps sutting it
> down for a few hours to let it cool. But am affraid the disk might not
> start up again.
>
> That message about invalid file struture level appears once so far, but
> on a different file and did not appear for the files that generated it
> the last time around. That is scary.
>
> Do I have any assurance that Backup will get the right data off the disk =
?

If you want the data off the disk and you have lost confidence in
either the disk hardware or the file structure it contains, have you
considered doing a block for block copy to something else (a container
file or a whole other drive) using your favourite block-by-block copy
tool (BACKUP /PHYSICAL? Maybe something other than BACKUP if you're
worrying about BACKUP doing the right thing quickly enough?)

OK this will copy empty space as well as occupied space but if you're
really worried about filesystem integrity that may be a GOOD thing to
do (there may be useful recoverable data in that "empty" space).

Before taking this backup I'd also want to check the error log, check
that IDE cables were properly seated, and perhaps even replace the IDE
cable in question with a "known good" one from elsewhere.

Then you hopefully have a safe (ish) copy of the data to work on.

Have fun.
0
Reply johnwallace43 (187) 11/11/2011 11:11:03 AM

John Wallace wrote:
>
> If you want the data off the disk and you have lost confidence in
> either the disk hardware or the file structure it contains, have you
> considered doing a block for block copy to something else 

Initially, I thought it was just the file structure. I don't experience
problems using the system, but know that ana/disk/repair generated
errors. So I decided to backup the disk and restore it to make it
"clean". It is only in doing so that I have come to realise that there
may be something wrong with the disk.

I had originally tried to send the backup to my xserve via NFS, but it
turns out that NFS has file size limitations that are well below the 10
gigs I need.

backup/physical would require 27 gigs of storage which I don't have on
the alpha.  I guess I could create about 15 2gig container fiels on the
xserve, mount them with LD, and then create a bound volume set that
would create a 30 gig drive allowing me to create one big saveset from a
physical backup.

After the backup/image is done, I may do a backup/physical to NLA0: just
to see how fast it runs.


> tool (BACKUP /PHYSICAL? Maybe something other than BACKUP if you're
> worrying about BACKUP doing the right thing quickly enough?)

I would have no worry about backup/physical. My concern is that backup
uses many tricks to access the file system and is exhibiting problems
reading this drive that I didn't experience in daily use. I am really
hoping this is just the overly obese INDEXF.SYS that causes it. (I've
deleted 10s of thousands of files that were a backup of an older mac
done via NFS).

> Before taking this backup I'd also want to check the error log,

If the devices all show 0 errors, would there be stuff in the error log?

Do IDE drives even report errors to VMS ? This tech was designed for
running DOS on wintel.



> that IDE cables were properly seated, and perhaps even replace the IDE
> cable in question with a "known good" one from elsewhere.

Well, for now, I will let the backup complete. Once I am reasonably sure
that the saveset is wholesome, then I can open up the alpha and perform
tests.

Another possibility is to reduce the alpha to a single drive system.
Move the backup save set to a bound volume made up of 2 gig containers
on the xserve, and then use backup to restore the files (/NOIMAGE) to
the system drive.
0
Reply jfmezei.spamnot (8832) 11/11/2011 5:45:28 PM

My backup is now at 35% complete !  Only one file got the originl warning.


If/when it completes,  is there a magic incantatio of BACKUP that would
let me verify the save set against the disk files  ?


The system is bootem "minimum" drive was mounted /NOWRITE , so a
verification should prove to be an exact atch even of backup took days
to complete.
0
Reply jfmezei.spamnot (8832) 11/12/2011 4:04:00 AM

On 2011-11-11, JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
> My backup is now at 35% complete !  Only one file got the originl warning.
>
> If/when it completes,  is there a magic incantatio of BACKUP that would
> let me verify the save set against the disk files  ?
>

Why isn't $ BACKUP/COMPARE suitable ?

That would seem to be the appropriate command to use.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
0
Reply clubley (1186) 11/12/2011 10:45:21 AM

On 11/11/2011 11:04 PM, JF Mezei wrote:
> My backup is now at 35% complete !  Only one file got the originl warning.
>
>
> If/when it completes,  is there a magic incantatio of BACKUP that would
> let me verify the save set against the disk files  ?
>
>
> The system is bootem "minimum" drive was mounted /NOWRITE , so a
> verification should prove to be an exact atch even of backup took days
> to complete.

What have you been drinking??   "bootem"???, "atch"???

0
Reply rgilbert88 (4360) 11/12/2011 2:41:25 PM

Simon Clubley wrote:

> Why isn't $ BACKUP/COMPARE suitable ?


Thanks. With my system having been essentially down for days now while
doing a backup of the drive, I haven't had access to normal HELP and had
never used /COMPARE before. I asky my buddy Mr Google and indeed, this
exactly what I will need if/when this backup completes. (am now at 60%
of 9.8 gigs to backup).


BTW, why can't one do a @TCPIP$STARUP  when booted minimum ? Is this
just some arbritary decision or must there be specific drivers that are
needed which are not loaded in a minimum boot ?

Is it just a question of undefining some logical name that indicates
this was a minimum boot after which once can start TCPIP ?

(It would be nice to have telnet access into the system while that
backup is monopolizing OPA0: (serial port) for days.  (it is too late
for this backup pass, but if backup/compare ends up taking just as long,
it would be nice to enable telnet access so that other tings can be done
while the compare spends days.
0
Reply jfmezei.spamnot (8832) 11/12/2011 6:58:29 PM

The plot thickens...

Backup now shows it is at 100% done but it is still chugging along, and
it emitted the follwoing:

> %BACKUP-E-OPENIN, error opening $11$DQB0:[WWW_WAP.WEATHER]radar_pickup.log;79 as input
> -SYSTEM-W-NOSUCHFILE, no such file
> %BACKUP-E-OPENIN, error opening $11$DQB0:[]radar_pickup.log;79 as input
> -SYSTEM-W-NOSUCHFILE, no such file
> %BACKUP-E-OPENIN, error opening $11$DQB0:[]$11$DQB0:[]radar_pickup.log;79�
> 
> 
> 
> DOC$S_WWW.DIR��DEVELOPMENT.DIR[�                                          ��CREATE_DISK5.DIR;�  DECUS.DIR<�
> -SYSTEM-W-FILESTRUCT, unsupported file structure level
> %BACKUP-E-OPENIN, error opening $11$DQB0:[]ALPHA_FORT080.ZIP;1 as input
> -SYSTEM-W-FILESTRUCT, unsupported file structure level
> %BACKUP-E-OPENIN, error opening $11$DQB0:[]_CACHE_003_.;1 as input
> -SYSTEM-W-FILESTRUCT, unsupported file structure level

Does Backup/IMAGE traverse the directory structure, or does it simply
backup all files in INDEXF.SYS ?

Since this disk is mounted /NOWRITE and system is booted "MIN", the
radar pickup.log;79 couldn't have been deleted during the days it has
taken to do this backup.

Notice how initially, it finds the .log;79 in its proper directory, but
complains about "no such file".

Then, it finds .log;79 in the [] directory, and complains it isn't
there.  And finally, it complains that a file with plenty of garbage
attached after the .log;79 has unsupported file structure (implying it
has found the file and parsed the header.

This just in: Backup finally finished !!!!

And the device shows 0 errors.


> Disk $11$DQB0: (CHAIN), device type Maxtor 53073H6, is online, allocated,
>     deallocate on dismount, mounted, software write-locked, file-oriented
>     device, shareable, served to cluster via MSCP Server, error logging is
>     enabled.
> 
>     Error count                    0    Operations completed             423300
>     Owner process           "SYSTEM"    Owner UIC                      [SYSTEM]
>     Owner process ID        20200110    Dev Prot            S:RWPL,O:RWPL,G:R,W
>     Reference count                2    Default buffer size                 512
>     Total blocks            58633344    Sectors per track                    87
>     Total cylinders             7659    Tracks per cylinder                  88
>     Logical Volume Size     58633344    Expansion Size Limit           58662912
>     Allocation class              11
> 
>     Volume label             "MAVIC"    Relative volume number                0
>     Cluster size                   6    Transaction count                     1
>     Free blocks             38037456    Maximum files allowed           4188096
>     Extend quantity               18    Mount count                           1
>     Mount status             Process    Cache name         "_$11$DQA0:XQPCACHE"
>     Extent cache size             64    Maximum blocks in extent cache  3803745
>     File ID cache size            64    Blocks in extent cache                0
>     Quota cache size               0    Maximum buffers in FCP cache       1124
>     Volume owner UIC        [SYSTEM]    Vol Prot    S:RWCD,O:RWCD,G:RWCD,W:RWCD
> 
>   Volume Status:  ODS-5, subject to mount verification, file high-water marking,
>       write-back caching enabled, hard links enabled.


Looks like I have a lot of meat in this with plenty of stuff to
investigate. I'll have to brish off that tool that lets you look at
INDEXF.SYS.

BACKUP/COMPARE is finding errors in specific blocks. But if I type the
file, it appears fine.
0
Reply jfmezei.spamnot (8832) 11/14/2011 12:03:53 AM

Am starting to suspect BACKUP having problems with very large INDEXF files.

I restored one file from the saveset to my sys$login, and then compared
it with that on the suspect disk.

$ diff
dqb0:[000000.ALLIN1.CBI_BRITISH]CBIQFIRST.SCP;1 sys$login:cbiqfirst.scp
Number of difference sections found: 0
Number of difference records found: 0

DIFFERENCES /IGNORE=()/MERGED=1-
    DQB0:[000000.ALLIN1.CBI_BRITISH]CBIQFIRST.SCP;1-
    SYS$SYSROOT:[SYSMGR]CBIQFIRST.SCP;1


yet...

When I first ran BACKUP/COMPARE , it issued verification warnings for
blocks 5 onwards of that file. And now, when I run backup/compare, it
issues warnings for different files but not the above file.


Right now, I am thinking of using Steven Schweda's ZIP utility to zip
each directory separately with the output on the NFS drive, and if
satisfied, I will do an INIT of that disk and restore from the ZIP archives.


The fact that backup tells me 2 files are different when diff says they
are not and when TYPE shows a perfect text file gets me worried.

This is on alpha VS 8.3. It is a shame that people like Guy Peleg no
longer works for the owner of VMS since such real engineers might be
interested in finding out why BACKUP shows errors where there doesn't
appear to be errors.
0
Reply jfmezei.spamnot (8832) 11/14/2011 12:42:27 AM

On Nov 13, 7:03=A0pm, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> The plot thickens...
>
> Backup now shows it is at 100% done but it is still chugging along, and
> it emitted the follwoing:
>
> > %BACKUP-E-OPENIN, error opening $11$DQB0:[WWW_WAP.WEATHER]radar_pickup.=
log;79 as input
> > -SYSTEM-W-NOSUCHFILE, no such file
> > %BACKUP-E-OPENIN, error opening $11$DQB0:[]radar_pickup.log;79 as input
> > -SYSTEM-W-NOSUCHFILE, no such file
> > %BACKUP-E-OPENIN, error opening $11$DQB0:[]$11$DQB0:[]radar_pickup.log;=
79=B0
>
> > DOC$S_WWW.DIR=FB=FFDEVELOPMENT.DIR[=FF =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=B8=FFCREATE_DISK5.DIR;=
=FF =A0DECUS.DIR<=FF
> > -SYSTEM-W-FILESTRUCT, unsupported file structure level
> > %BACKUP-E-OPENIN, error opening $11$DQB0:[]ALPHA_FORT080.ZIP;1 as input
> > -SYSTEM-W-FILESTRUCT, unsupported file structure level
> > %BACKUP-E-OPENIN, error opening $11$DQB0:[]_CACHE_003_.;1 as input
> > -SYSTEM-W-FILESTRUCT, unsupported file structure level
>
> Does Backup/IMAGE traverse the directory structure, or does it simply
> backup all files in INDEXF.SYS ?

Unless something this basic has changed since V6.2:

Both. First a bitmap is made of all the file headers in INDEXF.SYS.
Then BACKUP walks the directory structure, marking each file header in
the bitmap as it goes along. Any remaining files in the bitmap are
backed up under [].

>
> Since this disk is mounted /NOWRITE and system is booted "MIN", the
> radar pickup.log;79 couldn't have been deleted during the days it has
> taken to do this backup.
>
> Notice how initially, it finds the .log;79 in its proper directory, but
> complains about "no such file".

There is a directory entry for it, but there is no file with its
original FID.

>
> Then, it finds .log;79 in the [] directory, and complains it isn't
> there.

This is the leftover in the bitmap, so the file was present during the
initial scan of INDEXF.SYS, which makes the "bitmap".


And finally, it complains that a file with plenty of garbage
> attached after the .log;79 has unsupported file structure (implying it
> has found the file and parsed the header.

Must be from the corruption, or something.

Perhaps the file already wasn't there before you mounted /NOWRITE.

>
> This just in: Backup finally finished !!!!
>
> And the device shows 0 errors.

Ah, but what about "unexpected errors"? Now just exactly what is an
"unexpected error"? Some errors are just expected, I guess. "Gee, I
wasn't expecting an error. I certainly wasn't expecting an unexpected
error! Expected errors, yes. But not unexpected errors."

"But what if you actually expect unexpected errors? Does that cure
them? (~_^).

Pilot: We're expecting some unexpected turbulence on the flight.

[...]
> > =A0 =A0 Cluster size =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 6 =A0 =A0Trans=
action count =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1

Maybe try a larger cluster size next time.
[...]

AEF
0
Reply spamsink2001 (3065) 11/14/2011 2:52:09 AM

On Nov 13, 6:42=A0pm, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:

> When I first ran BACKUP/COMPARE , it issued verification warnings for
> blocks 5 onwards of that file. And now, when I run backup/compare, it
> issues warnings for different files but not the above file.

   Note that BACKUP /COMPARE may look at bytes past the
End-of-File mark, so it's not unusual to get a complaint
about the last block in the file, where DIFFERENCES is happy
about the same two files.

> Right now, I am thinking of using Steven Schweda's ZIP utility [...]

   It's not mine.  I'll admit only to adding some bits, and
changing others.

   Note also that "zip -V" stops at EOF, and so, if BACKUP
/COMPARE looks past EOF, then you may get some last-block
miscompares after Zip+UnZip.  VAX .OBJ files often get this.
("zip -VV" saves all allocated blocks, so it can avoid some
of these BACKUP /COMPARE complaints.)
0
Reply sms.antinode (933) 11/14/2011 3:19:59 AM

Re: unexpected errors:  One expecs some "file is open, but backing up
anyways because you had /ignore=interlock".

I would also expect "file not found" messages on an active system (file
purged when a new version was created (log files for instance).

Such messages, I can love with. (note that in this case, there was no
activity so the fact that it saw a file that needed backing up and then
couldn't fnd the file was strange.


I have written a .ZIP procedure which backs up each main directory in
[000000] on the drive to a NDF directory.  It seems to be running much
faster than backup. and no complaints.

I am really suspecting that Backup had serious problems with a large
indexf.sys.

If/when the disk is recreated with a normal indexf.sys, I'll see i
backup can then process the disk faster. If so, it would point to data
structure problems as opposed to hardware falure of the disk drive.
0
Reply jfmezei.spamnot (8832) 11/14/2011 5:18:16 AM

JF Mezei wrote:
> I had originally tried to send the backup to my xserve via NFS, but it
> turns out that NFS has file size limitations that are well below the 10
> gigs I need.
> 
> backup/physical would require 27 gigs of storage which I don't have on
> the alpha.  I guess I could create about 15 2gig container fiels on the
> xserve, mount them with LD, and then create a bound volume set that
> would create a 30 gig drive allowing me to create one big saveset from a
> physical backup.
> 
It depends on the version of NFS you are using : NFS 2.0 has a limit of 
around 4GB. However the latest ECO's for HP TCP/IP services V5.7 contain 
support for NFS 3.0 which allows for much larger file sizes (and 27 GB 
is no problem). Note that also your xserve should run at least NFS 3.0

               Jouk
0
Reply joukj2 (173) 11/14/2011 7:30:13 AM

JF Mezei wrote:
> I had originally tried to send the backup to my xserve via NFS, but it
> turns out that NFS has file size limitations that are well below the 10
> gigs I need.
> 
> backup/physical would require 27 gigs of storage which I don't have on
> the alpha.  I guess I could create about 15 2gig container fiels on the
> xserve, mount them with LD, and then create a bound volume set that
> would create a 30 gig drive allowing me to create one big saveset from a
> physical backup.
> 
It depends on the version of NFS you are using : NFS 2.0 has a limit of 
around 4GB. However the latest ECO's for HP TCP/IP services V5.7 contain 
support for NFS 3.0 which allows for much larger file sizes (and 27 GB 
is no problem). Note that also your xserve should run at least NFS 3.0

               Jouk
0
Reply joukj2 (173) 11/14/2011 7:43:58 AM

An update to my problem...

I was away last week, so I didnt get to do much work.


It was confirmed to be by a real/former VMS engineer that the VMS IDE
driver does not log errors, so having an error count of 0 did not mean
the disk was healthy.

I am extreely disapointed that Digital would produce Alphas with disk
drivers that did not log errors.

I may take this IDE drive and plug it into an old mac and run the disk
utility to test it out.



I tried to copy the allin1 software which was on that drive to the
healthy system disk. Allin1 crashed or said that a premature end of file
was encountered (although ana/image discovered 0 errors). So I am in the
process of installing it from scratch and will move config files from
older backups (those don't change much) once it is installed. A pain to
do since my remaining use of A1 is to port data/documents to another
platform.

So it looks like data corruption had truly begun on the drive.


last resort would be to run the VAX version on SIMH or fire up my
remaining VAXstation 3100 to extract the data. (I have good backups of
that data).


Here is a question:

> $ show dev dqb0/full
> 
> Disk $11$DQB0: (CHAIN), device type Maxtor 53073H6, is online, file-oriented
>     device, shareable, served to cluster via MSCP Server, error logging is
>     enabled.
> 
>     Error count                    0    Operations completed                223
>     Owner process                 ""    Owner UIC                      [SYSTEM]
>     Owner process ID        00000000    Dev Prot            S:RWPL,O:RWPL,G:R,W
>     Reference count                1    Default buffer size                 512
>     Total blocks            58633344    Sectors per track                    87
>     Total cylinders             7659    Tracks per cylinder                  88
>     Allocation class              11
> 
> $ mount/foreign dqb0:
> %MOUNT-F-DEVALLOC, device already allocated to another user


Of the owner process ID is blank, why would the system think the define
has been allocated to another user ?



0
Reply jfmezei.spamnot (8832) 11/21/2011 10:13:30 PM

On 2011-11-21, JF Mezei <jfmezei.spamnot@vaxination.ca> wrote:
>
> Here is a question:
>
>> $ show dev dqb0/full
>> 
>> Disk $11$DQB0: (CHAIN), device type Maxtor 53073H6, is online, file-oriented
>>     device, shareable, served to cluster via MSCP Server, error logging is
>>     enabled.
>> 
>>     Error count                    0    Operations completed                223
>>     Owner process                 ""    Owner UIC                      [SYSTEM]
>>     Owner process ID        00000000    Dev Prot            S:RWPL,O:RWPL,G:R,W
>>     Reference count                1    Default buffer size                 512
>>     Total blocks            58633344    Sectors per track                    87
>>     Total cylinders             7659    Tracks per cylinder                  88
>>     Allocation class              11
>> 
>> $ mount/foreign dqb0:
>> %MOUNT-F-DEVALLOC, device already allocated to another user
>
> Of the owner process ID is blank, why would the system think the define
> has been allocated to another user ?
>

Because the reference count appears to be wrong.

Simon.

-- 
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
0
Reply clubley (1186) 11/21/2011 11:44:57 PM

Update: I ran an ANA/MEDIA/EXERCISE for a few hours. Not sure if this
ends by itself eventually or if disk was very slow.

I then initialised the suspect drive. I am able to write to it at full
speed.

What I might do is BACKUP/verify to it to see it can reliably read/write

I am torn between declaring the drive hardware-dead or between having
had  corrupted disk structure which caused problems due to an overly big
indexf.sys which may have consume too much memory somewhere etc.
0
Reply jfmezei.spamnot (8832) 11/22/2011 5:58:31 AM

On Nov 21, 11:13=A0pm, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:

> I tried to copy the allin1 software which was on that drive to the
> healthy system disk. Allin1 crashed or said that a premature end of file
> was encountered (although ana/image discovered 0 errors). So I am in the
> process of installing it from scratch and will move config files from
> older backups (those don't change much) once it is installed. A pain to
> do since my remaining use of A1 is to port data/documents to another
> platform.
>
> So it looks like data corruption had truly begun on the drive.

Maybe. But if Allin1 crashed, then the image would have been activated
without any problem. If the image file were corrupt, the image
activator would have reported a problem. Ana/image is very likely
correct, but on the other hand, ana/image is not the tool to check the
correctness of an image file: it is a tool to format the image data.
Any error it encounters while formatting the data it will report, any
other error is undetected.

> last resort would be to run the VAX version on SIMH or fire up my
> remaining VAXstation 3100 to extract the data. (I have good backups of
> that data).

I would have connected the disk to a PC with Linux and would have made
a disk image with dd_rescue. Then I would have read the disk image
with vdisk, LD or the Linux ods5 file system.
0
Reply becker.avd (47) 11/22/2011 8:46:57 AM

28 Replies
549 Views

(page loaded in 0.226 seconds)

Similiar Articles:


















7/22/2012 10:32:04 PM


Reply: