VAX VMS 7.3, ana/image running out of virtual memory ?

  • Follow


SHORT STORY: I have a corrupted bound volume set thanks to VMS's no-longe=
r=20
so robust software. I should have stayed with 7.2.


QUESTION:  can I safely ANA/IMAGE individual drives in a bound volume set=
 ?=20
trying to analyse the whole volume fails. Are there other tools to fix a =

drive that is corrupted with some file data having been overwritten ?

(THE LD driver used was on Alpha 8.3 accessing the bound volume via MSCP)=
=2E


Am not a happy puppy. (presently doing a backup to see how much of the=20
drive can be recuperated, but I have no idea how many files have had thei=
r=20
actual data overwritten and which may not signal any error until you try =
to=20
look at the actual contents). (like my TPU$COMMANDS.TPU file which was=20
overwritten with junk).


I have a bound volume set (4 dssi disks of 2gigs). Tonight, while creatin=
g=20
some ISO container files, I filled the volume. I quickly deleted files to=
=20
make space.

However while using BACKUP to populate an ISO container file (LDA device:=
)

%BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XA_SS.EXE;1
%BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XG.EXE;1
%BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XG_SS.EXE;1
%BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM_XA.H;1
%BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DEBUG.EXE;1
%BACKUP-I-BTCROUT, routine Write error detected, output Status, Call=20
RESTORE_ERROR
%BACKUP-I-VALUE_TRACE_D, Decimal trace value 844
%BACKUP-E-WRITEBLOCK, error writing block 5293 of=20
LDA1:[VMS$COMMON.SYSLIB]DEBUGSHR.EXE;1
-SYSTEM-F-IVADDR, invalid media address
%SYSTEM-F-ABORT, abort
$ show dev lda1:/full

OK, figured perhaps it was some problem with LD Driver on alpha accessing=
 a=20
bound volume served by a 7.3 vax.

But then...

$ edit sys$manager:systartup_vms.com
%DCL-S-SPAWNED, process JFMEZEI_1 spawned
$
%TPU-E-READERR, error reading USRDIR:[JFMEZEI]TPU$COMMAND.TPU;2
-RMS-F-IRC, illegal record encountered; VBN or record number =3D 1

And now:

$ ana/disk/repair $disk2
Analyze/Disk_Structure/Repair for _$4$DIA1: started on 13-DEC-2006 06:06:=
53.72

%ANALDISK-I-OPENQUOTA, error opening QUOTA.SYS
-SYSTEM-W-NOSUCHFILE, no such file
%ANALDISK-I-BADHIGHWATER, file (431,5,1) AUSTRALIA.DXF;5
   inconsistent highwater mark and EFBLK
%ANALDISK-F-ALLOCMEM, error allocating virtual memory
-LIB-F-INSVIRMEM, insufficient virtual memory
$


-----------------


I have never seen ANA/DISK run out of virtual memory and was always able =
to=20
do this ana/disk on that bound volume set.


OK, and now:

DUMP USRDIR:[JFMEZEI]TPU$COMMAND.TPU;2

Yield only garbage date in it.
irtual block number 1 (00000001), 512 (0200) bytes

  00000000 00000031 00000000 00000000 ........1....... 000000
  0015E768 0015E758 58000000 00000040 @......XX=E7..h=E7.. 000010
  00000000 00000032 00000000 00000000 ........2....... 000020
  0015E780 0015E770 58001800 00000040 @......Xp=E7...=E7.. 000030
  00000000 00000031 00000000 00000000 ........1....... 000040
  0015E798 0015E788 58000800 00000040 @......X.=E7...=E7.. 000050
  00000008 00000031 00000000 00000000 ........1....... 000060
  0015E7B8 0015E7A8 5800B000 00000040 @....=B0.X=A8=E7..=B8=E7.. 000070



DAMNED VAX 7.3. It broke the OSU web server and had now corrupted my main=
=20
disk (just before I was to move it to new drives and get rid of this boun=
d=20
volume set).


0
Reply jfmezei.spamnot4 (5184) 12/13/2006 11:24:02 AM

Update:

Mozilla places it temporary files in my SYS$LOGIn which is on that drive. 
As a result, it has problems managing its own files. It appears that 
messages do get sent hoewver, despite the multiple error messages that are 
issued when I try to send it.

And so  far, BACKUP has already uncovered errors:

$ backup/list=disk2.list/image/ignore=(nobackup,interlock)/noalias $disk2: 
$disk
4:[000000]disk2.save/save
%BACKUP-E-BADDIR, directory $DISK2:[APPLICATIONS.KERMIT] has invalid format
%BACKUP-E-BADDIR, directory $DISK2:[APPLICATIONS.MOSAIC41] has invalid format

In a BACKUP/IMAGE, would the files inside those directories still get 
copied over because they would be in indexF.SYS ? Or are none of the files 
in those directories going to be included because the directory files were 
crushed with random data ?
0
Reply jfmezei.spamnot4 (5184) 12/13/2006 11:29:06 AM


JF Mezei wrote:
> SHORT STORY: I have a corrupted bound volume set thanks to VMS's no-longer
> so robust software. I should have stayed with 7.2.

Something went wrong somewhere. It is not clear at all that this was
caused  by a newer VMS version. It could easily be a hardware problem,
it coudl be a corruption seeded by 7.2 and finally comming to light. I
>
>
> QUESTION:  can I safely ANA/IMAGE individual drives in a bound volume set=
 ?

I believe it will have to go back to the root member.

Think about it for 1/2 second: In a bound volume set new files will be
allocated on the member disk with most space, not where the directory
is. So ina 4 member set roughly 75% of the files has a header in an
indexf.sys for a directory which is not on the drive.
So 75% of the file will be 'lost files' whne looked at the disk in
isolation.
Some files may have extention header on an other drive. Yikes.
The bitmap and allocation and indexf.sys should all be valid,
consistent, files-11 though.

> trying to analyse the whole volume fails. Are there other tools to fix a
> drive that is corrupted with some file data having been overwritten ?

I don;t think you have a clue about the exxtent of the damage, so I
woudl be very reluctant to start repairing. I would switch to read only
mode and try to suck off recent files changes after the last backup.

Main alternative tool is of course DFU.
It had a built in VERIFY option (with /FIX , if deemed appropriate
after analysis)

The other big tools for cases like this are - Brains; Patience; Dump;
and the black bible: "Guide to VMS file systems internals";

> (THE LD driver used was on Alpha 8.3 accessing the bound volume via MSCP).

Hmm, odd choice.
- Aggresive mix of technology.
- Unsupported cluster setup.
Why not write it to the local drive on the 8.3 box.

>
> Am not a happy puppy. (presently doing a backup to see how much of the
> drive can be recuperated, but I have no idea how many files have had their
> actual data overwritten and which may not signal any error until you try =
to
> look at the actual contents). (like my TPU$COMMANDS.TPU file which was
> overwritten with junk).

So what does the 'junk' look like? That's where the DUMP comes in.
The contents may give a clue as to what went wrong.

> I have a bound volume set (4 dssi disks of 2gigs). Tonight, while creating
> some ISO container files, I filled the volume. I quickly deleted files to
> make space.

So there where some interesting dynamics going on, and boundary
conditions my have been hit, like members running out of disk space
forcing a file header extent on an other member, which the old software
did not manage correctly?
Was the V7 system up to date with its patches?

> And now:
>
> $ ana/disk/repair $disk2
> Analyze/Disk_Structure/Repair for _$4$DIA1: started on 13-DEC-2006 06:06:=
53.72
>
> %ANALDISK-I-OPENQUOTA, error opening QUOTA.SYS
> -SYSTEM-W-NOSUCHFILE, no such file
> %ANALDISK-I-BADHIGHWATER, file (431,5,1) AUSTRALIA.DXF;5
>    inconsistent highwater mark and EFBLK
> %ANALDISK-F-ALLOCMEM, error allocating virtual memory
> -LIB-F-INSVIRMEM, insufficient virtual memory

So did you try giving it (lots) more memory?

This could be a simple lack of PAGFILQUO !?

Did you play with SET VOL/LIMIT?
I've seen on report on ANAL running out of memory in combination with
that.

and.... perhaps use /CONFIRM ?


> OK, and now:
>
> DUMP USRDIR:[JFMEZEI]TPU$COMMAND.TPU;2
>
> Yield only garbage date in it.
> irtual block number 1 (00000001), 512 (0200) bytes
>
>   00000000 00000031 00000000 00000000 ........1....... 000000
>   0015E768 0015E758 58000000 00000040 @......XX=E7..h=E7.. 000010
>   00000000 00000032 00000000 00000000 ........2....... 000020

I don't recognize that either... for now.


> DAMNED VAX 7.3. It broke the OSU web server and had now corrupted my main
> disk (just before I was to move it to new drives and get rid of this bound
> volume set).

So was it 7.3 or 7.2 ?
Which dot and dash versions?=20
Which patched applied?

0
Reply heinvandenheuvel2 (577) 12/13/2006 12:33:09 PM

(I fixed the subject: it shoudl have been ANA/DISK from the start).


Update: i was going to post some results from the alpha, but am now
unable to. Mozilla can no longer manager its control files where message
headers are loaded.

I did a dump of one of the directories reported as corrupt by BACKUP. It
contains textual data. It appears to be out of a VMS manual (index section).

It contains text such as :  security,  8-37      profilem 8-19
local subaddress 8-33
psi$configure 8-46
logiocal channel 8-19


So it appears that the LD CREATE command executed on Alpha 8.3 onto a
bound volume set ended up creating a container file that encompassed
real files. So when a backup command started to populate that container
file, it overwrite actual real data. 

This was a 600 meg file 


So now, I have to ask this:

How does ANA/DISK/REPAIR work ? 

If I delete the directory files that are now filled with junk, would
this allow ana/DISK/REPAIR to proceed without running out of virtual
memory ?


For files that are variable length, once they are oveerriden with random
data, they no longer have valid records, no valid record length bytes at
the start of each line etc. Yet, ANA/RMS reports no error.

Is there a way to quickly detect which file is like that ? 

EG: write a F$SEARCH loop of every file on that volume, and if its file
organisation is variable, then run that check to see if its records are
in fact valid ?


My last backup was done at a time when the DLT drive stopped working. So
I am in a rather difficult situation here.
0
Reply jfmezei.spamnot4 (5184) 12/13/2006 12:40:54 PM

Hein RMS van den Heuvel wrote:
> is. So ina 4 member set roughly 75% of the files has a header in an
> indexf.sys for a directory which is not on the drive.
> So 75% of the file will be 'lost files' whne looked at the disk in
> isolation.
> Some files may have extention header on an other drive. Yikes.
> The bitmap and allocation and indexf.sys should all be valid,
> consistent, files-11 though.


From the looks of it, I lost 600 megs worth of real data. It includes
parts of my web site which are down at the moment (file not found etc).

re: NOT SUPPORTED. Sorry to rain on your parade, but ALPHA 8.3 is said
to work fully with VAX 7.3 as per the SPD.

And I never read anyuthing about LD CREATE not being supported on bound
volume sets. And the size fo containers was small enough that even older
versions of LD would have worked.


> I don;t think you have a clue about the exxtent of the damage,

Doesn't look Good. My All-In-1 file cabinet is corrupt. 


> The other big tools for cases like this are - Brains; Patience; Dump;
> and the black bible: "Guide to VMS file systems internals";


> Why not write it to the local drive on the 8.3 box.


I was trying to build a container file sp a user could then FTP it to
their system, so I decided to create it in their SYS$LOGIN so it would
save me a long copy operation. 

> So what does the 'junk' look like? That's where the DUMP comes in.
> The contents may give a clue as to what went wrong.

In another message I gave examples. The container file was to contain a
VMS kit. So there would be lots of documentation, lots of binaries as well.


> conditions my have been hit, like members running out of disk space
> forcing a file header extent on an other member, which the old software
> did not manage correctly?
> Was the V7 system up to date with its patches?

Does the LD driver have its own fancy file allocation logic (which may
not be aware of older bound volumes), or does it use very standard/basic
file allocation routines ? In the later case, it means that standard
file allocation on VMS (Alpha 8.3) may have a big bug when dealing with
bound volume sets. 

If it didn't have enough space, it should have told me, instead of
stealing space from existing files.




> > $ ana/disk/repair $disk2
> > %ANALDISK-F-ALLOCMEM, error allocating virtual memory
> > -LIB-F-INSVIRMEM, insufficient virtual memory
> 
> So did you try giving it (lots) more memory?
> This could be a simple lack of PAGFILQUO !?


In the past, ANA/DISK did not have any problems with the quotas for the
SYSTEM account.  I boosted it PGFLQUO, and it still failed at the same
place. Just how much more should I give it ? 

I assume that some disk structure is all fucked up and ANA/DISK uses
data that is totally out of whack to measure how much memroy to allocate
and that causes it to fail.





> Did you play with SET VOL/LIMIT?

Nop.



> > DAMNED VAX 7.3. It broke the OSU web server and had now corrupted my main
> > disk (just before I was to move it to new drives and get rid of this bound
> > volume set).
> 
> So was it 7.3 or 7.2 ?
> Which dot and dash versions?
> Which patched applied?



VAX 7.3, Alpha 8.3. For VAX, I had recently applied a number of patches
in the hopes of fixing the problem with the OSU web server.  But SHOW
SYS still reports 7.3


For the alpha, I don't recall installing any patches.
0
Reply jfmezei.spamnot4 (5184) 12/13/2006 1:27:07 PM

JF Mezei wrote:

> For files that are variable length, once they are oveerriden with random
> data, they no longer have valid records, no valid record length bytes at
> the start of each line etc. Yet, ANA/RMS reports no error.
>
> Is there a way to quickly detect which file is like that ?

ANAL/RMS, and RMS itself treats a record whcih would go beyond EOF as
EOF.
I don't like that, but that's how it is.

With a C or MACRO program it would be trivial to SYS$READ or SYS$QIOW
the first block in the file and make sure the first 16-bit word has a
reasonable value for the file.
For a text file that often means less than 132, but certainly less than
MRS (from FAB) and preferably less than LRL (from XABFHC).

The following lines show a DCL implemenation of the suggestes check.

$ type CHECK_VAR.com
$rfm = f$file (p1,"RFM")
$mrs = f$file (p1,"MRS")
$lrl =  f$file (p1,"LRL")
$if rfm.nes."VAR" then exit %X0001860C
$set file 'p1'/attr=(rfm=fix,mrs=2,lrl=2)
$open/read/share=write file 'p1
$read file record
$length = f$cvui ( 0, 16, record )
$close file
$set file 'p1'/attr=(rfm='rfm',mrs='mrs',lrl='lrl')
$max = 200
$if mrs.ne.0 then max = mrs
$if lrl.ne.0 then max = lrl
$if length .gt. max
$then
$ write sys$output "suspect record size ''length' for ''p1'"
$else
$ write sys$output "file ''p1' seems ok."
$endif

Hein.

0
Reply heinvandenheuvel2 (577) 12/13/2006 1:50:12 PM

"JF Mezei" <jfmezei.spamnot@teksavvy.com> wrote in message 
news:457FFFA7.59C9390C@teksavvy.com...

> Does the LD driver have its own fancy file allocation logic (which may
> not be aware of older bound volumes), or does it use very standard/basic
> file allocation routines ?

I expect it take out an exclusive lock on the container file when mounted.
Other than that: it's not going to access the file through standard routines,
 it being a driver after all; it's going to have to figure out all the LBN
offsets itself. I wouldn't be greatly surprised if it assumed all the extents
were on the same volume, or even all on the primary volume. Do the
corrupt blocks appear at the same LBN numbers (on the other volume)
as the container file? 


0
Reply R.Brodie (551) 12/13/2006 1:55:37 PM

Richard Brodie wrote:
> were on the same volume, or even all on the primary volume. Do the
> corrupt blocks appear at the same LBN numbers (on the other volume)
> as the container file?


So far, of all the files I have found to be corrupt, they have all been
on RVN 1.  Not a conclusive thing mind you.


What I would like to do is to build a map of the disk with filenames
sorted by location on the disk. (at least for first header). The idea
being to try to find a pattern between files I know are currupt, which
would give me some hint on which other files to test (aka: all files
between 2 corrupt files in a list).


OK, just ran a bit of a test on the web site directories. (I backed it
up to a spare disk and try to make it functional again).

Basically a loop of f$search, then check if the file_attribute RFM =
VAR, if so, then TYPE/OUTPUT=NLA0: and check the $STATUS. 

451 files, 133 were variable, 3 were bad files.

But testing binary files (image,s .zips etc) will require the use a
browser once I have switched the web server to point to that spare disk.

But this is just so I can get the main web site back up. There is a hell
of a lot of testing to be done on a hell of a lot of files on that disk.

I've had to stop all queues and mail processing. Once I move the user
directories to another disk and clean the active ones up, then I can
restart mail processing. And maybe then get some sleep.
0
Reply jfmezei.spamnot4 (5184) 12/13/2006 2:21:08 PM

JF Mezei wrote:
> Update:
> 
> Mozilla places it temporary files in my SYS$LOGIn which is on that 
> drive. As a result, it has problems managing its own files. It appears 
> that messages do get sent hoewver, despite the multiple error messages 
> that are issued when I try to send it.
> 
> And so  far, BACKUP has already uncovered errors:
> 
> $ backup/list=disk2.list/image/ignore=(nobackup,interlock)/noalias 
> $disk2: $disk
> 4:[000000]disk2.save/save
> %BACKUP-E-BADDIR, directory $DISK2:[APPLICATIONS.KERMIT] has invalid format
> %BACKUP-E-BADDIR, directory $DISK2:[APPLICATIONS.MOSAIC41] has invalid 
> format
> 
> In a BACKUP/IMAGE, would the files inside those directories still get 
> copied over because they would be in indexF.SYS ? Or are none of the 
> files in those directories going to be included because the directory 
> files were crushed with random data ?

An image backup backs up all the files pointed to by INDEXF.SYS.

If you have a known good backup, first try deleting the directories in 
question and doing an ANALYZE /DISK /REPAIR /(no)CONFIRM.   You should 
then be able to create new directories and move files from [SYSLOST] to 
the proper directory.

If you don't have a known good backup, shame on you!
0
Reply rgilbert88 (4359) 12/13/2006 4:41:55 PM

"Richard B. Gilbert" wrote:
> An image backup backs up all the files pointed to by INDEXF.SYS.

Ok. in a  /LIST=  file, one has to search for [] to get a list of all
the orphaned files. 

Interestingly, BACKUP seems to copy them last. So I guess it builds list
of files from INDEXF, but then goes through the directory structure
marking the files as they are being backed up. And at the end, it goes
thorugh indexf and picks up those orphaned files that were not caught
while traversing the directories.

QUESTION: from those files marked as []S_ICON_5.XPM that BACKUP/IMAGE
rescued, is there any data contained in the saveset that would allow me
to deduct that it was a file that belonged below [MOSAIC41] directory ?
(eg: the backlink pointers)

For many files, I can guess where they came from. However, I have a
number of allin1 message files (those random file names) that come from
3 shared directories. So it woudl be nice to know in which of the 3
directories each of those files belongs.
0
Reply jfmezei.spamnot4 (5184) 12/13/2006 5:07:34 PM

Ok, mea culpa.

It's a bug in LDdriver, I just reproduced it and it's present for a
long time (since LD V8.0), on all platforms. Older versions are not
affected. It's a one line change, it is always writing to the member
of a bound volume set where the containerfile is created (normally
the first one).

So stay away from LD and bound volumesets for now. I'm not surprised
that it did not show up before as volumesets are evil, albeit supported.
There's not used much anymore nowadays.

This is one of the very few bugs in LD, and I'm sorry to see that it
had such severe consequenses.

A fix is on the way, I'll update my website later tonight.

http://www.digiater.nl/lddriver

Jur.

JF Mezei wrote:
> SHORT STORY: I have a corrupted bound volume set thanks to VMS's=20
> no-longer so robust software. I should have stayed with 7.2.
>=20
>=20
> QUESTION:  can I safely ANA/IMAGE individual drives in a bound volume=20
> set ? trying to analyse the whole volume fails. Are there other tools t=
o=20
> fix a drive that is corrupted with some file data having been overwritt=
en ?
>=20
> (THE LD driver used was on Alpha 8.3 accessing the bound volume via MSC=
P).
>=20
>=20
> Am not a happy puppy. (presently doing a backup to see how much of the =

> drive can be recuperated, but I have no idea how many files have had=20
> their actual data overwritten and which may not signal any error until =

> you try to look at the actual contents). (like my TPU$COMMANDS.TPU file=
=20
> which was overwritten with junk).
>=20
>=20
> I have a bound volume set (4 dssi disks of 2gigs). Tonight, while=20
> creating some ISO container files, I filled the volume. I quickly=20
> deleted files to make space.
>=20
> However while using BACKUP to populate an ISO container file (LDA devic=
e:)
>=20
> %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XA_SS.EXE;1
> %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XG.EXE;1
> %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XG_SS.EXE;1
> %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM_XA.H;1
> %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DEBUG.EXE;1
> %BACKUP-I-BTCROUT, routine Write error detected, output Status, Call=20
> RESTORE_ERROR
> %BACKUP-I-VALUE_TRACE_D, Decimal trace value 844
> %BACKUP-E-WRITEBLOCK, error writing block 5293 of=20
> LDA1:[VMS$COMMON.SYSLIB]DEBUGSHR.EXE;1
> -SYSTEM-F-IVADDR, invalid media address
> %SYSTEM-F-ABORT, abort
> $ show dev lda1:/full
>=20
> OK, figured perhaps it was some problem with LD Driver on alpha=20
> accessing a bound volume served by a 7.3 vax.
>=20
> But then...
>=20
> $ edit sys$manager:systartup_vms.com
> %DCL-S-SPAWNED, process JFMEZEI_1 spawned
> $
> %TPU-E-READERR, error reading USRDIR:[JFMEZEI]TPU$COMMAND.TPU;2
> -RMS-F-IRC, illegal record encountered; VBN or record number =3D 1
>=20
> And now:
>=20
> $ ana/disk/repair $disk2
> Analyze/Disk_Structure/Repair for _$4$DIA1: started on 13-DEC-2006=20
> 06:06:53.72
>=20
> %ANALDISK-I-OPENQUOTA, error opening QUOTA.SYS
> -SYSTEM-W-NOSUCHFILE, no such file
> %ANALDISK-I-BADHIGHWATER, file (431,5,1) AUSTRALIA.DXF;5
>   inconsistent highwater mark and EFBLK
> %ANALDISK-F-ALLOCMEM, error allocating virtual memory
> -LIB-F-INSVIRMEM, insufficient virtual memory
> $
>=20
>=20
> -----------------
>=20
>=20
> I have never seen ANA/DISK run out of virtual memory and was always abl=
e=20
> to do this ana/disk on that bound volume set.
>=20
>=20
> OK, and now:
>=20
> DUMP USRDIR:[JFMEZEI]TPU$COMMAND.TPU;2
>=20
> Yield only garbage date in it.
> irtual block number 1 (00000001), 512 (0200) bytes
>=20
>  00000000 00000031 00000000 00000000 ........1....... 000000
>  0015E768 0015E758 58000000 00000040 @......XX=E7..h=E7.. 000010
>  00000000 00000032 00000000 00000000 ........2....... 000020
>  0015E780 0015E770 58001800 00000040 @......Xp=E7...=E7.. 000030
>  00000000 00000031 00000000 00000000 ........1....... 000040
>  0015E798 0015E788 58000800 00000040 @......X.=E7...=E7.. 000050
>  00000008 00000031 00000000 00000000 ........1....... 000060
>  0015E7B8 0015E7A8 5800B000 00000040 @....=B0.X=A8=E7..=B8=E7.. 000070
>=20
>=20
>=20
> DAMNED VAX 7.3. It broke the OSU web server and had now corrupted my=20
> main disk (just before I was to move it to new drives and get rid of=20
> this bound volume set).
>=20
>=20
0
Reply Jur 12/13/2006 5:37:41 PM

In article <45803344.59422886@teksavvy.com>,
 JF Mezei <jfmezei.spamnot@teksavvy.com> wrote:

> "Richard B. Gilbert" wrote:
> > An image backup backs up all the files pointed to by INDEXF.SYS.
> 
> Ok. in a  /LIST=  file, one has to search for [] to get a list of all
> the orphaned files. 
> 
> Interestingly, BACKUP seems to copy them last. So I guess it builds list
> of files from INDEXF, but then goes through the directory structure
> marking the files as they are being backed up. And at the end, it goes
> thorugh indexf and picks up those orphaned files that were not caught
> while traversing the directories.
> 
> QUESTION: from those files marked as []S_ICON_5.XPM that BACKUP/IMAGE
> rescued, is there any data contained in the saveset that would allow me
> to deduct that it was a file that belonged below [MOSAIC41] directory ?
> (eg: the backlink pointers)
> 
> For many files, I can guess where they came from. However, I have a
> number of allin1 message files (those random file names) that come from
> 3 shared directories. So it woudl be nice to know in which of the 3
> directories each of those files belongs.

Many moons ago I had a disk where all the files appeared as orphaned.

Given that this system had 20 users, each with a unique UIC, I could use 
the UIC to decide which main directories to restore them to.

For other files, I had a backup listing online, so with a bit of editing 
and DCL, I could work out where 95% of them belonged.

-- 
Paul Sture
0
Reply paul.sture.nospam (2312) 12/13/2006 6:57:11 PM

Jur van der Burg wrote:
> 
> Ok, mea culpa.
> 
> It's a bug in LDdriver, I just reproduced it and it's present for a
> long time (since LD V8.0), on all platforms.

How dare you left a bug in VMS :-)  Perhaps if VMS engineers were
threathened to be force-fed with Vegemite or Marmite, they would not
leave any bugs in VMS :-)

Seriously, thank you for confirming this. At least I know what caused it.

Can you confirm that if I have found files corrupted on RVN 1,  then
your bug will affect only disk blocks on that physical disk ? 


> A fix is on the way, I'll update my website later tonight.
> 
> http://www.digiater.nl/lddriver

Out of curiosity, will this make it back into the main VMS
distribution/patch system now that LDdriver is "part" of VMS ?
0
Reply jfmezei.spamnot4 (5184) 12/13/2006 11:44:55 PM

 > Can you confirm that if I have found files corrupted on RVN 1,  then
 > your bug will affect only disk blocks on that physical disk ?

Yes. Only the member where the containerfile's first block is located
is corrupted.

 > Out of curiosity, will this make it back into the main VMS
 > distribution/patch system now that LDdriver is "part" of VMS ?

Not sure. Maybe. It would certainly have been the case if HP would
have kept me employed.

Anyway, LD V8.3 is out. It ONLY fixes this bug, so if people don't
use volumesets they do not need to upgrade.

Jur.


JF Mezei wrote:
> Jur van der Burg wrote:
>> Ok, mea culpa.
>>
>> It's a bug in LDdriver, I just reproduced it and it's present for a
>> long time (since LD V8.0), on all platforms.
> 
> How dare you left a bug in VMS :-)  Perhaps if VMS engineers were
> threathened to be force-fed with Vegemite or Marmite, they would not
> leave any bugs in VMS :-)
> 
> Seriously, thank you for confirming this. At least I know what caused it.
> 
> Can you confirm that if I have found files corrupted on RVN 1,  then
> your bug will affect only disk blocks on that physical disk ? 
> 
> 
>> A fix is on the way, I'll update my website later tonight.
>>
>> http://www.digiater.nl/lddriver
> 
> Out of curiosity, will this make it back into the main VMS
> distribution/patch system now that LDdriver is "part" of VMS ?
0
Reply Jur 12/14/2006 6:55:41 AM

JF Mezei wrote:
> Richard Brodie wrote:
> 
>>were on the same volume, or even all on the primary volume. Do the
>>corrupt blocks appear at the same LBN numbers (on the other volume)
>>as the container file?
> 
> 
> 
> So far, of all the files I have found to be corrupt, they have all been
> on RVN 1.  Not a conclusive thing mind you.
> 
> 

I feel your pain.  Dave's Xmas present must have arrived early :-)


> What I would like to do is to build a map of the disk with filenames
> sorted by location on the disk. (at least for first header). The idea
> being to try to find a pattern between files I know are currupt, which
> would give me some hint on which other files to test (aka: all files
> between 2 corrupt files in a list).
> 

Still more pain, but DFU will tell you what file is at a particular
block number.  So if you can find the lbn's belonging to the LD container
file (dump/header), you could check the same lbn's on the other 3 drives
to see if they are corrupted.

The implication, though is that LD is figuring out the logical block
number of its container file (and assuming it is contiguous?) and doing
logical or physical I/O to those blocks, possibly to the wrong volume
of the volume set.  I would think LDDRIVER is just opening the container
file and doing virtual I/O to it (and thus wouldn't care if the container
is contiguous or split across a volume set or whatever), but maybe it's
hard to do that from a driver, and it takes the much easier course of
doing logical i/o directly to the disk using qios.  If it lost track of
which physical extent each chunk of the file lived on, or worse yet,
assumed they were all on the same disk, or on the primary disk of the
volume set, and they weren't, exactly the symptoms you describe would
arise.  Maybe LD is checking for a contiguous file, but the check
erroneously succeeds on a volume set?  OUCH!  (For example, 1st extent
lives on volume 1.  Second extent lives on volume 2.  LD checks the
retrieval pointers and sees there is only one, and assumes the file is
contiguous, doesn't check for extension headers, and doesn't notice that
the allocation in the retrieval pointer doesn't match the total file
allocation.)  (This is pure speculation!  Maybe no one has ever used
LD on a non-contiguous file on a volume set before?!?)

------------------------

I've been running VAX V7.3 on several VAXes (hobbyist and work) for
nearly 5 years, and have had no problems like this.  Added an Itanium
V8.3 system to the cluster in August, and recently upgraded a couple of
Alphas from V8.2 to V8.3 in the same cluster.  Nothing broke (except
monitor cluster.)  But maybe there's a cache or locking problem in a
cluster that's addressed in one of the VAX or Alpha patches?  There
are only about four Alpha V8.3 patches so far, and about a dozen
VAX V7.3 patches.  If one of them fixes the problem, I would most
expect it to be the VAX VAXDRIV patch, or VAXF11X or VAXSYS patches.

> 
> OK, just ran a bit of a test on the web site directories. (I backed it
> up to a spare disk and try to make it functional again).
> 
> Basically a loop of f$search, then check if the file_attribute RFM =
> VAR, if so, then TYPE/OUTPUT=NLA0: and check the $STATUS. 
> 
> 451 files, 133 were variable, 3 were bad files.
> 
> But testing binary files (image,s .zips etc) will require the use a
> browser once I have switched the web server to point to that spare disk.
> 
> But this is just so I can get the main web site back up. There is a hell
> of a lot of testing to be done on a hell of a lot of files on that disk.
> 
> I've had to stop all queues and mail processing. Once I move the user
> directories to another disk and clean the active ones up, then I can
> restart mail processing. And maybe then get some sleep.

One thing that might help sort out the lost files is to ana/disk/repair,
then set file/remove everything from [syslost] except the directories.
Rename the directories to a temp directory, and repeat the ana/disk/repair.

Any files listed in the directories will no longer be lost, and the
second ana/disk/repair won't touch them.  The residue, truly lost files,
will still end up in [syslost], but there maybe many fewer of them to
sort out.  If you know which directories are top level directories, just
save those in the first step, rename them to [000000] (or wherever they
belong, if they are one or more levels down from a known location) and
keep repeating the ana/disk/repair until there are no more lost directories.
This, if done right, will rescue not just the files but the directory
structure.

-- 
John Santos
Evans Griffiths & Hart, Inc.
781-861-0670 ext 539
0
Reply john5 (550) 12/14/2006 9:55:41 AM

JF Mezei wrote:
> Jur van der Burg wrote:
> 
>>Ok, mea culpa.
>>
>>It's a bug in LDdriver, I just reproduced it and it's present for a
>>long time (since LD V8.0), on all platforms.
> 
> 
> How dare you left a bug in VMS :-)  Perhaps if VMS engineers were
> threathened to be force-fed with Vegemite or Marmite, they would not
> leave any bugs in VMS :-)
> 

<homer_simpson_voice>  Hmmmm!  Vegemite!! </homer_simpson_voice>


> Seriously, thank you for confirming this. At least I know what caused it.
> 
> Can you confirm that if I have found files corrupted on RVN 1,  then
> your bug will affect only disk blocks on that physical disk ? 
> 
> 
> 
>>A fix is on the way, I'll update my website later tonight.
>>
>>http://www.digiater.nl/lddriver
> 
> 
> Out of curiosity, will this make it back into the main VMS
> distribution/patch system now that LDdriver is "part" of VMS ?


-- 
John Santos
Evans Griffiths & Hart, Inc.
781-861-0670 ext 539
0
Reply john5 (550) 12/14/2006 10:00:29 AM

Jur,

WADU, I must disagree with the comment about bound volume sets. They
may be far less needed and used than in prior years, but they remain a
useful capability. Conceptually, I consider them to be a different
dimension of flexibility, with the other dimensions being the RAID
dimensions of striping and shadowing. Before the (recent) advent of
Dynamic Volume Expansion, volume sets were the only effective way to
increase storage capacity of a user volume that could be done without a
DISMOUNT/MOUNT sequence and consequent interruption of user activities.

In days past, volume sets were the only way of increasing usable
"single" volume size beyond the limits of the actual disks, the largest
of which (e.g., the size of a full size home clothes washer or dryer)
were 80-300 MB.

With disk drives quickly approaching the terabyte range, and
intelligent mass storage subsystems virtuallizing storage across their
drives, this is less needed, but not unneeded. That said, it is
important to be careful with one's maintenance.

- Bob Gezelter, http://www.rlgsc.com

Jur van der Burg wrote:
> Ok, mea culpa.
>
> It's a bug in LDdriver, I just reproduced it and it's present for a
> long time (since LD V8.0), on all platforms. Older versions are not
> affected. It's a one line change, it is always writing to the member
> of a bound volume set where the containerfile is created (normally
> the first one).
>
> So stay away from LD and bound volumesets for now. I'm not surprised
> that it did not show up before as volumesets are evil, albeit supported.
> There's not used much anymore nowadays.
>
> This is one of the very few bugs in LD, and I'm sorry to see that it
> had such severe consequenses.
>
> A fix is on the way, I'll update my website later tonight.
>
> http://www.digiater.nl/lddriver
>
> Jur.
>
> JF Mezei wrote:
> > SHORT STORY: I have a corrupted bound volume set thanks to VMS's
> > no-longer so robust software. I should have stayed with 7.2.
> >
> >
> > QUESTION:  can I safely ANA/IMAGE individual drives in a bound volume
> > set ? trying to analyse the whole volume fails. Are there other tools to
> > fix a drive that is corrupted with some file data having been overwritt=
en ?
> >
> > (THE LD driver used was on Alpha 8.3 accessing the bound volume via MSC=
P).
> >
> >
> > Am not a happy puppy. (presently doing a backup to see how much of the
> > drive can be recuperated, but I have no idea how many files have had
> > their actual data overwritten and which may not signal any error until
> > you try to look at the actual contents). (like my TPU$COMMANDS.TPU file
> > which was overwritten with junk).
> >
> >
> > I have a bound volume set (4 dssi disks of 2gigs). Tonight, while
> > creating some ISO container files, I filled the volume. I quickly
> > deleted files to make space.
> >
> > However while using BACKUP to populate an ISO container file (LDA devic=
e:)
> >
> > %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XA_SS.EXE;1
> > %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XG.EXE;1
> > %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM$XG_SS.EXE;1
> > %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DDTM_XA.H;1
> > %BACKUP-S-CREATED, created LDA1:[VMS$COMMON.SYSLIB]DEBUG.EXE;1
> > %BACKUP-I-BTCROUT, routine Write error detected, output Status, Call
> > RESTORE_ERROR
> > %BACKUP-I-VALUE_TRACE_D, Decimal trace value 844
> > %BACKUP-E-WRITEBLOCK, error writing block 5293 of
> > LDA1:[VMS$COMMON.SYSLIB]DEBUGSHR.EXE;1
> > -SYSTEM-F-IVADDR, invalid media address
> > %SYSTEM-F-ABORT, abort
> > $ show dev lda1:/full
> >
> > OK, figured perhaps it was some problem with LD Driver on alpha
> > accessing a bound volume served by a 7.3 vax.
> >
> > But then...
> >
> > $ edit sys$manager:systartup_vms.com
> > %DCL-S-SPAWNED, process JFMEZEI_1 spawned
> > $
> > %TPU-E-READERR, error reading USRDIR:[JFMEZEI]TPU$COMMAND.TPU;2
> > -RMS-F-IRC, illegal record encountered; VBN or record number =3D 1
> >
> > And now:
> >
> > $ ana/disk/repair $disk2
> > Analyze/Disk_Structure/Repair for _$4$DIA1: started on 13-DEC-2006
> > 06:06:53.72
> >
> > %ANALDISK-I-OPENQUOTA, error opening QUOTA.SYS
> > -SYSTEM-W-NOSUCHFILE, no such file
> > %ANALDISK-I-BADHIGHWATER, file (431,5,1) AUSTRALIA.DXF;5
> >   inconsistent highwater mark and EFBLK
> > %ANALDISK-F-ALLOCMEM, error allocating virtual memory
> > -LIB-F-INSVIRMEM, insufficient virtual memory
> > $
> >
> >
> > -----------------
> >
> >
> > I have never seen ANA/DISK run out of virtual memory and was always able
> > to do this ana/disk on that bound volume set.
> >
> >
> > OK, and now:
> >
> > DUMP USRDIR:[JFMEZEI]TPU$COMMAND.TPU;2
> >
> > Yield only garbage date in it.
> > irtual block number 1 (00000001), 512 (0200) bytes
> >
> >  00000000 00000031 00000000 00000000 ........1....... 000000
> >  0015E768 0015E758 58000000 00000040 @......XX=E7..h=E7.. 000010
> >  00000000 00000032 00000000 00000000 ........2....... 000020
> >  0015E780 0015E770 58001800 00000040 @......Xp=E7...=E7.. 000030
> >  00000000 00000031 00000000 00000000 ........1....... 000040
> >  0015E798 0015E788 58000800 00000040 @......X.=E7...=E7.. 000050
> >  00000008 00000031 00000000 00000000 ........1....... 000060
> >  0015E7B8 0015E7A8 5800B000 00000040 @....=B0.X=A8=E7..=B8=E7.. 000070
> >
> >
> >
> > DAMNED VAX 7.3. It broke the OSU web server and had now corrupted my
> > main disk (just before I was to move it to new drives and get rid of
> > this bound volume set).
> >=20
> >

0
Reply gezelter (537) 12/14/2006 11:00:16 AM

An update:

I fixed up almost all of the corrupt directory files (making them empty
directories). But ANA/DISK/REPAIR still failed, complaining of
insufficient virtual memory. So I splurged, and gave the SYSTEM account
tons of PGFLQUO and WSEXTENT (this was executed on the vax). And low and
behold, after complainig about the one file it com;lained about before,
I would now hear the disks really working out.

Unfortunatly, it still found a couple of bad directories (must have
gotten corrupted after I had draw up the list) and this caused a huge
abnout of garbage and messages to be displayed, causing most of the real
stuff to scroll past the decterm buffer.

ANA/DISK/REPAIR took a lot longer than I remembered it taking for that
bound volume (but then again, it had to do a lot more work). But it did
complete. I did not really see any "incorrectly marked free" warnings,
but again, this may have happened just before the tons of messages about
one or two directory files being corrupt caused the screen to scroll too fast.


John Santos wrote:

> One thing that might help sort out the lost files is to ana/disk/repair,
> then set file/remove everything from [syslost] except the directories.
> Rename the directories to a temp directory, and repeat the ana/disk/repair.
> 
> Any files listed in the directories will no longer be lost, and the
> second ana/disk/repair won't touch them.  The residue, truly lost files,
> will still end up in [syslost], but there maybe many fewer of them to
> sort out. 


I was very happy to see this message giving me some hopes... But the VMS
engineers beat you to it....

It appears that ANA/DISK/REPAIR  restores stray .DIR files into
[SYSLOST] first. And then, since ANA/DISK/REPAIR finds a directory entry
for the files contained in the subdirectory, those files need not be
entered into SYSLOST.

So the technique you outlined is not necessary (anymore ?).


I was however disapointed that ANA/DISK/REPAIR didn't place lost files
back into their original directories which I had fixed up whilst
maintaining their original fileID.

Question: theoretically speaking, shouldn't ANA/DISK/REPAIR be able to
place lost files back into their original directory if that directory
still exists ? 

I realise there may be a conscious decision to move everything to
[SYSLOST] so that the system manager can then verify the files before
moving them to their proper directory.
0
Reply jfmezei.spamnot4 (5184) 12/14/2006 6:55:52 PM

Let's do some rumour control.

LD is doing logical i/o to the container file. You can't do virtual i/o
from a devicedriver directly. Volumesets have been working since the
beginning, apart from the bug in V8.0 - V8.2. Contiguous files are not
needed anymore since I released LD V6.0, in November 1996, so that's
already possible for 10 years now. To check on which volume LD needs to
read or write it uses a VMS kernel routine to figure out the mapping,
so if that one would be wrong LD would be wrong too. The problem was
simply that LD ignored the ucb we got from the VMS routine, and used
the one from the device where the first block of the container file was
located.

Fwiw,

Jur.

 > The implication, though is that LD is figuring out the logical block
 > number of its container file (and assuming it is contiguous?) and doing
 > logical or physical I/O to those blocks, possibly to the wrong volume
 > of the volume set.  I would think LDDRIVER is just opening the container
 > file and doing virtual I/O to it (and thus wouldn't care if the container
 > is contiguous or split across a volume set or whatever), but maybe it's
 > hard to do that from a driver, and it takes the much easier course of
 > doing logical i/o directly to the disk using qios.  If it lost track of
 > which physical extent each chunk of the file lived on, or worse yet,
 > assumed they were all on the same disk, or on the primary disk of the
 > volume set, and they weren't, exactly the symptoms you describe would
 > arise.  Maybe LD is checking for a contiguous file, but the check
 > erroneously succeeds on a volume set?  OUCH!  (For example, 1st extent
 > lives on volume 1.  Second extent lives on volume 2.  LD checks the
 > retrieval pointers and sees there is only one, and assumes the file is
 > contiguous, doesn't check for extension headers, and doesn't notice that
 > the allocation in the retrieval pointer doesn't match the total file
 > allocation.)  (This is pure speculation!  Maybe no one has ever used
 > LD on a non-contiguous file on a volume set before?!?)

0
Reply Jur 12/14/2006 7:05:01 PM

Jur -

Thanks for your quick explanation (and discovery of the cause and
creating a fix for the underlying problem!)

I posted before I saw your followup - anyone Googling this in the
future should go by what Jur says, and not by what I said...

Jur van der Burg wrote:
> Let's do some rumour control.
> 
> LD is doing logical i/o to the container file. You can't do virtual i/o
> from a devicedriver directly. Volumesets have been working since the
> beginning, apart from the bug in V8.0 - V8.2. Contiguous files are not
> needed anymore since I released LD V6.0, in November 1996, so that's
> already possible for 10 years now. To check on which volume LD needs to
> read or write it uses a VMS kernel routine to figure out the mapping,
> so if that one would be wrong LD would be wrong too. The problem was
> simply that LD ignored the ucb we got from the VMS routine, and used
> the one from the device where the first block of the container file was
> located.
> 

So this bug would only occur if the LD file was on a bound volume set,
and was discontiguous, and at least some of the extents resided on a
different volume...

And all the trashed blocks should reside on the disk containing the
1st block of the container file, but only those blocks which (in the
container file) should be on other volumes of the volume set.

So to find all the trashed files, what JF could do is dump the file
headers for the container file, look for retrieval pointers that
point to disks other than the one containing the initial extent of
the file (IIRC, these will all be in extension headers, not in the
primary header.)  Make a list of the block numbers involved (due to
cluster allocation factor, he only needs to check every nth block),
and use DFU to determine which file the block belongs to on the
disk containing the initial allocation.  (I've not run DFU on a
volume set, so I don't know if you ask it to find files by lbn
(DFU SEARCH dev:/LBN=xxxxx) if you can give it a member volume or
if you need to specify the entire volume set, or if it will only
list the file on the specified volume or if it will list all files
at that lbn on all volumes, so you may need to prune the list.)
Having extracted all the file names, sort and purge duplicates to
simplify (should shorten the list significantly), and hopefully,
you'll now have a manageable list of files to fix.

 From the sound of things, though, this list probably includes
your MAIL.MAI.  :-( :-( :-(

JF, hope you can find a new DLT drive (or fix your old one)
without any Canadian import duties (does that apply only to
new equipment?) soon!

And I guess I should probably back up my VAX at home (hope the TK50
still works!)  and get my new Itanium from the porting workshop
onto our cluster backup schedule at work.  Lesson to us all :-) :)


Good luck.

> Fwiw,
> 
> Jur.
> 
>  > The implication, though is that LD is figuring out the logical block
>  > number of its container file (and assuming it is contiguous?) and doing
>  > logical or physical I/O to those blocks, possibly to the wrong volume
>  > of the volume set.  I would think LDDRIVER is just opening the container
>  > file and doing virtual I/O to it (and thus wouldn't care if the 
> container
>  > is contiguous or split across a volume set or whatever), but maybe it's
>  > hard to do that from a driver, and it takes the much easier course of
>  > doing logical i/o directly to the disk using qios.  If it lost track of
>  > which physical extent each chunk of the file lived on, or worse yet,
>  > assumed they were all on the same disk, or on the primary disk of the
>  > volume set, and they weren't, exactly the symptoms you describe would
>  > arise.  Maybe LD is checking for a contiguous file, but the check
>  > erroneously succeeds on a volume set?  OUCH!  (For example, 1st extent
>  > lives on volume 1.  Second extent lives on volume 2.  LD checks the
>  > retrieval pointers and sees there is only one, and assumes the file is
>  > contiguous, doesn't check for extension headers, and doesn't notice that
>  > the allocation in the retrieval pointer doesn't match the total file
>  > allocation.)  (This is pure speculation!  Maybe no one has ever used
>  > LD on a non-contiguous file on a volume set before?!?)
> 


-- 
John Santos
Evans Griffiths & Hart, Inc.
781-861-0670 ext 539
0
Reply john5 (550) 12/15/2006 3:32:31 AM

An update.

I was wondering why ANA/DISK/REPAIR didn't find multiply allocated
blocks or blocks incorrectly marked free. But  Mr van der Burg.s
explanation made me realise it was in fact normal. The driver wrote data
in blocks that didn't belong to the file. So the allocation bitmap
wasn't corrupted.  And when I deleted the container file, the right
blocks were made free without causing any blocks used by real files to
be marked free.


Question: on Alpha VMS 8.3,  would the file caching subsystem have been
aware of the LD driver updating blocks belonging to files that were
already cached ?  If not, it would explain why Mozilla continued to
function for some time because of the various files it had that were
cached and thus Mozilla unaware that those files would no longer valid
on disk and then functionality began to slowly degrade until it wouldn't
function anymore (the _MOZILLA directory in my sts$login was one of the
files zapped by the LDdriver.


Also, I finally got to run my disk analyser to completion.

--------------------------------------------------------------
Total files: 33285
  Bad files: 314
Blocks total: 10434277     Bad: 836246 8%

                Total           Bad
IDX             552             8
EXE             847             15
OBJ             858             25
TXT             30889           239
ZIP             139             27
DIR             0               0
--------------------------------------------------------------

Some 1800 files were placed in [SYSLOST]

Of the index files, a couple were signaled as bad to the files being
locked.  Some EXEs were actually PC/DOS files.

For the OBJ files, I suspect the corruption rate is genuine. 

For the text files, I used TYPE/OUTPUT=NLA0:, but my login caught some
non-text files that had RFM=VAR (I had excluded decw$book files but
forgot many others).

ZIP files were tested with UNZIP -t . However, this included some .ZIP
files that came from the mac and were transfered as macbinary (with 128
extra bytes at the tip). And also., UNZIP -t complained about some of
the JAVA files in my mozilla directory that ended with .ZIP. (would java
files in a .ZIP container be bonafide .ZIP files ?

DIR indicated 0 files because I fixed them all manually.

Of the 1800 files, there are an awfull lot of what appears to be .ZIP

I didn't test any of my many WPSPLUS files (.WPL).  

One of the indexed file is a critical ALLIN1 shared area index. It is
some 46000 blocks. The first 40,000 appear undamaged. The last 3000
appear undamaged. So I'll have to try to find some way to extract the
good portions of the file and then rebuild it. I have a backup that is
about a year old, and will probably want to merge it intelligently to
include records pointing to files that are still there but no longer in
the reconstructed file. (this file contain all sorts of document/email
file attributes such as the from, to,cc, subjecrt etc).
0
Reply jfmezei.spamnot (8807) 12/16/2006 6:50:53 AM

JF Mezei wrote:
> An update.
:
> One of the indexed file is a critical ALLIN1 shared area index. It is
> some 46000 blocks. The first 40,000 appear undamaged. The last 3000
> appear undamaged. So I'll have to try to find some way to extract the
> good portions of the file and then rebuild it.

Find the last valid data level bucket in the in the first chunk.
Use my ZAP tool to patch its next pointer to point the the first valid
bucket in the last chunk.
Try CONVERT... it will probably barf as a result of a bucket split
points it back into the missing zone. Patch you way forward, using
INDEX buckets as hints for the next.
In the end you may need CONVERT/SORT if you got the order wrong.
Only focous on the primary key, convert does not need the alternate for
extract.

> I have a backup that is
> about a year old, and will probably want to merge it intelligently to
> include records pointing to files that are still there but no longer in
> the reconstructed file.

Good plan. You may also use its level-1 index (with ANA/RMS/INT ...
DO... DO.. DO KE.. DO IND.. DO.. DO.. DO......  as a suggestion for
where buckets might be in the current file.

Hein.

0
Reply heinvandenheuvel2 (577) 12/16/2006 2:24:53 PM

Hein RMS van den Heuvel wrote:
> Use my ZAP tool to patch its next pointer to point the the first valid
> bucket in the last chunk.

Thanks. I need to find the silver lining in this. This is really getting
me to learn more about the file system.

> Good plan. You may also use its level-1 index (with ANA/RMS/INT ...
> DO... DO.. DO KE.. DO IND.. DO.. DO.. DO......  as a suggestion for
> where buckets might be in the current file.

I havben't used ana/rms/int yet. Something more to learn :-)
0
Reply jfmezei.spamnot (8807) 12/16/2006 11:05:49 PM

Update.

Feels a bit like a Star Trek episode where, after some attack by the bad 
guys, the Enterprise slowly comes back to life as each systems are brought 
back to life.

Recovering from this disc corruption will be a progressive thing. There are 
a number of executables that were zapped, and so far, I have been able to 
rebuild a number of them by simply recompiling them. For some, there are 
missing/corrupt source files.

Right now, Scotty has just announced that Mozilla is available again. I've 
partially reconstructed my Mozilla directory. But I need to reinstall all 
the add-ons to make it more functional. But at least my prefs.jf and 
bookmarks files were recovered.


Next step will be on working to restore ALLIN1 to service. This is where 
the most damage was done since there is where there were a lot a 
recent/movement of files. The attack against my disk seemed to focus more 
on recent files since this is where there would most likely be free blocks 
on the other drives.

Backing up one's Mozilla configuration and added components doesn't seem 
too obvious.  (the directory structure contains your cache with a gazillion 
files as well as newsgroup config files that are downloade from the news 
server).
0
Reply jfmezei.spamnot4 (5184) 12/18/2006 9:19:59 AM

23 Replies
74 Views

(page loaded in 0.716 seconds)

Similiar Articles:







7/24/2012 2:09:28 AM


Reply: