Smithware Home PageSmithware Product InformationHow to Order Smithware ProductsSmithware Technical SupportSearch the Smithware Web SiteBtrieve Developer's Journal MagazineContact Smithware by E-mailPervasive Software Home Page Smithware Home Page

Datasave

In the Current Issue
How to Subscribe
Request a Free Issue
Download Back Issues
Search Back Issues
Btrieve-Related Web Sites
Contact Us
Download DATASAVE.EXE:
datasave.zip (23K)

Jim Kyle's article on how to use this utility is given below as it appears in the current issue of Btrieve Developer's Journal.

Data Recovery the Easy Way

We've all been there at one time or another. You've just delivered a finished custom system. Your customer is extremely happy with it. You're sitting back, taking pride in a job well done. Then the phone rings.

Your customer is on the other end of the line. Her voice has a note of panic: "We just lost power over here for an hour. It's back on now, but we can't find any of the stuff that we spent all morning typing into the system! Can you help us?"

This doesn't happen nearly as often as it once did, since the introduction of shadow imaging at Btrieve's version 6.1, but occasionally it still does. One of Btrieve's strongest points is its ability to protect data in cases like this, but not even Btrieve can always work miracles.

When you need to recover data from a damaged Btrieve file, you can choose from a wide selection of tools. They include the Save and Restore options in BUTIL. Alternatively you can use a third-party editing/maintenance package. Both Doug Reilly (Access Microsystems, BTFILER) and André Larmet (C-Soft, BTEdit) offer these. Such tools can usually bring back most of the data after a serious database crash.

Sometimes, though, Btrieve just won't open the damaged file at all. Since all the usual data recovery tools require the Btrieve engine for getting the records out of the file, they're unusable when this happens. Fortunately, a method exists for recovering most if not all the data from a damaged Btrieve file, that doesn't require the Btrieve engine to even be loaded. While it won't recover information from an encrypted file, it does get almost everything else. To work this magic, you need a program that uses undocumented knowledge of the Btrieve file format.

I came up with just such a program while writing "Btrieve Complete," and it's one of the samples in the book. I learned the value of DATADUMP.EXE a year ago, when a power failure crashed a just-loaded Btrieve database at a customer site, before they had created any backup files for the data. Btrieve wouldn't even recognize the damaged data file as being a valid Btrieve file. DATADUMP, however, had no problems with it, and recovered every record. This saved the customer many hours of re-keying effort and convinced me that the technique should be more widely known.

However, the original DATADUMP.EXE file that's in the book and on its accompanying diskette has a serious (though not fatal) error that can drive you to distraction should your data bring it out of hiding. The bug affects only compressed files with variable-length records. However, any time you need to recover such a file there's a good chance that the program will go quietly crazy and attempt to uncompress the next 4 gigabytes of data from your memory buffer! Never mind that you don't have that much available; the program will try anyway.

Fortunately files of this sort are rare enough that I've had only three reports of the problem, and all three came from the same organization over an 18-month period. With a sample file that could trigger the bug every time, it still took me several weeks to track it down and apply fixes; it's just that obscure. The bug is now laid to rest, and a bit later in these columns I'll explain it in detail. To avoid confusion with the original buggy version, I've changed the program's name to DATASAVE.EXE. You don't need to know all the technical details, though, to use DATASAVE. Just follow the bouncing ball. That is, the procedure I'm about to describe.

Getting the File Package

The very first step, of course, is to have your own copy of DATASAVE.EXE available to use. You can download the ZipFile containing both the full program source and the executable file, from BDJ's Web site at /bdj/datasave.html.

Once you've downloaded the file onto your own system, use PKUnZip, WinZip, or the other unzip program of your preference to extract the EXE file from the package. Copy DATASAVE.EXE onto a floppy diskette so that you can easily take it to your customer sites when need be. Note that the program runs under MS-DOS, not under Windows. You may want to format the floppy as a bootable system diskette, just in case the customer hasn't made the DOS interface available at all workstations.

The ZipFile also includes everything you need to recompile the EXE, using either Borland or Microsoft 16-bit C compilers. This lets you use the modules within the source file as parts of your own custom package should you want to do so. If you don't need the source files or change log, you don't need to extract them from the ZipFile. Only DATASAVE.EXE is necessary for data recovery.

Recover the Data First

The first step in using DATASAVE to recover data from a damaged file is to change the working directory to the one that contains the damaged file. Then at the command prompt type A:DATASAVE FileName, where "FileName" is the name and extension of the file you're trying to recover. If you copy DATASAVE.EXE into the same directory, the "A:" prefix to the command isn't necessary.

You'll then see a banner and copyright notice from the book, which still lists the original name DUMPDATA rather than the new one. It'll be followed by two lines like these (although the values may differ and should match those of the file you're trying to recover):

File TEST.DAT is in new format; pagesize = 3072, has 25 pages
Record count = 126 (Compressed variable-length).
The screen will also show the following prompt:

Write SAVE file, or VIEW data on CRT (S or V )?

The proper response here, for recovering data, is S. When you press any key the program takes off without waiting for another keystroke, but if the key is anything other than S or V, it merely repeats the prompt. When you press S, you get a prompt for the name of the file in which to write the recovered data (the filename in italics was my response to this prompt):

Save to filename: mysave.asc

DATASAVE then goes through the file, recovering every record that it can locate and writing each record to the file you told it to use. The program writes each record in the format that BUTIL -LOAD expects, so that the final file can provide direct input to the standard tool in a subsequent step.

When the recovery operation completes, DATASAVE returns to the command prompt with no additional messages.

You can use the V option if you want to view the file in a combination hex-value and ASCII format. This can be helpful if you need to rebuild DDF files or do some similar operation requiring knowledge of the record layout, but is seldom useful for data recovery.

Next, Create a New File

Once you've written a recovery file with all the records possible, the next step is to create a new, empty data file with the same record layout. One way to do this is with the BUTIL -CLONE option, which sometimes works even though BUTIL cannot recover any data from the original. If you have DDF files for the package you can rename the original damaged file, then use the DDF files to create a new empty version. Finally, if you have an older backup copy of the file (even though its data is no longer valid) you can restore that copy to obtain an undamaged version that has the proper format, then CLONE that one to provide an empty copy.

Other methods, also, may occur to you. It doesn't really matter how you get there; the goal is to obtain a totally empty file with the name and record structure of the one you're recovering. Once you've gotten this far, the rest is easy.

Finally, Restore the Recovered Data

The final step in this data recovery technique is to use the recovery file you created with DATASAVE as input to the BUTIL -LOAD option, to put all recovered records into your replacement copy of the file. This assures that Btrieve itself does all needed indexing.

It's possible that you may fail to recover every record from the damaged file. If the original crash made an entire data page unreadable, for example, then every record on that page will vanish. Nevertheless when you work with large volumes of data, it's much less painful to lose a page of data than it is to lose the entire file. Once the recovery process completes, you need to print a full report of all records so that the users can verify that all records have been restored, or identify the lost ones if this isn't the case.

With Version 6 format files, it's also possible (though unlikely) that you may recover invalid records. That is, you may recover both the pre-change and post-change copies of one or more records. This can happen if the Page Allocation Tables (PAT pages) themselves were damaged. However, I've never heard of this happening; it's possible, but not worth worrying overmuch about. Again, careful proofreading after recovery can catch such things.

How DATASAVE Works

If you're curious about the way this approach works, here's an abbreviated explanation. The complete details take up a full chapter in Btrieve Complete, and obviously we don't have room for that here.

Any Btrieve file, as you probably know, consists of a sequence of fixed-size pages that can be any multiple of 512 bytes up to 4,096. Each page can be one of several "page types" but for our purposes we can classify them into just four types: header pages, data pages, PAT pages, and all other kinds (including index pages). PAT pages didn't exist before Version 6.0.

Except for the header page or pages (Version 6 formatted files have two header pages while older formats have only one) every Btrieve page begins with a 32-bit page-ID area followed by a 16-bit "usage count."

Every data page has the most significant bit of its usage-count field set to "1." Thus, if the usage count field for any page has an unsigned value less than 32,768, that page cannot be a data page. However, if the value is 32,768 or greater, it DOES hold data. Each data page contains another field that tells how many of the possible records on that page are valid.

DATASAVE's algorithm simply starts with the first page of the file, which is always a header page. After extracting essential information such as the page size itself, the file's format, the data storage type (fixed or variable length, compressed or uncompressed) and the number of records to expect from the header page, the program then begins cycling through the pages from front to back looking for those pages with usage count fields greater than 32,768.

Each time it finds a data page, the program sets up a record counter and cycles through the records on that page, processing each in turn.

The details of processing vary, depending on the record's storage format. Variable length records, for instance, contain linkage to other page types that contain the variable parts of the records. Compressed records require internal processing to reverse the compression. The choice of which processing to apply takes place as each record appears, and all the processing routines come back together at the end of each record's actions.

The Elusive Insect

The rare bug that afflicts the original DUMPDATA program sneaked in because it's possible to have several unused pointer positions within an internal table that's part of the variable-length record processing. Such unused positions contain a key value of 0xFFFF to tell the Btrieve engine to skip over them during data recovery.

My original DUMPDATA program detected the first such unused pointer and ignored it. Unfortunately, having skipped over the first 0xFFFF value, it then quit looking for any more. Consequently when two or more consecutive pointer positions held values of 0xFFFF, the program skipped over the first one but accepted the second as being valid.

Accepting an invalid pointer as "good" was bad enough, but the problem was even worse. At this point, the program calculates the number of bytes to process by subtracting one pointer value from another. When the second pointer is 0xFFFF and the first cannot be greater than 0x0FFF (the maximum size of a Btrieve page, minus 1), this guarantees that the most significant four bits of the difference will all be "1." During the calculation, these bits force sign extension of the result, and create a final length value in the neighborhood of 4 billion! The program outruns its memory limits immediately.

To fix the problem, I added an additional variable (named "skipper") within the record-processing module, and created an additional search loop at the point where the unused pointers are detected, so that all of the unused pointers get passed over instead of just the first one. While I was at it I made a number of cosmetic changes also, and quadrupled the size of the work buffer to allow for expansion of records up to a size of 16K bytes.

Summing Up

The revised program has proved its worth in many situations. While I've copyrighted the source code, you can use the executable as a tool for diagnosis and data recovery with no restriction. The purpose of the copyright is primarily to prevent anyone from re-selling DUMPDATA or DATASAVE intact as a product (a secondary purpose is to protect publication rights, of course).

Download a copy today, and try it with some sample files. When the time comes that everything else fails, you'll be glad you did.


Copyright © 1988, 1998 by Smithware, Inc. All rights reserved.
Btrieve Developer's Journal is published by Smithware, Inc. Btrieve is a registered trademark of Pervasive Software, Inc. All other words or phrases are trademarks or registered trademarks of their respective manufacturers. Legal Notice.