Chapter 6 - Archiving Document Images
Archiving is the process of moving documents from on-line storage to permanent storage. Typically, this means copying the images or documents from a local or network disk drive to a CD Stage Area, an Optical Disk, or some other permanent storage area. Only indexed documents are archived and the archive process removes the document from the basket. Multiple in-baskets can be selected.
A Cartridge is a permanent storage device for images. Typically, these are magnetic disks or optical disk cartridges. Refer to Chapter 1 for information on setting up cartridges.
A CD Stage Area is a 625 megabyte hard disk partition, typically on a file server. Images are archived or "stages" onto the CD Stage Area and, once it becomes full, CD-Rom disks are made from the partition as an image backup. The contents of the CD Stage Drive are then moved to a permanent location such as a Large Disk Partition or Snap Server.
There are a number of ways to run Archive including:
Manual – this option lets the user select a basket or baskets to archive and immediately runs the archive, showing the progress as it goes.
Auto Archive with no cartridge rotation – this option runs the archive as a scheduled task, typically overnight. A pre-configured screen is used to select what baskets are archived to which cartridges. It essentially runs the same as the manual option but unattended.
Auto Archive with cartridge rotation – this runs archive as a scheduled task. It can also manage the process of creating new cartridges automatically and limiting the amount of the space used on a cartridge to facilitate backing up cartridges to CD or DVD.
The options are discussed in greater detail below.
Manual archive lets the user run archive interactively. The steps to running a manual archive include:
- Click the archive icon from the main tool bar or select FILE-ARCHIVE from the main menu. The first window displayed will list the cartridges that are defined.
- Select the cartridge to which you wish to archive from the list of defined cartridges. The system will verify that the cartridge is available.
- The next screen is similar to the one shown on the following page. The list box on the left shows in-baskets that will be archived. Initially, this box is loaded with the current in-basket.
- Add baskets that you wish to include in the archive Add Basket button.
- Remove baskets that you wish to exclude from the archive using the Remove Basket button.
- Review the statistics shown in the middle of the screen, to make sure enough space is available for the images to be archived.
- The Delete Images after Archive check box should be checked if you want the original images to be removed from the hard disk after they have been placed onto permanent storage.
If you do not check the Delete Images after Archive box, it is your responsibility to 'clean up' archived images. You can use the File Manager or the DOS Delete command to do this.
- The Save Data for Archived Images to Cartridge check box will cause the system to save tab-delimited files of archived information. This is a great disaster recovery tool. The system saves aaadd_prime.txt (primary information) and aaadd_multi.txt (multi-entry information to the application folder on the cartridge. (aaa is the Application id and dd is the Database Id).
- Press the Begin Archive button to start the archive process.
The archive process searches the selected in-baskets for documents that have been indexed. When an indexed document is found, the images related to that document are copied from the in-basket directory to the selected cartridge. Refer to the How Images are Stored on Cartridges section below for a discussion of the directory structure into which images are placed on the cartridge.
How Images are Stored on Cartridges
When images are stored on cartridges, they are placed into specific directories. The highest directory level is the application id. The second directory level used depends on the folder option specified for the application (set using File-Application). The following illustrates the directories used for the various folder options.
No limits on number of images per folder
With this folder option, the user enters the folder id and there are no restrictions on how many images can be placed into the folder. The directory structure for this option is as follows:
Application Id _____ Folder Id
Therefore, if the application id is 'TST' and the folder is 'TEST' then images archived to cartridge are placed into the directory:
Limit number of images per folder and enter the folder id
With this option, the user enters the folder id and each folder is limited to a user-defined number of images. The directory structure on the archive cartridge for this option is as follows:
Application Id ______ Folder Id ______ Folder Counter
Therefore, if the application id is 'TST' and the folder is 'TEST' and the number of images per folder is 500, then the first 500 images are placed into the following directory.
Limit number of images per folder and auto assign the folder id
With this option, each folder is limited to a user-defined number of images and the system automatically defines the folder id. The directory structure on the archive cartridge is as follows:
Database Level Archiving
A option lets you include a database level folder in the archive folder structure. This option is defined on the General tab of the Auto Archive configuration screen (Tools-Configure Auto Archive menu in the halFILE Administrator). When this option is enabled, the folder structure is as follows:
Note: The system will never split the images for a document across two cartridges. If, based on the number of images per folder, there is not enough room in the folder counter directory, the document images will be copied to the next folder counter directory. Therefore, it is possible to have fewer images in a directory than is designated by the number of images per folder parameter.
halFILE's Archive utility includes an Auto Archive feature to archive a selected baskets to selected cartridges on a scheduled basis. The program, HFARCHIVE32.EXE, can be run as a scheduled task on an NT Server, a SQL Server Agent job, or via another scheduling method. Multiple Archive sets can be set up to archive to different sets of cartridges.
Auto Archive Set up
You can configure Auto Archive by selecting the Tools-Archive Setup menu in the halFILE Administrator. This brings up a screen with several tabs for defining the various Auto Archive options as described below:
This tab is used to set up general auto archive options as follows:
- Enable Auto Archive Features - check this box to enable auto archive.
- Force recalculation of cartridges - as the screen suggests, this option is used when archiving to some NAS storage devices if copy requests periodically seem to fail.
- Enable database-level archiving - this option will create a database level folder beneath the application folder to further separate images on the cartridge.
- Low Disk Notify (in MB) – when the free space on the current archive drive reaches the disk space configured in this box, an alert e-mail and system message is sent as a notification that the drive is getting low on space (see Alerts section).
Log Options tab
This tab is used to configure how logging occurs during Auto Archive. It is strongly recommended that you use logging options and review the provided log files daily to ensure that archive is running properly.
- Disable Archive Logging - check this box to disable all logging.
- Append to archive logs - check this box to append to archive logs. Uncheck it to overwrite archive logs from previous jobs. It is recommended that you uncheck this box except when debugging problems.
- Turn on Debug log file - check this box to create an archive.dbg log file of detail activity. This is useful when debugging set up problems.
- Use Dated Log - check this box to create dated log files. Normally log files are named autoarchauto1.log (where 1 is the archive set number). With this option the log file is named HFARCHIVE32_YYYYMMDD_Auto1.log (where YYYYMMDD is year, month, and day, and 1 is the archive set number).
- Path to backup dated archive log files - if the Use Dated Log box is checked, enter the path to store these files. You must create the folder.
Auto Archive tab
- Archive Set - You can have many archive sets. This drop down box lets you select the archive set to configure (<NEW> lets you configure the next available set).
- Set Description - enter a description of the archive set.
- Source tab - Select the Application, Database and Basket to archive and click Add to include it in the archive. Continue to select all the baskets for this archive set. The Add All button is used to add all the baskets for the currently selected database. If you delete a basket, you should remove it from the archive set using the Remove button.
- If the auto-rotate cartridges option is unchecked, archive automatically rotates to a new cartridge when a cartridge reaches a certain size. It is recommended that the auto-rotate cartridges box be checked.
- The Include ALL Baskets in the database check box can be used if you always want to archive all baskets in the database. In this case, it is not necessary to add the baskets into the list of items to be archived.
This tab is used to configure the destination cartridges and drives for the auto archive. The Use Defaults button usually can configure this screen properly.
- Cart Name Mask - this field defines how new cartridges are named. Use the %appl% to substitute the Application ID and %doctype% to substitute the Document Type. Also, use # to designate the cartridge number, including as many digit placeholders as the maximum cartridge number will be. For example, if your Application ID was XXX and your Document Type ID was TP and the mask was %appl%%doctype%###, then the first cartridge created would be named XXXTP001. Cartridges are limited to 8 characters so your cartridge mask should not exceed 8 characters.
- Cart Prefix - defines the cartridge prefix to include when creating new cartridges. You do not need to include the cartridge name prefix, it will automatically be included.
- Cart Description - defines the cartridge description to use when creating a new cartridge. Again, you can use the %appl%, %doctype% and ### substitution values.
- Size limit (in MB)- defines the size limit in megabytes for new cartridges. When the files on the cartridge reach this size, then the cartridge is considered full and the system rotates to a new cartridge.
- Cart type - defined the cartridge type to use for new cartridges. This should match the drive type used when the Archive Drive was created (Configure-Drives).
- Drive- select the drive to archive to from the list of drives provided. These drives are defined using the Configure-Drive menu.
- Overflow Drive - should the drive configured in the Drive box become full, then auto archive will begin using this drive to archive to, if it is defined.
- Current Cart Num - this shows the current cartridge number being used by Auto Archive. When you first set up archive, you would create the first cartridge and enter a 1 in this box. For example, if you wanted cartridges named in the IMG00001 form, then you would use the Configure-Cartridges menu to create IMG00001. Then enter a 1 in this box. The auto-rotate feature will create IMG00002 and enter a 2 in this box when the IMG00001 cartridge becomes full.
- Save DB Information - checking the ON box will make the system create text files containing the data for all the images placed on the cartridge. This is a good disaster recovery technique and so we strongly recommend the use of this option.
- Log System Messages - checking the ON box will log messages to the System Message area which is displayed when users login to halFILE. Normally, this option is set to OFF since we recommend checking the log files or receiving the e-mail alert of archive activity.
This tab is used to set up e-mail alerts of the Auto Archive activity to selected users. Enter the SMTP Server Address or Server Name in the SMTP Server box then enter the e-mail addresses to which the message should be sent. This feature requires HALSMTP.EXE, halFILE's SMTP e-mail program.
It is recommended that documents for different databases be archived to different sets of cartridges. To do this you should set up an archive set for each database and name the cartridges in a way that the database is identified. One good method is to use the database id as part of the cartridge mask. So, if you have an application id of HAL and two databases with database ids of DD and TP then the first auto archive set could be set up to archive documents for application HAL, database DD to cartridges using the cartridge mask DD#####. The first cartridge is named DD00001. Then a second archive set would be set up to archive documents for application HAL, database TP to cartridges using the cartridge mask of TP#####. The first cartridge is then named TP00001.
Running Auto Archive Interactively
To run the archive routine, the command line is:
Where <n> is the number of the archive set. HFArchive32 Auto1 will run archive set \#1. For each basket being archived, the screen showing its progress will display. Once the basket is complete, it will disappear as the program prepares to archive the next basket in the archive set. If there are lots of documents in the basket, the screen may disappear for some time while the program calculates the size of files to be archived.
Running Auto Archive as a Scheduled Task
HFARCHIVE32 can be set up as a scheduled NT task or as a job in SQL Server Agent using a batch file containing the same command line as above. You can set up multiple tasks to run different archive sets.
Be sure to configure the user that the task runs under as a network user who has the rights to run halFILE and has access the archive drives and basket folders.
Reviewing Auto Archive Results
Auto Archive creates an autoarch.log file each time it runs. You can review this log to see if archive completed normally. You should also review baskets in halFILE using File \| Basket Status as well as search for archived documents to ensure that documents are being properly archived.
The Alerts section of the configuration screen is very useful for reviewing results. This sends an e-mail containing the log file from archive. The log file contains the number of documents archived as well as notifications when a cartridge becomes full.
Other Auto Archive Features
Auto Archive can also be set up to:
- Send an e-mail notification of auto archive activity.
- Save the database information for archived documents.
- Bypass the CD Stage Area requirement while maintaining CD-sized or DVD-sized cartridges.
- Automatically assign the next cartridge in sequence.