[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]

An Overview of the DataBee
Extraction and Load Process

DataBee consists of five applications - each of which is dedicated to a particular function. The applications are used in sequence and form a process which accomplishes the task of creating a subset database.

Note: Download the DataBee Quick Start Guide for a complete discussion each of the DataBee software components and their usage.

Components
The applications are: the DDL Wizard, Set Planner, Set Designer, the Set Extractor and the Set Loader. A discussion of each application can be found on the Applications Overview page. This page also contains links to the individual help files for each application.

Set Planner
Makes decisions about the tables which will or will not be included in an extraction set and defines the components of the extraction set. The output of the Set Planner is a written report which can be referenced later during the creation of an extraction set.
Set Designer
Finds, adds and edits the rules which are used to extract the rows from the source database tables. The collection of rules used to create a subset database is called an extraction set.
Set Extractor
Uses the extraction set created by the Set Designer application to identify the rows from the source Oracle database which will be required in the destination database. This information is written to a binary file (on the local PC not the server) called an Extracted Set File.
Set Loader
Uses the Extracted Set File created by the Set Extractor application to pull the required rows from the source database to the target database. After the Set Loader has finished, the subset database is complete.
The DDL Wizard
Reads the DDL structure of the source database and generates (in a configurable way) the DDL source to recreate the much smaller version target environment.

Implementation Cycle
The steps taken to create an extraction set and to apply it to create a subset database generally follow the same sequence. The list below discusses each step and provides useful hints and tips for each stage. Most of the discussion below concerns the processes used to develop an extraction set. Extraction Sets need only be developed once - after they are set up correctly the subset database creation consists of a simple extract and load.

Make decisions about which tables to include
The subset database will have tables. For most subset databases some tables, such as reference tables, will be required in full. Some of the tables will be required in part and some will be irrelevant and can be excluded. The decisions about which tables to include (in whole or in part) or exclude should be made in consultation with the intended end user.

Decide on the base data
The base data is the initial rows which populate a few carefully chosen tables. Once these base tables are given a quantity of rows, the rules in the extraction set are used to make sure the rest of the tables in the schema have the rows required to support the base tables and each other. The base rows form the core of the subset database and determine the other rows which will be extracted to populate the subset database.

The decision on which base rows to take is probably the most crucial step in the process and quite a bit of care should be taken in their choice. It is quite possible to have more than one base table. With DataBee you can, for example, set up base data rules that take 5 percent of the invoices at random and also certain specified customers. One of the best ways to determine the most suitable tables for base data is to use the Set Designer Chain Finder tool to find the tables that will populate the most child tables.

Base rows are retrieved using special rules called WHERE Clause rules. There can be any number of WHERE Clause rules referencing any number of tables. The DataBee Set Extractor application will process these rules first and will use the rows they extract as a starting point for the extraction of the rest of the rows in the subset schema.

Get Schema, Constraints and build an Extraction Set
Once the decision has been made on which base rows to use, it is time to build the extraction set which will eventually create the subset database schema. Start the Set Designer application and use the Set Wizard tool to guide you through the process of creating an initial extraction set. The Set Wizard will ask all of the appropriate questions and build the basic structure of an extraction set on a step-by-step basis.

The Set Wizard will discover any referential integrity constraints in the database and build the DataBee rules to support them. It will also ask for initial table population rules (WHERE Clause rules). It may be necessary to add further WHERE Clause rules by pressing the "New Rules" button on the Set Rules tab.

Try an Extract
Having made a trial extraction set in the step above, the time has come to try it out. If the source database is in COST based optimizer mode it is important to make sure all of the tables in the schema have been analyzed. If the tables are analyzed, the extraction process will usually proceed quite quickly. If the tables are not analyzed then you might find one or two rules seem to retrieve rows extremely slowly. The lack of statistics on a table can sometimes cause Oracle to choose an inappropriate execution plan.

Start up the Set Extractor application and load the newly created extraction set into it. Note that as long as the extraction set has been saved it is not necessary to shut down the Set Designer application. Press the "Extract Data Using Set" button and observe the download progress in the Table Statistics tab. Did a satisfactory number of rows get downloaded for each base table? If not, the WHERE Clause rules will need to be edited (or additional ones added) in order to make sure the tables are appropriately populated.

If Table-To-Table rules were added to support the database referential constraint rows in the "Generate Rules from DB Constraints" step above there might well be rows extracted for other tables as well.

Update the Extraction Set
Examine the table statistics on the Set Extractor Table Statistics tab carefully and note which tables have rows and which do not. Tables for which there are no rows will need to have additional Table-To-Table rules defined in the Set Designer application to ensure that they are well populated. It might be useful to use the Set Extractor Extraction Reports button to save a copy of the table statistics to disk.

The task now is to manually add rules to the extraction set in order to make sure the tables which did not get any rows during the extract will get populated. The technique is to choose a table and add only one or two rules at a time and then try another extract. In this manner the effect of the newly added rule will be easy to determine. Manually adding rules usually requires business knowledge of how the tables are related, although the Set Designer does contain a number of tools to help discover them (see the tools on the Rule Tools page).

Try an Extract Again
Save the extraction set in the Set Designer, reload it in the Set Extractor and repeat the extract. Did the new rules cause any more rows to be extracted for other tables? Compare the row counts for this extract against the row counts for the previous one to see if the newly added rules should be kept. If a table has few or no rows even though it is directly required to do so by a rule from a populated table it generally means that the extracted rows for the Parent table simply don't contain data that forces any rows to be present. In general, the solution to such a problem is to adjust (or add) WHERE Clause statements to pick up additional rows to satisfy the conditions.

Repeat the Update and Extract until all Required Tables have Extracted Rows
The cycle of updates on the extraction set and test extracts from the source database repeat until the appropriate tables all have rows extracted for them. Once this point has been reached, the development of the extraction set is complete.

Build the Target Schema
Once the extract is complete sizing information (row counts & etc.) will be available. The DDL Wizard is used to generate the DDL source for the target environment. The DDL source is run and the structure of the target environment is created.

Perform the Load
The load of the rows into the destination schema is quite straight forward. Perform an extract of using the Set Extractor application and save the Extracted Set File to disk by pressing the "Detach Extracted Set" button. Shutdown the Set Extractor and start up the Set Loader application. Make the Set Loader aware of the extracted data by pressing the "Acquire Extracted Set" button.

Once the Extracted Set File has been acquired by the Set Loader, the load can begin. The easy way to load the data onto the destination schema is to use the Load Wizard - just press on the "Load Wizard" button. The Load Wizard contains a number of screens and panels designed to make the load process simple and straightforward - just answer the questions as they are presented and then move on to the next screen. Once the load is complete and constraints re-enabled the destination subset database is complete.


[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]