- Make decisions about which tables to include
- The subset database will have tables. For most subset databases some tables, such as reference tables, will
be required in full. Some of the tables will be required in part and some will be irrelevant and can be
excluded. The decisions about which tables to include (in whole or in part) or exclude should be made
in consultation with the intended end user.
- Decide on the base data
- The base data is the initial rows which populate a few carefully chosen tables. Once these base tables are
given a quantity of rows, the rules in the extraction set are used to make sure the rest of the tables
in the schema have the rows required to support the base tables and each other. The base rows form the core of the subset database and
determine the other rows which will be extracted to populate the subset database.
The decision on which base rows to take is probably the most crucial step in the process and quite a bit of care should be
taken in their choice. It is quite possible to have more than one base table. With DataBee you can, for example,
set up base data rules that take 5 percent of the invoices
at random and also certain specified customers. One of the best ways to determine the most suitable tables for
base data is to use the Set Designer Chain Finder tool to find the tables that will populate the most child tables.
Base rows are retrieved using special rules called WHERE Clause rules. There
can be any number of WHERE Clause rules referencing any number of tables. The DataBee Set Extractor application
will process these rules first and will use the rows they extract as a starting point for the extraction of the
rest of the rows in the subset schema.
- Get Schema, Constraints and build an Extraction Set
- Once the decision has been made on which base rows to use, it is time to build the extraction set which
will eventually create the subset database schema. Start the Set Designer application
and use the Set Wizard tool to guide you through the process of creating an initial extraction set. The Set Wizard
will ask all of the appropriate questions and build the basic structure of an extraction set on a step-by-step
basis.
The Set Wizard will discover any referential integrity constraints in the database
and build the DataBee
rules to support them. It will also ask for initial table population rules
(WHERE Clause rules). It may
be necessary to add further WHERE Clause rules by pressing the "New Rules" button on the Set Rules
tab.
- Try an Extract
- Having made a trial extraction set in the step above, the time has come to try it out. If the source
database is in COST based optimizer mode it is important to make sure all of the tables in the schema have
been analyzed. If the tables are analyzed, the extraction process will usually proceed quite quickly. If the tables are not
analyzed then you might find one or two rules seem to retrieve rows extremely slowly. The lack of statistics on
a table can sometimes cause Oracle to choose an inappropriate execution plan.
Start up the
Set Extractor application and load the newly created extraction set into it. Note
that as long as the extraction set has been saved it is not necessary to shut down the Set Designer application.
Press the "Extract Data Using Set" button and observe the download progress in the
Table Statistics tab. Did a satisfactory number of rows get downloaded
for each base table? If not, the WHERE Clause rules will need to be edited (or additional ones added) in order to
make sure the tables are appropriately populated.
If Table-To-Table rules were added to support the database referential constraint rows
in the "Generate Rules from DB Constraints" step
above there might well be rows extracted for other tables as well.
- Update the Extraction Set
- Examine the table statistics on the Set Extractor
Table Statistics tab carefully and note which tables have rows and which do not.
Tables for which there are no rows will need to have additional Table-To-Table rules
defined in the Set Designer application to ensure that they are well populated. It might be useful to use the Set
Extractor Extraction Reports button to save a copy of the table statistics to disk.
The task now is to manually add rules to the extraction set in order to make sure the tables which did not get any rows
during the extract will get populated. The technique is to choose a table and add only one or two rules
at a time and then try another extract. In this manner the effect of the newly added rule will be easy to
determine. Manually adding rules usually requires business knowledge of how the tables are related, although the Set
Designer does contain a number of tools to help discover them
(see the tools on the Rule Tools page).
- Try an Extract Again
-
Save the extraction set in the Set Designer, reload it in the Set Extractor and repeat the extract. Did the
new rules cause any more rows to be extracted for other tables? Compare the row counts for this extract
against the row counts for the previous one to see if the newly added rules should be kept. If a table has
few or no rows even though it is directly required to do so by a rule from a populated table it generally means
that the extracted rows for the Parent table simply don't contain data that forces any rows to be present. In
general, the solution to such a problem is to adjust (or add) WHERE Clause statements to pick up additional rows
to satisfy the conditions.
- Repeat the Update and Extract until all Required Tables have Extracted Rows
- The cycle of updates on the extraction set and test extracts from the source database repeat until
the appropriate tables all have rows extracted for them. Once this point has been reached, the development
of the extraction set is complete.
- Build the Target Schema
- Once the extract is complete sizing information (row counts & etc.) will be available. The DDL Wizard
is used to generate the DDL source for the target environment. The DDL source is run and the structure
of the target environment is created.
- Perform the Load
- The load of the rows into the destination schema is quite straight forward. Perform an extract of using the
Set Extractor application and save the Extracted Set File to disk by pressing the "Detach Extracted Set"
button. Shutdown the Set Extractor and start up the Set Loader application. Make the
Set Loader aware of the extracted data by pressing the "Acquire Extracted Set" button.
Once the Extracted Set File has been acquired by the Set Loader, the load can begin. The easy way to load
the data onto the destination schema is to use the Load Wizard - just press on the "Load Wizard"
button. The Load Wizard contains a number of screens and panels designed to make the load process simple and
straightforward - just answer the questions as they are presented and then move on to the next screen.
Once the load is complete and constraints re-enabled the destination subset database is complete.