The collection of information about a schema, its structure, connection details and defined extraction rules is called an extraction set. An extraction set is built and maintained by the DataBee Set Designer application.
Physically, extraction sets are stored on disk as standard XML text file. So yes, technically the extraction set files are user editable via a text editor. Please be aware that making manual adjustments is unsupported - the Set Designer application is the only supported solution. We do recognize that situations may sometimes arise where a quick search and replace on the extraction set file may save hours of work. If you do edit the extraction set files by hand, be sure to save a backup copy first and perhaps email Support@DataBee.com for some advice before proceeding.
Extraction sets are built with the DataBee Set Designer application and executed with the Set Extractor application. The two tools work together and communicate with each other. For example, if an extraction set is open in both the Set Designer and Set Extractor and an extract is run to completion, then the Set Designer will automatically update its extracted row count statistics with the results of the run. In other words, if the two applications have the same extraction set open, the Set Extractor will automatically update the Set Designer with the most recent statistics. This makes it easy to use the pair of applications together to iteratively build a set of extraction rules.
The recommended method of building an extraction set is to:
The list above is just a summary. The DataBee Quick Start Guide provides a detailed, practical, step-by-step walk through of the iterative DataBee design and extract methodology. Besides discussing the most effective method of building an extraction set (hint: do NOT add dozens of rules at once and hope for the best), the Quick Start Guide includes links to tutorials and help files which will demonstrate the many tools supplied with DataBee and the techniques for using them. The Extraction and Load Process help file provides a summary the subset database creation and population process.
As discussed above, an extraction set is a container for the connection information, table/index/foreign key structure details and for the extraction rules which define the relationships between tables and the data which must be extracted. All of this information is encoded in the extraction set in the form of rules.
One key rule, called a Controller Rule, describes the schema from which the data will be extracted. The Rule Controller also contains the schema structure (tables, indexes foreign keys) and has dependent rules which describe the extraction relationships. Every extraction set has at least one Rule Controller (you can have more than one). Every other type of extraction rule has a parent Rule Controller and will execute in the schema defined within that Rule Controller.
The following is a brief summary of the rules which can be configured in an extraction set. Each rule has its own detailed help file and a slightly longer summary can be found on the New Extraction Rule Form help page.
The execution of an extraction set proceeds in three distinct stages. Each stage contains specific types of rule and the stage to which the rule belongs is listed beside the other rule information on the Extraction Rules tab of the Set Designer and Set Extractor. All rules within any one executing extraction set stage must complete before the next stage will begin. Rules from different stages will never execute simultaneously.
An Example of Extraction Rule Stages
The Set Extractor application is multi-threaded and will execute rules simultaneously. It can, and will, run multiple rules simultaneously. The number of rules which can run in parallel is determined by the Number of Extraction Threads setting on the Misc. Setup tab. Unless strictly specified (see below), each extraction rule is considered by the DataBee software to be an independent entity and its order of execution is not guaranteed.
Other than for performance reasons, it does not matter if All Rows and Where Clause rules execute simultaneously in stage 1. In fact parallel execution to be encouraged because it reduces the elapsed time of the extraction run. However, Command Rules can be configured to perform actions which are based on the assumption that other Command rules have previously completed their tasks. An example of this is the creation and population of a temporary table. The Command rule which creates the table must complete before the rule which populates the temporary table begins to execute. If it does not, then an error will occur - the rule will be attempting to populate a table which does not yet exist. Clearly two such rules can never be scheduled to execute at the same time.
An Example of Stage 1 Rule Blocks
Since the rules in Stage 1 can, and will, execute in parallel the DataBee software implements a concept called Rule Blocks to explicitly control the execution order. A rule block is the two-digit numeric prefix listed before the rule number. Rule blocks are processed in strict numeric order and all rules in a given rule block will complete before any rule in the next highest rule block begins. Inside a rule block the rules execute in random order as determined by the optimization routines and the availability of worker threads.
In the image above, seven stage 1 rules are listed. The All Rows rules are all in rule block 01 - they will execute in parallel up to the limit set by the number of extraction threads. Because they are in rule block 01 all of these rules will complete before any rules in the next highest rule block (rule block 10 in this example) begin to execute. The chain of rules 10-0015, 20-0016, 30-0017 and 40-0018 will all execute sequentially one after the other because they are in separate rule blocks. This ensures (as noted by the rule comments) that the temporary table is created before its indexes are built and analyzed.
Note that only stage 1 and stage 3 rules can have user configured rule blocks. The Table-To-Table rules in stage 2 are scheduled as required by the Set Extractor application and are always listed with rule blocks of fk or tt which indicates if they are derived from a foreign key or were manually implemented.