The DataBee Create/Edit Where Clause Rule Form
Every schema which is a subset of a larger schema is based on a small defined set of rows from that larger schema. In other words, each subset schema is created by specifying certain type and quantity of data as a starting point and subsequently fetching all of the other rows in the larger schema required to support that data. The initial data items which form the basis of the subset schema are called the driver rows and they are usually selected from among the rows in a single table. All of the rows in the other tables of the subset schema will be referentially related to the chosen driver rows.
It is possible to implement multiple driver tables in a set of extraction rules, however this is something an advanced topic and it is advisable to seek assistance from the DataBee Support Team prior to implementation.
Where Clause rules are used to select the initial driver rows for a subset schema. The method used to select those driver rows can be a Where Clause, as the name implies, or a sampling mechanism and limits can be used. The Create/Edit Where Clause Rule form is used to create and edit DataBee Where Clause rules.
Important Note: The choice of the driver table and the driver rows determines the content of the subset schema. The driver table should be chosen carefully prior to building any extraction rules. See the detailed discussion in the Where Clause rule help file for more information on the best way to choose a driver table.
Configuring the driver table is a straightforward process. The Create/Edit Where Clause Rule form is started by pressing the New Where Clause Rule button on the New Extraction Rule form in the Set Designer application. It is not possible to create or edit Where Clause rules in the Set Extractor application, although they can be viewed there. Once the Create/Edit Where Clause Rule form has started, the mouse can be used to select the table from the Tables panel on the left hand side of the form. The choice of available tables is entirely determined by the Rule Controller with which the Where Clause rule is associated. Only tables from within the schema for which the Rule Controller is configured will be visible. If the required tables are not visible, the schema structure can be refreshed using the Refresh Schema and Indexes button on Options tab of the edit Rule Controller form.
Once the table has been selected, the table name will be listed in the panel on the right hand side. The Where Clause and Sampling options can be configured. These are discussed in detail in the Where Clause rules help file. For convenience, a summary clipped from that help file is presented below. It is highly recommended that the Where Clause rules help file be read in full before a Where Clause rule is implemented.
Implementing a Where Clause
The Where Clause should be added as it would be in a SELECT statement used on the SQL*Plus command line. Do not enter the entire query - only the Where Clause is needed. DataBee will add the other components of the SQL statement as required.
For example, if the desired rows could be retrieved from the INVOICE_TABLE table via a SQL query such as:
Select * from INVOICE_TABLE where INVOICE_NUM>1000 and INVOICE_NUM<2000then make an entry in the Where Clause box of:
WHERE INVOICE_NUM>1000 and INVOICE_NUM<2000
Note that it is necessary to include the word "WHERE" however a terminating semicolon is optional - DataBee will strip it off if it is present.
Notes on Row Sampling and Row Limits
If the sampling option is enabled on a Where Clause rule, the sampling sifts the values returned by the Where Clause. For example, if a where clause states where region_ID=10 and there is a 10% sample condition applied then the rows extracted by the Where Clause rule will be ten percent of the rows returned by the Where Clause - not 10% of the rows in the entire table. Limit values are similarly applied to the rows returned by the Where Clause and sampling options.
The sampling option is quite sophisticated. For example, assume a Where Clause rule is implemented with no Where Clause or limit and a 25% sample condition. The rule will not extract just the first 25% of the rows from the table, and it is also not a mechanical extraction of every fourth row. The sampling algorithm will randomly select rows from the entire table so that the resulting number comprises 25% of the total. It is possible to implement non integer sampling values. For example, a sample of 0.1% will extract 1000 rows, taken at random, from a million row table.
After the Where Clause, sampling and limit options are set, the rule is fully configured and can be saved to the extraction set. The example screen shot above shows a rule configured with both a where clause and a sampling option. Also shown in the left hand panel are columns which list the source row count and a column indicating how well that table matches the extraction set plan. These columns were added to the display using the Options button on the lower left hand side of the form.
Where Clause rules execute in Stage 1 of the extraction process. The DataBee extraction process is multithreaded and can run multiple rules simultaneously. If necessary, (it usually isn't) the execution order of the Where Clause rules can be explicitly controled by adjusting the rule block component of the rule ID number. Please see the discussion of rule blocks in the extraction set overview help page in order to understand how to explicitly control the execution order of stage 1 and stage 3 extraction rules.