[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]
The DataBee Set Extractor
Run Statistics Tab
The DataBee Set Extractor Run Statistics Tab
The primary function of the Set Extractor Run Statistics tab is to indicate the progress of the currently executing extraction rules and also display aggregate statistics for the operational extraction set. The current status of each running rule is presented in the top part of the page and statistics for the set as a whole are displayed towards the bottom.
The panel at the top of the Run Statistics tab lists the rules currently executing in the extraction set. It is possible to execute up to eight rules simultaneously in independent processes called worker threads. If it is inappropriate for some rules to execute simultaneously then the execution order must be explicitly specified using rule blocks.
Each worker thread is provided with a row in the display. Note that only information on actively executing rules will be present in this display. When a extraction rule stops executing (whether through completion or by error) its information will be removed from the display. Historical statistics and status information on each rule, whether active or not, can be found on the Rule Statistics Tab.
Note that double clicking on any worker thread actively running a rule will launch a form which displays that rules configuration.
There are seven vertical columns in the extraction rule display. These columns provide specific information on various components of each worker thread and the rule it is running.
What the columns in the Worker Thread Panel mean
- Worker#
- This column indicates the ID number of the worker running the rule. It is not significant other than as an identifier for diagnostics in the log file - all workers are identical in functionality.
- Status
- A text field which indicates if the worker is currently operational. DataBee will only run rules simultaneously if they are in the same extraction stage and have identical rule blocks. It is possible to temporarily see workers listed as Idle if there are no rules remaining in the current rule block that are available to run.
- Rule ID
- A summary of the rule block, rule ID and the rule type. Double clicking with the mouse on the rule ID field will open up the rule and the specific information on the rules actions can be viewed.
- Unique Rows
- This column indicates the number of unique rows extracted into the target table by that thread. This field is unused Command rules and will be blank when not required.
- Duplicate Rows
- This column indicates the number of duplicate rows which were extracted and discarded. Normally this field is zero since most extraction rules are designed to never extract duplicates. However for certain specific circumstances (for example, a self referential relationship involving multiple join keys) this field might be populated.
- Run Time (sec)
- This is the total run time in seconds for the thread. Note that since a stage 2 Table-To-Table rules can repeatedly activate this value reflects the specific time taken for the specific activation rather than the entire cumulative time taken for that rule. To see the total time taken for all activations of a rule look at the display on the Rule Statistics Tab.
- Passes
- Rules executing in Stage 1 and Stage 3 will only ever execute once so this value will always be equal to 0 or 1 for them. Table-To-Table rules executing in stage 2 can, and will, activate many times to ensure that a referentially correct set of rows is extracted for the target table. For those rules this value indicates the number of times the rule has activated.
The Number of Rule Worker Threads Setting
- The Number of Rule Worker Threads combo-box offers the option of increasing or decreasing the number of workers available for simultaneous operations. This value can be adjusted (up or down) while the set is running and the workers will enable or disable as required. Although the number of worker threads can be changed in the Set Extractor application, this information cannot be saved back to the extraction set (the Set Extractor does not have that capability). The Set Designer application is used to permanently change the number of Rule Worker Threads and save it to the extraction set.
- The most effective number of workers to use depends on a number of factors: the speed and memory of the PC on which the Set Extractor application is running, the speed of the network connection to the remote Oracle database, the speed of the Oracle database and number of CPU's on the Oracle database platform. As a general rule, it is inefficient to set the number of workers too high since bottlenecks will occur that can cause the total execution time to be longer than if fewer workers were running. Since the correct setting depends on a number of variables, finding the most effective setting is something of a trial and error process. A typical setting would be to set the number of worker threads to be equal to the number of CPU's on the Oracle database server.
The Analyze DTBTAB Now Button
- The DTBTAB temporary table contains one row for each extracted row. Sometimes it is necessary to refresh the statistics on the DTBTAB in order to ensure that the Oracle database builds an optimal execution plan. The analyze of the DTBTAB table can be triggered automatically at pre-set levels using the tools on the Options tab of the Rule Controller or, if desired, an immediate statistics analyze operation can be performed on the DTBTAB table by pressing this button. The extraction operation must be in progress for this button to be operational.
The Extraction Set Statistics Panel
- Various statistics for the currently running extraction set are displayed at the bottom of the Run Statistics tab. The statistics displayed are for the extraction set as a whole - if there are multiple Rule Controllers in the extraction set, the values shown will be the aggregate values for all Rule Controllers.
- Total run time
- The total time the extraction set has been operational.
- Unique extracted rows per second
- This value indicates the number of rows extracted per second by the rules in the extraction set. Note that the elapsed time is only calculated to the nearest second. For extract operations that take small amounts of time this figure can seem lower than it really is.
- Total extracted rows per second
- This value indicates the number of rows extracted per second by the rules in the extraction set including any duplicates which may subsequently have been discarded. The extraction algorythm ensures that only in very rare cases are duplicates extracted and hence this figure is almost always identical to the Total extracted rows per second.
- Total tables
- The total number tables known to the extraction set.
- Tables with extracted rows
- The total number of tables which have had rows extracted for them.
- Tables with non-zero src. rows
- The number of tables which have non zero source row counts according to the row count estimates in the Rule Controller. Since the source row count value is only an estimate this figure is also an estimate.
- Unique extracted rows
- The number of distinct rows extracted into all tables.
- Total extracted rows
- The number of rows extracted into all tables including possible duplicate rows subsequently discarded.
- Duplicate rows ignored
- The number of duplicate rows extracted and discarded.
- Tables with rows not extracted.
- The total number tables with (estimated) non-zero source row counts which have not had rows extracted for them.
- Total source row count
- The sum of all of the (estimated) source row count values for all tables.
- Subset database percentage.
- The number of extracted rows divided by the number of source rows expressed as a percentage. Since the source row count value is only an estimate this figure is also an estimate.
- Total rules
- The total number of rules in the extraction set.
- Rules activated
- The number of rules activated in the last extraction set run.
- Rules not run
- The number of rules eligible to be run which were not executed. An extraction set which completes without running all of its rules is not necessarily in error. A stage 2 Table-To-Table rule may not ever be activated if no rows were ever extracted by other rules for the source table in the relationship. This is simply what the rules require and referential integrity will be maintained.
- Estimated percent complete
- This value is calculated from the recorded time and extracted row counts from the previous run of the extraction set. It serves as a simple estimate of the length of time remaining for the extraction set run.
What the buttons do
- Show Error Summary
- This button displays a summary of any errors which have occurred during the run of the extraction set. Detailed information on specific errors which have occurred is also available by pressing on the Error button associated with each rule on the Set Extractor Rule Statistics tab.
[Net 2000 Ltd. Home][DataBee Home][DataBee Manual][DataBee FAQ]