Profiling

Understanding Profiling Settings

We have 10 attributes that can be specified in the profile settings. The attributes are as follows:

  1. Order: Order is the sequence in which the profiling is done.
  2. Day: the day of the week in which profiling is set to run.
  3. Start/End Time: the start and end time that profiling is set to perform.
  4. Number of Threads: a thread is a process where a query is executed on a database to do single or multiple tasks. The number of threads determines the number of parallel queries that are executed on the data source.
  5. Profile Type: There are four main types of data profiling:
      1. Sample: Profiling runs the profile on a given sample size. The data on columns (like Min, Max, Distinct, Null Count, etc.) will be different when compared with full profiles as we calculate them only on sample size. The sample profile is based on two main values. To execute a sample profile, first, select the Sample profile type as “Sample” and enter a sample profile size(count of records to be profiled).
      2. Auto: This profiling is based on the row count constraint set as ‘True’. 
        • If the Row Constraint checkbox is selected (Set as True). If the total Table Row Count (1000)  is more significant than configured Rowcount Limit (100) then the sample profiling is performed by considering the count mentioned in the Sample Profile Size.
        • If the  Table Row Count (100) is less than configured Rowcount Limit (1000)  then all the rows of the table will be profiled without considering the count mentioned in the Rowcount Limit.
      3. Query: Profiling is based on meeting the condition of a row count limit given in the profiler setting. To execute a Query profile, select the profile type as “Query”, select the checkbox for row count constraint, and then enter the row count limit in the profiler setting.
        • If the entered table row count is less than the Rowcount Limit, then the profiling is executed on the entire table.
        • If the input table row count exceeds the Rowcount Limit, then the profiling skips execution for those tables to avoid performance issues. 
      4. Disabled: Profile type which prevents profiling on the selected data source.
  6. Row Count Constraint: when set to true, this enables the data rule profiling.
  7. Row Count Limit: number of rows of data to be profiled.
  8. Sample Data Count: total number of rows to see within the table data page in the Catalog.
  9. Sample Profile Size: total number of rows to be included in profiling.
  10. Query Timeout: length of time in seconds to allow the query to run on a remote database before timing out.