Skip to main content

Not Empty Percentage Check

The not_empty_pct check validates that the percentage of NULL values in specified dimensions does not exceed a threshold. This check is similar to not_empty but uses percentage-based thresholds instead of absolute counts.

Configuration

ParameterRequiredDescription
nameYesUnique name for the check
datasetYesTable name or SQL query
typeYesMust be not_empty_pct
dimensionsYesList of columns to check for NULL values
conditionYesComparison operator (typically le for "less than or equal")
thresholdNoMaximum allowed NULL percentage as decimal (0.0-1.0, default: 0.0)
filterNoWHERE clause conditions

Examples

Basic Percentage Check

- name: customer_data_completeness_pct
dataset: customers
type: not_empty_pct
dimensions: [phone, address]
condition: le
threshold: 0.05 # Allow up to 5% NULL values

This check ensures that no more than 5% of customers have NULL phone or address values.

Strict Completeness

- name: critical_fields_complete
dataset: orders
type: not_empty_pct
dimensions: [customer_id, product_id]
condition: le
threshold: 0.0 # No NULL values allowed (0%)

This check requires 100% completeness for critical order fields.

Higher Tolerance for Optional Fields

- name: optional_fields_check
dataset: users
type: not_empty_pct
dimensions: [middle_name, phone_secondary]
condition: le
threshold: 0.8 # Allow up to 80% NULL values

This check allows optional fields to have high NULL rates.

Default Threshold

- name: required_fields_pct
dataset: products
type: not_empty_pct
dimensions: [product_name, price]
condition: le
# No threshold specified - defaults to 0.0 (0%)

When no threshold is specified, it defaults to 0.0 (no NULL values allowed).

Threshold Format

Thresholds are specified as decimal values between 0.0 and 1.0:

  • 0.0 = 0% (no NULL values allowed)
  • 0.05 = 5% (up to 5% NULL values allowed)
  • 0.1 = 10% (up to 10% NULL values allowed)
  • 0.5 = 50% (up to 50% NULL values allowed)
  • 1.0 = 100% (all values can be NULL)

Behavior

  • Individual Dimension Checks: The check runs separately for each dimension specified
  • Result per Dimension: Each dimension generates its own check result
  • Naming Convention: Results are named as {original_name}_{dimension}_not_empty_pct
  • Percentage Calculation: Automatically calculates NULL percentage using SQL CAST operations

Use Cases

  • Data Quality with Tolerance: Allow some missing data while maintaining quality
  • Business Rules: Validate acceptable completeness rates
  • SLA Monitoring: Ensure data completeness meets service level agreements
  • Gradual Improvement: Set improving targets over time (e.g., reduce from 10% to 5%)

Generated SQL

For each dimension, the check generates percentage calculation SQL like:

SELECT CAST(SUM(CASE WHEN phone IS NULL THEN 1 ELSE 0 END) AS FLOAT) / CAST(COUNT(*) AS FLOAT)
FROM customers
WHERE status = 'active'

This automatically calculates the percentage as a decimal (e.g., 0.05 for 5%).

Example Results

If checking phone and email dimensions with a 5% threshold, you might get:

✓ customer_completeness_phone_not_empty_pct: 0.03 (3% ≤ 5%)
✗ customer_completeness_email_not_empty_pct: 0.08 (8% > 5%)

Comparison with Not Empty Check

Aspectnot_emptynot_empty_pct
Threshold TypeAbsolute countPercentage (0.0-1.0)
Use CaseFixed datasetsVariable dataset sizes
Example≤ 100 NULL values≤ 5% NULL values
ScalabilityPoor for growing dataScales with data size
  • Not Empty - Similar check but using absolute count thresholds