Skip to main content

Row Count Check

The row_count check validates the number of rows in a dataset. This is one of the most basic and commonly used data quality checks.

Configuration

ParameterRequiredDescription
nameYesUnique name for the check
datasetYesTable name or SQL query
typeYesMust be row_count
conditionYesComparison operator
thresholdYesExpected row count value
dimensionsNoGroup by columns
filterNoWHERE clause conditions

Examples

Basic Row Count Check

- name: orders_minimum_rows
dataset: orders
type: row_count
condition: gt
threshold: 0

This check ensures the orders table has more than 0 rows.

Row Count with Filter

- name: active_users_count
dataset: users
type: row_count
condition: ge
threshold: 100
filter: status = 'active'

This check validates that there are at least 100 active users.

Row Count by Dimensions

- name: orders_by_region
dataset: orders
type: row_count
dimensions: [region, status]
condition: gt
threshold: 10

This check ensures each region/status combination has more than 10 orders.

Row Count with Time Dimension

- name: daily_orders_count
dataset: orders
type: row_count
condition: gt
threshold: 50
time_dimension:
name: created_at
granularity: day

This check validates that each day has more than 50 orders.

Use Cases

  • Data Freshness: Ensure new data is being loaded
  • Completeness: Verify expected data volume
  • Business Rules: Validate minimum activity levels
  • Monitoring: Track growth or decline in data volume

Generated SQL

The row count check generates SQL similar to:

SELECT COUNT(*) 
FROM orders
WHERE status = 'active'

With dimensions:

SELECT region, status, COUNT(*) 
FROM orders
GROUP BY region, status