2 Comments

Great article about data QA. By the way, is this formula for assessing duplication appropriate?

> Assertion pass if (1 – primary_key_count ) / total_row_count < duplicates SLA

Shouldn't it be

`1 - primary_key_count/total_row_count < duplicates SLA`?

Expand full comment

This was a great read. Thank you! I’m looking to automate data quality monitoring at my job and this article has given me so many ideas and validates some approaches I’m working towards.

Expand full comment