In professional "Human Evaluation"—such as ranking AI model outputs or grading creative assets—the "Score" must be technically valid and logically consistent. If an evaluator gives a '5/5' (Perfect) but their rationale mentions "Frequent errors," the data is contaminated. If an evaluator consistently gives '3/5' (Neutral) for every item, they are "Zoning Out." This "Logical Inconsistency" and "Fatigue Bias" is the primary reason why evaluation projects fail to achieve consensus. The Score Range Forensic Guard is a forensic-grade "Reliability Firewall" that ensures your evaluation data is 100% technically and logically sound.
This rule performs a "Multi-Layer Score Audit" on every submission. It utilizes "Range-Enforcement Logic"—ensuring that every score falls within your required scale (e.g., 1-5 or 1-10). TaskVerified identifies "Boundary Failures" and provides immediate feedback: "Score 6 is out of bounds [1, 5]. Please use the required rubric scale." This ensures that your "Aggregated Metrics" are always mathematically valid.
"Sentiment Alignment Forensic Analysis" is the core innovation of this rule. Our validator performs a "Linguistic Cross-Check" between the numeric score and the rationale text. TaskVerified identifies "Logical Mismatches"—where a high score is paired with negative keywords (error, bad, fail) or a low score is paired with positive keywords (great, perfect). TaskVerified requires the evaluator to reconcile their "Subjective Feeling" with their "Numeric Rating," ensuring high-fidelity data that an AI model can actually learn from.
The guard also features "ISO/IEC 5259 Compliant Bias Detection." It calculates the "Batch Distribution," identifying "Skew Bias" (consistently too high/low) and "Neutrality Fatigue" (over-use of the middle-ground score). If more than 35% of a batch is "Neutral," TaskVerified issues a "Batch Fatigue Alert," requiring the contributor to be more decisive. This level of oversight is essential for achieving high "Inter-Annotator Agreement" (IAA) in professional research environments.
For lead researchers and QA managers, this rule is a "Consensus Multiplier." It provides a specific "Reliability Integrity Report" for every batch: "Score Logic: 100% Verified." This documented proof of logical consistency allows you to build definitive "Ground Truth" datasets with total certainty in your human metrics. It transforms a complex manual "Spot-Check" into a guaranteed technical state: "Reliability Compliance: 100%."
Logic is the foundation of truth. The Score Range Forensic Guard ensures that your "Evaluation Metrics" are as consistent as they are accurate, protecting your data authority and ensuring 100% professional precision for every project.