Score Range Forensic Guard

Hard-gate "Rating Reliability" by enforcing strict rubric boundaries and identifying "Sentiment Mismatches" (ISO/IEC 5259 Compliant).

In professional "Human Evaluation"—such as ranking AI model outputs or grading creative assets—the "Score" must be technically valid and logically consistent. If an evaluator gives a '5/5' (Perfect) but their rationale mentions "Frequent errors," the data is contaminated. If an evaluator consistently gives '3/5' (Neutral) for every item, they are "Zoning Out." This "Logical Inconsistency" and "Fatigue Bias" is the primary reason why evaluation projects fail to achieve consensus. The Score Range Forensic Guard is a forensic-grade "Reliability Firewall" that ensures your evaluation data is 100% technically and logically sound.

This rule performs a "Multi-Layer Score Audit" on every submission. It utilizes "Range-Enforcement Logic"—ensuring that every score falls within your required scale (e.g., 1-5 or 1-10). TaskVerified identifies "Boundary Failures" and provides immediate feedback: "Score 6 is out of bounds [1, 5]. Please use the required rubric scale." This ensures that your "Aggregated Metrics" are always mathematically valid.

"Sentiment Alignment Forensic Analysis" is the core innovation of this rule. Our validator performs a "Linguistic Cross-Check" between the numeric score and the rationale text. TaskVerified identifies "Logical Mismatches"—where a high score is paired with negative keywords (error, bad, fail) or a low score is paired with positive keywords (great, perfect). TaskVerified requires the evaluator to reconcile their "Subjective Feeling" with their "Numeric Rating," ensuring high-fidelity data that an AI model can actually learn from.

The guard also features "ISO/IEC 5259 Compliant Bias Detection." It calculates the "Batch Distribution," identifying "Skew Bias" (consistently too high/low) and "Neutrality Fatigue" (over-use of the middle-ground score). If more than 35% of a batch is "Neutral," TaskVerified issues a "Batch Fatigue Alert," requiring the contributor to be more decisive. This level of oversight is essential for achieving high "Inter-Annotator Agreement" (IAA) in professional research environments.

For lead researchers and QA managers, this rule is a "Consensus Multiplier." It provides a specific "Reliability Integrity Report" for every batch: "Score Logic: 100% Verified." This documented proof of logical consistency allows you to build definitive "Ground Truth" datasets with total certainty in your human metrics. It transforms a complex manual "Spot-Check" into a guaranteed technical state: "Reliability Compliance: 100%."

Logic is the foundation of truth. The Score Range Forensic Guard ensures that your "Evaluation Metrics" are as consistent as they are accurate, protecting your data authority and ensuring 100% professional precision for every project.

Forensic Mechanism

The validator utilize a numeric parser that supports fractions (5/5) and text-embedded scores. It implements a "Linguistic Sentiment Sieve" to cross-reference text rationales with numeric values and performs "Batch Distribution Analysis" for ISO-compliant bias detection. It provides specific "Rating Conflict" reports for any non-compliant evaluation data.

handshakes & Hand-offs

Quality is a binary state.
Verified or Rejected.

Stop managing via opinion. Use the Robot PM to enforce the objective standards your brand requires.

Score Range Forensic Guard | TaskVerified Forensic Rules | TaskVerified