Candidate Scoring Framework

Why Unstructured Scoring Fails

Most hiring panels rely on implicit criteria. Each interviewer arrives with a different mental model of what 'strong' looks like, and consensus discussions reward whoever speaks loudest, not whoever observed the most signal.

A structured scoring framework replaces this with explicit dimensions assessed before the debrief — making each interviewer's evaluation independent and auditable.

Select Your Scoring Dimensions

Start with four to six dimensions specific to the role. Generic frameworks fail because they score every candidate on the same axes regardless of what the job actually requires.

For an engineering role, example dimensions might include: Technical Depth, System Thinking, Communication Under Ambiguity, Ownership Evidence, and Culture Contribution. For an operations role, swap Technical Depth for Process Design and add Cross-Functional Influence.

Each dimension should answer a specific question ('Does this person think in systems or in steps?') rather than a vague one ('Is this person smart?').

Assign Weights Before Reviewing

Weight assignment must happen before any candidate is evaluated — not after. Post-hoc weighting lets unconscious bias sneak in through convenient justification of already-formed opinions.

A common approach: make every weight explicit as a percentage (must add to 100%). For a senior role, Technical Depth might carry 30%, while Culture Contribution carries 10%. For a people-facing role, reverse those.

Document the weight set per job family, not per individual job posting. This prevents hiring managers from quietly adjusting weights to favor a preferred candidate.

Score Independently, Before Debriefs

The sequence matters more than the rubric itself. Require all interviewers to submit scores before the debrief — not during and not after. Research consistently shows that anchoring is the primary source of inflated interviewer agreement.

Use a 1–4 scale, not 1–5. Even-numbered scales force a directional judgment. A '3 out of 5' is easily used as a non-answer; a '2 out of 4' means below expectation.

After scores are submitted, the debrief should resolve disagreements — not manufacture agreement by overriding low scores with social pressure.

Normalize Across Interviewers

Different interviewers calibrate differently. One interviewer's '4' might be another's '3'. Without normalization, aggregating raw scores produces noise — not signal.

Simple normalization: track each interviewer's historical score distribution and apply a per-interviewer mean/standard deviation adjustment before aggregating. Even without statistics, reviewing score distributions during calibration sessions closes most of the gap.

Connecting Scores to Decisions

A scored rubric produces a number. It does not produce a decision. The decision gate — hire, hold, or decline — should be defined in advance, not improvised.

Example gates: weighted composite ≥ 75% triggers an offer recommendation; between 60–75% triggers a hold; below 60% triggers a structured decline. These thresholds should be calibrated by role level.

This separation — score vs. decision — is what makes hiring defensible. When a candidate later asks why they were declined, you have clean evidence rather than post-hoc rationalization.