Most "AI job matching" pitches are a black box. You upload a resume, you get a percentage, and you're expected to trust it. That's a bad deal for job seekers, and it's the wrong design for a system whose mistakes have real consequences — a missed interview, a wasted application credit, a job you should have applied to that got filtered out by a model you can't audit.
This post walks through what actually happens inside an AI matching system: what kinds of signals it uses, where it goes wrong, and what an honest score should look like. We'll use RemoteHunt's scoring pipeline as the worked example because it's the one we can describe in detail without leaking anyone else's internals — but most of this generalizes.
The Naive Approach: Keyword Counting
The earliest "matching" tools were just keyword counters. Take the job description, extract a list of skills and tools, count how many appear in your resume, divide. A resume with 8 of the 10 listed skills scores 80%.
This was — and still is — the engine inside most basic ATS scoring. It works approximately, but it has obvious failure modes:
- It confuses presence with proficiency. "Mentioned Kubernetes once" scores the same as "ran a 200-node Kubernetes cluster in production for three years."
- It misses synonyms. "PostgreSQL" and "Postgres" are the same thing. So are "k8s" and "Kubernetes," "GCP" and "Google Cloud," "ML" and "Machine Learning."
- It ignores recency. A skill from a 2014 internship counts the same as one from your current job.
- It rewards keyword stuffing. A resume with a wall of comma-separated buzzwords beats a tightly-written one with the same actual experience.
A pure keyword matcher is essentially a spell checker for skill lists. Useful, fast, cheap — and not nearly enough.
The Modern Approach: Embeddings and Language Models
A well-designed AI matcher in 2026 uses two ingredients:
1. An embedding model that turns text into a vector — a long list of numbers that captures meaning, not just words. Two phrases like "led migration to event-driven architecture" and "redesigned platform around async messaging" land near each other in vector space, even with no shared words. 2. A reasoning model (Gemini, Claude, GPT-4o-class) that reads your resume alongside the job posting and produces a structured judgment: what matches, what's missing, where you exceed requirements, where you fall short.
The first is good for ranking thousands of jobs cheaply. The second is good for explaining why a single job is or isn't a fit.
A serious matching system uses both. Embeddings narrow 5,000 listings down to the 50 worth a closer look. The language model evaluates those 50 in detail and produces a real score with reasoning attached.
What a Good Score Actually Measures
A useful match score is not "how similar is your resume to this job." It's a weighted combination of several questions:
- Skill match. Do you have the explicitly required tools and technologies?
- Seniority match. Is the role aligned with your years of experience and scope of impact? A staff engineer applying to a junior role is a mismatch in the other direction.
- Domain match. Have you worked in the right industry or problem space? Fintech background applying to a healthcare role is a softer match even if the tech stack overlaps.
- Location and timezone fit. "Remote" is not one thing — it can mean US-only, Americas-only, EU-only, or fully global. A US candidate applying to an EU-only listing should get a low score regardless of skill match.
- Compensation fit. A senior engineer earning $250k applying to a $100k posting is statistically a mismatch even if everything else aligns.
- Authorization and visa. Some companies sponsor; most don't. Some only hire contractors; some only employees. A score that ignores this is wasting your time.
If a tool gives you a single score without telling you which of these dimensions are weak, the score is hard to act on. You don't know whether to skip the listing, apply with a tweaked resume, or apply confidently.
How RemoteHunt Scores: The Actual Pipeline
We'll be specific so you can compare what other tools claim against something concrete.
When a new remote listing appears in our database, the following happens:
1. Parse the listing. We pull the job title, company, location/timezone constraints, salary if disclosed, and the full description. We extract structured fields: required vs nice-to-have skills, years of experience, role function, employment type. 2. Parse your profile. Your uploaded resume is converted to a structured JSON profile (skills, roles, durations, achievements, location, target salary). This happens once on upload, not per job. 3. Score with Gemini Flash 2.0 at temperature 0. The model receives both structured profile and listing, plus a strict prompt asking for a 0–100 score with rationale broken down by category: skills, seniority, domain, location, compensation, authorization. 4. Cache the score. Every (user, job) pair is scored once. Re-runs only happen if you update your profile, and we tell you when that's about to happen.
The temperature-0 part is important. It means two runs of the same scoring call return the same number. You can refresh the page and the score doesn't drift. Most consumer AI products run at temperature 0.7+ because it produces more "natural" output; for scoring, that's a bug, not a feature.
Where AI Matching Goes Wrong
Even with a good model and a careful prompt, scoring fails in predictable ways. We'll list them honestly because pretending otherwise erodes trust.
Job descriptions are often badly written
A surprising number of postings list 15 "required" skills, half of which are clearly nice-to-haves. The model takes the listing at face value and penalizes candidates who could absolutely do the role. Mitigation: weight "required" less aggressively when the list is improbably long, and surface a flag like "this listing has unusually long requirements; review manually."
Resumes underrepresent recent experience
Most people stop updating their resume the moment they land a job. A resume from 18 months ago doesn't reflect what you've actually been doing. Mitigation: prompt users to refresh the most recent role before scoring; allow free-form notes to add unwritten experience.
Niche fields fool the model
If you're in a small specialized field — say, formal verification, or game-engine internals — generic LLMs may not know your tools well enough to score correctly. They may rate a perfect skill match as middling because they don't recognize "Coq" as a programming language. Mitigation: give users the ability to flag a low score as wrong and have it re-scored with additional context.
Compensation data is incomplete
Most US listings disclose salary; most EU listings don't. This means location fit and comp fit are entangled in ways the model can't always untangle. We tell users when comp data is missing rather than guessing.
"Remote" doesn't mean what it says
A meaningful fraction of "remote" listings turn out to be remote-eligible-with-quarterly-onsite, hybrid-after-six-months, or remote-but-only-from-three-states. Models can sometimes catch these; other times they slip through. Mitigation: encode known patterns ("remote-eligible," "occasional travel required") as scoring penalties and surface them on the listing card before you click apply.
Why Transparency Matters
A score is only useful if you know what it means and what it's based on. A 73 from one tool and a 73 from another can mean entirely different things.
This is why we show the breakdown — skill match, seniority, location, compensation — instead of just a final number. If your score is 60 because of a skill gap, that's actionable: maybe a tailored resume closes it, or maybe you skip the role. If your score is 60 because of a location mismatch, no amount of resume editing will help.
It's also why we show the reasoning text. You should be able to read what the model said about you and disagree with it. Sometimes the model misreads your resume; sometimes the listing has buried information the model surfaced for you. Either way, an opaque number doesn't help you make a decision.
What to Look For in Any AI Matching Tool
If you're evaluating us, our competitors, or any other matching product, here are five questions that separate serious tools from polished demos:
1. Is the score reproducible? Refresh the page. Does the number change? If yes, the underlying model is running at high temperature, which means the score has noise baked in. 2. Can you see the rationale? Is there a paragraph explaining why this number? If not, you can't audit it. 3. Is the score broken into components? Skill match, seniority, location, comp — each as a sub-score. A single number is too compressed to act on. 4. Does it handle location and timezone correctly? Apply your profile with a non-US location and check whether US-only listings get penalized. Many tools quietly ignore this. 5. Does it tell you when it doesn't have enough data? A confident 85 on a listing with no salary disclosed and a vague description is a sign of overconfidence in the model.
A good matching tool is humble about its limits. The number is a starting point for your decision, not a verdict.
The Limits of Matching
Even a perfectly-scored 95 doesn't guarantee you'll get the interview, and a 60 doesn't guarantee you won't. Hiring is a human process with real noise — a recruiter is having a bad day, a hiring manager has a pet candidate, the role gets reshuffled mid-search. Scoring helps you allocate effort intelligently across a sea of listings; it doesn't predict outcomes.
The right way to use a match score is the way you'd use a stock screener: as a filter to narrow your attention, not as a buy signal. The 50 listings you actually look at and apply to thoughtfully will outperform the 500 you blast at scale, every time.
That's the goal of a good matcher — not to replace your judgment, but to give it better-quality inputs to work with.
Curious how RemoteHunt scores you against current remote listings? Sign up free — your first 50 credits are on us. Already searching? You'll probably also want How to Find Remote Jobs in 2026 and our breakdown of the Best Remote Job Boards.