Back to the scanner

Methodology

One question drives everything here: can search engines find the website, and can AI systems understand, trust, and recommend it? This page explains exactly how we score that, and, just as importantly, what we refuse to claim.

Scoring rules

  • Undeterminable criteria are skipped, not zeroed. If a check cannot be established from public evidence (e.g. traceroute from Oman, backend geolocation behind a CDN), its weight is excluded and the section score is normalized over what could actually be measured. A site is never punished for the scanner's blind spots.
  • Oman locality is binary. Evidence either points to Oman or it does not. A UAE or wider-GCC edge earns no Oman-locality points; regional proximity is reflected only in performance-related checks.
  • One report per domain per day (Muscat time). Re-scanning the same domain on the same day serves the stored report (on-screen and PDF). Every scan issues a signed, timestamped PDF, and every criterion score is stored individually.

The category model (100 points)

Ten categories, weighted by real-world impact. The report presents three top-level scores built from them: SEO Discoverability, AI Readability, and the separate Oman Localization Readiness.

Crawlability & Indexability

15 pts

HTTP status, robots.txt, noindex, canonicals, sitemap, static-HTML content, internal links, AI-crawler access (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Applebot, CCBot, Googlebot, Bingbot).

Site Architecture & Internal Linking

10 pts

Homepage positioning, service URL structure, per-service pages, breadcrumbs, cross-linking, footer completeness, orphan-page sampling.

Page Structure & AI Readability

15 pts

Single H1, definitional opening (first 100–150 words), H2/H3 hierarchy, scannable formats, audience, offering, outcomes, CTA, fluff detection.

Content Usefulness & Answer Quality

15 pts

Direct answers, specificity signals, use cases, process, pricing, FAQs, limitations, keyword-stuffing check, plus five LLM-style extraction tests (heuristic in MVP).

Entity Clarity & Brand Understanding

10 pts

Company name, brand consistency, market, team, service naming, contacts, sameAs, description consistency, superlative-claim check, category positioning.

Structured Data / Schema

10 pts

Organization, LocalBusiness, Service, FAQPage, Article, BreadcrumbList; JSON-LD validity; schema-vs-visible-content match; enrichment properties.

Trust, Proof & Authority

10 pts

Case studies, quantified results, testimonials, demos/artifacts, team credibility, external references, dates, claim verifiability.

Agent / LLM Friendly Layer

7 pts

llms.txt, llms-full.txt, markdown twins, service index, do/don’t guidance, machine-readable contacts, last-updated dates.

Local, Multilingual & Market Signals

5 pts

Service area, visible contacts, Arabic/English, hreflang, local proof (OMR, +968, Oman regulations, city references).

Performance, Accessibility & UX

3 pts

Viewport, page-weight & TTFB proxies, semantic elements, labels/alt text, layout-stability hints. Lightweight MVP checks, not lab data.

Rating bands

  • 90–100 Excellent
  • 75–89 Good
  • 60–74 Average
  • 40–59 Weak
  • 0–39 Poor

SEO Discoverability and AI Readability are weighted blends of the relevant categories; Entity, Structured Data, Trust and Agent-layer scores are their categories rescaled to 100. Each score carries its own rating band; there is no single combined score.

Oman Localization Readiness (separate /100)

Checks whether a website is locally relevant, fast for Oman users, and whether there is evidence of Oman-hosted infrastructure. Locality checks (IP, ASN, DNS, API host) are binary: Oman or not Oman. A .om / .com.om domain is also credited: registration requires an Omani commercial presence, making it registry-verified local proof.

Business & content localization20
Technical hosting proximity25
Asset/resource localization15
Backend/API localization evidence15
Data residency evidence10
Performance for Oman users10
Transparency/disclosure5

Latency interpretation bands (for a real Oman probe): 30–80ms strong local/GCC signal · 100–180ms possible UAE/GCC · 200–350ms likely farther region · 400ms+ poor regional localization. Latency is always evidence, not proof. The MVP measures TTFB from the scanner’s serverless region and labels it approximate; the architecture accepts future probes in Oman, UAE, Saudi Arabia, India, Europe and the USA.

Confidence labels

Every finding carries one of six labels. They are not decoration. They are the product.

Confirmed

Directly observed in a server response or file. E.g. “robots.txt blocks GPTBot”.

Strong evidence

Multiple consistent public signals. Very likely true, technically falsifiable.

Moderate evidence

A reasonable inference from public signals; heuristics may misread edge cases.

Weak evidence

A hint, not a conclusion. Often from limited crawl samples or single-region measurements.

Not externally verifiable

Cannot be established from outside at all: database location, backup regions, origin servers behind CDNs. We report claims, never facts, here.

Contradictory evidence

Public signals disagree with each other or with the site’s own claims.

What this scanner will never claim

Backend location. Modern websites often use CDNs and reverse proxies. Public tests may reveal the visible edge server, but not the private origin server, backend application server, database, logs, or backup location. When we detect Cloudflare, CloudFront, Akamai, Fastly, Vercel, Netlify, BunnyCDN or Azure Front Door, we say so: “The visible server appears to be a CDN/proxy. Origin backend location cannot be confirmed externally.”

Data residency. Database and data residency cannot usually be verified from outside. This scanner only reports public technical evidence and visible policy claims, always labeled “externally claimed, not independently verified.”

Rich results. Structured data may help search engines understand the page, but it does not guarantee rich results or ranking improvements.

llms.txt. llms.txt and markdown twins are not guaranteed ranking factors. They are treated here as AI-readability and agent-discovery aids, not as replacements for technical SEO or high-quality content.

Blocking AI crawlers is not automatically wrong. If a site blocks GPTBot or ClaudeBot we flag the tradeoff (AI discovery may be limited if you want the site to appear in AI-assisted search or assistant answers), but content protection is a legitimate choice.

Scan mechanics

  • Fetches the homepage, robots.txt and sitemaps, then crawls up to ~10 prioritized same-domain pages (about, contact, services, pricing, case studies, blog, privacy, terms…).
  • Parses static HTML only, with no JavaScript execution. This mirrors how most AI crawlers see the site, and low static-HTML content is itself reported as a finding.
  • Probes /llms.txt, /llms-full.txt, /ai.txt, /humans.txt, /.well-known/security.txt and .md twins of crawled pages.
  • Extracts JSON-LD (plus microdata types), asset hosts, API hosts from HTML, form actions and up to 5 same-site JS bundles.
  • Resolves DNS, follows CNAME chains, fingerprints CDNs from headers, and geolocates the visible IP via a pluggable provider (free-tier accuracy; treated as evidence, not fact).
  • Reports are stored via Supabase and Vercel Blob so results are shareable by permalink.

Ready to see your numbers?

Run the free scan