Methodology
One question drives everything here: can search engines find the website, and can AI systems understand, trust, and recommend it? This page explains exactly how we score that, and, just as importantly, what we refuse to claim.
Scoring rules
- Undeterminable criteria are skipped, not zeroed. If a check cannot be established from public evidence (e.g. traceroute from Oman, backend geolocation behind a CDN), its weight is excluded and the section score is normalized over what could actually be measured. A site is never punished for the scanner's blind spots.
- Oman locality is binary. Evidence either points to Oman or it does not. A UAE or wider-GCC edge earns no Oman-locality points; regional proximity is reflected only in performance-related checks.
- One report per domain per day (Muscat time). Re-scanning the same domain on the same day serves the stored report (on-screen and PDF). Every scan issues a signed, timestamped PDF, and every criterion score is stored individually.
The category model (100 points)
Ten categories, weighted by real-world impact. The report presents three top-level scores built from them: SEO Discoverability, AI Readability, and the separate Oman Localization Readiness.
Crawlability & Indexability
15 ptsHTTP status, robots.txt, noindex, canonicals, sitemap, static-HTML content, internal links, AI-crawler access (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Applebot, CCBot, Googlebot, Bingbot).
Site Architecture & Internal Linking
10 ptsHomepage positioning, service URL structure, per-service pages, breadcrumbs, cross-linking, footer completeness, orphan-page sampling.
Page Structure & AI Readability
15 ptsSingle H1, definitional opening (first 100–150 words), H2/H3 hierarchy, scannable formats, audience, offering, outcomes, CTA, fluff detection.
Content Usefulness & Answer Quality
15 ptsDirect answers, specificity signals, use cases, process, pricing, FAQs, limitations, keyword-stuffing check, plus five LLM-style extraction tests (heuristic in MVP).
Entity Clarity & Brand Understanding
10 ptsCompany name, brand consistency, market, team, service naming, contacts, sameAs, description consistency, superlative-claim check, category positioning.
Structured Data / Schema
10 ptsOrganization, LocalBusiness, Service, FAQPage, Article, BreadcrumbList; JSON-LD validity; schema-vs-visible-content match; enrichment properties.
Trust, Proof & Authority
10 ptsCase studies, quantified results, testimonials, demos/artifacts, team credibility, external references, dates, claim verifiability.
Agent / LLM Friendly Layer
7 ptsllms.txt, llms-full.txt, markdown twins, service index, do/don’t guidance, machine-readable contacts, last-updated dates.
Local, Multilingual & Market Signals
5 ptsService area, visible contacts, Arabic/English, hreflang, local proof (OMR, +968, Oman regulations, city references).
Performance, Accessibility & UX
3 ptsViewport, page-weight & TTFB proxies, semantic elements, labels/alt text, layout-stability hints. Lightweight MVP checks, not lab data.
Rating bands
- 90–100 Excellent
- 75–89 Good
- 60–74 Average
- 40–59 Weak
- 0–39 Poor
SEO Discoverability and AI Readability are weighted blends of the relevant categories; Entity, Structured Data, Trust and Agent-layer scores are their categories rescaled to 100. Each score carries its own rating band; there is no single combined score.
Oman Localization Readiness (separate /100)
Checks whether a website is locally relevant, fast for Oman users, and whether there is evidence of Oman-hosted infrastructure. Locality checks (IP, ASN, DNS, API host) are binary: Oman or not Oman. A .om / .com.om domain is also credited: registration requires an Omani commercial presence, making it registry-verified local proof.
| Business & content localization | 20 |
| Technical hosting proximity | 25 |
| Asset/resource localization | 15 |
| Backend/API localization evidence | 15 |
| Data residency evidence | 10 |
| Performance for Oman users | 10 |
| Transparency/disclosure | 5 |
Latency interpretation bands (for a real Oman probe): 30–80ms strong local/GCC signal · 100–180ms possible UAE/GCC · 200–350ms likely farther region · 400ms+ poor regional localization. Latency is always evidence, not proof. The MVP measures TTFB from the scanner’s serverless region and labels it approximate; the architecture accepts future probes in Oman, UAE, Saudi Arabia, India, Europe and the USA.
Confidence labels
Every finding carries one of six labels. They are not decoration. They are the product.
Confirmed
Directly observed in a server response or file. E.g. “robots.txt blocks GPTBot”.
Strong evidence
Multiple consistent public signals. Very likely true, technically falsifiable.
Moderate evidence
A reasonable inference from public signals; heuristics may misread edge cases.
Weak evidence
A hint, not a conclusion. Often from limited crawl samples or single-region measurements.
Not externally verifiable
Cannot be established from outside at all: database location, backup regions, origin servers behind CDNs. We report claims, never facts, here.
Contradictory evidence
Public signals disagree with each other or with the site’s own claims.
What this scanner will never claim
Backend location. Modern websites often use CDNs and reverse proxies. Public tests may reveal the visible edge server, but not the private origin server, backend application server, database, logs, or backup location. When we detect Cloudflare, CloudFront, Akamai, Fastly, Vercel, Netlify, BunnyCDN or Azure Front Door, we say so: “The visible server appears to be a CDN/proxy. Origin backend location cannot be confirmed externally.”
Data residency. Database and data residency cannot usually be verified from outside. This scanner only reports public technical evidence and visible policy claims, always labeled “externally claimed, not independently verified.”
Rich results. Structured data may help search engines understand the page, but it does not guarantee rich results or ranking improvements.
llms.txt. llms.txt and markdown twins are not guaranteed ranking factors. They are treated here as AI-readability and agent-discovery aids, not as replacements for technical SEO or high-quality content.
Blocking AI crawlers is not automatically wrong. If a site blocks GPTBot or ClaudeBot we flag the tradeoff (AI discovery may be limited if you want the site to appear in AI-assisted search or assistant answers), but content protection is a legitimate choice.
Scan mechanics
- Fetches the homepage, robots.txt and sitemaps, then crawls up to ~10 prioritized same-domain pages (about, contact, services, pricing, case studies, blog, privacy, terms…).
- Parses static HTML only, with no JavaScript execution. This mirrors how most AI crawlers see the site, and low static-HTML content is itself reported as a finding.
- Probes /llms.txt, /llms-full.txt, /ai.txt, /humans.txt, /.well-known/security.txt and .md twins of crawled pages.
- Extracts JSON-LD (plus microdata types), asset hosts, API hosts from HTML, form actions and up to 5 same-site JS bundles.
- Resolves DNS, follows CNAME chains, fingerprints CDNs from headers, and geolocates the visible IP via a pluggable provider (free-tier accuracy; treated as evidence, not fact).
- Reports are stored via Supabase and Vercel Blob so results are shareable by permalink.
Ready to see your numbers?
Run the free scan