On-Page SEO Statistics: What 1.5 Million Website Scans Reveal
We ran our automated on-page SEO scanner across 1,528,567 websites — 1.53 million distinct domains — between 29 March and 7 June 2026. The data shows most of the web is leaving basic, fixable on-page SEO on the table. The average site scores just 41.9 out of 100.
1,528,567 scans · 1.53M domains · Mar–Jun 2026 · free to cite (credit Abby SEO)
Key findings
- The average website scores 41.9/100 for on-page SEO (median 40). Two-thirds of all sites score below 50.
- 74% of websites are missing one or more core Open Graph tags — the single most-failed check — so their links render as bare text when shared on social media.
- 61% of websites publish no structured data (no JSON-LD or microdata), forfeiting eligibility for rich results.
- Half of all websites (50%) have no meta description, and only 18% have one in the recommended length range.
- 51% of websites have no canonical tag, and 46% have no XML sitemap our crawler could find.
- 44% of websites have no H1 heading, and only 30% of pages have a title tag in the recommended 30–60 character range.
- Website builders beat self-hosted stacks. Wix-built sites average 58.2; sites served from raw Apache average 29.8.
How websites perform on each on-page SEO check
Every scanned page is graded against the same on-page checks. Each check returns pass, warn (present but sub-optimal), or fail (absent or broken). The table below shows the share of all 1,528,567 scans landing in each state. Descriptions state exactly what each check tests.
| On-page check | Pass | Warn | Fail | What "fail" / "warn" means |
|---|---|---|---|---|
| Open Graph tags | 25.9% | — | 74.1% | Fail = missing one or more of og:title, og:description, og:image, og:url. |
| Structured data (schema) | 38.7% | — | 61.3% | Fail = no JSON-LD or microdata structured data found on the page. |
| Canonical tag | 48.3% | 0.4% | 51.3% | Fail = no canonical link tag. Warn = canonical present but a relative URL. |
| Meta description | 18.0% | 31.6% | 50.4% | Fail = no meta description. Warn = present but too short (<120) or too long (>160) characters. |
| Image alt text | 53.1% | — | 46.9% | Fail = at least one <img> on the page is missing alt text. (Pages with no images pass.) |
| XML sitemap | 53.8% | — | 46.2% | Fail = no XML sitemap found at common paths or declared in robots.txt. |
| H1 heading | 41.5% | 14.3% | 44.2% | Fail = no H1 heading. Warn = more than one H1 (best practice is exactly one). |
| HTML lang attribute | 73.5% | — | 26.5% | Fail = no lang attribute on the <html> tag. |
| Viewport meta (mobile) | 78.4% | — | 21.6% | Fail = no viewport meta tag, the signal that a page is mobile-responsive. |
| Page title | 30.4% | 50.7% | 19.0% | Fail = no <title> tag. Warn = title too short (<30) or too long (>60) characters. |
| robots.txt | 45.5% | 35.8% | 18.7% | Fail = no robots.txt. Warn = robots.txt exists but has no Sitemap directive. |
| Robots meta tag | 95.8% | 0.1% | 4.1% | Fail = page carries a noindex directive (it won't appear in search). Warn = nofollow. |
| Twitter Card tag | 42.9% | 57.1% | — | Warn = no twitter:card meta tag (a soft signal; not penalized as a failure). |
| Heading hierarchy | 32.3% | 67.7% | — | Warn = fewer than two headings on the page, or heading levels that skip (e.g. H2 → H4). |
| Breadcrumb schema | 11.4% | 88.6% | — | Warn = no BreadcrumbList structured data (an opportunity, not a hard failure). |
Checks shown as a dash (—) have no outcome in that column — for example, Open Graph tags are scored strictly pass/fail, while Twitter Card and breadcrumb schema only ever pass or warn. Some checks treat "missing" as a warning rather than a failure because they are best-practice opportunities rather than errors.
How website SEO scores are distributed
Scores are not bell-shaped — they lean low. Nearly 45% of all websites score below 40, and only about 1 in 14 scores 70 or higher. Fewer than 1 in 1,000 sites scores 90+. Each bar is the share of all scored websites whose overall score fell in that 10-point band.
Percentages are the share of 1,530,960 scored websites. Bars are scaled relative to the largest band (30–39, the modal score range) for readability. The 90–100 band combines scores of 90–99 with the 157 sites that scored a perfect 100.
On-page SEO scores by platform: builders beat self-hosted
The clearest pattern in the data: hosted website builders and managed e-commerce platforms produce meaningfully better on-page SEO out of the box than self-hosted stacks. Sites we could only identify by their web server (raw Nginx or Apache, typically bespoke or hand-rolled) score worst — they ship none of the defaults a builder bakes in.
| Platform | Avg. SEO score | Websites measured |
|---|---|---|
| Wix | 58.2 | 72,617 |
| Shopify | 53.2 | 34,868 |
| WordPress | 53.0 | 372,419 |
| Squarespace | 52.0 | 37,688 |
| WooCommerce | 48.3 | 398 |
| Next.js | 47.6 | 62,112 |
| Webflow | 45.6 | 593 |
| Drupal | 44.6 | 4,651 |
| Magento | 41.2 | 819 |
| Nuxt.js | 37.4 | 18,918 |
| Nginx (server only) | 33.4 | 102,016 |
| Apache (server only) | 29.8 | 74,907 |
"Server only" rows are sites where we could fingerprint the web server (Nginx, Apache) but not a CMS or framework — usually custom or hand-built sites. Platforms with fewer than ~350 sites are omitted from this table for stability. The 743,157 sites where no platform could be detected (avg 35.7) are also excluded.
A notable outlier: 38% of sites already have an llms.txt
One number stood out. The llms.txt file — an emerging convention that tells AI assistants what a site is about — passed on 38.1% of scans. That seems high for a brand-new standard. The reason is in how the check works: it passes when either an llms.txt file is reachable (with more than a token of content) or the page declares one via a <link> in its head. A large share of the passes come from platforms and templates that auto-generate the file or link, not from owners deliberately authoring one. We flag it here rather than in the headline stats because the number reflects platform defaults as much as deliberate AI-readiness work.
What this means for your website
The encouraging takeaway is that the most common failures are also the cheapest to fix. A missing meta description, absent Open Graph tags, no canonical, or no sitemap are each a one-time edit — not a rebuild. Because the average site scores 41.9, fixing even a handful of the checks above moves a site past most of the web. The widest gaps across the whole dataset, in order, are: Open Graph tags (74% missing), structured data (61%), canonical tags (51%), meta descriptions (50%), image alt text (47%), and XML sitemaps (46%).
If you run on a self-hosted Nginx or Apache stack, the platform comparison is a warning worth heeding: you don't inherit the sensible defaults a builder gives you, so these checks are entirely on you. The fastest way to know where you stand is to run a free scan of your own site — you'll get the same checks measured here, scored against the same rubric, with a fix for every failure.
Want to go deeper than the homepage scan? Use the free SEO checker for a full technical audit, the website audit to grade any URL's on-page and performance signals, or keyword research to find the terms worth targeting.
Methodology
- Sample. 1,528,567 completed website scans covering 1.53 million distinct domains, collected 29 March – 7 June 2026. Score-distribution and platform figures draw on 1,530,960 scored scans.
- Source. All data comes from Abby SEO's automated on-page SEO scanner, the same tool that powers our public website checker. Each scan fetches a site's homepage (following redirects) and grades it against a fixed set of on-page checks.
- Aggregation. Figures are anonymized aggregates — counts and averages across the whole population. No individual site is identified, and per-check rates are simple shares of all scans.
- HTTPS / selection-bias caveat. This dataset only includes scans that completed. Sites that were unreachable, that a firewall blocked, or whose certificate was so broken the scan failed are not counted — so the dataset is biased toward sites that are live and well-enough configured to be crawled. For that reason we deliberately do not claim "100% of sites use HTTPS": HTTPS passed on essentially every completed scan, but that is an artifact of which sites can be scanned, not a statement about the whole web.
- Excluded checks. A few checks were left out of the published tables because they ran on too few scans to be reliable or are internal: the TLS-certificate fallback flag, the llms.txt discovery-link check, and an internal run marker. The llms.txt file check itself is discussed separately above rather than headlined, because its pass rate reflects platform defaults.
- Platform detection. Platform is inferred from page markup and response headers. "Server only" rows (Nginx, Apache) are sites where a web server was identified but no CMS or framework; 743,157 sites with no detectable platform are excluded from the platform table.
See where your site lands
Run the same checks on your own homepage in about a minute — free, no signup. You'll get a 0–100 score, every passed and failed check, and exactly what to fix.
Run a free SEO scanCiting this data? Please credit Abby SEO and link to https://www.abbyseo.com/research/onpage-seo-statistics. The aggregates on this page are free to reuse.