Content is user-generated and unverified.

Technical SEO Checklist: 30+ Items for Crawlability, Speed, and Indexing in 2026

A technical SEO checklist covers everything search engines need to crawl, render, index, and rank your pages correctly. BlazeHive generates technically correct pages with proper schema, canonical tags, and optimized structure, but it does not replace a full technical site audit. This checklist gives you every item to verify, the specific threshold that passes, and the tool to check it. Use it quarterly on your full site and immediately after any migration, redesign, or CMS change.

Crawlability: Can Search Engines Find Your Pages?

Crawlability determines whether Google can discover and access your content. If Googlebot cannot reach a page, nothing else matters.

Robots.txt configuration. Your robots.txt file lives at domain.com/robots.txt and controls which URLs crawlers can access. Verify it allows access to all pages you want indexed. Common mistakes include blocking CSS/JS files (prevents rendering), blocking entire subdirectories accidentally, and using overly broad disallow patterns. Test using Google Search Console's robots.txt tester. Note that Google ignores crawl-delay directives. Adjust crawl rate in Search Console directly.

XML sitemap. Submit your sitemap through Google Search Console. Include only indexable, canonical URLs (no noindexed pages, no redirects, no 404s). Update it automatically when pages are added or removed. Maximum 50,000 URLs per sitemap file. Larger sites need sitemap index files. Verify your sitemap returns HTTP 200 and valid XML using the sitemap checker. Pages in your sitemap get discovered 14% faster than pages relying solely on internal link crawling.

Internal link structure. Every important page should be reachable within 3 clicks from the homepage. Orphan pages (pages with zero incoming internal links) may never get crawled. Use a crawl tool like Screaming Frog ($259/year), Sitebulb ($35-$70/month), or Ahrefs Site Audit (included in $99-$999/month plans) to identify unreachable pages. Fix by adding internal links from relevant existing pages.

Crawl budget optimization. Sites with over 10,000 pages need to manage crawl budget actively. Block low-value pages (search results, filtered pages, parameter variations) from crawling. Remove redirect chains (more than 2 hops). Fix soft 404s (pages returning 200 status but showing error content). Monitor crawl stats in Search Console under Settings > Crawl Stats. Healthy sites see Googlebot crawling 50-80% of their pages monthly.

Indexability: Are Your Pages Making It Into the Index?

A page can be crawlable but still not indexed. Google decides whether each page deserves a spot in the index based on quality signals and technical directives.

Canonical tags. Every page needs a self-referencing canonical tag pointing to its own URL. When duplicate content exists (HTTP/HTTPS versions, www/non-www, parameter variants), all versions should canonical to the preferred URL. Google considers canonical tags as a strong hint, not a directive. Conflicting signals (sitemap includes non-canonical URL, internal links point to non-canonical) weaken the tag. Audit using Screaming Frog or the canonical checker tool to find pages with missing, broken, or conflicting canonicals.

Meta robots directives. The <meta name="robots" content="index, follow"> tag explicitly tells Google to index a page and follow its links. Use noindex only on pages that should not appear in search results (admin pages, thin tag pages, duplicate pagination). A single accidental noindex on a high-traffic page can erase thousands of visits within 48 hours of recrawling. Audit monthly using Search Console's Index Coverage report.

HTTP status codes. Every important page should return 200. Redirects should return 301 (permanent) or 308 (permanent, preserves method). Temporary redirects (302, 307) do not pass full link equity. Monitor for unexpected 404s and 5xx errors weekly. Fix any 5xx errors immediately because they signal server instability and can trigger crawl rate reduction. Google Search Console's Coverage report surfaces most status issues automatically.

Thin content and index bloat. Pages with fewer than 300 words of unique content risk being classified as thin and excluded from the index. Paginated archive pages, tag pages with only 2-3 posts, and automatically generated category pages often fall into this category. Either add unique content to make them valuable or noindex them to prevent index bloat. Sites with index bloat (more indexed pages than valuable pages) see ranking dilution across the entire domain.

Site Speed and Core Web Vitals

Google measures three Core Web Vitals. All three must pass "good" thresholds at the 75th percentile of page loads.

Largest Contentful Paint (LCP): under 2.5 seconds. LCP measures when the largest visible element finishes loading. Common fixes: optimize the hero image (compress to WebP, use appropriate dimensions, preload with <link rel="preload">), reduce server response time (target TTFB under 800ms), minimize render-blocking CSS/JS, and implement a CDN for static assets. Pages with LCP over 4.0 seconds are classified "poor" and lose the page experience ranking boost.

Interaction to Next Paint (INP): under 200 milliseconds. INP replaced First Input Delay in March 2024. It measures responsiveness to any user interaction (clicks, taps, keypresses). Common fixes: break up long JavaScript tasks (over 50ms) into smaller chunks, defer non-critical JavaScript, reduce main-thread work, and minimize DOM size (target under 1,500 elements). Sites with INP over 500ms are classified "poor."

Cumulative Layout Shift (CLS): under 0.1. CLS measures visual stability. Elements shifting position after initial render frustrate users. Common fixes: set explicit width/height attributes on images and videos, reserve space for ads and embeds, avoid inserting content above existing content, and use CSS contain property for dynamic elements. Any individual layout shift over 0.25 within a 5-second window contributes to poor CLS scores.

Test every page template using PageSpeed Insights (free, lab data) and verify with Chrome UX Report data in Search Console (real-user field data). Field data takes 28 days to update after fixes.

Mobile-First Indexing and Responsive Design

Google uses the mobile version of every page for indexing and ranking. Desktop-only content that does not appear on mobile is effectively invisible to Google.

Responsive design verification. All content, structured data, and metadata must be identical on mobile and desktop versions. Test using Chrome DevTools (Ctrl+Shift+M) across iPhone 14, Pixel 7, and iPad viewports. Common failures: hidden content behind "read more" expandables (Google may devalue this), missing structured data on mobile templates, and images that do not scale.

Touch target sizing. Interactive elements (buttons, links, form fields) must be at least 48x48 pixels with 8px spacing between them. Failing this triggers a "Mobile Usability" error in Search Console. Test using Lighthouse mobile audit.

Font size and readability. Body text must be at least 16px. Line height should be 1.5-1.6x the font size. No horizontal scrolling at any viewport width. Text should be readable without zooming on a 375px-wide screen.

Structured Data, Hreflang, and Advanced Technical Elements

JSON-LD schema implementation. Implement schema relevant to your content type. Article schema (blog posts, guides), Product schema (e-commerce), LocalBusiness schema (service businesses), FAQ schema (pages with Q&A sections), HowTo schema (tutorial content), BreadcrumbList schema (all pages). Validate with Google's Rich Results Test. Pages with valid schema earn rich results in 40% more SERPs than pages without.

Hreflang for multilingual sites. If your site serves content in multiple languages or targets multiple countries, implement hreflang tags. Every page must reference all language versions including itself. Hreflang annotations go in the <head>, HTTP headers, or sitemap. Mismatched hreflang pairs (Page A references Page B, but Page B does not reference Page A) cause Google to ignore both annotations. Audit using Ahrefs or Screaming Frog hreflang reports.

HTTPS and security. Every page must serve over HTTPS with a valid SSL certificate. Mixed content warnings (HTTP resources on HTTPS pages) break the security indicator and can prevent proper crawling. Check certificate expiration dates and set renewal reminders 30 days before expiry. Redirect all HTTP URLs to HTTPS with 301 redirects.

Redirect management. Avoid redirect chains longer than 2 hops. Each hop adds 100-500ms of load time and leaks 10-15% of link equity. Audit redirect chains quarterly. When consolidating pages, redirect directly from old URL to final destination. Never redirect to a page that itself redirects elsewhere.

BlazeHive generates every page with proper canonical tags, Article and FAQ schema (JSON-LD), optimized heading hierarchy, and structured content that renders correctly on mobile. What it does not do is audit your existing site infrastructure. For that, pair BlazeHive content generation with quarterly technical audits using Screaming Frog ($259/year) or Ahrefs Site Audit.

Common Mistakes

Blocking JavaScript/CSS in robots.txt. If Googlebot cannot render your page, it cannot evaluate content quality or layout. This was common in 2015 and still appears on legacy sites. Remove any disallow rules for CSS and JS files.
No sitemap submitted to Search Console. 23% of sites in a 2025 audit study had no sitemap submitted to Google. Without it, new pages rely entirely on internal link discovery, adding 2-4 weeks to indexing timelines.
Redirect chains of 3+ hops. Every redirect chain bleeds load time and link equity. A 4-hop chain loses 30-50% of the original page's authority. Audit and consolidate to single-hop 301 redirects.
Mixed HTTP/HTTPS internal links. Linking internally to HTTP versions when HTTPS exists creates unnecessary redirects, wastes crawl budget, and adds latency. Run a crawl and update all internal links to HTTPS directly.
Ignoring Core Web Vitals field data. Lab data (Lighthouse, PageSpeed Insights) shows potential issues. Field data (Chrome UX Report via Search Console) shows actual user experience. A page can pass lab tests but fail field data due to real-world network conditions. Monitor both.

Advanced Tips

Run the robots.txt checker monthly to verify no accidental blocks exist for critical pages or resources. A single misconfigured line can deindex an entire directory.
Use the sitemap checker to validate your XML sitemap returns 200 status and contains only indexable canonical URLs. Remove any URLs returning 3xx, 4xx, or 5xx from the sitemap.
Verify every page's canonical tag after CMS updates or theme changes that may reset canonical configurations. Use Google Search Console's URL Inspection tool as a secondary check.
Monitor HTTP status codes across your site weekly using the HTTP status checker. Catch 5xx errors before they trigger Googlebot crawl rate throttling.
Implement preloading for LCP elements (<link rel="preload" as="image" href="hero.webp">) and defer non-critical JavaScript. These two changes alone fix 60% of LCP failures.

A technical SEO audit is not a one-time task. Run this checklist quarterly, after every migration or redesign, and whenever Search Console surfaces new errors. For consistent content production that meets technical requirements from day one, use BlazeHive's automated SEO content generation so you can focus audit time on site-wide infrastructure rather than per-page technical elements.

Frequently Asked Questions

What is a technical SEO checklist?

A technical SEO checklist is a structured list of infrastructure elements that must be correctly configured for search engines to crawl, render, index, and rank your pages. It covers server configuration (HTTPS, status codes, response times), crawlability (robots.txt, sitemaps, internal links), indexability (canonical tags, meta robots, thin content management), performance (Core Web Vitals, page speed, mobile rendering), and advanced elements (structured data, hreflang, redirect management). Unlike on-page SEO (which focuses on content optimization), technical SEO ensures the underlying infrastructure works correctly. A site with perfect content but broken technical foundations will not rank. The checklist should contain 25-35 items minimum, audited quarterly for existing sites and checked immediately after any infrastructure change. Most technical issues are invisible to regular visitors but directly impact how search engines perceive your site.

How often should I run a technical SEO audit?

Run a comprehensive technical audit quarterly (every 3 months) for stable sites. Run one immediately after any CMS migration, hosting change, domain change, redesign, or major content restructuring. Between quarterly audits, monitor Google Search Console weekly for new crawl errors, indexing issues, or Core Web Vitals regressions. Set up automated monitoring for critical elements: SSL certificate expiry, sitemap accessibility, robots.txt changes, and server uptime. Sites publishing over 50 pages monthly should increase audit frequency to monthly because new content introduces new potential technical issues. Small sites (under 100 pages) with stable infrastructure can extend to biannual audits. The cost of a missed technical issue (deindexed pages, crawl budget waste, ranking loss) always exceeds the cost of regular auditing. Budget 4-8 hours per quarterly audit for sites under 10,000 pages.

What are the Core Web Vitals thresholds for 2026?

The three Core Web Vitals thresholds for "good" classification in 2026 are: Largest Contentful Paint (LCP) under 2.5 seconds, Interaction to Next Paint (INP) under 200 milliseconds, and Cumulative Layout Shift (CLS) under 0.1. These are measured at the 75th percentile of real user visits over a 28-day rolling window. "Needs improvement" ranges are: LCP 2.5-4.0s, INP 200-500ms, CLS 0.1-0.25. "Poor" classification applies above those thresholds. Google uses field data from the Chrome User Experience Report (CrUX), not lab data from Lighthouse. A page can score 100 in Lighthouse but still fail field metrics due to real-world network conditions, device capabilities, and geographic latency. Passing all three metrics earns the "page experience" ranking signal boost, which functions as a tie-breaker between pages of similar content quality and authority.

What tools do I need for a technical SEO audit?

A complete technical SEO audit requires four tool categories. Crawling: Screaming Frog ($259/year, industry standard for sites up to 500K pages), Sitebulb ($35-$70/month, visual interface), or Ahrefs Site Audit (included in $99-$999/month plans). Performance: Google PageSpeed Insights (free, lab data), Chrome UX Report via Search Console (free, field data), and WebPageTest (free, detailed waterfall analysis). Indexing: Google Search Console (free, mandatory for all sites), Bing Webmaster Tools (free). Validation: Google's Rich Results Test (schema validation), Mobile-Friendly Test, and robots.txt tester. Total minimum cost: $0 using only free Google tools. Recommended professional setup: $259/year for Screaming Frog plus $99/month for Ahrefs, totaling approximately $1,450/year. This covers 95% of technical audit needs for sites under 100,000 pages.

How do I fix crawlability issues?

Start with Google Search Console's Crawl Stats report (Settings > Crawl Stats). Identify pages returning 4xx or 5xx errors and fix them: update internal links pointing to removed pages, implement 301 redirects for changed URLs, and fix server errors causing 5xx responses. Check robots.txt for overly broad disallow rules blocking important content. Submit an XML sitemap containing all indexable pages. Add internal links to orphan pages (pages with zero incoming internal links). Remove redirect chains longer than 2 hops by redirecting directly to the final destination. For large sites (over 10,000 pages), prioritize crawl budget by noindexing thin pages (tag archives, empty categories, parameter variations) so Googlebot spends time on valuable content. Monitor crawl rate over 30 days after changes. Healthy sites see Googlebot accessing 50-80% of pages monthly. Below 30% indicates crawlability problems requiring investigation.

What is the difference between crawling and indexing?

Crawling is Google discovering and downloading your page content. Indexing is Google adding that page to its searchable database. A page can be crawled but not indexed if Google determines it lacks sufficient quality, duplicates existing indexed content, or contains a noindex directive. Think of crawling as Google reading your page and indexing as Google deciding to remember it. Common reasons a crawled page is not indexed: thin content (under 300 unique words), duplicate content without proper canonical pointing to a preferred version, noindex meta tag, low-quality signals (no inbound links, no organic demand for the topic), or server-side rendering issues preventing content visibility. Monitor the Index Coverage report in Search Console to identify pages Google crawled but chose not to index. These pages need content improvements, canonical consolidation, or removal from your site.

How does robots.txt affect SEO?

Robots.txt controls which URL paths crawlers can access on your server. It does not control indexing. A page blocked by robots.txt can still appear in Google's index if external links point to it (Google indexes the URL without page content). To prevent indexing, use meta robots noindex instead. Robots.txt best practices: allow all CSS, JavaScript, and image files so Google can render pages correctly. Block low-value crawl paths (internal search results, faceted navigation with parameters, admin directories). Do not block your XML sitemap. Include a sitemap reference in robots.txt (Sitemap: https://domain.com/sitemap.xml). Test every robots.txt change using Google Search Console's robots.txt tester before deploying. A misconfigured robots.txt blocking your primary content directory can deindex hundreds of pages within one crawl cycle, typically 24-72 hours for active sites.

What is a canonical tag and when do I need one?

A canonical tag (<link rel="canonical" href="...">) tells search engines which URL is the preferred version when duplicate or similar content exists at multiple URLs. Every page needs a self-referencing canonical tag pointing to its own URL, even if no duplicates exist. This prevents issues from parameter additions, tracking codes, or session IDs creating alternate URLs that dilute ranking signals. Implement canonical tags when: the same content is accessible at HTTP and HTTPS versions, www and non-www variations, URLs with and without trailing slashes, pages with sort/filter parameters, syndicated content published on multiple domains, or paginated content where page 1 should be the canonical for the series. Google treats canonical tags as a strong hint, not a directive. Conflicting signals (sitemap includes non-canonical URL, internal links point to non-canonical, redirect points elsewhere) can cause Google to ignore your preferred canonical. Audit monthly using crawl tools.

How do I check if my pages are indexed?

Three methods, from fastest to most comprehensive. First, site-search: type site:yourdomain.com/specific-page-url in Google. If the page appears, it is indexed. If nothing shows, it is not. Second, Google Search Console URL Inspection tool: paste any URL and Google shows current index status, last crawl date, any issues preventing indexing, and how Google renders the page. Third, Search Console Index Coverage report: shows all indexed pages and all pages with issues (crawled-not-indexed, noindexed, redirected, 404). This report categorizes your entire site into Valid (indexed), Valid with warnings, Excluded, and Error states. Check the Excluded section regularly for pages you expect to be indexed but Google chose to skip. Common exclusion reasons: "Crawled - currently not indexed" (quality too low), "Duplicate without user-selected canonical" (Google chose a different canonical), and "Excluded by noindex tag" (intentional or accidental noindex directive).

What structured data should I implement for SEO?

Implement structured data based on your content type. Every site needs: Article or BlogPosting schema (publication date, author, headline), BreadcrumbList schema (site navigation hierarchy), and Organization schema (brand name, logo, social profiles). Content-specific additions: FAQ schema for pages with question-answer sections (earns FAQ rich results), HowTo schema for tutorial/process content (earns step-by-step rich results), Product schema for e-commerce (earns price/review rich results), LocalBusiness schema for physical locations (earns map pack eligibility), Event schema for dates/venues (earns event listings), and Video schema for pages with video embeds (earns video carousels). Use JSON-LD format exclusively. Validate every implementation with Google's Rich Results Test. Monitor the Enhancements section in Search Console for schema errors. Pages with valid structured data earn rich results in 40% more queries, directly increasing CTR by 20-30% for eligible result types.

How does site speed affect search rankings?

Site speed impacts rankings through two mechanisms. First, directly: Google's page experience signal uses Core Web Vitals (LCP, INP, CLS) as ranking factors. Pages passing all three thresholds receive a tie-breaker ranking boost over slower competitors with similar content quality. The boost is modest but measurable: studies show 1-3 position improvements for pages moving from "poor" to "good" CWV classification. Second, indirectly: slow pages have higher bounce rates and lower engagement. Pages loading in 5+ seconds see 38% bounce rates versus 9% for pages loading under 2 seconds. These behavioral signals affect rankings through user satisfaction metrics. Priority optimization targets: reduce server response time (TTFB under 800ms), compress and properly size images (saves 40-60% load time for image-heavy pages), defer non-critical JavaScript (reduces main-thread blocking), implement browser caching (saves full page loads on return visits), and use a CDN (reduces latency by 30-50% for geographically distributed users).

What is mobile-first indexing and how do I prepare?

Mobile-first indexing means Google uses the mobile version of your website for indexing and ranking. This has been the default for all websites since 2023. If your mobile version has less content, fewer internal links, or missing structured data compared to desktop, Google only sees the mobile version. Preparation checklist: verify all content appears on mobile (no desktop-only sections hidden by CSS), confirm structured data is present in mobile HTML, ensure images load with proper dimensions on mobile, test that all internal links function on mobile, verify meta robots tags match between mobile and desktop, and confirm mobile page speed passes Core Web Vitals. Common failure: responsive sites that hide content behind "show more" expandables on mobile. Google can see this content but may give it reduced weight. Keep critical content visible by default on mobile. Test using Google's Mobile-Friendly Test and Chrome DevTools device emulation. Check Search Console's Mobile Usability report for flagged issues.

How do I handle duplicate content technically?

Duplicate content requires one of four technical solutions depending on the scenario. For exact duplicates at different URLs (www vs non-www, HTTP vs HTTPS): implement 301 redirects from non-preferred to preferred URLs plus self-referencing canonical tags. For near-duplicate pages (product variations, location pages with minimal differences): use canonical tags pointing to the primary version, or add sufficient unique content to differentiate. For intentional duplicates (syndicated content, print versions): use cross-domain canonical tags pointing to the original source. For paginated content (page 1, page 2, page 3 of listings): use self-referencing canonical on each paginated page and implement proper pagination markup. Never use noindex to handle duplicates unless you want the page completely excluded from search. Canonical tags preserve the page's ability to rank while consolidating signals to the preferred URL. Audit for duplicates using Screaming Frog's "Duplicate" detection or Sitemaps comparison in Search Console.

What is crawl budget and does it matter for my site?

Crawl budget is the number of pages Googlebot will crawl on your site within a given timeframe. It matters primarily for large sites (over 10,000 pages). Google determines crawl budget based on two factors: crawl rate limit (how fast Googlebot can crawl without overloading your server) and crawl demand (how important Google considers your site's content). For small sites under 1,000 pages, crawl budget is rarely a concern because Google can crawl the entire site in one session. For large sites, optimize crawl budget by: removing or noindexing low-value pages (thin tag pages, empty categories), fixing redirect chains, ensuring server response times stay under 500ms, submitting XML sitemaps highlighting priority pages, and blocking parameter-heavy URLs in robots.txt. Monitor crawl stats in Search Console. If Googlebot crawls less than 50% of your site monthly and you have important uncrawled pages, you have a crawl budget problem requiring attention.

How do I implement hreflang correctly?

Hreflang tags tell Google which language and geographic version of a page to serve to users in different locations. Implementation requires three things done correctly simultaneously. First, every page must reference ALL language versions including itself. If you have English, French, and German versions, each page needs three hreflang annotations (one for each version). Second, annotations must be reciprocal. If Page A (English) references Page B (French), then Page B must also reference Page A. Non-reciprocal annotations get ignored entirely. Third, use correct language and country codes: hreflang="en-us" for US English, hreflang="en-gb" for UK English, hreflang="fr" for French (all countries). Always include an x-default version for users not matching any specified language. Place annotations in the HTML head, HTTP headers, or XML sitemap. Use only one method consistently. Audit with Screaming Frog or Ahrefs hreflang validator. Broken hreflang causes Google to show wrong language versions in local SERPs, directly losing traffic.

What causes pages to be crawled but not indexed?

When Google Search Console shows "Crawled - currently not indexed," Google visited the page but decided not to add it to the index. This happens for five main reasons. First, content quality: the page has insufficient unique content (under 300 words), or the content is not substantially different from pages already indexed on your site or elsewhere. Second, low authority signals: the page has no inbound links (internal or external), suggesting low importance. Third, thin search demand: Google's systems determine that no user would benefit from this page appearing in search results. Fourth, site-level quality issues: if your domain has many low-quality pages, Google may apply quality filters that affect indexing decisions across the entire site. Fifth, technical rendering issues: JavaScript-dependent content that does not render during Google's crawl gets treated as empty. Fix by: adding substantial unique content (500+ words), building internal links from higher-authority pages, consolidating similar thin pages into one comprehensive page, and ensuring content renders without JavaScript. Resubmit via URL Inspection after fixes.

Content is user-generated and unverified.