My Website Didn't Scrape Properly

Scrape Issues โ€” Common Causes & Fixes โš ๏ธ Pages missing Dynamic JS content or pages behind login Fix: Add pages manually โš ๏ธ Thin content scraped Website has vague copy not useful for AI Fix: Add manual Q&As โš ๏ธ Irrelevant pages imported Cookie notices, T&Cs, nav text imported Fix: Delete from KB โœ“ Scrape completed Review KB tab ยท Check which pages imported โœ“ Re-scrape option After updating your site, run scrape again

Website scraping doesn't always capture everything you expect. Here are the most common scraping issues and exactly how to fix each one.

Issue 1 โ€” Important pages didn't get imported

Why it happens:

  • The page loads content dynamically via JavaScript (common in modern websites and SPA frameworks)
  • The page is behind a login, paywall, or form submission
  • The page URL doesn't appear in any navigation link from your homepage
  • The page is blocked by your website's robots.txt file

Fix: Manually add the specific page URL in the Knowledge Base tab under "Add Website" โ†’ "Add Specific Page". You can add individual URLs one at a time.

Issue 2 โ€” Content was scraped but is too vague

Why it happens: Your website has marketing copy ("We provide excellent service") rather than specific, factual content that the AI can use to answer questions.

Fix: This is a content quality problem, not a scraping problem. Two options:

  1. Add manual Q&A entries for each question that needs a specific answer
  2. Improve your website copy to include specific information, then re-scrape

Issue 3 โ€” Irrelevant pages got imported

Why it happens: The scraper picks up all public pages โ€” including cookie policy pages, privacy policies, navigation text, and legal disclaimers.

Fix: In the Knowledge Base tab, click on each scraped source and review what was imported. Delete any entries that contain irrelevant boilerplate. You can identify these quickly by looking at the preview text.

Issue 4 โ€” Scrape seems to have failed or timed out

Signs: No pages were imported, the import shows 0 pages, or the scrape hangs indefinitely.

Fix:

  1. Check that your URL includes https:// โ€” missing the protocol is a common cause of scrape failure
  2. Try scraping again โ€” temporary connectivity issues can cause one-off failures
  3. Try adding individual page URLs manually if the full domain scrape continues to fail
  4. Check if your website is accessible โ€” sometimes maintenance mode or password protection blocks scraping
๐Ÿ’ก
After any scrape, always review the KB Spend 5 minutes looking at what was imported. This is the fastest way to spot issues โ€” missing pages, irrelevant content, or thin copy โ€” before they affect the agent's answers.