Website scraping doesn't always capture everything you expect. Here are the most common scraping issues and exactly how to fix each one.
Issue 1 โ Important pages didn't get imported
Why it happens:
- The page loads content dynamically via JavaScript (common in modern websites and SPA frameworks)
- The page is behind a login, paywall, or form submission
- The page URL doesn't appear in any navigation link from your homepage
- The page is blocked by your website's robots.txt file
Fix: Manually add the specific page URL in the Knowledge Base tab under "Add Website" โ "Add Specific Page". You can add individual URLs one at a time.
Issue 2 โ Content was scraped but is too vague
Why it happens: Your website has marketing copy ("We provide excellent service") rather than specific, factual content that the AI can use to answer questions.
Fix: This is a content quality problem, not a scraping problem. Two options:
- Add manual Q&A entries for each question that needs a specific answer
- Improve your website copy to include specific information, then re-scrape
Issue 3 โ Irrelevant pages got imported
Why it happens: The scraper picks up all public pages โ including cookie policy pages, privacy policies, navigation text, and legal disclaimers.
Fix: In the Knowledge Base tab, click on each scraped source and review what was imported. Delete any entries that contain irrelevant boilerplate. You can identify these quickly by looking at the preview text.
Issue 4 โ Scrape seems to have failed or timed out
Signs: No pages were imported, the import shows 0 pages, or the scrape hangs indefinitely.
Fix:
- Check that your URL includes
https://โ missing the protocol is a common cause of scrape failure - Try scraping again โ temporary connectivity issues can cause one-off failures
- Try adding individual page URLs manually if the full domain scrape continues to fail
- Check if your website is accessible โ sometimes maintenance mode or password protection blocks scraping