Help

Image Extraction

The Image Extraction tab configures how the system chooses a representative image for each page. That image is used in search results and in agent citations.

The crawler first checks standard metadata (e.g. og:image). If none is present or you want to override it, you can use Image URL Patterns and/or Image XPaths (under Advanced). → Open Crawler Settings (Image Extraction tab)


Enable AI Auto-Detection

When enabled, the system automatically picks a representative image for pages that don’t have suitable og:image (or similar) metadata. It runs after each crawl: it learns global images to exclude (logos, icons), then uses heuristics to pick the best content image per page.

  • On — Use auto-detection when metadata is missing or you want a fallback.
  • Off — Rely only on metadata and any Image URL Patterns or Image XPaths you configure.

Image URL Patterns

URL substrings that identify product or content images (e.g. /images/products/, /uploads/). The first image whose URL contains any of these patterns is used as the page thumbnail. This is more flexible than XPaths and also considers background-image styles.

  • Add one pattern per row.
  • Order matters: the first matching image wins.
  • Leave empty if you only use metadata and/or XPaths.

Advanced: Image XPaths

XPath expressions used to find image URLs on pages that lack standard og:image (or similar) metadata. They are evaluated in the order you list them. Image URL Patterns are tried before XPaths.

  • Examples:
  • //img[@id='main_image']/@src
  • //div[@class='article-image']/img/@src
  • Use when your pages have a consistent structure but no og:image. Add one XPath per row and reorder as needed.

← Back to Crawler settings overview

You have unsaved changes