Help

Overview Admin Chat UI Design Curated Answers Search Settings Conversational Intelligence Data Sync Upload Documents Human Handoff Admin Console Authorisation Contact Support

Crawler Settings Image Extraction

Airgentic Help

The Image Extraction tab configures how the system chooses a representative image for each page. That image is used in search results and in agent citations.

The crawler first checks standard metadata (e.g. og:image). If none is present or you want to override it, you can use Image URL Patterns and/or Image XPaths (under Advanced). → Open Crawler Settings (Image Extraction tab)

UI Design screen


Enable AI Auto-Detection

When enabled, the system automatically picks a representative image for pages that don’t have suitable og:image (or similar) metadata. It runs after each crawl: it learns global images to exclude (logos, icons), then uses heuristics to pick the best content image per page.

  • On — Use auto-detection when metadata is missing or you want a fallback.
  • Off — Rely only on metadata and any Image URL Patterns or Image XPaths you configure.

Image URL Patterns

URL substrings that identify product or content images (e.g. /images/products/, /uploads/). The first image whose URL contains any of these patterns is used as the page thumbnail. This is more flexible than XPaths and also considers background-image styles.

  • Add one pattern per row.
  • Order matters: the first matching image wins.
  • Leave empty if you only use metadata and/or XPaths.

Advanced: Image XPaths

XPath expressions used to find image URLs on pages that lack standard og:image (or similar) metadata. They are evaluated in the order you list them. Image URL Patterns are tried before XPaths.

  • Examples:
  • //img[@id='main_image']/@src
  • //div[@class='article-image']/img/@src
  • Use when your pages have a consistent structure but no og:image. Add one XPath per row and reorder as needed.

← Back to Crawler settings overview

You have unsaved changes