Crawler + AEM: How it Works

Modified on Mon, 9 Dec at 4:09 PM

In addition to offering support for the JavaScript deployment (using highlight.js) Schema App supports using its web crawler to generate the schema markup. The Crawler should be used when one is interested in no impact to page speed and can manage schema markup updating less frequently. 


The setup for the AEM connector only differs in the configuration you choose between Crawler and JavaScript. The operation of the Schema App connector only differs in that it does not include the highlight.js script on the page and will not request highlight.js produced schema markup from Schema App's CDN. The flow of the integration is shown below. 



The Crawler will run on a schedule typically once a week where it will start and crawl your site. It will use a combination of the sitemap and links extracted from the pages. If no sitemap is present it will start on the homepage and pull links to reach the rest of the site.


It attempts to match templates created in the Schema App Highlighter as it parses each page. If a template is matched Schema Markup is produced and sent to Schema App servers where it is converted to JSON-LD. The JSON-LD is then stored on S3 that is attached to a CloudFront CDN.


The Schema App connector in AEM will send requests on a schedule to the CDN and pull schema markup that is produced by the crawler, if the schema markup has changed it will be updated in the page and pushed to out the dispatcher.

No action is required in the front end to support this feature.

Cache Validation
The site is crawled on a schedule and updates the CDN, the code gets to the page via a call to the CDN from AEM. In generals, it will not be necessary to invalidate the dispatcher cache. Instead, the solution relies on the dispatcher refreshing its cache when content is being republished and the CDN respecting expiration headers.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article