- Source of truth
- The principle that the rendered page itself — not a side-channel file or self-declared metadata — is the authoritative content an engine should read. It is why the GEF Score measures the page rather than declared signals.
- Structured data (JSON-LD @graph)
- Machine-readable schema.org annotations embedded in the page that declare entities (Organization, Article, DefinedTerm, FAQPage) and their relationships. A connected @graph links nodes by @id so engines resolve them as one entity graph instead of disconnected fragments.
- Server-side rendering (SSR)
- Generating a page's HTML on the server so the full content is present in the initial response, readable by crawlers that do not execute JavaScript. A precondition for Accessibility.
- Canonical URL
- The rel="canonical" declaration of the preferred URL for a piece of content, removing duplicate-content ambiguity for crawlers and consolidating signals on one address.
- E-E-A-T
- Experience, Expertise, Authoritativeness, Trustworthiness — the framework for source quality, here extended with machine-verifiable author and entity identity so an engine can confirm who is behind a page.
- Entity clarity (sameAs)
- Making a brand's or author's identity unambiguous to engines by linking the on-page entity to authoritative external references via schema.org sameAs, so the engine knows exactly which entity the page is about.
- Answer-first passage (direct answer)
- A complete, self-contained answer to the page's core question placed early — typically within the first ~120 words — so an engine can extract it without parsing the whole document.
- Chunking (chunk-friendly content)
- Structuring content into self-contained passages an engine can retrieve and cite independently. Contested as a lever: Google states it is not required, because its systems comprehend multiple topics on a page, so the effect is treated as engine-dependent and measured rather than assumed.
- Freshness
- Signals that content is current — dateModified, sitemap lastmod — used by engines as a relevance and quality cue when choosing which source to cite.
- llms.txt
- A proposed plain-markdown file at /llms.txt that points machine consumers to clean markdown versions of key pages, paired with llms-full.txt for the full corpus. Consumed in practice by some coding tools and answer-engine retrieval; not required by, and given no special treatment by, Google Search.
- Content negotiation (markdown for agents)
- Serving a clean markdown representation of a page to machine consumers via the HTTP Accept: text/markdown header or .md endpoints, while humans receive HTML. A standards-based practice (RFC 9110), not cloaking, since it is the same content in a different format.
- AI crawlers (user agents)
- The bots generative engines use to fetch pages — GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google), PerplexityBot (Perplexity). Whether they are allowed or blocked in robots.txt determines whether your content is eligible to be cited at all.
- Retrieval-augmented generation (RAG)
- The pattern where an engine retrieves source documents and grounds its generated answer in them, citing the sources. It is the mechanism by which a generative-engine-friendly page turns into an actual citation.
- Academic Repository PresenceARP
- An exploratory off-page variable measuring whether a brand or author has presence in academic repositories such as OSF or arXiv. Studied as a possible authority signal, it is not part of the core GEF Score.