Category: Blog

  • What Marketers Get Wrong About llms.txt

    What Marketers Get Wrong About llms.txt

    TL;DR: Most marketers think llms.txt is for SEO – it’s not. It’s for making your content cheaper and easier for AI tools to process, reducing token costs and friction for the thousands of AI-powered products your customers actually use.

    I see the discussion about llms.txt pop up all the time among digital marketers, and I think many are a bit confused about its purpose and potential. So let’s shed some light on it:

    What is llms.txt?

    llms.txt is a proposal designed by Jeremy Howard, an AI researcher, not a digital marketer! Let’s have a look at the definition:

    A proposal to standardise on using an /llms.txt file to provide information to help LLMs use a website at inference time.

    The key insight: Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.

    The Common Misconception

    As you can see, the standard is talking about how websites can provide a format that is simple to read for an LLM. It is NOT talking about improving visibility in Google AI Overviews or Perplexity!

    Let’s be frank: Google, Perplexity and ChatGPT don’t need llms.txt to index your page and remove all the boilerplate (ads, cookie banners, etc.). They already have sophisticated pipelines in place to do exactly that.

    The REAL Marketing Opportunity

    So why do I think marketers are missing a huge potential?

    If you step back a bit from the AI search engines, you will most certainly have noticed that every single product on the market now advertises having AI integrated.

    These tool providers usually don’t have their own search index. Certainly, they could use the search functionality of the LLM provider and/or use a proper scraper to read your website, but that is expensive for the company (e.g. $14/1k searches on Gemini, $10/1k searches on Anthropic) and is also often not useful when you want to access a specific page.

    Key Benefits for Marketers

    • Reduced friction: Not providing an llms.txt makes it more difficult and expensive for LLMs used in AI tools to read and reason about your page!
    • Better performance: A cleaned up markdown (without ads, header/footer, cookie banner, newsletter signup) has a significantly lower token count, making the LLM answer faster and more accurate about your content
    • Cost efficiency: Reduces processing costs for AI tools integrating your content

    Smart Companies Are Already Doing This

    That’s why many SaaS companies have started to add a ‘copy to markdown’ button to their website, which makes it very easy to copy/paste content into whatever AI tool the user is currently using.

    Bottom Line for Marketers

    llms.txt isn’t about SEO or search rankings – it’s about making your content more accessible and cost-effective for the growing ecosystem of AI-powered tools that your customers are using every day. By implementing it, you’re reducing friction for AI integration and positioning your content for the future of how people interact with information.

  • What are vector embeddings? What AIO managers need to know

    What are vector embeddings? What AIO managers need to know

    tl;dr: vector embeddings capture semantic meaning; cheap to compute, used in initial content ranking; good first quality indicator

    Key facts:

    • Vector embeddings encapsulate the meaning of your content
    • Embeddings are extremely cheap to compute
    • Therefore used in the initial content ranking for AI
    • Should be viewed as an indicator rather than ground truth

    Vector embeddings & cosine similarity are all the fuzz now in the AIO social sphere, so here is a quick explainer:

    1. Vector embeddings explained (No PhD required)

    Specialised AI models called encoder models read your content and compress its meaning into mathematical vectors (lists of numbers). Think of it as a massive neural network that is trained on billions of documents to summarise the entire content into a bunch of numbers, the vector space.

    Here is how this looks like for a 2D vectors:

    Illustration of vector embeddings with cosine similarity

    Content about the same topic cluster together in the vector space – even if they use completely different words.

    Real example:

    • ‘What is email marketing automation?’ → [0.34, 0.78]
    • ‘Top 15 newsletter automation tools’ → [0.31, 0.82]

    But how do we measure whether two vectors are similar?

    This is where cosine similarity comes into play. Cosine similarity is a measure of the similarity between two vectors(0 to 1 score). If your article about ‘CRM software’ gets 0.89 similarity with the query ‘customer management tools’, you’re on the right track!

    2. Where vector embeddings fail

    Vector embeddings are fast and cheap – perfect for initial screening. But they have two major blind spots:

    Problem #1: Terrible at exact keyword matching

    • ‘iPhone 15’ vs ‘iPhone 16’ = almost identical embeddings
    • ‘React developer’ vs ‘Vue developer’ = high similarity score
    • ‘2023 data’ vs ‘2024 data’ = embeddings can’t tell the difference

    That’s why AI systems typically pair embeddings with traditional keyword matching algorithms such as BM25.

    Problem #2: Can’t validate search intent

    Content embeddings are calculated independently of user queries. They don’t know:

    • Whether your content actually answers the specific question
    • If the user wants a tutorial vs comparison vs definition
    • Whether your ‘beginner guide’ matches a ‘advanced techniques’ query

    Enter rerankers: These AI models look at both your content and the specific query to determine: ‘Does this actually answer what the user asked?’

    Modern AI search engines such as Perplexity therefore use a combination of embeddings, traditional matching algorithms and rerankers:

    1. Embeddings + BM25: Fast filtering for semantic + keyword relevance
    2. Rerankers: Expensive but precise – validates actual search intent match

    3. How AI search uses vector embeddings

    You may be shocked to hear that vector embeddings are not new, but have been used by Google as a ranking factor as early as 2019!

    Computing embeddings is extremely fast and cheap. Using cosine similarity gives a good first estimate of whether the content is relevant to the search intent of the user.

    AI search engines like Perplexity therefore use embeddings (alongside other ranking factors) to create an initial candidate set of content that is fed to the AI model to provide the final answer.

    4. What this means for your AI optimisation strategy

    For you as an AIO manager this means:

    • Embeddings are an important indicator for AI search engines whether your content is relevant for the search intent
    • Different AI search engines use multiple, different embeddings
    • Cosine similarity values should, however, not be overstated

    Summary

    Vector embeddings are the semantic DNA of your content – AI compresses meaning into numbers that cluster similar topics together. While fast and cheap for initial screening, they fail at exact keywords and can’t validate search intent alone. Modern AI search uses embeddings + BM25 + rerankers for complete relevance matching. Bottom line: important relevance indicator, but don’t overstate cosine similarity scores.

    Pro tip: You can check semantic similarity of your content with tools like searchattention.