Files
orion/docs/proposals/poc-content-mapping.md
Samir Boulahtit 27ac7f3e28
Some checks failed
CI / ruff (push) Successful in 15s
CI / pytest (push) Failing after 2h40m46s
CI / validate (push) Successful in 32s
CI / dependency-scanning (push) Successful in 37s
CI / docs (push) Has been skipped
CI / deploy (push) Has been skipped
docs: add nav fix to POC content mapping proposal
E-commerce nav (Products, Cart, Account) shows on hosting POC sites.
Preview mode should render only CMS pages (Services, Projects, Contact)
in the nav, not module-defined e-commerce items.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 21:32:15 +02:00

7.4 KiB

POC Content Mapping — Scraped Content → Template Sections

Problem

The POC builder creates pages from industry templates but the scraped content from the prospect's site doesn't appear meaningfully. The homepage shows generic template text ("Quality construction and renovation") instead of the prospect's actual content ("Depuis trois générations, nous mettons notre savoir-faire au service de la qualité...").

For batirenovation-strasbourg.fr:

  • Scraped: 30 headings, 21 paragraphs, 13 images, 3 contacts
  • Shows on POC: only business name, phone, email, address via placeholders
  • Missing: all the prose content, service descriptions, company history, project descriptions

Current Flow

Scraped content → context dict → {{placeholder}} replacement → CMS pages

Placeholders are limited to simple fields: {{business_name}}, {{phone}}, {{email}}, {{address}}, {{meta_description}}, {{about_paragraph}}.

The rich content (paragraphs, headings, images) is stored in prospect.scraped_content_json but never mapped into the template sections.

Desired Flow

Scraped content → intelligent mapping → template sections populated with real content

Without AI (Phase 1 — programmatic mapping)

Map scraped content to template sections by position and keyword matching:

Template Section Scraped Source Logic
Hero title headings[0] First heading = main title
Hero subtitle headings[1] or meta_description Second heading or meta desc
Hero background images[0] First large image
Features items headings containing service keywords Match headings to service names, use following paragraph as description
About content paragraphs[0:3] First 3 paragraphs = company story
Services content paragraphs matching service keywords Group paragraphs by service
Projects/Gallery images[1:8] Scraped images as gallery
Contact details contacts (email, phone, address) Already working
Social links social_links from scrape Footer social icons

With AI (Phase 2 — Workstream 4)

Send scraped content + template structure to LLM with prompt:

Given this scraped content from a construction company website and this
template structure, generate professional marketing copy for each section.
Rewrite and enhance the original text, keeping the facts but improving
tone and clarity. Output JSON matching the template section format.

AI would:

  1. Extract the company's key selling points from paragraphs
  2. Write a compelling hero tagline
  3. Generate professional service descriptions from raw text
  4. Create an about section from company history paragraphs
  5. Translate to multiple languages
  6. Generate missing content (testimonial placeholders, CTA copy)

Plan — Phase 1 (Programmatic, no AI)

Changes to poc_builder_service.py

1. Enhanced _build_context() — extract more from scraped content

context["hero_subtitle"] = scraped["headings"][1] if len(headings) > 1 else ""
context["hero_image"] = scraped["images"][0] if scraped.get("images") else None
context["about_paragraphs"] = scraped["paragraphs"][:3]
context["all_paragraphs_html"] = "\n".join(f"<p>{p}</p>" for p in scraped["paragraphs"][:8])
context["gallery_images"] = scraped["images"][:8]
context["social_links"] = scraped.get("social_links", {})

2. New _enrich_homepage_sections() — inject scraped data into sections JSON

After placeholder replacement, before saving to DB:

def _enrich_homepage_sections(self, sections: dict, context: dict) -> dict:
    # Hero: use scraped subtitle and image
    if sections.get("hero") and context.get("hero_subtitle"):
        hero = sections["hero"]
        for lang in (hero.get("subtitle", {}).get("translations", {}) or {}):
            hero["subtitle"]["translations"][lang] = context["hero_subtitle"]
        if context.get("hero_image"):
            hero["background_image"] = context["hero_image"]

    # Add gallery section from scraped images
    if context.get("gallery_images") and len(context["gallery_images"]) > 2:
        sections["gallery"] = {
            "enabled": True,
            "title": {"translations": {"en": "Our Work", "fr": "Nos Réalisations"}},
            "images": [{"src": img, "alt": ""} for img in context["gallery_images"]],
        }

    return sections

3. Enrich subpages with scraped paragraphs

Already partially done (appending scraped_paragraphs_html to about/services/projects). Improve by:

  • Using scraped headings as section titles when they match service keywords
  • Distributing paragraphs across pages based on keyword proximity
  • Adding scraped images inline in content pages

Changes to templates

Hero section: support background_image field

{% if hero.background_image %}
<section style="background-image: url('{{ hero.background_image }}'); background-size: cover;">
{% else %}
<section class="gradient-primary">
{% endif %}

Already supported via _gallery.html macro — just need the section data populated.

Changes to storefront base template

When request.state.is_preview is True, render social links from the store's CMS data or from the prospect's scraped social links.

Files to modify

File Change
hosting/services/poc_builder_service.py Enhanced _build_context(), new _enrich_homepage_sections(), better content distribution
cms/platform/sections/_hero.html Support background_image field for scraped hero images
cms/storefront/landing-full.html Ensure gallery section renders
Template JSON files Add {{hero_subtitle}}, {{hero_image}} placeholders

Additional Issue: Storefront Navigation

The storefront base template shows e-commerce navigation (Products, Cart, Account) which makes no sense for a hosting POC (construction company showing "Products" in the menu).

Fix options:

  1. Platform-aware nav — the hosting platform should have its own storefront base template (or at minimum, a nav override) that hides e-commerce menus and only shows CMS pages
  2. CMS-only nav for POC — when is_preview is True, render navigation only from the store's published CMS pages (show_in_header=True), not from module-defined nav items
  3. Storefront nav config per platform — each platform defines which nav items are visible (hosting = CMS pages only, OMS = full e-commerce, loyalty = loyalty-specific)

Recommended: Option 2 — simplest, preview-specific. In storefront/base.html, when request.state.is_preview, replace the entire nav with links to CMS pages:

{% if request.state.is_preview|default(false) %}
    {# Preview mode: show only CMS pages, no e-commerce nav #}
    {% for page in header_pages %}
        <a href="{{ base_url }}{{ page.slug }}">{{ page.title }}</a>
    {% endfor %}
{% else %}
    {# Normal storefront nav #}
    ... existing e-commerce nav ...
{% endif %}

The header_pages variable is already populated from CMS ContentPages where show_in_header=True.

Files to modify (navigation fix)

File Change
app/templates/storefront/base.html Wrap main nav in is_preview check, show CMS-only nav in preview mode

Estimated effort

  • Phase 1 (programmatic content mapping + nav fix): ~3-4 hours
  • Phase 2 (AI): depends on provider integration (deferred)