Prompt Pipeline
===============

The prompt rewrite described in ``REWRITE-TASK.md`` is now the live runtime
architecture.

Synthetic Heart no longer treats prompt assembly as a single pretty-printed
JSON blob that every engine must parse as text. The canonical pipeline is now:

1. ``core.plugin_instance`` calls ``core.prompt_engine.build_prompt_request()``.
2. The prompt builder assembles a typed ``PromptRequest`` plus compatibility
   data for legacy callers.
3. Engines render that typed request into their native transport format with a
   renderer from ``core.prompt_renderers``.
4. The model returns either plain text or native tool/function calls, which are
   normalized back into SyntH's ``{"actions": [...]}`` format.

Compatibility status
--------------------

- ``build_prompt_request()`` is the canonical prompt builder.
- ``build_json_prompt()`` still exists as a deprecated alias for backward
  compatibility.
- The legacy dict payload is still returned today, but it now carries the typed
  request under ``__prompt_request`` so migrated engines can use the new path
  immediately.
- Engines may also accept a ``PromptRequest`` object directly.

PromptRequest
-------------

``core.prompt_request.PromptRequest`` is the engine-agnostic intermediate
representation. It splits prompt state by stability so renderers can preserve
conversation structure and enable prompt caching where the provider supports it.

Stable fields:

- ``system_instruction``: persona, safety rules, and high-level response rules.
- ``tool_declarations``: tool manifests derived from the actions registry.

Moderately stable field:

- ``context_summary``: diary, memories, cross-chat recaps, and participant
  summaries formatted as plain text.

Dynamic fields:

- ``conversation_history``: parsed user / assistant turns for the active chat.
- ``current_text``: the current user turn.
- ``runtime_ctx``: timestamp, scope, language, tone, emotions, interface data,
  and grillo flags.
- ``attachments``: multimodal payload metadata for the current turn.
- ``reply_to``: optional reply metadata.

Modes
-----

The rewrite is not chat-only. The builder now produces different
``PromptRequest.mode`` values for different runtime surfaces.

``chat``
   Standard message processing with full conversation history and context.

``grillo``
   Internal autonomous beats. These omit normal conversation history and use a
   minimal context summary.

``delivery``
   Auto-response delivery prompts created by
   ``core.prompt_engine.build_delivery_request()``. These contain persona,
   delivery instructions, action outputs, and only ``message_*`` tools.

``live``
   Live voice prompts created by ``core.prompt_engine.build_live_prompt_request()``
   and rendered to one flat instruction string for live sessions.

Renderers
---------

``core.prompt_renderers`` contains the provider-specific renderers.

``OpenAIRenderer``
   Baseline renderer for OpenAI-compatible chat-completions APIs. Produces
   ``messages`` arrays and optional tool schemas.

``AnthropicRenderer``
   Produces Anthropic Messages payloads. The stable system block is emitted
   with ``cache_control`` when ``ENABLE_PROMPT_CACHING`` is enabled.

``GeminiRenderer``
   Produces Gemini-native ``system_instruction_text`` + ``contents`` payloads
   and Gemini function declarations.

``TextRenderer``
   Compact fallback for engines that cannot consume structured conversation
   turns. This is still smaller than the old indented JSON blob path.

``LiveRenderer``
   Flattens ``PromptRequest(mode='live')`` into the plain-text instruction used
   by live voice callers.

Where the new path is used
--------------------------

The rewrite is already active in the main engine families:

- ``engines/external_engines/openapi.py`` uses ``OpenAIRenderer``.
- ``engines/external_engines/openrouter.py`` uses ``OpenAIRenderer``.
- ``engines/external_engines/anthropic.py`` uses ``AnthropicRenderer``.
- ``engines/external_engines/gemini_api.py`` uses ``GeminiRenderer``.
- ``core/external_endpoints/bridges/cortex_bridge.py`` uses
  ``OpenAIRenderer`` for external endpoint Cortex engines.
- ``core.auto_response.AutoResponseSystem`` attaches
  ``PromptRequest(mode='delivery')`` to action-result deliveries.
- ``core.prompt_engine.build_live_system_instruction()`` now renders a
  ``PromptRequest(mode='live')`` through ``LiveRenderer``.

Multimodal handling
-------------------

Attachments are no longer documented as a giant base64 blob living inside a
 single user message by default. The typed pipeline keeps attachment metadata in
 ``PromptRequest.attachments`` and migrated engines extract native multimodal
 parts before sending the request to the provider.

This keeps text prompts smaller and avoids duplicating heavy binary payloads in
 both the text and multimodal layers.

Operational notes
-----------------

- ``instructions_verbose`` and ``instructions`` still exist in the compatibility
  dict for legacy callers, but renderer-backed engines treat
  ``PromptRequest.system_instruction`` as the canonical system prompt.
- ``system_message`` payloads are still used for correction and delivery flows.
  They coexist with ``__prompt_request`` during the transition.
- Debugging may still show compatibility prompt dicts in logs, but the runtime
  decision point for modern engines is the typed prompt object.

See also
--------

- ``core/prompt_request.py``
- ``core/prompt_renderers.py``
- ``core/prompt_engine.py``
- ``core/auto_response.py``
- ``engines/external_engines/openapi.py``
- ``engines/external_engines/anthropic.py``
- ``engines/external_engines/gemini_api.py``