# PROMPT.md — Voice-Preserving Style Analysis for Academic Talks

A reusable, four-phase prompt for getting an LLM to (a) elicit who you are and who you talk to, (b) characterise your distinctive presentational voice across a corpus of past talks, (c) split that characterisation by audience, and (d) produce style guides that can be loaded selectively when authoring or revising future talks. Designed to **preserve voice during revision** rather than smooth it into a generic register.

## What you'll end up with

Markdown files in a `voice/` directory:

- `customisation.md` — your role, voice markers, calibration speakers, and audiences (produced by Phase 0)
- `style-v1.md` — first-pass thematic analysis of your corpus (produced by Phase 1; reviewed by you before Phase 2)
- `style-global.md` — voice immutables, argument structure, visual conventions; loaded for every talk
- `style-[audience].md` — one file per audience you define (e.g. `style-students.md`, `style-conference-researchers.md`, `style-departmental-colleagues.md`)
- `style-mechanical.md` — voice-neutral technical reference; loaded alongside the voice guides when authoring. Content adapts to your platform (Quarto/RevealJS, Beamer/LaTeX, PowerPoint, Keynote, Google Slides, etc.) — see **A note on platforms** below.

Each voice file contains a **"Things to NEVER strip out when revising"** section so the model has an explicit envelope around your voice when editing.

Files are kept in Git; previous versions can be overwritten without renaming.

## Prerequisites

- A corpus of at least 5–6 past talks. Source files preferred where they exist (`.qmd`, `.tex`, `.md`); binary slide formats (`.pptx`, `.key`, `.odp`) work too — see platform notes below. More files is better; recency matters more than volume.
- Rendered output (PDF, HTML) is optional but useful for visual observations.
- A `voice/` directory in your project root.

## A note on platforms

The voice analysis (Phases 0, 1, 2) is platform-agnostic — narrative arc, register, recurring phrases, and audience-tuning don't depend on what software produced the slides. Only the **mechanical** file changes shape by platform.

**If your slides are in Quarto / RevealJS / Beamer / Markdown:** source files are plain text. The model reads them directly. The mechanical file can be rich — YAML conventions, CSS variables, layout primitives, fragment patterns, citation syntax. This is what the Worked Example below illustrates.

**If your slides are in PowerPoint (`.pptx`):** the file is a ZIP archive of XML. The model can read structured content several ways — pick whichever its tools support:

- Extract text and notes via `python-pptx` (cleanest if Python is available)
- Unzip the file directly: slide content lives in `ppt/slides/slide*.xml`, notes in `ppt/notesSlides/notesSlide*.xml`, theme in `ppt/theme/theme1.xml`, masters in `ppt/slideMasters/`
- Render to PDF/PNG and read visually (best for layout/visual analysis; weakest for speaker notes)

The mechanical file for PowerPoint should cover what's *actually* in the corpus, e.g. **slide master and layout choices** (which built-in layouts recur — Title and Content, Two Content, Section Header, Comparison, Picture with Caption — and any custom layouts), **theme colours** (which palette slot is used for what), **font scheme** (heading vs. body), **build animations and transitions** (which are used, on what content), **image placement patterns** (full-bleed via background fill, picture placeholders, freeform), **table and chart styling**, **speaker notes conventions**, and **footer/logo placement**. Drop everything that doesn't apply (no YAML, no CSS, no `r-stack`).

**If your slides are in Keynote (`.key`) or LibreOffice Impress (`.odp`):** also ZIP archives, similar approach. Keynote is harder to parse without macOS tools — exporting to PPTX or PDF first is usually the path of least resistance.

**If your slides are in Google Slides:** export the corpus to PPTX or PDF and treat as PowerPoint.

Whatever the platform, the **voice files are the same** — only `style-mechanical.md` shifts vocabulary.

---

# Phase 0: Interview (→ `voice/customisation.md`)

This phase elicits everything the later phases need: your role, your voice markers, the speakers you want to calibrate against, and — most importantly — your audiences. Most users haven't articulated all of this explicitly, so an interview is more reliable than a fill-in-the-blank form. (If you'd rather skip the interview and write it yourself, see the **Customisation file structure** appendix at the end of this document and write `voice/customisation.md` directly.)

## Phase 0 prompt

> Conduct a structured interview to help me articulate the inputs needed for the rest of this workflow. Save your output incrementally to `voice/customisation.md` — append as we go, so we don't lose progress. The final file must follow the structure in the **Customisation file structure** appendix in `voice/PROMPT.md`.
>
> Conduct the interview conversationally. Ask one to three related questions at a time. Probe vague or thin answers — "substantive" or "I want to engage them" are not specific enough to work with. Read answers back periodically and ask "is that right?" before moving on. Allow me to skip questions that don't apply (e.g. brand colour, font stack).
>
> **Sections to cover, in this order:**
>
> **A. Who you are.** Open with: "What's your role and field, in two sentences?" Then probe: where you trained, whether you came to academia from elsewhere, what makes your professional positioning unusual or specific.
>
> **B. Voice markers.** This is the hardest section — most users can't name their voice cold. Ask in stages:
> - "What kinds of phrases or moves do colleagues recognise as 'yours'? Anything they'd quote back at you?"
> - "What kind of humour, if any, do you use? Dry irony, observational, self-deprecating, deadpan, none?"
> - "What kinds of references do you reach for — pop culture, literary, historical, methodological, none?"
> - "Are there positions you hold that aren't standard in your field? Any 'this time it's different' / 'most people in my field assume X but I think Y' moves?"
> - "Is there a register you actively avoid? (Corporate, motivational, jargon-heavy, etc.)"
>
> Synthesise their answers into 2–4 phrases that capture what's distinctive — read them back for confirmation.
>
> **C. Stack notes (optional, fast).** Brand colour, font stack, citation style, anything technical worth carrying through. Skip cleanly if they don't have preferences.
>
> **D. Calibration speakers.** Likely the trickiest because most people don't have a list of names ready. Approach in this order:
> - "Whose talks have you walked out of and thought 'I want to talk like that'?"
> - If thin: "Think across categories — a historian, a quantitative-data communicator, a theorist, a policy-translator, an industry voice, a teacher. Anyone come to mind in any of those?"
> - If still thin: offer a small menu of candidates spanning registers (e.g. Mary Beard, Stuart Hall, Hans Rosling, Tim Harford, Richard Feynman, Brené Brown, Simon Sinek, Adam Curtis, Doreen Massey) and ask which feel close to what they want.
> - Aim for three to six names.
>
> Then ask explicitly about TED. **Frame it as a real choice, not a moral one:** "TED-style stylists — Brené Brown, Simon Sinek, Hans Rosling at his most TED-ish — produce a different kind of guide. Suggestions calibrated against them lean toward narrative architecture, dramatic reveals, and tight emotional pacing. Some users want that. Some find it incompatible with academic substance. Where do you land?" Capture their answer verbatim.
>
> **E. Audiences.** This is the section the rest of the workflow most depends on. Approach in stages:
> 1. "Who do you give talks to? List the kinds of rooms — don't worry about being exhaustive yet."
> 2. From their list, identify which audiences have *materially different expectations and registers*. Group or split as needed: "You said 'students' and 'undergrads' — is that the same audience or two? You said 'conferences' and 'industry' — different rooms?"
> 3. For each audience that survives consolidation, ask:
>    - "Describe them in 2–4 sentences. Who they are, what they know about you and your work coming in, what register lands and what doesn't, what (if any) running jokes or set pieces translate."
>    - "Where in your corpus do these talks live? Folders, globs, or specific files. (e.g. `casa/`, or `*` excluding `casa/`, or just `dcdc22/index.qmd`)"
>    - "Which 2–4 specific talks best exemplify this audience? These will be the anchors I weight most heavily."
>
> Aim for two to five audiences. More than five usually means the model is being asked to make distinctions too fine to be useful.
>
> **At the end of the interview**, write the full `voice/customisation.md` in the structure specified in the appendix. Read it back as a final check: "Here's what I've captured. Anything to change before we run Phase 1?"

The output of Phase 0 is `voice/customisation.md`. Phases 1 and 2 read from this file.

---

# Phase 1: Corpus Analysis (→ `voice/style-v1.md`)

Run after `voice/customisation.md` exists. Produces a single thematic analysis file that you'll review before running Phase 2.

## Phase 1 prompt

> Read `voice/customisation.md` first — it tells you who I am, who I talk to, and how to calibrate. Use it to inform your analysis.
>
> Analyse my corpus of past talks across all audiences defined in the Audiences section of `voice/customisation.md`. Read at least the 8–10 most recent talks across audiences.
>
> **Reading approach by file type:**
> - Plain-text source (`.qmd`, `.tex`, `.md`, `.rmd`): read directly. Prefer source over rendered output for content analysis.
> - PowerPoint (`.pptx`), Keynote (`.key`), Impress (`.odp`): these are ZIP archives. Extract via `python-pptx` if available, or unzip and read `ppt/slides/slide*.xml` + `ppt/notesSlides/notesSlide*.xml` for content and notes; `ppt/theme/theme1.xml` and `ppt/slideMasters/` for theme/layout patterns. For visual layout, also render a few to PDF/PNG and read visually.
> - Rendered output (HTML, PDF) for any platform: scan a few for visual pattern confirmation.
> - Shared config: read CSS/SCSS, `_quarto.yml`, `.tex` preambles, or PowerPoint slide masters as relevant.
>
> **Weighting:** Weight recent files more heavily than older ones. Note where you observe evolution across time. Make sure each audience defined in `customisation.md` is represented in your reading.
>
> **Themes to characterise:**
> 1. Narrative approach (how I open, build tension, structure the arc)
> 2. Word choice and voice (register, characteristic phrases, humour, hedging patterns, rhetorical moves)
> 3. Slide structure (typical anatomy, layout primitives, recurring patterns)
> 4. Visual style (colour, image placement, branding, recurring assets)
> 5. Use of interactive/computational elements
> 6. Typography (heading style, emphasis patterns, blockquotes)
> 7. Metadata conventions (YAML, footers, dates)
> 8. Self-referential / institutional positioning
> 9. Data and evidence style (citation, uncertainty, tables vs. charts)
> 10. Recurring motifs or phrases
> 11. **How patterns differ across the audiences I defined.** Where do conventions change? Where do they hold?
>
> Add additional themes if you find something significant.
>
> **Output:** Write to `voice/style-v1.md`. Format: brief preamble (corpus, date range, weighting, audience coverage) → one section per theme (`##` heading) → 3–8 specific, cited observations under each (quote the source file and approximate location) → a "Cross-audience differences" section explicitly noting what varies → a final "Candidate rules for a style guide" section distilling the most consistent patterns into imperative statements.
>
> Be specific and evidence-based. Every claim should be backed by at least one example from the corpus. Aim for 1,200–1,800 words.

After v1 is written, **read it carefully**. Note where it's right, where it's wrong, and what's missing. Phase 2 is where you encode your judgements.

---

# Phase 2: Voice-First Split (→ one file per audience + global + mechanical)

Run after reviewing `style-v1.md`. Addresses two problems with v1: (a) it usually over-indexes on mechanical formatting at the expense of voice, and (b) a single guide can't be loaded selectively.

## Phase 2 prompt

> Read `voice/customisation.md` and `voice/style-v1.md`. Use them to write voice-first style guides plus one technical reference. Goal: I want to be able to load only the relevant context when revising a talk for a specific audience, and I want the rules to encode my *voice* (so suggested edits don't strip it out) rather than the mechanics.
>
> **Files to write into `voice/`:**
>
> 1. `style-global.md` — applies to every talk regardless of audience
> 2. **One file per audience defined in the Audiences section of `customisation.md`.** Filename convention: `style-[audience-slug].md`, where the slug is the audience heading lowercased and dasherized (e.g. "Departmental colleagues" → `style-departmental-colleagues.md`). Read each `### [Audience]` block — the description, scope, and anchor talks tell you what to include in that file.
> 3. `style-mechanical.md` — voice-neutral technical reference (YAML, layout primitives, CSS classes, citation conventions). Loaded alongside the voice guides; not a voice document.
>
> If a previous version of any of these files exists, **overwrite it**. Versioning is handled by Git.
>
> **Each voice guide must contain:**
>
> - **Scope** (when to load this file — quote the audience description from `customisation.md`)
> - **The voice (immutable core)** — the non-mechanical things that make a talk identifiably mine for this audience. Use the voice markers from `customisation.md` as the spine. Note where the voice differs across audiences.
> - **Argument and structure** — how I build a talk for this audience
> - **Visual / typographic conventions** (briefly; the technical file has the detail)
> - **Things to NEVER strip out when revising** — an explicit list. This is the envelope around my voice. Treat anything in this list as load-bearing during later revisions; flag rather than silently change.
> - **Suggestions for improvement** — engagement and technique suggestions calibrated against the speakers named in the Calibration section of `customisation.md`. Honour my stated position on TED-style stylists. Three to ten suggestions per file, treated as a menu rather than prescriptions.
>
> **Calibration rules:**
>
> - Prioritise voice and argument over mechanics. A rule like "open with provocation, not agenda" is more important than "use `r-fit-text` only on dividers".
> - When v1 has both voice rules and mechanical rules, voice rules go in `style-global.md` / audience files; mechanical rules go in `style-mechanical.md`.
> - The "things to NEVER strip out" sections must be specific and unambiguous — they are constraints, not aspirations.
> - For improvement suggestions, name the speaker or technique you're calibrating against. "Beard does this — she names the contrary view before refuting it" beats "consider naming counterarguments".
> - Honour my TED position from `customisation.md`. If avoid: do not import that register. If approving: draw on the named TED speakers explicitly.
>
> **Length:** 700–1,500 words per voice file; 1,000–2,000 for the mechanical file.
>
> Also write `style-mechanical.md` — voice-neutral, content adapted to my platform. Pick the right vocabulary based on the corpus you read:
>
> **For Quarto / RevealJS / Beamer / Markdown corpora:** YAML metadata patterns (with snippets, per-audience variation if relevant), section dividers, layout primitives, reveals (incremental/fragment/auto-animate), speaker notes convention, image conventions and recurring asset registry, citation/footnote conventions, table conventions, code blocks and inline computed values, typography stack as CSS variables, recurring class names, closing-slide patterns, **what NOT to use** (patterns from the broader ecosystem absent from my corpus), and suggestions for technical improvement (tooling, automation, includes, linters).
>
> **For PowerPoint / Keynote / Impress corpora:** slide master and layout choices (which built-in layouts recur, any custom layouts), theme colour usage (which palette slot does what), font scheme (heading vs. body, recurring typographic choices), build animations and slide transitions (which are used, on what content), image placement patterns (full-bleed background fills, picture placeholders, freeform), table and chart styling conventions, speaker notes conventions, footer/logo/slide number placement, recurring asset registry (logos, recurring images, master-slide elements), **what NOT to use** (animation effects, layouts, or styling habits absent from my corpus), and suggestions for technical improvement (slide master cleanup, layout consolidation, custom theme file, PPTX-template extraction).
>
> Use snippets, examples, or screenshot descriptions liberally — this is a reference, not an essay. Drop sections that don't apply to my platform; don't invent conventions the corpus doesn't actually use.

---

## What to keep generic (don't customise these)

Structural elements that make the workflow work:

- **The four-phase workflow.** Phase 0 (interview) → Phase 1 (analysis) → review → Phase 2 (split). Skipping Phase 0 produces thin guides; skipping the review between Phase 1 and Phase 2 produces guides that smooth your voice into a generic shape.
- **The output structure.** Global + per-audience + mechanical. Adding or removing this layering destroys the selective-context property.
- **The "Things to NEVER strip out" pattern.** This is the load-bearing concept that makes the guides useful as a *constraint* on revision, not just a description.
- **Calibrating improvement suggestions against named speakers.** The named comparison is what keeps the suggestions concrete and field-appropriate.
- **The customisation file as the single source of truth for who you are and who you talk to.** Phase 1 and Phase 2 both read from it; if you change your audiences later, edit `customisation.md` and re-run Phase 2.

## On the calibration speakers (be deliberate)

The improvement suggestions are only as useful as the speakers you calibrate against. There is **no neutral default** — different speakers produce different guides, and they will all sound coherent. Choose deliberately.

Axes to consider:

- **Substantive vs. stylistic:** Speakers known for argument vs. for delivery. Both produce real engagement. They produce different guides.
- **Field-adjacent vs. field-distant:** Adjacent speakers transfer techniques readily; distant ones introduce moves you wouldn't otherwise consider.
- **Register variety:** Pick across registers (one historian, one quantitative-data communicator, one theorist, one policy-translator) so suggestions don't collapse into one mode.
- **TED-style vs. other:** A real choice, not a moral one. Some users genuinely admire TED stylists (Brené Brown, Simon Sinek, Hans Rosling at his most TED-ish) — naming them gets suggestions in that register. Others find the emotional architecture incompatible with academic substance — say so explicitly so the model doesn't default to it. Either is valid; **the harm is in picking unconsciously**.

The Phase 0 interview surfaces this choice explicitly so it doesn't get made by accident.

## On versioning

Files are kept in Git. Previous versions are accessible via `git log` and `git show`. Overwrite freely:

- Phase 0 writes `voice/customisation.md`
- Phase 1 writes `voice/style-v1.md` (the analysis pass — kept named for clarity as your "before" snapshot)
- Phase 2 writes `voice/style-global.md`, `voice/style-[audience-slug].md`, `voice/style-mechanical.md` (no version suffix)

When revising the guides later, overwrite the same filenames. Use Git for history.

---

## Use after the guides exist

When asking the model to revise a talk, instruct it to:

1. Read `voice/style-global.md` and the relevant `voice/style-[audience].md` (pick the audience based on where the talk lives or where it's being delivered)
2. Read `voice/style-mechanical.md` if the revision is structural rather than purely textual
3. Treat the "Things to NEVER strip out" sections as binding constraints
4. Treat the "Suggestions for improvement" sections as a menu — apply only when explicitly invited

Example revision prompt:

> Revise `path/to/talk.qmd` for [delivery context]. Load `voice/style-global.md` and `voice/style-[audience].md` first. Preserve everything in the "Things to NEVER strip out" sections; flag any place where my current draft conflicts with them. Don't apply suggestions from the "Suggestions for improvement" sections unless I ask.

---

# Appendix: Customisation file structure

This is the structure Phase 0 should produce. If you're skipping the interview and writing it yourself, follow this structure exactly — Phase 1 and Phase 2 read it.

```markdown
# Customisation

## Who I am

**Role and field:** [1–2 sentences. Role, discipline, anything unusual about the professional positioning.]

**Voice markers:** [2–4 phrases describing what makes my voice non-generic. The model will treat these as load-bearing and protect them during revision.]

**Stack notes:** [Optional. Brand colour, font stack, citation style. Skip if not relevant.]

## Calibration speakers

[Three to six names. Optionally a one-line note on each saying what makes them a useful calibration target.]

**On TED:** [Explicit position. "Avoid", "lean in", or a nuanced version. Captured verbatim from the interview where possible.]

## Audiences

### [Audience name]

[2–4 sentence description: who they are, what they know coming in, what register lands, what running jokes or set pieces translate.]

- **Include:** [folder, glob, or specific paths]
- **Exclude:** [optional]
- **Anchor talks:** [2–4 specific files weighted most heavily as exemplars]

### [Next audience name]

[…]
```

# Appendix: Worked example

The following is one user's `customisation.md` (mine), illustrating the format. The audiences and voice markers are particular to me — yours will differ. My corpus is in Quarto/RevealJS, so my `style-mechanical.md` (not shown here) is full of YAML, CSS variables, and layout primitives. A PowerPoint user with the same audience structure would get the same `customisation.md` shape but a `style-mechanical.md` focused on slide masters, theme colours, layout choices, and animation conventions.

```markdown
# Customisation

## Who I am

**Role and field:** Head of a Centre for Advanced Spatial Analysis (CASA) at UCL. Quantitative urban geographer with a background in both Comparative Literature and dot-com web development before further study.

**Voice markers:** British humour, dry irony rather than punchlines; literary and pop-cultural references doing argument work (Dr Strangelove, BL archive prints, Steve Jobs "One More Thing..."); refusal of the corporate line while operating within the corporate context; long-view scepticism that refuses "this time it's different" claims.

**Stack notes:** CASA purple `rgb(78, 60, 86)` as sole accent; Amatic SC headings, Source Serif Pro body, Caveat blockquotes; Harvard inline citations via `harvard-cite-them-right.csl`.

## Calibration speakers

Mary Beard, Stuart Hall, Hans Rosling, Tim Harford, Doreen Massey, Adam Curtis.

**On TED:** Avoid TED-style stylists. I want substance with engagement, not engagement substituting for substance.

## Audiences

### Departmental colleagues

People who work in my department, attend all-staff meetings, and have heard me speak before. They have full institutional context, recognise running jokes (the "free yoga" footnote, the HoD KPI gag), and reward frankness about institutional bind. Trust earned through bureaucratic specificity ("three messages to Procurement") rather than performance.

- **Include:** `casa/`
- **Anchor talks:** `casa/SoTN-202511.qmd`, `casa/HoD-Intro.qmd`, `casa/Strangelove.qmd`

### Conference researchers

Topic-specialist peers from other institutions at academic conferences. They do not have departmental context. Running jokes from internal talks will not land. Trust is earned through argument, evidence, and substantive cultural anchoring. They expect a bibliography, formal citation, and an argument-driven structure.

- **Include:** `rsa/`, `frag2024/`
- **Anchor talks:** `rsa/RSA_2025.qmd`, `frag2024/Gathered_Field.qmd`

### Industry and policy

Practitioners — surveyors, planners, policy staff, civic technologists. Mixed academic exposure. They want operational consequence, not method detail. The "this time it's different" scepticism does most of its work here.

- **Include:** `f2f/`, `dcdc22/`, `arc/`
- **Anchor talks:** `f2f/Carter_Jonas.qmd`, `dcdc22/index.qmd`

### Pedagogy and other institutions

Educators, fellows, and cross-institutional research-software audiences. They want transferable practice, not just findings. Practitioner-not-theorist self-positioning matters most here.

- **Include:** `reproducible/`, `in2science/`, `grow_your_data/`
- **Anchor talks:** `reproducible/index.qmd`, `in2science/Part1.qmd`
```
