Keenan Finkelstein Keenan Finkelstein

We Remediated a Real Public-Records PDF: From Scanned Deed to CAV-Style Accessible Parity PDF

Some of the authored accessible HTML that created the document

We Remediated a Real Public-Records PDF: From Scanned Deed to CAV-Style Accessible Parity PDF

State and local government records portals are full of PDFs that look readable to sighted users but are functionally blank to assistive technology. A scanned deed may display perfectly in a browser and still expose no headings, no selectable text, no reading order, no document language, and no way for a screen-reader user to reach the actual content.

That is the problem this demo solves with Fullbleed.

Why This Matters Now

The Department of Justice's Title II web and mobile accessibility rule requires state and local government web content and mobile apps to meet WCAG 2.1 Level AA. The DOJ fact sheet says the April 2026 interim final rule extended the compliance dates to April 26, 2027 for public entities serving 50,000 or more people and April 26, 2028 for smaller entities and special district governments.

Sources:

The practical issue is not abstract. Public portals often provide essential records as image-only PDFs. A user can see the page image, but the document is not meaningfully available to many people who use screen readers, text resizing, copy/paste, document search, or other assistive workflows.

The Test Record

For this remediation pass, we used a publicly available Escambia County Official Records document from LandmarkWeb:

  • Portal: https://dory.escambiaclerk.com/LandmarkWeb

  • Search path: Record Date search for May 21, 2026

  • Clerk file number: 2026039980

  • Document ID: 7400460

  • Source document type: Warranty Deed

  • Scan caption: Special Warranty Deed

  • Book/page: OR 9495, pages 1539 through 1543

The source record is a five-page scanned PDF. Fullbleed inspection showed no language metadata, no structure tree, no marked content, no embedded fonts, and no declared accessibility profile. A local text-operator probe found zero PDF text objects and zero text-showing operators across all five pages.

In plain English: the PDF looked like a record, but to software it was five pictures.

What Fullbleed Produced

We created two CAV authoring paths and one shorter sales summary.

The first CAV path is direct semantic HTML/CSS. The primary output is a 12-page CAV-style accessible parity PDF. It is not a certified copy or a legal substitute. It is a remediation artifact: it preserves the original scan as evidence, keeps the original page sequence, includes a visual facsimile of each source page, and follows each facsimile with a structured transcript or field equivalent.

The second CAV path is component based. The source file components/escambia_cfn_2026039980_component_cav.py defines the same kind of CAV using Fullbleed accessibility components such as Document, Region, Heading, FieldGrid, FieldItem, SemanticTable, ColumnHeader, RowHeader, and DataCell. That source emits accessible HTML, validates the component tree, renders a PDF/UA-targeted PDF through the AccessibilityEngine bundle path, and writes run reports and traces.

The secondary output is a shorter 5-page accessible summary that can be used as sales collateral. It should not be presented as the parity/CAV deliverable.

The CAV-style PDF includes:

  • A declared document language.

  • A meaningful title.

  • Headings that expose the outline.

  • Tables with headers for record metadata and field values.

  • Visual facsimiles of all five source pages.

  • Page-by-page accessible transcripts and field equivalents.

  • CAV profile/family attributes and an explicit source-page parity map.

  • Machine-readable signature status attributes for present, missing, and review-required marks.

  • Text descriptions of signatures, seals, handwritten marks, and unclear checkboxes.

  • Review flags where the scan is ambiguous.

  • A preserved source PDF and evidence log.

The generated CAV output inspected as:

  • PDF version: 1.7

  • Page count: 12

  • PDF/UA claim: present

  • Metadata: present

  • Language: present

  • Structure tree: present

  • Marked content: present

  • Embedded fonts: present

  • Extractable text: present

The component CAV output inspected as the same 12-page PDF/UA-targeted class of artifact, with metadata, language, structure tree, marked content, embedded font, and extractable text present. Its component validation report showed zero accessibility contract errors and zero warnings.

The Workflow

  1. Find a real public record in the portal.

  2. Save the original PDF as evidence.

  3. Inspect the source PDF for accessibility blockers.

  4. Render page images for human review.

  5. Rebuild the practical record information as semantic HTML, keeping the original page sequence.

  6. Render with Fullbleed using --pdf-profile pdfua1, --document-lang, embedded font assets, page images, manifest output, and a deterministic hash.

  7. Inspect the generated PDF and compare the result against the source.

The important part is the feedback loop. Fullbleed can render HTML/CSS into PDF and PNG pages, then inspect the PDF structure. That gives a human or AI agent enough evidence to iterate: fix text overflow, confirm the reading order, catch missing metadata, preserve page parity, and regenerate deterministically.

There are two practical implementation styles:

  • Direct HTML/CSS: fastest for a one-off remediation where the reviewer wants to see and edit the exact source markup.

  • Component CAV: better for repeatable record families because the authoring layer encodes reusable field grids, semantic tables, source-page mappings, and signature-state components before rendering to HTML/PDF.

How a Human or Agent Can Replicate It

The repeatable remediation pattern is to treat the original scan, the semantic source, the generated PDF, and the verification files as one evidence bundle. In this case study, that bundle lives in three folders:

  • source-records/ for the original PDF and page images.

  • artifacts/ for the HTML, CSS, generated PDF, rendered output pages, manifest, hash, and reproduction record.

  • docs/ for the evidence log, user guide, and AI-agent runbook.

The first technical step is source inspection. This tells you whether the PDF is already accessible or whether it is just a scanned container:

fullbleed --json-only inspect pdf source-records\escambia-cfn-2026039980-original.pdf > source-records\escambia-cfn-2026039980-original-inspect.json

For this record, the inspection output showed metadata_present: false, lang_present: false, struct_tree_root_present: false, mark_info_present: false, and embedded_font_count: 0. A local stdlib-only probe also checked the decoded page content streams for text-showing operators:

python tools\pdf_text_operator_probe.py source-records\escambia-cfn-2026039980-original.pdf --out source-records\escambia-cfn-2026039980-original-text-probe.json

For the source scan, that probe reported text_objects: 0, text_show_operators: 0, and candidate_string_bytes: 0. The same probe on the CAV output reported text-showing operators on every generated page.

Then render the scanned pages to PNG so a human reviewer, OCR process, or AI vision pass can inspect what is actually on the page:

python tools\render_pdf_pages.py source-records\escambia-cfn-2026039980-original.pdf source-records\rendered-pages --zoom 2.5

The remediation source is ordinary semantic HTML and CSS. That is intentional. Humans can review it, agents can edit it, and Fullbleed can render it deterministically. The CAV HTML uses document-level structure instead of visual-only positioning: h1, h2, h3, paragraphs, lists, and tables with header cells. It also carries machine-readable hooks such as data-fb-cav-family, data-fb-cav-profile, data-fb-source-page, and data-fb-a11y-signature-status. Each source page gets a visual facsimile section and a matching accessible transcript or field-equivalent section. Ambiguous scanned fields are not guessed; they are explicitly marked for human review.

For the component path, the same semantics are generated from reusable Python components:

python components\escambia_cfn_2026039980_component_cav.py

That command emits:

  • artifacts/component-cav/escambia-cfn-2026039980-component-cav.html

  • artifacts/component-cav/escambia-cfn-2026039980-component-cav.css

  • artifacts/component-cav/escambia-cfn-2026039980-component-cav.pdf

  • artifacts/component-cav/escambia-cfn-2026039980-component-cav_run_report.json

  • reading-order, PDF-structure, pagination, typography, and region-alignment traces

This matters commercially because it shows the accessibility source of truth. The component tree produces HTML elements that screen readers understand, and Fullbleed carries those semantics into PDF tagging and verification artifacts.

The render command asks Fullbleed for a PDF/UA-oriented output, declares document language and title metadata, embeds a font, emits PNG pages, writes a compiler manifest, writes a deterministic hash, and records reproduction metadata:

fullbleed render `
  --html artifacts\escambia-cfn-2026039980-cav.html `
  --css artifacts\escambia-cfn-2026039980-cav.css `
  --asset <path-to-NotoSans-Regular.ttf> `
  --asset-kind font `
  --pdf-profile pdfua1 `
  --document-lang en-US `
  --document-title "Conforming Alternate Version - Escambia County CFN 2026039980" `
  --profile preflight `
  --emit-image artifacts\cav-pages `
  --emit-manifest artifacts\escambia-cfn-2026039980-cav-manifest.json `
  --deterministic-hash artifacts\escambia-cfn-2026039980-cav.sha256 `
  --repro-record artifacts\escambia-cfn-2026039980-cav-repro.json `
  --out artifacts\escambia-cfn-2026039980-cav.pdf

After rendering, inspect the output and compare it to the source failure state:

fullbleed --json-only inspect pdf artifacts\escambia-cfn-2026039980-cav.pdf > artifacts\escambia-cfn-2026039980-cav-inspect.json

Then run the same text-operator probe against the generated PDF:

python tools\pdf_text_operator_probe.py artifacts\escambia-cfn-2026039980-cav.pdf --out artifacts\escambia-cfn-2026039980-cav-text-probe.json

The generated CAV-style PDF passes the practical gates we care about for this demonstration: PDF/UA claim present, metadata present, language present, structure tree present, marked content present, embedded font present, and extractable text present on every generated page.

For an AI agent, the loop is straightforward:

  1. Read the source inspection JSON and determine what is missing.

  2. Render page images from the source PDF.

  3. Draft or update semantic HTML, or update the component payload/source, preserving original page order and pairing each source-page facsimile with its transcript or field equivalent.

  4. Run Fullbleed render with strict metadata and asset inputs.

  5. For component CAVs, run the component bundle to validate the component tree and emit AccessibilityEngine reports.

  6. Inspect the generated PDF.

  7. Open the generated PNG pages and look for clipped text, overlap, awkward page breaks, and missing content.

  8. Patch the HTML/CSS or component source and rerun until the inspection gates and visual review both pass.

That loop is the product story. Fullbleed gives the agent the same controls a careful human production engineer wants: a deterministic input, a rendered PDF, visual page artifacts, inspection JSON, and a reproducible hash.

Step-by-Step Guide: Create a CAV-Style Accessible PDF for a Public Record

This guide is written for a records, web, accessibility, or compliance team that needs a practical workflow for scan-only public PDFs.

1. Pick a Public Record

Use a publicly available source and record the exact search path. For this demo:

Prefer records that minimize unnecessary exposure of personal information. This demo used an entity-to-entity deed record rather than a marriage license, mortgage, or individual lien.

2. Save the Original

Preserve the original PDF before changing anything.

Example artifact:

source-records\escambia-cfn-2026039980-original.pdf

The original remains the official source of truth. The CAV-style accessible version is an alternative, not a certified replacement.

3. Inspect the Source

Run Fullbleed inspection:

fullbleed --json-only inspect pdf source-records\escambia-cfn-2026039980-original.pdf > source-records\escambia-cfn-2026039980-original-inspect.json

For the demo record, inspection found no language metadata, no structure tree, no marked content, no embedded fonts, and no accessibility profile. A local text-operator probe found zero PDF text objects and zero text-showing operators in the source page content streams.

python tools\pdf_text_operator_probe.py source-records\escambia-cfn-2026039980-original.pdf --out source-records\escambia-cfn-2026039980-original-text-probe.json

4. Recover the Record Information

Render page images and review them carefully.

python tools\render_pdf_pages.py source-records\escambia-cfn-2026039980-original.pdf source-records\rendered-pages --zoom 2.5

In this workspace, page images are stored at:

source-records\rendered-pages\

Capture the information users need: record number, parties, dates, legal description, fee fields, signature descriptions, notary details, and form fields.

5. Mark Uncertain Fields

Do not guess. If a checkbox, handwritten note, seal, or signature is unclear, mark it for review.

Example:

County maintenance field: Checkbox or selection state is not clear from the scan and needs human review against the original.

6. Choose an Authoring Path

For a one-off remediation, build semantic HTML directly. Use headings, paragraphs, lists, and tables with headers. Avoid putting all content into positioned boxes or images. Preserve source-page order and pair each scanned page facsimile with a matching transcript or field equivalent. Add stable CAV hooks such as profile/family attributes, source-page markers, and signature-status attributes so later tools can audit what each section represents.

Example artifact:

artifacts\escambia-cfn-2026039980-cav.html

For repeatable record families, use Fullbleed accessibility components. The component source in this demo is:

components\escambia_cfn_2026039980_component_cav.py

It uses reusable Document, Region, Heading, FieldGrid, FieldItem, SemanticTable, and signature-state helpers to generate the accessible HTML that drives screen readers and PDF tagging.

7. Render With Fullbleed

Render a PDF/UA-oriented output with document language, title metadata, embedded fonts, page images, a manifest, and reproducibility evidence.

fullbleed render `
  --html artifacts\escambia-cfn-2026039980-cav.html `
  --css artifacts\escambia-cfn-2026039980-cav.css `
  --asset C:\Users\keena\AppData\Roaming\Python\Python311\site-packages\fullbleed_assets\fonts\NotoSans-Regular.ttf `
  --asset-kind font `
  --pdf-profile pdfua1 `
  --document-lang en-US `
  --document-title "Conforming Alternate Version - Escambia County CFN 2026039980" `
  --profile preflight `
  --emit-image artifacts\cav-pages `
  --emit-manifest artifacts\escambia-cfn-2026039980-cav-manifest.json `
  --deterministic-hash artifacts\escambia-cfn-2026039980-cav.sha256 `
  --repro-record artifacts\escambia-cfn-2026039980-cav-repro.json `
  --out artifacts\escambia-cfn-2026039980-cav.pdf

To run the component-based bundle:

python components\escambia_cfn_2026039980_component_cav.py

That writes a PDF, emitted HTML/CSS, component validation JSON, a run report, page previews, and accessibility traces under:

artifacts\component-cav\

8. Verify the Output

Inspect the generated PDF:

fullbleed --json-only inspect pdf artifacts\escambia-cfn-2026039980-cav.pdf > artifacts\escambia-cfn-2026039980-cav-inspect.json

Check for:

  • Metadata present.

  • Language present.

  • Structure tree present.

  • Marked content present.

  • Embedded fonts present.

  • Extractable text present.

  • Rendered page images with no overlapping or clipped text.

Run the text-operator probe against the generated PDF as a second check:

python tools\pdf_text_operator_probe.py artifacts\escambia-cfn-2026039980-cav.pdf --out artifacts\escambia-cfn-2026039980-cav-text-probe.json

For the component CAV, inspect and probe the component-generated PDF:

fullbleed --json-only inspect pdf artifacts\component-cav\escambia-cfn-2026039980-component-cav.pdf > artifacts\component-cav\escambia-cfn-2026039980-component-cav-inspect.json
python tools\pdf_text_operator_probe.py artifacts\component-cav\escambia-cfn-2026039980-component-cav.pdf --out artifacts\component-cav\escambia-cfn-2026039980-component-cav-text-probe.json

9. Publish Responsibly

Publish the CAV or accessible alternative next to the source scan. Label it clearly:

Accessible alternative for reading and navigation. Not a certified copy. Refer to the official recorded instrument for legal reliance.

10. Keep the Evidence

Keep the source, HTML, CSS, inspect JSON, rendered PNG pages, hash, and reproduction record together. That is what makes the remediation auditable.

Read More
Keenan Finkelstein Keenan Finkelstein

I Rigged 55 Tax Documents for VDP in an Evening

Rigging 55 Tax documents for VDP in an evening

I’m not much of a blogger. I haven’t written one in 20 years. I also don’t usually find tax documents, nor rigging them for variable data printing very exciting.
But, here we are…

Read More