A hidden data problem: What your PDF may be saying…without you even knowing.

4 Jun

A little while ago, I was reviewing a publicly available data sharing agreement (a PDF) as part of some research we were carrying out here at Privacy Path. I read through the document carefully, it looked clean, professional, and complete. Names were consistent throughout. The parties were correctly identified. Nothing appeared out of place. Then, almost as an afterthought, I did a simple Ctrl+F search for a name I had been curious about. It came back with four hits. Four references to an entirely separate third-party organisation, one whose name appeared nowhere in the visible text of the document. The document had been telling a story I could not see just by reading it.

That sent me down a rabbit hole. What I found is a surprisingly common and underappreciated data protection risk, one that affects organisations of all sizes, in both the public and private sectors. And the consequences, in a GDPR context, can be more serious than most people realise.

How Does Hidden Data End Up in a Published PDF?

The answer lies in how Word documents are converted to PDF without first being cleaned. Modern word processors like Microsoft Word track every edit made to a document, insertions, deletions, comments, author names, and revision history. This functionality is enormously useful during drafting but becomes a liability at the point of publication. When a document is saved as PDF with tracked changes unaccepted, comments unresolved, or metadata unchecked, all of that underlying information is baked into the exported file.

In the case I encountered, the document had clearly been produced from a template originally drafted for a different organisation. The visible text had been updated, names changed, references corrected, parties substituted. The document looked clean on screen. But somewhere in the revision history, the earlier organisation's name had survived, invisible in the rendered PDF but fully searchable in the underlying file structure. Anyone with a few seconds and a search bar could find it. The organisation named had no idea they were referenced in a published document, and the organisation that published it almost certainly had no idea either.

This is not an isolated incident. It is a structural risk that arises whenever template-based documents are converted to PDF without a proper document inspection step, and it is far more widespread than most organisations appreciate.

Why This Matters Under GDPR

GDPR's accountability principle requires organisations to be able to demonstrate that personal data is processed in a controlled, transparent, and purposeful way. Inadvertent disclosure of third-party names, internal commentary, or draft processing details through embedded PDF metadata undermines each of those requirements. Where the embedded content relates to an individual, an author name in document properties, a comment attributing a decision to a named person, a tracked change identifying who drafted a clause, it constitutes personal data. Its unintended publication may constitute a personal data breach reportable to the Data Protection Commission under GDPR.

Beyond the regulatory risk, there is a straightforward reputational dimension. Clients, counterparties, and regulators who discover that your published documents contain hidden references to other organisations or internal drafting notes are unlikely to draw a charitable conclusion. Document hygiene is a basic operational control, and its absence signals wider weaknesses in data governance.

Dos and Don'ts: Document Publication Checklist

DO Use Word's built-in Document Inspector (File → Info → Check for Issues → Inspect Document) before converting any document to PDF. It will surface tracked changes, comments, hidden text, and personal data in document properties.
DO Accept or reject all tracked changes before exporting. A clean accepted document ensures nothing from the revision history is carried into the PDF.
DO Remove all comments and author annotations before publication, particularly where comments reference third parties, clients, or internal decision-making.
DO Clear document metadata — author name, company name, and last-modified-by fields — before publishing externally. These fields can be edited under File → Properties.
DO Use print-to-PDF or a dedicated PDF export with metadata stripping enabled, rather than simply saving as PDF, for documents intended for public or third-party distribution.
DO Conduct a post-export search of the PDF using Ctrl+F to verify that no legacy references, client names, or internal notes are discoverable in the published file.

DON'T

DON'T Assume that invisible text is inaccessible. Text hidden through formatting, colour matching, or tracked deletions may still be searchable in the exported PDF.
DON'T Publish template-derived documents without checking for residual references to the original template client or context.
DON'T Treat document inspection as a one-off task. Any time a document is updated and re-exported, the inspection process should be repeated from scratch.
DON'T Overlook third-party names embedded in earlier drafts. Even where the individual or organisation is not a data subject in the conventional sense, disclosure of their involvement in a transaction or processing arrangement may have legal and reputational consequences.
DON'T Rely on recipients not looking. PDF search tools are standard, and regulators conducting document reviews will use them.

Need a document compliance review?

At Privacy Path, we regularly review contracts, policies, and published documents for exactly these kinds of hidden compliance risks. If you'd like us to cast an eye over your document templates or publication processes, get in touch with our team.

Contact us: privacy@privacypath.ie | privacypath.ie

Maeve Dunne

A hidden data problem: What your PDF may be saying…without you even knowing.

Can event organisers take photos of me without my consent?

Does Your Dashcam or CCTV Create a GDPR Problem?