Access PDF Text Layer Elements in React with react-pdf

When using react-pdf to render PDF files in a React application, a common challenge developers face is accessing elements like the textLayer. Often, you’ll see the correct element in the console.log(), but attempts to directly access it via document.getElementsByClassName() result in undefined.

In this article, we’ll break down why this happens and how to fix it, including examples and best practices that work reliably.


Problem Summary

You’ve likely tried something like this in your code:

var element = document.getElementsByClassName("react-pdf__Page__textContent textLayer");
console.log("page element: ", element);

You might see the element printed correctly in the console, but when you attempt to access element[0], it returns undefined or the collection is empty.

Why This Happens

react-pdf renders the text layer asynchronously, after the PDF document and page have been initially loaded. This means that by the time onLoadSuccess is triggered, the DOM element with the class react-pdf__Page__textContent textLayer might not be available yet.


Recommended Solution: Wait for Render Completion

The key to accessing the text layer is waiting until it has been fully rendered in the DOM. Here’s how to do it without using MutationObserver, by adding a slight delay after the onLoadSuccess event.

✅ Working Example

import { Document, Page } from 'react-pdf';
import { useEffect, useState } from 'react';

const PdfViewer = ({ pdfUrl }) => {
  const [numPages, setNumPages] = useState(null);
  const [pageNumber, setPageNumber] = useState(1);

  const onDocumentLoadSuccess = ({ numPages }) => {
    setNumPages(numPages);

    // Delay to allow textLayer to render
    setTimeout(() => {
      const elements = document.getElementsByClassName("react-pdf__Page__textContent textLayer");

      if (elements.length > 0) {
        const firstTextLayer = elements[0];
        console.log("✅ TextLayer found:", firstTextLayer);
        // You can now interact with the text layer
      } else {
        console.warn("⚠️ TextLayer not found yet.");
      }
    }, 1000); // Adjust timeout if needed
  };

  return (
    <Document file={pdfUrl} onLoadSuccess={onDocumentLoadSuccess}>
      <Page
        pageNumber={pageNumber}
        className="pdf-page"
        renderTextLayer={true}
        loading={<div>Loading your PDF...</div>}
      />
    </Document>
  );
};

Important Notes

1. Use Correct Class Names

Ensure you’re using the correct combination of class names for the textLayer:

document.getElementsByClassName("react-pdf__Page__textContent textLayer");

or, using a more robust selector:

document.querySelector('.react-pdf__Page__textContent.textLayer');

Note that getElementsByTagName("react-pdf__Page__textContent textLayer") is incorrect—those are class names, not tag names.

2. Consider Asynchronous Rendering

As react-pdf is based on pdfjs, rendering is broken down into parts:

  • Canvas rendering
  • Text layer rendering (used for search/select/copy)

That means even after the PDF is loaded, the text layer may appear a few milliseconds later.


Alternatives to setTimeout

If you need a more robust solution in production environments:

  • Use MutationObserver to watch for DOM changes.
  • Use a ref and check for its .textContent after the render.
  • Use onRenderSuccess from the <Page /> component (if available in your version of react-pdf).

Conclusion

Accessing the text layer in react-pdf requires awareness of rendering delays. By using a short setTimeout after the PDF loads, you can safely interact with the textLayer without resorting to more complex DOM observers.

This workaround is simple, effective, and easy to integrate into most React applications.

This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,

Let's talk!

    Please fill your details, and we will contact you back

      Please fill your details, and we will contact you back