video699.document.pdf

This module implements reading a document from a PDF document file.

Module Contents

video699.document.pdf.LOGGER
video699.document.pdf.CONFIGURATION
video699.document.pdf.LRU_CACHE_MAXSIZE
class video699.document.pdf.PDFDocumentPage(document, page)

Bases: video699.interface.PageABC

A page of a PDF document read from a PDF document file.

Parameters:
  • document (DocumentABC) – The document containing the page.
  • page (fitz.Page) – The internal representation of the page by the PyMuPDF library.
document

DocumentABC – The document containing the page.

image

array_like – The image data of the page as an OpenCV CV_8UC3 RGBA matrix, where the alpha channel (A) denotes the weight of a pixel. Fully transparent pixels, i.e. pixels with zero alpha, SHOULD be completely disregarded in subsequent computation. Any margins added to the image data, e.g. by keeping the aspect ratio of the page, MUST be fully transparent.

number

int – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.

document
number
image
render(self, width=None, height=None)
__hash__(self)
class video699.document.pdf.PDFDocument(pathname)

Bases: video699.interface.DocumentABC

A PDF document read from a PDF document file.

Note

A document file is opened as soon as the class is instantiated, and closed only after the finalization of the object.

Parameters:pathname (str) – The pathname of a PDF document file.
title

str – The title of a document.

author

str – The author of a document.

pathname

str – The pathname of a PDF document file.

uri

string – An IRI, as defined in RFC3987, that uniquely indentifies the document over the entire lifetime of a program.

Raises:ValueError – If the pathname does not specify a PDF document file or if the PDF document contains no pages.
title
author
uri
__iter__(self)
__hash__(self)