video699.document.image_file

This module implements a document represented by image files containing the individual pages.

Module Contents

class video699.document.image_file.ImageFileDocumentPage(document, number, image_pathname)

Bases: video699.interface.PageABC

A document page represented by a NumPy matrix containing image data.

Parameters:
  • document (DocumentABC) – The document containing the page.
  • number (int) – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.
  • image_pathname (str) – The pathname of the image files containing the document page.
document

DocumentABC – The document containing the page.

number

int – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.

image

array_like – The image data of the page as an OpenCV CV_8UC3 RGBA matrix, where the alpha channel (A) denotes the weight of a pixel. Fully transparent pixels, i.e. pixels with zero alpha, SHOULD be completely disregarded in subsequent computation. Any margins added to the image data, e.g. by keeping the aspect ratio of the page, MUST be fully transparent.

document
number
image
__hash__(self)
class video699.document.image_file.ImageFileDocument(image_pathnames, title=None, author=None)

Bases: video699.interface.DocumentABC

A document that consists of pages represented by NumPy matrices containing image data.

Parameters:
  • image_pathnames (iterable of str) – The pathnames of the image files containing the individual pages in the document.
  • title (str or None, optional) – The title of a document. None when unspecified.
  • author (str or None, optional) – The author of a document. None when unspecified.
title

str or None – The title of a document.

author

str or None – The author of a document.

uri

string – An IRI, as defined in RFC3987, that uniquely indentifies the document over the entire lifetime of a program.

Raises:ValueError – If no pathnames to image files containing document pages were provided.
_num_documents = 0
title
author
uri
__iter__(self)
__hash__(self)