video699.video.annotated

This module implements reading a sample of a video from a dataset with XML human annotations, and related classes.

Module Contents

video699.video.annotated.LOGGER
video699.video.annotated.RESOURCES_PATHNAME
video699.video.annotated.DATASET_PATHNAME
video699.video.annotated.DOCUMENT_ANNOTATIONS
video699.video.annotated.FRAME_ANNOTATIONS
video699.video.annotated.VIDEO_ANNOTATIONS
video699.video.annotated.VIDEOS
video699.video.annotated.PAGES
video699.video.annotated.SCREENS
video699.video.annotated.URI_REGEX
video699.video.annotated._init_dataset()

Reads human annotations from an XML dataset, converts them into objects and sorts them.

video699.video.annotated.get_videos()

Returns all videos from an XML dataset.

Returns:videos – A map between video file URIs, and all videos from an XML dataset.
Return type:dict of (str, AnnotatedSampledVideo)
class video699.video.annotated.VGG256Features(imagenet, imagenet_and_places2)

Bases: object

Two feature vectors obtained from the 256-dimensional last hidden layers of [VGG] ConvNets.

[VGG]Simonyan, Karen & Zisserman, Andrew. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556.
Parameters:
  • imagenet (array_like) – A 256-dimensional feature vector obtained from a network trained on the Imagenet dataset.
  • imagenet_and_places2 (array_like) – A 256-dimensional feature vector obtained from a network trained on the Imagenet, and Places2 datasets.
imagenet

np.array – A 256-dimensional feature vector obtained from a network trained on the Imagenet dataset. The feature vector is stored in a NumPy array of 64-bit floats.

imagenet_and_places2

np.array – A 256-dimensional feature vector obtained from a network trained on the Imagenet, and Places2 datasets. The feature vector is stored in a NumPy array of 64-bit floats.

class video699.video.annotated._DocumentAnnotations(filename, pages)

Bases: object

Human annotations associated with a single document.

Parameters:
  • filename (str) – The filename of the corresponding PDF document. The filename is unique in the video.
  • pages (dict of (str, _PageAnnotations)) – A map between page keys, and human annotations associated with the pages of the document.
filename

str – The filename of the corresponding PDF document. The filename is unique in the video.

pages

dict of (str, _PageAnnotations) – A map between page keys, and human annotations associated with the pages of the document.

class video699.video.annotated._PageAnnotations(key, number, filename, vgg256)

Bases: object

Human annotations associated with a single page of a document.

Parameters:
  • key (str) – An identifier of a page in a document. The identifier is unique in the video associated with the document.
  • number (int) – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.
  • filename (str) – The filename of the corresponding document page image. The filename is unique in the video associated with the document.
  • vgg256 (VGG256Features) – 256-dimensional feature vectors obtained by feeding the page image data into VGG ConvNets.
key

str – An identifier of a page in a document. The identifier is unique in the video associated with the document.

number

int – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.

filename

str – The filename of the corresponding document page image. The filename is unique in the video associated with the document.

vgg256

VGG256Features – 256-dimensional feature vectors obtained by feeding the page image data into VGG ConvNets.

class video699.video.annotated.AnnotatedSampledVideoDocumentPage(document, key)

Bases: video699.interface.PageABC

A single page of a document extracted from a dataset with XML human annotations.

Parameters:
  • document (AnnotatedSampledVideoDocument) – The document containing the page.
  • key (str) – A page identifier. The identifier is unique in the video associated with the document.
document

DocumentABC – The document containing the page.

image

array_like – The image data of the page as an OpenCV CV_8UC3 RGBA matrix, where the alpha channel (A) denotes the weight of a pixel. Fully transparent pixels, i.e. pixels with zero alpha, SHOULD be completely disregarded in subsequent computation. Any margins added to the image data, e.g. by keeping the aspect ratio of the page, MUST be fully transparent.

number

int – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.

filename

str – The filename of the corresponding document page image. The filename is unique in the video associated with the document.

pathname

str – The full pathname of the corresponding document page image. The pathname is unique in the video associated with the document.

key

str – A page identifier. The identifier is unique in the video associated with the document.

vgg256

VGG256Features – 256-dimensional feature vectors obtained by feeding the page image data into VGG ConvNets.

document
number
pathname
image
__hash__(self)
class video699.video.annotated.AnnotatedSampledVideoDocument(video, filename)

Bases: video699.interface.DocumentABC

A sequence of images forming a document extracted from a dataset with XML human annotations.

Parameters:
  • video (AnnotatedSampledVideo) – The video associated with this document.
  • filename (str) – The filename of the corresponding PDF document. The filename is unique in the video.
video

AnnotatedSampledVideo – The video associated with this document.

filename

str – The filename of the corresponding PDF document. The filename is unique in the video.

pathname

str – The full pathname of the corresponding PDF document. The pathname is unique in the video.

title

str or None – The title of a document.

author

str or None – The author of a document.

uri

string – An IRI, as defined in RFC3987, that uniquely indentifies the document over the entire lifetime of a program.

Raises:ValueError – If the document contains no pages.
title
author
pathname
uri
__iter__(self)
__hash__(self)
class video699.video.annotated._FrameAnnotations(filename, number, screens, vgg256)

Bases: object

Human annotations associated with a single frame of a video.

Parameters:
  • filename (str) – The filename of the corresponding video frame image. The filename is unique in the video.
  • number (int) – The frame number, i.e. the position of the frame in the video. Frame indexing is one-based, i.e. the first frame has number 1. The frame number is unique in the video.
  • screens (list of _ScreenAnnotations) – A list of human annotations associated with the lit projection screens in the frame.
  • vgg256 (VGG256Features) – 256-dimensional feature vectors obtained by feeding the frame image data into VGG ConvNets.
filename

str – The filename of the corresponding video frame image. The filename is unique in the video.

number

int – The frame number, i.e. the position of the frame in the video. Frame indexing is one-based, i.e. the first frame has number 1. The frame number is unique in the video.

screens

list of _ScreenAnnotations – A list of human annotations associated with the lit projection screens in the frame.

vgg256

VGG256Features – 256-dimensional feature vectors obtained by feeding the frame image data into VGG ConvNets.

class video699.video.annotated.AnnotatedSampledVideoFrame(video, number)

Bases: video699.interface.FrameABC

A frame of a video extracted from a dataset with XML human annotations.

Parameters:
  • video (VideoABC) – The video containing the frame.
  • number (int) – The frame number, i.e. the position of the frame in the video. Frame indexing is one-based, i.e. the first frame has number 1. The frame number is unique in the video.
video

VideoABC – The video containing the frame.

number

int – The frame number, i.e. the position of the frame in the video. Frame indexing is one-based, i.e. the first frame has number 1. The frame number is unique in the video.

filename

str – The filename of the corresponding video frame image. The filename is unique in the video.

pathname

str – The full pathname of the corresponding video frame image. The pathname is unique in the video.

image

ndarray – The image data of the frame as an OpenCV CV_8UC3 RGBA matrix, where the alpha channel (A) is currently unused and all pixels are fully opaque, i.e. they have the maximum alpha of 255.

width

int – The width of the image data.

height

int – The height of the image data.

datetime

aware datetime – The date, and time at which the frame was captured.

vgg256

VGG256Features – 256-dimensional feature vectors obtained by feeding the frame image data into VGG ConvNets.

video
number
pathname
image
class video699.video.annotated._VideoAnnotations(uri, dirname, datetime, fps, num_frames, width, height)

Bases: object

Human annotations associated with a single video.

Parameters:
  • uri (str) – The URI of the video file. The URI is unique in the dataset.
  • dirname (str) – The pathname of the directory, where the frames, documents, and XML human annotations associated with the video are stored.
  • datetime (aware datetime) – The date, and time at which the video was captured.
  • num_frames (int) – The total number of frames in the original video file.
  • fps (scalar) – The framerate of the video in frames per second.
  • width (int) – The width of the video.
  • height (int) – The height of the video.
uri

str – The URI of the video file. The URI is unique in the dataset.

dirname

str – The pathname of the directory, where the frames, documents, and XML human annotations associated with the video are stored.

datetime

aware datetime – The date, and time at which the video was captured.

num_frames

int – The total number of frames in the original video file.

fps

scalar – The framerate of the video in frames per second.

width

int – The width of the video.

height

int – The height of the video.

class video699.video.annotated.AnnotatedSampledVideo(uri)

Bases: video699.interface.VideoABC, collections.abc.Sized

A sample of a video file extracted from a dataset with XML human annotations.

Notes

It is possible to repeatedly iterate over all video frames.

Parameters:uri (str) – The URI of the video file. The URI is unique in the dataset.
dirname

str – The pathname of the directory, where the frames, documents, and XML human annotations associated with the video are stored.

pathname

str – The full pathname of the directory, where the frames, documents, and XML human annotations associated with the video are stored.

filename

str – The filename of the video file.

num_frames

int – The total number of frames in the original video file.

fps

scalar – The framerate of the video in frames per second.

width

int – The width of the video.

height

int – The height of the video.

duration

timedelta – The elapsed time since the beginning of the video.

datetime

aware datetime – The date, and time at which the video was captured.

documents

dict of (str, AnnotatedSampledVideoDocument) – A map between PDF document filenames, and the documents associated with the video.

uri

string – The URI of the video file. The URI is unique in the dataset.

pathname
fps
width
height
datetime
uri
__iter__(self)
__len__(self)

Produces the number of video frames.

Returns:length – The number of video frames.
Return type:int
class video699.video.annotated._KeyRefAnnotations(key, similarity)

Bases: object

Human annotations describing a document page shown in a lit projection screen.

Parameters:
  • key (str) – An identifier of a page in a document. The identifier is unique in the video associated with the document.
  • similarity (str) –

    The similarity between what is shown in the projection screen, and the document page. The following values are legal:

    • full specifies that there is a 1:1 correspondence between what is shown in the projection screen, and the document page.
    • incremental specifies that in a document attached to the ancestor video, a single logical page is split across multiple physical pages and incrementally uncovered; the slide and the frame correspond to the same logical page, but not the same physical page.
key

str – An identifier of a page in a document. The identifier is unique in the video associated with the document.

similarity

str – The similarity between what is shown in the projection screen, and the document page. The following values are legal:

  • full specifies that there is a 1:1 correspondence between what is shown in the projection screen, and the document page.
  • incremental specifies that in a document attached to the ancestor video, a single logical page is split across multiple physical pages and incrementally uncovered; the slide and the frame correspond to the same logical page, but not the same physical page.
class video699.video.annotated._ScreenAnnotations(coordinates, condition, keyrefs, vgg256)

Bases: object

Human annotations associated with a single lit projection screen in a frame of a video.

Parameters:
  • coordinates (ConvexQuadrangleABC) – A map between frame and screen coordinates.
  • condition (str) –

    The condition of what is being shown in the screen. The following values are legal:

    • pristine specifies that there is no significant degradation beyond photon noise.
    • windowed specifies that a slide is being shown, but the slide does not cover the full screen.
    • obstacle specifies that a part of the screen or the projector light is partially obscured by either a physical obstacle, or by a different GUI window.
  • keyrefs (dict of (str, _KeyRefAnnotations)) – A map between document page keys, and human annotations specifying the relationship between the projection screen, and the document pages.
  • vgg256 (VGG256Features) – 256-dimensional feature vectors obtained by feeding the screen image data into VGG ConvNets.
coordinates

ConvexQuadrangleABC – A map between frame and screen coordinates.

condition

str – The condition of what is being shown in the screen. The following values are legal:

  • pristine specifies that there is no significant degradation beyond photon noise.
  • windowed specifies that a slide is being shown, but the slide does not cover the full screen.
  • obstacle specifies that a part of the screen or the projector light is partially obscured by either a physical obstacle, or by a different GUI window.
keyrefs

dict of (str, _KeyRefAnnotations) – A map between document page keys, and human annotations specifying the relationship between the projection screen, and the document pages.

vgg256

VGG256Features – 256-dimensional feature vectors obtained by feeding the screen image data into VGG ConvNets.

class video699.video.annotated.AnnotatedSampledVideoScreen(frame, screen_index)

Bases: video699.interface.ScreenABC

A projection screen extracted from XML human annotations.

Parameters:
  • frame (FrameABC) – A video frame containing the projection screen.
  • screen_index (int) – The index of the projection screen in the human annotations for the video frame. Screen indexing is zero-based, i.e. the first screen in the human annotations has index 0.
frame

FrameABC – A video frame containing the projection screen.

coordinates

ConvexQuadrangleABC – A map between frame and screen coordinates.

condition

str – The condition of what is being shown in the screen. The following values are legal:

  • pristine specifies that there is no significant degradation beyond photon noise.
  • windowed specifies that a slide is being shown, but the slide does not cover the full screen.
  • obstacle specifies that a part of the screen or the projector light is partially obscured by either a physical obstacle, or by a different GUI window.
vgg256

VGG256Features – 256-dimensional feature vectors obtained by feeding the screen image data into VGG ConvNets.

frame
coordinates
matching_pages(self)

Returns an iterable of pages related to the screen \(s\) based on human annotations.

Note

When a projection screen \(s\) shows a document page \(p\), we say that \(s\) fully matches \(p\) and we write \(s\approx p\).

When a single logical document page is split across several document pages \(p\) and a projection screen \(s\) shows the same logical page as \(p\), we say that \(s\) incrementally matches \(p\) and we write \(s\sim p\).

We say that \(s\) matches \(p\) the closest if and only if \(s\approx p\lor (\nexists p'(s\approx p') \land s \sim p)\).

Returns:
  • full_matches (iterable of AnnotatedSampledVideoDocumentPage) – An iterable of all document pages \(p\) that fully match \(s\).
  • incremental_matches (iterable of AnnotatedSampledVideoDocumentPage) – An iterable of all document pages \(p\) that incrementally match \(s\).
  • closest_matches (iterable of AnnotatedSampledVideoDocumentPage) – An iterable of all document pages \(p\) that match \(s\) the closest.
class video699.video.annotated.AnnotatedSampledVideoPageDetector

Bases: video699.interface.PageDetectorABC

A page detector that maps video screen to closest matching page using XML human annotations.

detect(self, frame, appeared_screens, existing_screens, disappeared_screens)
class video699.video.annotated.AnnotatedSampledVideoScreenDetector(conditions=('pristine', 'windowed', 'obstacle'), beyond_bounds=True)

Bases: video699.interface.ScreenDetectorABC

A screen detector that maps an annotated video frame to screens using XML human annotations.

Parameters:
  • conditions (iterable of str, optional) –

    A set of admissible conditions of a screen. The following condition strings are legal:

    • pristine specifies that there is no significant degradation beyond photon noise.
    • windowed specifies that a slide is being shown, but the slide does not cover the full screen.
    • obstacle specifies that a part of the screen or the projector light is partially obscured by either a physical obstacle, or by a different GUI window.

    Screens with inadmissible conditions will not be detected. When unspecified, all conditions are admissible.

  • beyond_bounds (bool, optional) – Whether a screen may extend beyond the bounds of a video frame. When unspecified, a screen may extend beyond the bounds.
detect(self, frame)
video699.video.annotated.evaluate_event_detector(annotated_video, event_detector)

Processes a video using a screen event detector and counts successful trials.

A video file is processed using a screen event detector. When an annotated video frame is encountered, a trial takes place. A trial is successful if and only if:

  1. the intersection of detected pages and the pages that match a pristine screen is non-empty for all pristine screens with matching pages, and
  2. the number of additional detected pages is less than or equal to the number of pages that match the non-pristine screens the closest according to the human annotations.
Parameters:
Returns:

  • num_successes (int) – The number of successful trials.
  • num_trials (int) – The number of trials.