`video699.video.annotated`¶

This module implements reading a sample of a video from a dataset with XML human annotations, and related classes.

Module Contents¶

video699.video.annotated.LOGGER¶

video699.video.annotated.RESOURCES_PATHNAME¶

video699.video.annotated.DATASET_PATHNAME¶

video699.video.annotated.DOCUMENT_ANNOTATIONS¶

video699.video.annotated.FRAME_ANNOTATIONS¶

video699.video.annotated.VIDEO_ANNOTATIONS¶

video699.video.annotated.VIDEOS¶

video699.video.annotated.PAGES¶

video699.video.annotated.SCREENS¶

video699.video.annotated.URI_REGEX¶

video699.video.annotated._init_dataset()¶: Reads human annotations from an XML dataset, converts them into objects and sorts them.

video699.video.annotated.get_videos()¶

Returns all videos from an XML dataset.

Returns:	videos – A map between video file URIs, and all videos from an XML dataset.
Return type:	dict of (str, AnnotatedSampledVideo)

class video699.video.annotated.VGG256Features(imagenet, imagenet_and_places2)¶

Bases: object

Two feature vectors obtained from the 256-dimensional last hidden layers of [VGG] ConvNets.

[VGG]

Simonyan, Karen & Zisserman, Andrew. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556.

Parameters:	imagenet (array_like) – A 256-dimensional feature vector obtained from a network trained on the Imagenet dataset. imagenet_and_places2 (array_like) – A 256-dimensional feature vector obtained from a network trained on the Imagenet, and Places2 datasets.

imagenet¶: np.array – A 256-dimensional feature vector obtained from a network trained on the Imagenet dataset. The feature vector is stored in a NumPy array of 64-bit floats.

imagenet_and_places2¶: np.array – A 256-dimensional feature vector obtained from a network trained on the Imagenet, and Places2 datasets. The feature vector is stored in a NumPy array of 64-bit floats.

class video699.video.annotated._DocumentAnnotations(filename, pages)¶

Bases: object

Human annotations associated with a single document.

Parameters:	filename (str) – The filename of the corresponding PDF document. The filename is unique in the video. pages (dict of (str, _PageAnnotations)) – A map between page keys, and human annotations associated with the pages of the document.

filename¶: str – The filename of the corresponding PDF document. The filename is unique in the video.

pages¶: dict of (str, _PageAnnotations) – A map between page keys, and human annotations associated with the pages of the document.

class video699.video.annotated._PageAnnotations(key, number, filename, vgg256)¶

Bases: object

Human annotations associated with a single page of a document.

Parameters:

key (str) – An identifier of a page in a document. The identifier is unique in the video associated with the document.
number (int) – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.
filename (str) – The filename of the corresponding document page image. The filename is unique in the video associated with the document.
vgg256 (VGG256Features) – 256-dimensional feature vectors obtained by feeding the page image data into VGG ConvNets.

key¶: str – An identifier of a page in a document. The identifier is unique in the video associated with the document.

number¶: int – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.

filename¶: str – The filename of the corresponding document page image. The filename is unique in the video associated with the document.

vgg256¶: VGG256Features – 256-dimensional feature vectors obtained by feeding the page image data into VGG ConvNets.

class video699.video.annotated.AnnotatedSampledVideoDocumentPage(document, key)¶

Bases: video699.interface.PageABC

A single page of a document extracted from a dataset with XML human annotations.

Parameters:	document (AnnotatedSampledVideoDocument) – The document containing the page. key (str) – A page identifier. The identifier is unique in the video associated with the document.

document¶: DocumentABC – The document containing the page.

image¶: array_like – The image data of the page as an OpenCV CV_8UC3 RGBA matrix, where the alpha channel (A) denotes the weight of a pixel. Fully transparent pixels, i.e. pixels with zero alpha, SHOULD be completely disregarded in subsequent computation. Any margins added to the image data, e.g. by keeping the aspect ratio of the page, MUST be fully transparent.

number¶: int – The page number, i.e. the position of the page in the document. Page indexing is one-based, i.e. the first page has number 1.

filename¶: str – The filename of the corresponding document page image. The filename is unique in the video associated with the document.

pathname¶: str – The full pathname of the corresponding document page image. The pathname is unique in the video associated with the document.

key¶: str – A page identifier. The identifier is unique in the video associated with the document.

vgg256¶: VGG256Features – 256-dimensional feature vectors obtained by feeding the page image data into VGG ConvNets.

document

number

pathname

image

__hash__(self)¶

class video699.video.annotated.AnnotatedSampledVideoDocument(video, filename)¶

Bases: video699.interface.DocumentABC

A sequence of images forming a document extracted from a dataset with XML human annotations.

Parameters:	video (AnnotatedSampledVideo) – The video associated with this document. filename (str) – The filename of the corresponding PDF document. The filename is unique in the video.

video¶: AnnotatedSampledVideo – The video associated with this document.

filename¶: str – The filename of the corresponding PDF document. The filename is unique in the video.

pathname¶: str – The full pathname of the corresponding PDF document. The pathname is unique in the video.

title¶: str or None – The title of a document.

author¶: str or None – The author of a document.

uri¶: string – An IRI, as defined in RFC3987, that uniquely indentifies the document over the entire lifetime of a program.

Raises:	`ValueError` – If the document contains no pages.

title

author

pathname

uri

__iter__(self)¶

__hash__(self)¶

class video699.video.annotated._FrameAnnotations(filename, number, screens, vgg256)¶

Bases: object

Human annotations associated with a single frame of a video.

Parameters:

filename (str) – The filename of the corresponding video frame image. The filename is unique in the video.
number (int) – The frame number, i.e. the position of the frame in the video. Frame indexing is one-based, i.e. the first frame has number 1. The frame number is unique in the video.
screens (list of _ScreenAnnotations) – A list of human annotations associated with the lit projection screens in the frame.
vgg256 (VGG256Features) – 256-dimensional feature vectors obtained by feeding the frame image data into VGG ConvNets.

filename¶: str – The filename of the corresponding video frame image. The filename is unique in the video.

number¶: int – The frame number, i.e. the position of the frame in the video. Frame indexing is one-based, i.e. the first frame has number 1. The frame number is unique in the video.

screens¶: list of _ScreenAnnotations – A list of human annotations associated with the lit projection screens in the frame.

vgg256¶: VGG256Features – 256-dimensional feature vectors obtained by feeding the frame image data into VGG ConvNets.

class video699.video.annotated.AnnotatedSampledVideoFrame(video, number)¶

Bases: video699.interface.FrameABC

A frame of a video extracted from a dataset with XML human annotations.

Parameters:	video (VideoABC) – The video containing the frame. number (int) – The frame number, i.e. the position of the frame in the video. Frame indexing is one-based, i.e. the first frame has number 1. The frame number is unique in the video.

video¶: VideoABC – The video containing the frame.

number¶: int – The frame number, i.e. the position of the frame in the video. Frame indexing is one-based, i.e. the first frame has number 1. The frame number is unique in the video.

filename¶: str – The filename of the corresponding video frame image. The filename is unique in the video.

pathname¶: str – The full pathname of the corresponding video frame image. The pathname is unique in the video.

image¶: ndarray – The image data of the frame as an OpenCV CV_8UC3 RGBA matrix, where the alpha channel (A) is currently unused and all pixels are fully opaque, i.e. they have the maximum alpha of 255.

width¶: int – The width of the image data.

height¶: int – The height of the image data.

datetime¶: aware datetime – The date, and time at which the frame was captured.

vgg256¶: VGG256Features – 256-dimensional feature vectors obtained by feeding the frame image data into VGG ConvNets.

video

number

pathname

image

class video699.video.annotated._VideoAnnotations(uri, dirname, datetime, fps, num_frames, width, height)¶

Bases: object

Human annotations associated with a single video.

Parameters:

uri (str) – The URI of the video file. The URI is unique in the dataset.
dirname (str) – The pathname of the directory, where the frames, documents, and XML human annotations associated with the video are stored.
datetime (aware datetime) – The date, and time at which the video was captured.
num_frames (int) – The total number of frames in the original video file.
fps (scalar) – The framerate of the video in frames per second.
width (int) – The width of the video.
height (int) – The height of the video.

uri¶: str – The URI of the video file. The URI is unique in the dataset.

dirname¶: str – The pathname of the directory, where the frames, documents, and XML human annotations associated with the video are stored.

datetime¶: aware datetime – The date, and time at which the video was captured.

num_frames¶: int – The total number of frames in the original video file.

fps¶: scalar – The framerate of the video in frames per second.

width¶: int – The width of the video.

height¶: int – The height of the video.

class video699.video.annotated.AnnotatedSampledVideo(uri)¶

Bases: video699.interface.VideoABC, collections.abc.Sized

A sample of a video file extracted from a dataset with XML human annotations.

Notes

It is possible to repeatedly iterate over all video frames.

Parameters:	uri (str) – The URI of the video file. The URI is unique in the dataset.

dirname¶: str – The pathname of the directory, where the frames, documents, and XML human annotations associated with the video are stored.

pathname¶: str – The full pathname of the directory, where the frames, documents, and XML human annotations associated with the video are stored.

filename¶: str – The filename of the video file.

num_frames¶: int – The total number of frames in the original video file.

fps¶: scalar – The framerate of the video in frames per second.

width¶: int – The width of the video.

height¶: int – The height of the video.

duration¶: timedelta – The elapsed time since the beginning of the video.

datetime¶: aware datetime – The date, and time at which the video was captured.

documents¶: dict of (str, AnnotatedSampledVideoDocument) – A map between PDF document filenames, and the documents associated with the video.

uri¶: string – The URI of the video file. The URI is unique in the dataset.

pathname

fps

width

height

datetime

uri

__iter__(self)¶

__len__(self)¶

Produces the number of video frames.

Returns:	length – The number of video frames.
Return type:	int

class video699.video.annotated._KeyRefAnnotations(key, similarity)¶

Bases: object

Human annotations describing a document page shown in a lit projection screen.

Parameters:

key (str) – An identifier of a page in a document. The identifier is unique in the video associated with the document.
similarity (str) –
The similarity between what is shown in the projection screen, and the document page. The following values are legal:
- full specifies that there is a 1:1 correspondence between what is shown in the projection screen, and the document page.
- incremental specifies that in a document attached to the ancestor video, a single logical page is split across multiple physical pages and incrementally uncovered; the slide and the frame correspond to the same logical page, but not the same physical page.

key¶: str – An identifier of a page in a document. The identifier is unique in the video associated with the document.

similarity¶

str – The similarity between what is shown in the projection screen, and the document page. The following values are legal:

full specifies that there is a 1:1 correspondence between what is shown in the projection screen, and the document page.
incremental specifies that in a document attached to the ancestor video, a single logical page is split across multiple physical pages and incrementally uncovered; the slide and the frame correspond to the same logical page, but not the same physical page.

class video699.video.annotated._ScreenAnnotations(coordinates, condition, keyrefs, vgg256)¶

Bases: object

Human annotations associated with a single lit projection screen in a frame of a video.

Parameters:

coordinates (ConvexQuadrangleABC) – A map between frame and screen coordinates.
condition (str) –
The condition of what is being shown in the screen. The following values are legal:
- pristine specifies that there is no significant degradation beyond photon noise.
- windowed specifies that a slide is being shown, but the slide does not cover the full screen.
- obstacle specifies that a part of the screen or the projector light is partially obscured by either a physical obstacle, or by a different GUI window.
keyrefs (dict of (str, _KeyRefAnnotations)) – A map between document page keys, and human annotations specifying the relationship between the projection screen, and the document pages.
vgg256 (VGG256Features) – 256-dimensional feature vectors obtained by feeding the screen image data into VGG ConvNets.

coordinates¶: ConvexQuadrangleABC – A map between frame and screen coordinates.

condition¶

str – The condition of what is being shown in the screen. The following values are legal:

pristine specifies that there is no significant degradation beyond photon noise.
windowed specifies that a slide is being shown, but the slide does not cover the full screen.
obstacle specifies that a part of the screen or the projector light is partially obscured by either a physical obstacle, or by a different GUI window.

keyrefs¶: dict of (str, _KeyRefAnnotations) – A map between document page keys, and human annotations specifying the relationship between the projection screen, and the document pages.

vgg256¶: VGG256Features – 256-dimensional feature vectors obtained by feeding the screen image data into VGG ConvNets.

class video699.video.annotated.AnnotatedSampledVideoScreen(frame, screen_index)¶

Bases: video699.interface.ScreenABC

A projection screen extracted from XML human annotations.

Parameters:	frame (FrameABC) – A video frame containing the projection screen. screen_index (int) – The index of the projection screen in the human annotations for the video frame. Screen indexing is zero-based, i.e. the first screen in the human annotations has index 0.

frame¶: FrameABC – A video frame containing the projection screen.

coordinates¶: ConvexQuadrangleABC – A map between frame and screen coordinates.

condition¶

str – The condition of what is being shown in the screen. The following values are legal:

pristine specifies that there is no significant degradation beyond photon noise.
windowed specifies that a slide is being shown, but the slide does not cover the full screen.
obstacle specifies that a part of the screen or the projector light is partially obscured by either a physical obstacle, or by a different GUI window.

vgg256¶: VGG256Features – 256-dimensional feature vectors obtained by feeding the screen image data into VGG ConvNets.

frame

coordinates

matching_pages(self)¶

Returns an iterable of pages related to the screen \(s\) based on human annotations.

Note

When a projection screen \(s\) shows a document page \(p\), we say that \(s\) fully matches \(p\) and we write \(s\approx p\).

When a single logical document page is split across several document pages \(p\) and a projection screen \(s\) shows the same logical page as \(p\), we say that \(s\) incrementally matches \(p\) and we write \(s\sim p\).

We say that \(s\) matches \(p\) the closest if and only if \(s\approx p\lor (\nexists p'(s\approx p') \land s \sim p)\).

Returns:

full_matches (iterable of AnnotatedSampledVideoDocumentPage) – An iterable of all document pages \(p\) that fully match \(s\).
incremental_matches (iterable of AnnotatedSampledVideoDocumentPage) – An iterable of all document pages \(p\) that incrementally match \(s\).
closest_matches (iterable of AnnotatedSampledVideoDocumentPage) – An iterable of all document pages \(p\) that match \(s\) the closest.

class video699.video.annotated.AnnotatedSampledVideoPageDetector¶

Bases: video699.interface.PageDetectorABC

A page detector that maps video screen to closest matching page using XML human annotations.

detect(self, frame, appeared_screens, existing_screens, disappeared_screens)¶

class video699.video.annotated.AnnotatedSampledVideoScreenDetector(conditions=('pristine', 'windowed', 'obstacle'), beyond_bounds=True)¶

Bases: video699.interface.ScreenDetectorABC

A screen detector that maps an annotated video frame to screens using XML human annotations.

Parameters:

conditions (iterable of str, optional) –
A set of admissible conditions of a screen. The following condition strings are legal:
- pristine specifies that there is no significant degradation beyond photon noise.
- windowed specifies that a slide is being shown, but the slide does not cover the full screen.
- obstacle specifies that a part of the screen or the projector light is partially obscured by either a physical obstacle, or by a different GUI window.
Screens with inadmissible conditions will not be detected. When unspecified, all conditions are admissible.
beyond_bounds (bool, optional) – Whether a screen may extend beyond the bounds of a video frame. When unspecified, a screen may extend beyond the bounds.

detect(self, frame)¶

video699.video.annotated.evaluate_event_detector(annotated_video, event_detector)¶

Processes a video using a screen event detector and counts successful trials.

A video file is processed using a screen event detector. When an annotated video frame is encountered, a trial takes place. A trial is successful if and only if:

the intersection of detected pages and the pages that match a pristine screen is non-empty for all pristine screens with matching pages, and
the number of additional detected pages is less than or equal to the number of pages that match the non-pristine screens the closest according to the human annotations.

Parameters:

annotated_video (AnnotatedSampledVideo) – An annotated video file.
event_detector (ScreenEventDetectorABC) – The screen event detector.

Returns:

num_successes (int) – The number of successful trials.
num_trials (int) – The number of trials.

video699.video.annotated¶

Module Contents¶

`video699.video.annotated`¶