# Posts Tagged ‘semantics

### Integrating metadata into the data model

Mathematical models define infinite precision real numbers and functions with infinite domains, whereas computer data objects contain finite amounts of information and must therefore be approximations to the mathematical objects that they represent. Several forms of scientific metadata serve to specify how computer data objects approximate mathematical objects and these are integrated into our data model. For example, missing data codes (used for fallible sensor systems) may be viewed as approximations that carry no information. Any value or sub-object in a VIS-AD data object may be set to the missing value. Scientists often use arrays for finite samplings of continuous functions, as, for example, satellite image arrays are finite sampling of continuous radiance fields. Sampling metadata, such as those that assign Earth locations to pixels, and those that assign real radiances to coded (e.g., 8-bit) pixel values, quantify how arrays approximate functions and are integrated with VIS-AD array data objects.

The integration of metadata into our data model has practical consequences for the semantics of computation and display. For example, we define a data type goes_image as an array of ir radiances indexed by lat_lon values. Arrays of this data type are indexed by pairs of real numbers rather than by integers. If goes_west is a data object of type goes_image and loc is a data object of type lat_lon then the expression goes_west[loc] is evaluated by picking the sample of goes_west nearest to loc. If loc falls outside the region of the Earth covered by goes_west pixels then goes_west[loc] evaluates to the missing value. If goes_east is another data object of type goes_image, generated by a satellite with a different Earth perspective, then the expression goes_west – goes_east is evaluated by resampling goes_east to the samples of goes_west (i.e., by warping the goes_east image) before subtracting radiances. In Earth regions where the goes_west and goes_east images do not overlap, their difference is set to missing values. Thus metadata about map projections and missing data contribute to the semantics of computations.

Metadata similarly contribute to display semantics. If  both goes_east and goes_west are selected for display, the system uses the sampling of their indices to co-register these two images in a common Earth frame of reference. The samplings of 2-D and 3-D array indices need not be Cartesian. For example, the sampling of lat_lon may define virtually any map projection. Thus data may be displayed in non-Cartesian coordinate systems.

### Hapax Semantic Clustering

To show the generic nature of the approach, we apply it at different levels of abstraction on case-studies written indifferent languages.

1. In the first case-study we analyze the core and the plugins of a large framework, the Moose re-engineering environment. This experiment focuses on the relation between architecture and semantics. It reveals, among other findings, four cases of duplicated code and a core functionality misplaced in one of the plug-ins.

2. The second case-study is the class MSEModel, which is one of the largest classes in Moose. This experiment applies our approach on a different level of abstraction to focus on more in-detail findings. It visualizes the relationship among methods of a large class, and reveals that the class should be split as it servers at least two different purposes.

3. The third case-study, the JEdit open-source Java editor, focuses the relationships among classes and proves the strength of our approach in identifying and labeling semantic concepts.

The following table summarizes the problem size of each case study. It lists the number of documents and terms in the vector-space-model, and the rank to which the vector space has been broken down with LSI. Moose and JEdit use classes as input documents, and MSEModel uses methods.

### Video Content Analysis and Adaptive Rate Allocation

In classical rate control, all GoPs are treated equally, and frame-level bit allocation is basedon the frame type and a complexity measure, sometimes with multiple passes over the video, but without considering the semantics of picture content. In the H.264/AVC reference encoder, the GoP borders are determined according to a predefined pattern of frames, and the same target bit rate is used for each GoP given the available channel rate. As a result,the video quality varies from GoP to GoP depending on the video content. The problem with this approach is that in some applications, e.g., wireless video, the total bit budget is not sufficient to encode the entire content at an acceptable quality. Video segments with high motion and/or small details may become unacceptable when all GoPs are encoded at the same low rate. Inter-GOP rate control schemes, that is variation of the target bitrate from GoP-to-GoP, have been proposed to offer uniform video quality over the entire video. For example, an optimal solution for the buffer constrained adaptive quantization problem is formulated.The rate-distortion characteristics of the encoded video are used to find the frame rate and quantization parameters that provide the minimum distortion under rate constraints. The minimization operation is done in an iterative manner so thatthe measured distortion is smaller than the previous iteration at each step. However, these methods do not consider the semantics of the video content either in GoP definition or in GoP target bitrate allocation.

As the available computing power at the encoders increases, so does the level of sophisti-cation of the encoders and their associated control techniques. By using appropriate content analysis, it is now possible to define GoPs according to shot boundaries, and allocate target bit rates to each GoP based on the shot type considering the “relevance” or “semantics” of each type of shot. Such a rate control scheme will be called “content-adaptive rate control.” In content-adaptive rate control, video will be encoded according to a pre-specified or user defined relevance-distortion policy. In effect, we accept a priori that some losses are goingto occur due to the high compression ratios needed, and we force these losses to occur in less relevant parts of the video content. We note that “relevance of the content” is highly context (domain) dependent. For example, in the context of a soccer game, the temporal video segments showing a goal event and the spatial segments around the ball are definitely more important than any other part of the video. There are a variety of other domains,such as other sports videos and broadcast news, where the relevance of the content can easily be classified. In content adaptive video coding, temporal segmentation policy used has a major effect on the overall efficiency and rate distribution among temporal segments. There exist techniques for automatically locating such content. A summarization of the available multimedia access technologies that support Universal Multimedia Access (UMA) is presented. Segmentation and summarization of audio-video content are discussed in detail and the transcoding techniques for such content are demon-strated.

Content adaptive rate allocation ideas have been introduced in the literature before.The input video is segmented and encoded as two streams for different relevance levels with “predetermined bit rates,” namely, the high target bitrate (highly relevant) and the low target bitrate (less relevant) streams. The less relevant shots are then encoded such that they are shown as still images at the receiving side and the more important shots are encoded at full quality. In this pioneering work, the decision to restrict the number ofthe relevance levels to two and the determination of the relative bit allocations are donein an ad-hoc manner. Quality of Service (QoS) is required for continuous playback to be guaranteed and low and high rates are determined by the client buffer size and the channel bandwidth. The server buffer size required is set afterwards, which effectively determines the pre-roll delay.

There are also techniques that divide the input video into segments by considering vari-ous statistics along these segments that affect the ease of coding without taking into accountany relevance issues. For example, MPEG-7 metadata are used for video transcoding for home networks. Concepts like “difficulty hints” and “motion hints” are described. Difficulty hints are a kind of metadata that denotes the encoding difficulty of the given content.The motion hints describe the motion un-compensability metadata, which contains infor-mation about the GOP structure, frame rate and bitrate control and also the search range metadata that reduces the complexity of the transcoding process. In this work, boundaries of the temporal segments of the content are determined by the points where the motionun-compensability metadata makes a peak and then the video is transcoded using the difficulty hints. Here, GOP size is varied according to the motion un-compensability metadata. A hybrid scaling algorithm using a quality metric based on the features of the human visual system is introduced in, which tries to make full utilization of the communication channel by scaling video in either temporal or spatial dimensions. In this work, frame rate ofthe encoded video is reduced at scenes where motion jitter is low (high temporal resolution)and all the frames are kept for scenes with high motion at the expense of reduced spatial resolution.