Posts Tagged ‘metadata

Integrating metadata into the data model

Mathematical models define infinite precision real numbers and functions with infinite domains, whereas computer data objects contain finite amounts of information and must therefore be approximations to the mathematical objects that they represent. Several forms of scientific metadata serve to specify how computer data objects approximate mathematical objects and these are integrated into our data model. For example, missing data codes (used for fallible sensor systems) may be viewed as approximations that carry no information. Any value or sub-object in a VIS-AD data object may be set to the missing value. Scientists often use arrays for finite samplings of continuous functions, as, for example, satellite image arrays are finite sampling of continuous radiance fields. Sampling metadata, such as those that assign Earth locations to pixels, and those that assign real radiances to coded (e.g., 8-bit) pixel values, quantify how arrays approximate functions and are integrated with VIS-AD array data objects.

The integration of metadata into our data model has practical consequences for the semantics of computation and display. For example, we define a data type goes_image as an array of ir radiances indexed by lat_lon values. Arrays of this data type are indexed by pairs of real numbers rather than by integers. If goes_west is a data object of type goes_image and loc is a data object of type lat_lon then the expression goes_west[loc] is evaluated by picking the sample of goes_west nearest to loc. If loc falls outside the region of the Earth covered by goes_west pixels then goes_west[loc] evaluates to the missing value. If goes_east is another data object of type goes_image, generated by a satellite with a different Earth perspective, then the expression goes_west – goes_east is evaluated by resampling goes_east to the samples of goes_west (i.e., by warping the goes_east image) before subtracting radiances. In Earth regions where the goes_west and goes_east images do not overlap, their difference is set to missing values. Thus metadata about map projections and missing data contribute to the semantics of computations.

Metadata similarly contribute to display semantics. If  both goes_east and goes_west are selected for display, the system uses the sampling of their indices to co-register these two images in a common Earth frame of reference. The samplings of 2-D and 3-D array indices need not be Cartesian. For example, the sampling of lat_lon may define virtually any map projection. Thus data may be displayed in non-Cartesian coordinate systems.

Tags : , , , , , , ,

Replica Creation and Replica Selection in Data Grid Service

Replica selection is interesting because it does not build on top of the core services, but rather relies on the functions provided by the replica management component described in the preceding section. Replica selection is the process of choosing a replica that will provide an application with data access characteristics that optimize a desired performance criterion, such as absolute performance (i.e. speed), cost, or security. The selected le instance may be local or accessed remotely. Alternatively the selection process may initiate the creation of a new replica whose performance will be superior to the existing ones.

Where replicas are to be selected based on access time, Grid information services can provide information about network  performance, and perhaps the ability to reserve network bandwidth, while the metadata repository can provide information about the size of the file. Based on this, the selector can rank all of the existing replicas to determine which one will yield the fastest data access time.  Alternatively, the selector can consult the same information sources to determine whether there is a storage system that would result in better performance if a replica was created on it.

A more general selection service may consider access to subsets of a fi le instance. Scientific experiments often produce large les containing data for many variables, time steps, or events, and some application processing may require only a subset of this data. In this case, the selection function may provide an application with a fi le instance that contains only the needed subset of the data found in the original file instance. This can obviously reduce the amount of data that must be accessed or moved.

This type of replica management has been implemented in other data-management systems. For example, STACS is often capable of satisfying requests from High Energy Physics applications by extracting a subset of data from a file instance. It does this using a complex indexing scheme that represents application metadata for the events contained within the file . Other mechanisms for providing similar function may be built on application metadata obtainable from self-describing file formats such as NetCDF or HDF.

Providing this capability requires the ability to invoke ltering or extraction programs that understand the structure of the fi le and produce the required subset of data. This subset becomes a fi le instance with its own metadata and physical characteristics, which are provided to the replica manager. Replication policies determine whether this subset is recognized as a new logical file (with an entry in the metadata repository and a fi le instance recorded in the replica catalog), or whether the fi le should be known only locally, to the selection manager.

Data selection with subsetting may exploit Grid-enabled servers, whose capabilities involve common operations such as reformatting data, extracting a subset, converting data for storage in a different  type of system, or transferring data directly to another storage system in the Grid. The utility of this approach has been demonstrated as part of the Active Data Repository. The subsetting function could also exploit the more general capabilities of a computational Grid such as that provided by Globus. This o ers the ability to support arbitrary extraction and processing operations on fi les as part of a data management activity.

Tags : , , , , , , , , , , , , , , , ,

METADATA

Metadata is the activity styles and constraints data of mobile databases. It consists of connectivity, availability, data access patterns and user-defined constraints. The roles of metadata include: 1. identify data user cares about (Domain): what data interests a user? 2. specify relative worth of data (Utility): what is its relative worth? 3. indicate power of mobile host (Priority):what’s my activity pattern? The focused subset of metadata includes connection duration, disconnection frequency, read frequency, tentative update frequency, commit rate, and commit delay.

Sliding window scheme is used to implement incremental metadata collection. A sliding window is a time frame with a predefined time interval, and the end point of the frame being the current local time. Window width is the time duration of the sliding window. Events in such sliding window include connection, disconnection, read, tentative updates and update commitment. Such events are logged in the time order that they occur. Since the capacity of a mobile host is limited, we cannot afford to storage them all from the very beginning. So the metadata collection algorithm is designed to log only the events in the time range of a sliding window. The log is called bounded window log. Sliding windows consume events as well as store them. When an event occurs, if the size of the sliding window has not reach its limit, the event is logged and its effect is reflected to the metadata. If the sliding window grows to its upper limit, one or more oldest events are retired and its effect is eliminated from the metadata when the new event is added. In this way, the metadata always remember the effects of events in the most recent bounded history.

The overhead to collect metadata for a replica is trivial. The main overhead is the cost to store the event and update metadata when an event occurs. There is no message cost since it happens totally locally.

Tags : , , , , , , ,

Out-of-Band Heap Metadata

The first requirement protects against use-after-free vulnerabilities with dangling pointers to free, not yet reallocated, memory. If the memory allocator uses freed memory for metadata, such as free list pointers, these allocator metadata can be interpreted as object fields, e.g. vtable pointers, when free memory is referenced through a dangling pointer. Memory allocator designers have considered using out-of-band metadata before, because attackers targeted in-band heap metadata in several ways: attacker controlled data in freed objects can be interpreted as heap metadata through double-free vulnerabilities, and heap based overflows can corrupt allocator metadata adjacent to heap-based buffers. If the allocator uses corrupt heap metadata during its linked list operations, attackers can write an arbitrary value to an arbitrary location.

Although out-of-band heap metadata can solve these problems, some memory allocators mitigate heap metadata corruption without resorting to this solution. For example, attacks corrupting heap metadata can be addressed by detecting the use of corrupted metadata with sanity checks on free list pointers before unlinking a free chunk or using heap canaries to detect corruption due to heap-based buffer overflows. In some cases, corruption can be prevented in the first place, e.g. by detecting attempts to free objects already in a free list. These techniques avoid the memory overhead of out-of-band metadata, but are insufficient for preventing use-after free vulnerabilities, where no corruption of heap metadata takes place.

An approach to address this problem in allocator designs reusing free memory for heap metadata is to ensure that these metadata point to invalid memory if interpreted as pointers by the application. Merely randomizing the metadata by XORing with a secret value may not be sufficient in the face of heap spraying. One option is setting the top bit of every metadata word to ensure it points to protected kernel memory, raising a hardware fault if the program dereferences a dangling pointer to heap metadata,while the allocator would flip the top bit before using the metadata. However, it is still possible that the attacker can tamper with the dangling pointer before dereferencing it. This approach may be preferred when modifying an existing allocator design, but for Cling, we chose to keep metadata out-of-band instead.

An allocator can keep its metadata outside deallocated memory using non-intrusive linked lists (next and prev pointers stored outside objects) or bitmaps. Non-intrusive linked lists can have significant memory overhead for small allocations, thus Cling uses a two-level allocation scheme where non-intrusive linked lists chain large memory chunks into free lists and small allocations are carved out of buckets holding objects of the same size class using bitmaps. Bitmap allocation schemes have been used successfully in popular memory allocators aiming for performance, so they should not pose an inherent performance limitation.

Tags : , , , , , , , , ,

Video Content Analysis and Adaptive Rate Allocation

In classical rate control, all GoPs are treated equally, and frame-level bit allocation is basedon the frame type and a complexity measure, sometimes with multiple passes over the video, but without considering the semantics of picture content. In the H.264/AVC reference encoder, the GoP borders are determined according to a predefined pattern of frames, and the same target bit rate is used for each GoP given the available channel rate. As a result,the video quality varies from GoP to GoP depending on the video content. The problem with this approach is that in some applications, e.g., wireless video, the total bit budget is not sufficient to encode the entire content at an acceptable quality. Video segments with high motion and/or small details may become unacceptable when all GoPs are encoded at the same low rate. Inter-GOP rate control schemes, that is variation of the target bitrate from GoP-to-GoP, have been proposed to offer uniform video quality over the entire video. For example, an optimal solution for the buffer constrained adaptive quantization problem is formulated.The rate-distortion characteristics of the encoded video are used to find the frame rate and quantization parameters that provide the minimum distortion under rate constraints. The minimization operation is done in an iterative manner so thatthe measured distortion is smaller than the previous iteration at each step. However, these methods do not consider the semantics of the video content either in GoP definition or in GoP target bitrate allocation.

As the available computing power at the encoders increases, so does the level of sophisti-cation of the encoders and their associated control techniques. By using appropriate content analysis, it is now possible to define GoPs according to shot boundaries, and allocate target bit rates to each GoP based on the shot type considering the “relevance” or “semantics” of each type of shot. Such a rate control scheme will be called “content-adaptive rate control.” In content-adaptive rate control, video will be encoded according to a pre-specified or user defined relevance-distortion policy. In effect, we accept a priori that some losses are goingto occur due to the high compression ratios needed, and we force these losses to occur in less relevant parts of the video content. We note that “relevance of the content” is highly context (domain) dependent. For example, in the context of a soccer game, the temporal video segments showing a goal event and the spatial segments around the ball are definitely more important than any other part of the video. There are a variety of other domains,such as other sports videos and broadcast news, where the relevance of the content can easily be classified. In content adaptive video coding, temporal segmentation policy used has a major effect on the overall efficiency and rate distribution among temporal segments. There exist techniques for automatically locating such content. A summarization of the available multimedia access technologies that support Universal Multimedia Access (UMA) is presented. Segmentation and summarization of audio-video content are discussed in detail and the transcoding techniques for such content are demon-strated.

Content adaptive rate allocation ideas have been introduced in the literature before.The input video is segmented and encoded as two streams for different relevance levels with “predetermined bit rates,” namely, the high target bitrate (highly relevant) and the low target bitrate (less relevant) streams. The less relevant shots are then encoded such that they are shown as still images at the receiving side and the more important shots are encoded at full quality. In this pioneering work, the decision to restrict the number ofthe relevance levels to two and the determination of the relative bit allocations are donein an ad-hoc manner. Quality of Service (QoS) is required for continuous playback to be guaranteed and low and high rates are determined by the client buffer size and the channel bandwidth. The server buffer size required is set afterwards, which effectively determines the pre-roll delay.

There are also techniques that divide the input video into segments by considering vari-ous statistics along these segments that affect the ease of coding without taking into accountany relevance issues. For example, MPEG-7 metadata are used for video transcoding for home networks. Concepts like “difficulty hints” and “motion hints” are described. Difficulty hints are a kind of metadata that denotes the encoding difficulty of the given content.The motion hints describe the motion un-compensability metadata, which contains infor-mation about the GOP structure, frame rate and bitrate control and also the search range metadata that reduces the complexity of the transcoding process. In this work, boundaries of the temporal segments of the content are determined by the points where the motionun-compensability metadata makes a peak and then the video is transcoded using the difficulty hints. Here, GOP size is varied according to the motion un-compensability metadata. A hybrid scaling algorithm using a quality metric based on the features of the human visual system is introduced in, which tries to make full utilization of the communication channel by scaling video in either temporal or spatial dimensions. In this work, frame rate ofthe encoded video is reduced at scenes where motion jitter is low (high temporal resolution)and all the frames are kept for scenes with high motion at the expense of reduced spatial resolution.

Tags : , , , , , , , , , , , , , , , , , , , , ,

Techniques that guarantee absence of dangling pointer references

SafeC  is one of the earliest systems to detect (with high probability) all memory errors including all dangling pointer errors in C programs. SafeC creates a unique capability (a 32-bit value) for each memory allocation and puts it in a Global Capability Store (GCS). It also stores this capability with the meta-data of the returned pointer. This metadata gets propagated with pointer copying, arithmetic. Before every access via a pointer, its capability is checked for membership in the global capability store. A free removes the capability from the global capability store and all dangling pointer accesses are detected. FisherPatil and Xuet propose improvements to the basic scheme by eliminating the need for fat-pointers and storing the metadata separately from the pointer for better backwards compatibility. To be able to track meta-data they disallow arbitrary casts in the program, including casts from pointers to integers and back. Their overheads for detecting only the temporal errors on allocation intensive Olden benchmarks are much less than ours – about 56% on average (they donot report overheads for system software).

However, the GCS can consume significant memory: they report increases in (physical and virtual) memory consumption of factors of 1.6x – 4x for different benchmarks sets. For servers in particular, we believe that such significant increases in memory consumption would be a serious limitation.

Our approach to provides better backwards compatibility: we allow arbitrary casts including casts from pointers to integers and back. Furthermore,our approach uses the memory management unit to do a hardware runtime check and does not incur any per access penalty. Our overheads in our experiments on system software,with low allocation frequency, are negligible in most cases and less than 15%  in all the cases. However, for programs that perform frequent memory allocations and deallocations like the Olden benchmarks, our overheads are significantly worse (up to 11x slowdown). It would be an interesting experiment to see if a combination of these two techniques can work better for general programs.

Tags : , , , , , , , ,