Application Development

Replica Creation and Replica Selection in Data Grid Service

Replica selection is interesting because it does not build on top of the core services, but rather relies on the functions provided by the replica management component described in the preceding section. Replica selection is the process of choosing a replica that will provide an application with data access characteristics that optimize a desired performance criterion, such as absolute performance (i.e. speed), cost, or security. The selected le instance may be local or accessed remotely. Alternatively the selection process may initiate the creation of a new replica whose performance will be superior to the existing ones.

Where replicas are to be selected based on access time, Grid information services can provide information about network  performance, and perhaps the ability to reserve network bandwidth, while the metadata repository can provide information about the size of the file. Based on this, the selector can rank all of the existing replicas to determine which one will yield the fastest data access time.  Alternatively, the selector can consult the same information sources to determine whether there is a storage system that would result in better performance if a replica was created on it.

A more general selection service may consider access to subsets of a fi le instance. Scientific experiments often produce large les containing data for many variables, time steps, or events, and some application processing may require only a subset of this data. In this case, the selection function may provide an application with a fi le instance that contains only the needed subset of the data found in the original file instance. This can obviously reduce the amount of data that must be accessed or moved.

This type of replica management has been implemented in other data-management systems. For example, STACS is often capable of satisfying requests from High Energy Physics applications by extracting a subset of data from a file instance. It does this using a complex indexing scheme that represents application metadata for the events contained within the file . Other mechanisms for providing similar function may be built on application metadata obtainable from self-describing file formats such as NetCDF or HDF.

Providing this capability requires the ability to invoke ltering or extraction programs that understand the structure of the fi le and produce the required subset of data. This subset becomes a fi le instance with its own metadata and physical characteristics, which are provided to the replica manager. Replication policies determine whether this subset is recognized as a new logical file (with an entry in the metadata repository and a fi le instance recorded in the replica catalog), or whether the fi le should be known only locally, to the selection manager.

Data selection with subsetting may exploit Grid-enabled servers, whose capabilities involve common operations such as reformatting data, extracting a subset, converting data for storage in a different  type of system, or transferring data directly to another storage system in the Grid. The utility of this approach has been demonstrated as part of the Active Data Repository. The subsetting function could also exploit the more general capabilities of a computational Grid such as that provided by Globus. This o ers the ability to support arbitrary extraction and processing operations on fi les as part of a data management activity.

Tags : , , , , , , , , , , , , , , , ,

Professional certification in computer security

Certification programs in computer security have been provided by government agencies, professional organizations, and private corporations. By examining the certification requirements set by these certification bodies, I hope to identify common themes, which will provide useful insights into the design of computer security curriculum. The identified certification programs include the Certified Information Systems Auditor (CISA) program, the Certified Information Systems Security Professional (CISSP) program, the SNAP program,and the SAGE program.

The Certified Information Systems Auditor (CISA(r)) program was established in 1978 by the Information Systems Audit and Control Association (ISACA). The CISA certification focuses on five domain areas: Information Systems Audit Standards and Practices and Information Systems Security and Control Practices (8%); Information Systems Organization and Management (15%); Information Systems Process (22%); Information Systems Integrity,Confidentiality, and Availability (29%); and Information Systems Development, Acquisition, and Maintenance (26%).

The Certified Information Systems Security Professional (CISSP) program was created by the International Information Systems Security Certification Consortium (ISC), which is supported by Computer Security Institute (CSI), Information Systems Security Association(ISSA), Canadian Information Processing Society (CIPS), and other industry presences(Power 1997). CISSP certification requires the participants to pass the CISSP exam, which consists of questions covering 10 test domains: Access Control Systems & Methodology; Computer Operations Security; Cryptography; Application & Systems Development; Business Continuity & Disaster Recovery Planning; Telecommunications & Network Security; Security Architecture &Models; Physical Security; Security Management Practices; Law, Investigations& Ethics.

The SNAP9 program administered by GIAC of the SANS Institute is designed to serve the people who are or will be responsible for managing and protecting important information systems and networks. The GIAC program consists of a Level One Module covering the basics of information security followed by advanced and targeted Level Two Subject Area Modules. The Level One module consists of 18 elements: Information Assurance Foundations; IP Concepts; IP Behavior; Internet Threat; Computer Security Policies.

Tags : , , , , , , , , ,

Shortest Path Algorithm for Multicast Routing in Multimedia Applications

A new heuristic algorithm is proposed for constructing multicast tree for multimedia and real-time applications. The tree is used to concurrently transmit packets from source to multiple destinations such that exactly one copy of any packet traverses the links of the multicast tree. Since multimedia applications require some Quality of Service, QoS, a multicast tree is needed to satisfy two main goals, the minimum path cost from source to each destination (Shortest Path Tree) and a certain end-to-end delay constraint from source to each destination. This problem is known to be NP-Complete. The proposed heuristic algorithm solves this problem in polynomial time and gives near optimal tree. We first mention some related work in this area then we formalize the problem and introduce the new algorithm with its pseudo code and the proof of its complexity and its correctness by showing that it always finds a feasible tree if one exists. Other heuristic algorithms are examined and compared with the proposed algorithm via simulation.

Handling group communication is a key requirement for numerous applications that have one source sends the same information concurrently to multiple destinations. Multicast routing refers to the construction of a tree rooted at the source and spanning all destinations. Generally, there are two types of such a tree, the Steiner tree and the shortest path tree. Steiner tree or group-shared tree tends to minimize the total cost of the resulting tree, this is an NP-Complete problem. Shortest path tree or source-based trees tends to minimize the cost of each path from source to any destination, this can be achieved in polynomial time by using one of the two famous algorithms of Bellman and Dijkstra and pruning the undesired links. Recently, with the rapid evolution of multimedia and real-time applications like audio/video conferencing, interactive distributed games and real-time remote control system, certain QoS need to be guaranteed in the resulted tree. One such QoS, and the most important one is the end-to-end delay between source and each destination, where the information must be sent within a certain delay constraint D. By adding this constraint to the
original problem of multicast routing, the problem is reformulated and the multicast tree should be either delay constrained Steiner tree, or delay-constrained shortest path tree. Delay constrained Steiner tree is an NP-Complete problem, several heuristics are introduced for this problem each trying to get near optimal tree cost, without regarding to the cost of each individual path for each destination. Delay constrained shortest path tree is also an NP-Complete problem. An optimal algorithm for this problem is presented, but its execution time is exponential and used only for comparison with other algorithms. Heuristic for this problem is presented, which tries to get a near optimal tree from the point of view of each destination without regarding the total cost of the tree. An exhaustive comparison between the previous heuristics for the two problems can be found. We investigate the problem of delay constrained shortest path tree since it is appropriate in some applications like Video on Demand (VoD), where the multicast group has a frequent change, and every user wants to get his information in the lowest possible cost for him without regarding the total cost of the routing tree. Also shortest path tree always gives average cost per destination less than Steiner tree. We present a new heuristic algorithm that finds the required tree in polynomial time.

Tags : , , , , , , , ,

Specialized Parallel Relational Operators

Some algorithms for relational operators are especially appropriate for parallel execution, either because they minimize data flow, or because they better tolerate data and execution skew. Improved algorithms have been found for most of the relational operators. The evolution of join operator algorithms is sketched here as an example of these improved algorithms.

Recall that the join operator combines two relations, A and B, to produce a third relation containing all tuple pairs from A and B with matching attribute values. The conventional way of computing the join is to sort both A and B into new relations ordered by the join attribute. These two intermediate relations are then compared in sorted order, and matching tuples are inserted in the output stream. This algorithm is called sort-merge join.

Many optimizations of sort-merger join are possible, but since sort has execution cost nlog(n), sort-merge join has an nlog(n) execution cost. Sort-merge join works well in a parallel dataflow environment unless there is data skew. In case of data skew, some sort partitions may be much larger than others. This in turn creates execution skew and limits speedup and scaleup. These skew problems do not appear in centralized sort-merge joins.

Hash-join is an alternative to sort-merge join. It has linear execution cost rather than nlog(n) execution cost, and it is more resistant to data skew. It is superior to sort-merge join unless the input streams are already in sorted order. Hash join works as follows. Each of the relations A and B are first hash partitioned on the join attribute. A hash partition of relation A is hashed into memory. The corresponding partition of table relation B is scanned, and each tuple is compared against the main-memory hash table for the A partition. If there is a match, the pair of tuples are sent to the output stream. Each pair of hash partitions is compared in this way.

The hash join algorithm breaks a big join into many little joins. If the hash function is good and if the data skew is not too bad, then there will be little variance in the hash bucket size. In these cases hash-join is a linear-time join algorithm with linear speedup and scaleup. Many optimizations of the parallel hash-join algorithm have been discovered over the last decade. In pathological skew cases, when many or all tuples have the same attribute value, one bucket may contain all the tuples. In these cases no algorithm is known to speedup or scaleup.

Tags : , , , , , , , , , , , , , , ,

Direct Cryptanalytic Attacks

During a cryptanalytic attack the adversary observes the output and tries to gain any information about the inner state or future output of the generator. Many RNGs (Random Number Generators) use cryptographic primitives like hash functions (e.g. SHA-1 or MD5) or block ciphers (DES,Triple-DES, AES) to prevent this kind of attacks. The underlying assumption is that the cryptographic security of the primitives transfers to the generators which employ them.Generally, the con dence into the security of this primitives is based only partially on mathematical analysis but mainly on empirical results and statistical tests. Since most of the applications that apply cryptographic RNGs rely on those primitives, we may have con dence in their security as well.

Nevertheless, it is not advisable to blindly trust generators that are built on cryptographic primitives as we will see by the example of the Kerberos 4 session key generator. The speci c method of employing the primitives has a main impact on the security of the generator as well. The Kerberos 4 generator produces a 56-bit key fora DES block cipher by two successive calls of the UNIX random function which uses onlya 32 bit key. The random function is seeded every time a key is requested. Consequently,the strength of the encryption and, thus, the resistance against cryptanalytic attacks is reduced from 56 to 32 bits. It still takes about 6 hours on a DEC Alpha to gain the proper key of a plain text-cipher text pair by brute force, but we see that the 56 bit strength of the encryption is only an illusion. It is the weakest link in the chain that counts.

Tags : , , , , , , , , ,

Why signing XML documents is different ?

Why relying on XML for solving the “what you see is what you sign” problem? Our ideas can be summarized in two points:

  1. If a document to be signed is either not well-formed in the sense of XML, or not valid in the sense of its accompanying schema, or both, than it must strictly be assumed that the document has been manipulated. In consequence, it has to be dropped, and the user has to notified.
  2. A smart card application can extract certain content items for dis-play on the smart card reader¬from a structured and formally described document. The extraction and display operations are fully controlled by the tamper-proof smart card—which is the same environment that generates the digital signature.

The fundamental property of XML documents is wellformedness. Ac-cording to the XML specification every XML processing entity has to check and assert this property. Regarding digital signing wellformedness is important, since it ensures the uniqueness of the XML documents’ interpretation. Wellformedness also ensures the usage of proper Unicode characters and the specification of their encoding. This is also very important regarding digital signatures, since character set manipulation can be used to perform “what you see is what sign” attacks.

Validity is a much more restrictive property of XML documents com-pared to wellformedness. A smart card which checks validity of XML documents with respect to a given schema before signing ensures due to the tamper resistance of the smart card that only certain types of XML documents are signed. Consider for example a smart card which contains your private key, but only signs XML documents which are valid with respect to a purchase order schema. You could give this card to your secretary being sure, that nothing else than purchase order is signed using your signature. Using additional constrains in the schema, e.g. the restriction of the maxi-mum amount to 100 Euro, eliminates the last chance of misusage.

When operated in a class 3 card reader (i.e. a card reader including a dis-play and a keypad) the card can display selected content and request user confirmation. This finally solves the “what you see is what you sign” problem. Obviously, XML processing is not an easy task to perform on resource-constraint SmartCards. The following table therefore summarizes the challenging XML properties and the resulting opportunities for improving the signing process.

Tags : , , , , , , , ,

Reducing Multi Block Queries to Single Block

The technique described in this section shows how under some conditions, it is possible to collapse a multi-block SQL query into a single block SQL query.

1. Merging Views

Let us consider a conjunctive query using SELECT ANY. If one or more relations in the query are views, but each is defined through a conjunctive query, then the view definitions can simply be “unfolded” to obtain a single block SQL query. For example, if a query Q = Join(R,V) and view V = Join(S,T), then the query Q can be unfolded to Join(R,Join(S,T)) and may be freely reordered. Such a step may require some renaming of the variables in the view definitions.

Figure 1. Group By and Join

Unfortunately, this simple unfolding fails to work when the views are more complex than simple SPJ queries. When one or more of the views contain SELECT DISTINCT, transformations to move or pull up DISTINCT need to be careful to preserve the number of duplicates correctly. More generally, when the view contains a group by operator, unfolding requires the ability to pull-up the group-by operator and then to freely reorder not only the joins but also the group-by operator to ensure optimality. In particular, we are given a query such as the one in Fig. 1(b) and we are trying to consider how we can transform it in a form such as Fig. 1(a) so that R1 and R2 may be freely reordered.

2. Merging Nested Subqueries

Consider the following example of a nested query from where Emp# and Dept# are keys of the corresponding relations:

SELECT Emp.Name

FROM Emp

WHERE Emp.Dept# IN

SELECT Dept.Dept# FROM Dept

WHERE Dept.Loc=‘Denver’

AND Emp.Emp# = Dept.Mgr

If tuple iteration semantics are used to answer the query, then the inner query is evaluated for each tuple of the Dept relation once.An obvious optimization applies when the inner query block contains no variables from the outer query block (uncorrelated).In such cases, the inner query block needs to be evaluated only once. However, when there is indeed a variable from the outer block, we say that the query blocks are correlated. For example,in the query above, Emp.Emp# acts as the correlated variable.Some have identified techniques to unnest a correlated nested SQL query and “flatten” it to a single query. For example, the above nested query reduces to:

SELECT E.Name

FROM Emp E, Dept D

WHERE E.Dept# = D.Dept#

AND D.Loc = ‘Denver’ AND E.Emp# = D.Mgr

The complexity of the problem depends on the structure of the nesting, i.e., whether the nested subquery has quantifiers (e.g., ALL, EXISTS), aggregates or neither. In the simplest case, of which the above query is an example, observed that the tuple semantics can be modeled as Semijoin(Emp,Dept, Emp.Dept# = Dept.Dept#). Once viewed this way, it is not hard to see why the query may be merged since:

Semijoin(Emp, Dept, Emp.Dept# = Dept. Dept#) =Project(Join(Emp, Dept), Emp.*)

Where Join(Emp, Dept) is on the predicate Emp.Dept# =Dept. Dept# . The second argument of the Project operator indicates that all columns of the relation Emp must be retained. The problem is more complex when aggregates are present in the nested subquery, as in the example below from since merging query blocks now requires pulling up the aggregation without violating the semantics of the nested query:

SELECT  Dept.name FROM Dept WHERE Dept.num-of-machines ≥(SELECT COUNT(Emp.*) FROM Emp WHERE Dept.name= Emp.Dept_name)

It is especially tricky to preserve duplicates and nulls. To appreciate the subtlety, observe that if for a specific value of Dept.name (say d), there are no tuples with a matching Emp.Dept_name, i.e., even if the predicate Dept.name=Emp.dept_name fails, then there is still an output tuple for the Dept tuple d. However, if we were to adopt the transformation used in the first query of this section, then there will be no output tuple for the dept d since the join predicate fails. Therefore, in the presence of aggregation, we must preserve all the tuples of the outer query block by a left outer join. In particular, the above query can be correctly transformed to:

SELECT Dept.name FROM Dept LEFT OUTER JOIN Emp

ON (Dept.name= Emp.dept_name )

GROUP BY Dept.name

HAVING Dept. num-of-machines < COUNT (Emp.*)

Thus, for this class of queries the merged single block query has outer joins. If the nesting structure among query blocks is linear, then this approach is applicable and transformations produce a single block query that consists of a linear sequence of joins and outer joins.

Tags : , , , , , , , , , , ,

METADATA

Metadata is the activity styles and constraints data of mobile databases. It consists of connectivity, availability, data access patterns and user-defined constraints. The roles of metadata include: 1. identify data user cares about (Domain): what data interests a user? 2. specify relative worth of data (Utility): what is its relative worth? 3. indicate power of mobile host (Priority):what’s my activity pattern? The focused subset of metadata includes connection duration, disconnection frequency, read frequency, tentative update frequency, commit rate, and commit delay.

Sliding window scheme is used to implement incremental metadata collection. A sliding window is a time frame with a predefined time interval, and the end point of the frame being the current local time. Window width is the time duration of the sliding window. Events in such sliding window include connection, disconnection, read, tentative updates and update commitment. Such events are logged in the time order that they occur. Since the capacity of a mobile host is limited, we cannot afford to storage them all from the very beginning. So the metadata collection algorithm is designed to log only the events in the time range of a sliding window. The log is called bounded window log. Sliding windows consume events as well as store them. When an event occurs, if the size of the sliding window has not reach its limit, the event is logged and its effect is reflected to the metadata. If the sliding window grows to its upper limit, one or more oldest events are retired and its effect is eliminated from the metadata when the new event is added. In this way, the metadata always remember the effects of events in the most recent bounded history.

The overhead to collect metadata for a replica is trivial. The main overhead is the cost to store the event and update metadata when an event occurs. There is no message cost since it happens totally locally.

Tags : , , , , , , ,

Building Personal Ethical Decision Making Abilities

Analyzing case studies provides an excellent mechanism for building practical ethical decision making abilities. Bynum and Rogerson suggest a multistaged approach to case study analysis:

Clearly, a person’s key ethical principles will vary based on such factors as fundamental ethical approach(for example, consequentialism or virtue ethics) and can change depending on the normative values of one’s culture. Although the specific ethical issues raised by a case will depend on the combination of these ethical principles and the case in question, we have found the following generic questions to be useful in building ethical decision-making expertise across a wide range of cases and philosophies:

  1. Does the research aim to protect a specific population,and if so, which population (for example, the owners of infected hosts, the victims of secondary attacks, or the general Internet user)?
  2. Can studying malicious behavior achieve multiple simultaneous benefits to society (for example, developing new defenses while aiding investigation of criminal acts and assisting victimized network sites)?
  3. Who benefits more from publication of research findings, and in what order (for example, victims of criminal acts, the researchers themselves, or the criminals perpetrating computer crimes)?
  4. Is there another way to accomplish the desired research results?
  5. What is the safest way to disseminate research results without risking improper use by individuals who might not share the researchers’ ethical standards?

Tags : , , , , , ,

Out-of-Band Heap Metadata

The first requirement protects against use-after-free vulnerabilities with dangling pointers to free, not yet reallocated, memory. If the memory allocator uses freed memory for metadata, such as free list pointers, these allocator metadata can be interpreted as object fields, e.g. vtable pointers, when free memory is referenced through a dangling pointer. Memory allocator designers have considered using out-of-band metadata before, because attackers targeted in-band heap metadata in several ways: attacker controlled data in freed objects can be interpreted as heap metadata through double-free vulnerabilities, and heap based overflows can corrupt allocator metadata adjacent to heap-based buffers. If the allocator uses corrupt heap metadata during its linked list operations, attackers can write an arbitrary value to an arbitrary location.

Although out-of-band heap metadata can solve these problems, some memory allocators mitigate heap metadata corruption without resorting to this solution. For example, attacks corrupting heap metadata can be addressed by detecting the use of corrupted metadata with sanity checks on free list pointers before unlinking a free chunk or using heap canaries to detect corruption due to heap-based buffer overflows. In some cases, corruption can be prevented in the first place, e.g. by detecting attempts to free objects already in a free list. These techniques avoid the memory overhead of out-of-band metadata, but are insufficient for preventing use-after free vulnerabilities, where no corruption of heap metadata takes place.

An approach to address this problem in allocator designs reusing free memory for heap metadata is to ensure that these metadata point to invalid memory if interpreted as pointers by the application. Merely randomizing the metadata by XORing with a secret value may not be sufficient in the face of heap spraying. One option is setting the top bit of every metadata word to ensure it points to protected kernel memory, raising a hardware fault if the program dereferences a dangling pointer to heap metadata,while the allocator would flip the top bit before using the metadata. However, it is still possible that the attacker can tamper with the dangling pointer before dereferencing it. This approach may be preferred when modifying an existing allocator design, but for Cling, we chose to keep metadata out-of-band instead.

An allocator can keep its metadata outside deallocated memory using non-intrusive linked lists (next and prev pointers stored outside objects) or bitmaps. Non-intrusive linked lists can have significant memory overhead for small allocations, thus Cling uses a two-level allocation scheme where non-intrusive linked lists chain large memory chunks into free lists and small allocations are carved out of buckets holding objects of the same size class using bitmaps. Bitmap allocation schemes have been used successfully in popular memory allocators aiming for performance, so they should not pose an inherent performance limitation.

Tags : , , , , , , , , ,