Posts Tagged ‘IT services

Computer Network Routing with a Fuzzy Neural Network

As more individuals transmit data through a computer network, the quality of service received by the users begins to degrade. A major aspect of computer networks that is vital to quality of service is data routing. A more effective method for routing data through a computer network can assist with the new problems being encountered with today’s growing networks. Effective routing algorithms use various techniques to determine the most appropriate route for transmitting data. Determining the best route through a wide area network (WAN), requires the routing algorithm to obtain information concerning all of the nodes, links, and devices present on the network. The most relevant routing information involves various measures that are often obtained in an imprecise or inaccurate manner, thus suggesting that fuzzy reasoning is a natural method to employ in an improved routing scheme. The neural network is deemed as a suitable accompaniment because it maintains the ability to learn in dynamic situations.

Once the neural network is initially designed, any alterations in the computer routing environment can easily be learned by this adaptive artificial intelligence method. The capability to learn and adapt is essential in today’s rapidly growing and changing computer networks. These techniques, fuzzy reasoning and neural networks, when combined together provide a very effective routing algorithm for computer networks. Computer simulation is employed to prove the new fuzzy routing algorithm outperforms the Shortest Path First (SPF) algorithm in most computer network situations. The benefits increase as the computer network migrates from a stable network to a more variable one. The advantages of applying this fuzzy routing algorithm are apparent when considering the dynamic nature of modern computer networks.

Applying artificial intelligence to specific areas of network management allows the network engineer to dedicate additional time and effort to the more specialized and intricate details of the system. Many forms of artificial intelligence have previously been introduced to network management; however, it appears that one of the more applicable areas, fuzzy reasoning, has been somewhat overlooked. Computer network managers are often challenged with decision-making based on vague or partial information. Similarly, computer networks frequently perform operational adjustments based on this same vague or partial information. The imprecise nature of this information can lead to difficulties and inaccuracies when automating network management using currently applied artificial intelligence techniques. Fuzzy reasoning will allow this type of imprecise information to be dealt with in a precise and well-defined manner, providing a more flawless method of automating the network management decision making process.

The objective of this research is to explore the use of fuzzy reasoning in one area of network management, namely the routing aspect of configuration management. A more effective method for routing data through a computer network needs to be discovered to assist with the new problems being encountered on today’s networks. Although traffic management is only one aspect of  configuration management, at this time it is one of the most visible networking issues. This becomes apparent as consideration is given to the increasing number of network users and the tremendous growth driven by Internet-based multimedia applications. Because of the number of users and the distances between WAN users, efficient routing is more critical in wide area networks than in LANs (also, many LAN architectures such as token ring do not allow any flexibility in the nature of message passing). In order to determine the best route over the WAN, it is necessary to obtain information concerning all of the nodes, links, and LANs present
in the wide area network. The most relevant routing information involves various measures regarding each link. These measures include the distance a message will travel, bandwidth available for transmitting that message (maximum signal frequency), packet size used to segment the message (size of the data group being sent), and the likelihood of a link failure. These are often measured in an imprecise or inaccurate manner, thus suggesting that fuzzy reasoning is a natural method to employ in an improved routing scheme.

Utilizing fuzzy reasoning should assist in expressing these imprecise network measures; however, there still remains the massive growth issue concerning traffic levels. Most routing algorithms currently being implemented as a means of transmitting data from a source node to a destination node cannot effectively handle this large traffic growth. Most network routing methods are designed to be efficient for a current network situation; therefore, when the network deviates from the original situation, the methods begin to lose efficiency. This suggests that an effective routing method should also be capable of learning how to successfully adapt to network growth. Neural networks are extremely capable of adapting to system changes, and thus will be applied as a second artificial intelligence technique to the proposed routing method in this research. The proposed routing approach incorporates fuzzy reasoning in order to prepare a more accurate assessment of the network’s traffic conditions, and hence provide a faster, more reliable, or more efficient route for data exchange. Neural networks will be incorporated into the routing method as a means for the routing method to adapt and learn how to successfully handle network traffic growth. The combination of these two tools is expected to produce a more effective routing method than is currently available.

In order to achieve the primary objective of more efficient routing, several minor objectives also need to be accomplished. A method of data collection is needed throughout the different phases of the study. Data collection will be accomplished through the use of simulation methods; therefore, a simulation model must be accurately designed before proceeding with experimenting or analysis. Additional requirements include building and training the neural network and defining the fuzzy system. The objective of this research is to demonstrate the effective applicability of fuzzy reasoning to only one area of network management, traffic routing.

Tags : , , , , , , , , , , , ,

Reasons for not using assembly code

There are so many disadvantages and problems involved in assembly programming that it is advisable to consider the alternatives before deciding to use assembly code for a particular task. The most important reasons for not using assembly programming are:

1. Development time : Writing code in assembly language takes much longer time than in a high level language.

2. Reliability and security : It is easy to make errors in assembly code. The assembler is not checking if the calling conventions and register save conventions are obeyed. Nobody is checking for you if the number of PUSH and POP instructions is the same in
all possible branches and paths. There are so many possibilities for hidden errors in assembly code that it affects the reliability and security of the project unless you have a very systematic approach to testing and verifying.

3. Debugging and verifying : Assembly code is more difficult to debug and verify because there are more possibilities for errors than in high level code.

4. Maintainability : Assembly code is more difficult to modify and maintain because the language allows unstructured spaghetti code and all kinds of dirty tricks that are difficult for others to understand. Thorough documentation and a consistent programming style is needed.

5. System code can use intrinsic functions instead of assembly : The best modern C++ compilers have intrinsic functions for accessing system control registers and other system instructions. Assembly code is no longer needed for device drivers and other system code when intrinsic functions are available.

6. Application code can use intrinsic functions or vector classes instead of assembly: The best modern C++ compilers have intrinsic functions for vector operations and other special instructions that previously required assembly programming. It is no longer necessary to use old fashioned assembly code to take advantage of the Single-Instruction-Multiple-Data (SIMD)  instructions.

7. Portability: Assembly code is very platform-specific. Porting to a different platform is difficult. Code that uses intrinsic functions instead of assembly are portable to all x86 and x86-64 platforms.

8. Compilers have been improved a lot in recent years : The best compilers are now better than the average assembly programmer in many situations.

9. Compiled code may be faster than assembly code because compilers can make inter-procedural optimization and whole-program optimization : The assembly programmer usually has to make well-defined functions with a well-defined call interface that obeys all calling conventions in order to make the code testable and verifiable. This prevents many of the optimization methods that compilers use, such as function inlining, register allocation, constant propagation, common subexpression elimination across functions, scheduling across functions, etc. These advantages can be obtained by using C++ code with intrinsic functions instead of
assembly code.


Tags : , , , , , , , , ,

Fabric Shortest Path First

FSPF (Fabric Shortest Path First) is a Link State path selection protocol, similar to OSPF, which is an Interior Gateway Protocol (IGP) widely used in IP networks.This protocol keeps track of the state of the links on all switches in the Fabric. It also associates a cost with eachlink. The protocol computes paths from a switch to all the other switches in the fabric, by adding the cost of allthe links traversed by the path, and choosing the path that minimizes the cost. In order for the protocol to work,the cost must be a positive integer number. The collection of link states (including cost) of all the switches in a fabric constitutes the topology database.

FSPF has four major components:

• A Hello protocol, used to establish connectivity with a neighbor switch, to establish the identity of the neighborswitch, and to exchange FSPF parameters and capabilities.

• A replicated topology database, with the protocols and mechanisms to keep the databases synchronizedacross the fabric.

• A path computation algorithm.

• A routing table update.

The topology database synchronization in turn consists of two major components: an initial database synchronization,and an update mechanism. The initial database synchronization is used when a switch is booted, or whenan Inter Switch Link (ISL) comes up. The update mechanism is used in two circumstances:

• When there is a link state change, for example an ISL going down or coming up.

• On a periodic basis, to prevent switches from deleting topology information from the database.

The path computation algorithm can be any algorithm that yields a minimum cost path. A path selection protocol requires an addressing scheme to uniquely identify every switch in the Fabric. FSPF supports the FC-SW addressing scheme, with a 24-bit Fibre Channel address subdivided in three parts, with 8bits used as switch identifier. However, FSPF uses a 32-bit field as a switch identifier, so that other addressing schemes could be supported as well in the future.

Three messages are defined: Hello (HLO), Link State Update (LSU) and Link State Acknowledgment (LSA). The message formats are listed below. All the messages carry a Protocol Version field, to insure backward compatibilitywith future versions, and a command field to discriminate between messages, ‘Reserved’ fields should be set to 0 by the sender. Undefined bits in bitmap fields should be set to 0 by the sender.

Currently defined commands are:

Table 1, Commands

FSPF Header

The format of the messages header is:

Table 2, FSPF Header


Version :  The current version of FSPF, which is 2.

Section ID :  Identifies a Section, that is a set of switches that share an identical topology database.It is used for hierarchical routing. It should be set to 0 for a non-hierarchical fabric.

Authentication Type :  The type of authentication to be used in the ‘Authentication’ field. It should be set to 0for no authentication.

Originating Domain ID :  The ID of the switch that transmitted the message.

Authentication :  The Authentication string. It should be 0 if Authentication Type is 0.

Hello Message Format

The format of the Hello message is:

Table 3, Hello Message


Options :  Bit map that indicates the options supported by the sending switch. An option is supported if the corresponding bit is set to 1. All other bits should be 0. Unicast routing issupported by default, so it does not require an Option bit.

Hello Interval :  The interval in seconds between two consecutive Hello messages that the sending switch will transmit for the life of the adjacency.

Dead Interval :  The maximum interval, in seconds, that the sending switch will wait for two consecutive Hello messages from the neighbor switch, before declaring the adjacency down.

Recipient Domain ID :  The ID of the switch that the Hello message is transmitted to.

Originating Port ID :  The port ID that the Hello message is transmitted to.

Tags : , , , , , , , , , ,

More Efficient Distance Vector Routing Algorithm

A more efficient distance vector routing algorithm(EDVA) for computer networks is presented. EDVA is based on enhancements to two classes of distance vector algorithms: the path finding algorithms that report complete path information incrementally, and the diffusing update algorithm (DUAL) which is used in Cisco’s Enhanced Interior Gateway Routing Protocol (EIGRP) and is based on diffusing computations for inter nodal synchronization. EDVA operates by specifying a distance, a feasible distance, a predecessor and diffusing hops toeach destination. This information is used together withdiffusing computations of limited span to avoid routing table loops. The correctness of EDVA is discussed, and its benefits compared with DUAL and path finding algorithms are illustrated.

Information  Exchanged and Maintained in EDVA

Each router maintains a link-cost table, a distance table and a routing table. The link-cost table lists the cost of each link adjacent to the router. The cost of the link from i to n is denoted by Lki and is considered to be infinity when the link fails. The distance table at each router i is a matrix containing, for each destination j and for each neighbor n of router i, the distance, the feasible distance (defined subsequently), the initial diffusing step number (defined subsequently), the successor and the predecessor reported by router n. As Table 1 states, these variables are denoted by Djn, FDjn, IMjn, Sjn and Pjn, respectively.

The routing table at router i is a column vector containing, for each destination j, the minimum distance(denoted by Dji), the feasible distance (denoted by FDji),  the predecessor (denoted by Pji), the successor (denoted bySji), the initial diffusing step number (denoted by IMji), diffusing step number (denoted by Mji) and a marker(denoted by TAGji) used to update the routing table. For destination j, TAGji specifies whether the entry corresponds to a simple path (TAGji = correct), a loop(TAGji = error) or a destination that has not been marked (TAGji = null).

Table 1.    Notation

If any routing information is changed at a router, it sends this information in an update message to certain neighbors. An update message from router i can consist ofone or more entries for one or more destinations, respectively. Each entry specifies an update flag (denotedby Uji), a destination, the predecessor, the distance, andthe feasible distance for that destination.

The update flag indicates whether the entry is an update (Uji =0), a query(Uji =1) or a reply to a query (Uji = 2). In the specification of EDVA, the successor to destination j for a router is simply referred to as the successor of the router, and the same reference applies to other information maintained by a router. Similarly,updates, queries and replies refer to destination j, unless stated otherwise.  When router i receives an input event regarding its neighbor n (an update message from neighbor n or a change in the cost or status of link (i, n)), it updates its destination table and link-cost table accordingly.

Tags : , , , , , , , , , ,

Shortest Path First with Emergency Exits

Under heavy and dynamic traffic, the SPF routing algorithm often suffers from wild oscillation and severe congestion, and results in degradation of the network performance. Here we presenta new routing algorithm (SPF-EE) which attempts to eliminate the problems associated with the SPF algorithm by providing alternate paths as emergency exits.With the SPF-EE algorithm, traffic is routed along the shortest-paths under normal condition. However, in the presence of congestion and resource failures, the traffic can be dispersed temporarily to alternate paths without route re-computation. Simulation experiments show that the SPF-EE algorithm achieves grater throughput, higher responsiveness, better congestion control and fault tolerance, and substantially improves the performance of routing in a dynamic environment.

A distributed routing algorithm can be decomposed into four procedures: distance measurement, information updating, route computation and packet forwarding. The distance measurement procedure monitors and collects certain network parameters according to the particular routing metric used. The collected information is distributed over the entire network by the information distribution procedure. In each node, the route computation procedure then constructs the routing table based on the received information and the packet forwarding procedure actually routes the traffic to the next hop. The new routing algorithm is an improvement over the SPF algorithm. It uses the same distance measurement procedure and information updating procedure as the SPF algorithm. Each node measures the actual delay to its neighbors and periodically broadcasts the changes to all other nodes in the network. However, the new routing algorithm is equipped with more sophisticated route computation procedure and packet forwarding procedure to deal with heavy and dynamic traffic.

The problem with the SPF algorithm is that there are no mechanisms to alter the routing other than updating the routing tables while route updating is too slow and costly for responding to traffic fluctuations. Under heavy traffic load, frequent route updating may also lead to instability. To solve this dilemma, the new routing algorithm, maintains a stable routing table and meanwhile provides alternate paths to disperse traffic when it is accumulating in some points of the network. When congestion and network failures do occur, instead of initiating route updating, the node forwards the traffic along the alternate paths temporarily and pass around the congested or failed areas. If the changes are persistent, the routing tables will be updated eventually when the next route updating time is due.

In the new routing algorithm, the alternate paths are only used as emergency exits when the shortest paths are experiencing problems. In normal conditions, the new routing algorithm performs exactly the same as the SPF algorithm does. Because of this feature, we call the new routing algorithm Shortest Path First with Emergency Exits (SPF-EE). It should be emphasized that the alternate path as anemergency exit does not have to be the shorest path to theparticular destination, nor does it have to be disjoint fromthe current shortest path. When congestion or networkfailures occur, the primary goal is to !ind an alternativepath for the traflic and avoid the packets accumulating inthe network.


Tags : , , , , , , , , , ,

Parallel Dataflow Approach to SQL Software

Terabyte online databases, consisting of billions of records, are becoming common as the price of online storage decreases. These databases are often represented and manipulated using the SQL relational model. A relational database consists of relations (files in COBOL terminology) that in turn contain tuples (records in COBOL terminology). All the tuples in a relation have the same set of attributes (fields in COBOL terminology).

Relations are created, updated, and queried by writing SQL statements. These statements are syntactic sugar for a simple set of operators chosen from the relational algebra. Select project, here called scan, is the simplest and most common operator – it produces a row-and column subset of a relational table. A scan of relation R using predicate P and attribute list L produces a relational data stream as output. The scan reads each tuple, t, of R and applies thepredicate P to it. If P(t) is true, the scan discards any attributes of t not in L and inserts the resulting tuple in the scan output stream. Expressed in SQL, a scan of a telephone book relation to find the phone numbers of all people named Smith would be written:

SELECT     telephone_number      /* the output attribute(s) */

FROM        telephone_book           /* the input relation */

WHERE     last_name = ‘Smith’;   /* the predicate */

A scan’s output stream can be sent to another relational operator, returned to an application, displayed on a terminal, or printed in a report. There in lies the beauty and utility of the relational model. The uniformity of the data and operators allow them to be arbitrarily composed into data flow graphs. The output of a scan may be sent to a sort operator that will reorder the tuples based onan attribute sort criteria, optionally eliminating duplicates. SQL defines several aggregate operators to summarize attributes into a single value, for example, taking the sum, min, or maxof an attribute, or counting the number of distinct values of the attribute. The insert operator adds tuples from a stream to an existing relation. The update and delete operators alter and delete tuples in a relation matching a scan stream.

The relational model defines several operators to combine and compare two or more relations. It provides the usual set operators union, inter section, difference, and some more exoticones like join and division. Discussion here will focus on the equi-join operator (here called join). The join operator composes two relations, A and B, on some attribute to produce a third relation. For each tuple, ta, in A, the join finds all tuples, tb, in B with attribute value equal to that of ta. For each matching pair of tuples, the join operator inserts into the output steam a tuple built by concatenating the pair. Codd, in a classic paper, showed that the relational data model can represent any form of data, and that these operators are complete [CODD70]. Today, SQL applications are typically a combination of conventional programs and SQL statements. The programs interact with clients, perform data display, and provide high-level direction of the SQL dataflow.

The SQL data model was originally proposed to improve programmer productivity by offering a non-procedural database language. Data independence was and additional benefit; since the programs do not specify how the query is to be executed, SQL programs continue to operate as the logical and physical database schema evolves. Parallelism is an unanticipated benefit of the relational model. Since relational queries are really just relational operators applied to very large collections of data, they offer many opportunities for parallelism. Since the queries are presented in a non-procedural language, they offer considerable latitude in executing the queries. Relational queries can be executed as a dataflow graph. As mentioned in the introduction, these graphs can use both pipelined parallelism and partitioned parallelism. If one operator sends its output to another, the two operators can execute in parallel giving potential speedup of two.

The benefits of pipeline parallelism are limited because of three factors: (1) Relational pipelines are rarely very long – a chain of length ten is unusual. (2) Some relational operators donot emit their first output until they have consumed all their inputs. Aggregate and sort operators have this property. One cannot pipeline these operators. (3) Often, the execution cost of one operator is much greater than the others (this is an example of skew). In such cases, the speedup obtained by pipelining will be very limited. Partitioned execution offers much better opportunities for speedup and scaleup. Bytaking the large relational operators and partitioning their inputs and outputs, it is possible to use divide-and-conquer to turn one big job into many independent little ones. This is an ideal situation for speedup and scaleup. Partitioned data is the key to partitioned execution.

Tags : , , , , , , , , , ,

Semantic optimization of OQL queries

This explores all the phases of developing a query processor for OQL, the Object Query Language proposed by the Object Data Management Group (ODMG 3.0). There has been a lot of research on the execution of relational queries and their optimization using syntactic or semantic transformations. However, there is no context that has integrated and tested all the phases of processing an object query language, including the use of semantic optimization heuristics. This research is motivated by the need for query execution tools that combine two valuable properties: i) the expressive power to encompass all the features of the object-oriented paradigm and ii) the flexibility to benefit from the experience gained with relational systems, such as the use of semantic knowledge to speedup query execution. The contribution of this work is two fold. First, it establishes a rigorous basis for OQL by defining a type inference model for OQL queries and proposing a complete framework for their translation into calculus and algebraic representations. Second, in order to enhance query execution it provides algorithms for applying two semantic optimization heuristics: constraint introduction and constraint elimination techniques. By taking into consideration a set of association rules with exceptions, it is possible to add or remove predicates from an OQL query, thus transforming it to a more efficient form.

We have implemented this framework, which enables us to measure the benefits and the cost of exploiting semantic knowledge during query execution. The experiments showed significant benefits, especially in the application of the constraint introduction technique. In contexts where queries are optimized once and are then executed repeatedly, we can ignore the cost of optimization, and it is always worth carrying out the proposed transformation. In the context of adhoc queries the cost ofthe optimization becomes an important consideration. We have developed heuristics to estimate the cost as well as the benefits of optimization. The optimizer will carry out a semantic transformation only when the overhead is less than the expected benefit. Thus transformations are performed safely even with adhoc queries. The framework can often speed up the execution of an OQL query to a considerable extent.

Semantic optimization is a third important aspect of our query processing framework. The underlying idea is to optimize a query based on semantic knowledge in the form of association rules. This first requires identifying the rules using standard data mining methods; these rules must then be combined into forms suitable for the optimization purposes. Finally, assessing the performance benefits of semantic optimization requires estimating the costand the scalability of the methods developed.

The query language used in our framework is OQL, the Object Query Language proposed as part ofthe Object Data Standard (version 3.0) published by the ODMG [CBB+00]. This standard provides specifications for storing and retrieving objects in databases. Its objective is to encompass all the important features of the OO paradigm and to ensure portability across database systems that complywith it. It consists of specifications of the following components:

1.  the Object Model defines the type system of the object paradigm,

2. the Object Definition Language (ODL) is a specification language used to define the classes of a particular schema,

3. the Object Interchange Format (OIF) is a specification language used to dump and load the state of the database to and from a file,

4. the Object Query Language (OQL) is a functional language that allows the user to query objects,

5. C++, Java, Smalltalk bindings enable developers to write code in the respective language to manipulate persistent objects.

It is a functional language, in which queries can be composed to an arbitrary depth, as long as their operands and results respect the type system defined in the Object Model. Many of the constructs of OQL are syntactically similar to those of SQL-92; however, OQL has a number of object-oriented features. First, it deals with complex objects, i.e. objects containing or being related to other objects,or containing collections (or structures) of objects. Secondly, it enables users to navigate over the objects using path expressions of arbitrary length. Thirdly, OQL allows the comparison of objects based on their object identity instead of their content values. Fourthly, it enables method invocations within a query. Fifthly, it manipulates a series of collections (sets, bags, lists, arrays, dictionaries) with the aid of additional language constructs. Sixthly, it handles polymorphic collections, i.e. collections with objects belonging to different classes. The late binding mechanism enables users to perform generic operations on the elements of these collections. The casting mechanism allows them to declare explicitly the class of an object, going down the class hierarchy.

The expressive power of OQL makes it an interesting query language to develop. In addition to its independent use as a query language, it can potentially be embedded in any object-oriented programming language. The object-oriented nature of OQL makes the gap between the query and the programming language much narrower than in the relational context. In certain cases, it allows the programmer to perform exactly the same operation by either using the commands available in the binding of the programming language, or by embedding an OQL query in it.

Tags : , , , , , , , , , , , , ,

Determining InnoDB Resource Requirements

InnoDB clearly requires far more memory for these reasons,  but it gets slightly difficult to pin down exactly how much more memory. This is true for several reasons:

a. How did you load your database?

InnoDB table size is not a constant. If you took a straight SQL dump from a MyISAM table and inserted it into an InnoDB table, it is likely larger than it really needs to be. This is because the data was loaded out of primary key order and the index isn’t tightly packed because of that. If you took the dump with the ­­order ­by ­primary argument to mysql dump, you likely have a much smaller table and will need less memory to buffer it.

b. What exactly is your table size?

This is an easy question to answer with MyISAM: that information is directly in the output of “SHOW TABLE STATUS”. However, the numbers from that same source for InnoDB are known to be estimates only. The sizes shown are the physical sizes reserved for the tables and have nothing to do with the actual data size at that point. Even the row count is a best guess.

c. How large is your primary key?

It was mentioned above that InnoDB clusters the data for a table around the primary key.This means that any secondary index leaves must contain the primary key of the data they “point to.” Thus, if you have tables with a large primary key, you will need more memory to buffer a secondary index and more disk space to hold them. This is one of the reasons some people argue for short “artificial” primary keys for InnoDB tables when there isn’t one “natural” primary key.

In summary, there is no set method that will work for everyone to predict the needed resources. Worse than that, your needed resources will change with time as more inserts to your table increase its size and fragment the packing of the BTree. The best advice to be offered is use the mysql report tool available here ( to monitor your available innodb_buffer_pool and adjust accordingly, the most important InnoDB tunable. It is important to not run at 100% usage of the innodb buffer, as this likely means that you’re not buffering as much as you could for reads, and that you are starving your write buffer which also lives in the same global innodb_buffer.

Tags : , , , , , , , , , , ,

Constrained Shortest Path Algorithm for Multicast Routing in Multimedia Applications

A  new  heuristic  algorithm  is  proposed  for  constructing multicast  tree  for multimedia  and  real-time  applications. The  tree  is  used  to  concurrently  transmit  packets  from source to multiple destinations such that exactly one copy of  any  packet  traverses  the  links  of  the  multicast  tree. Since  multimedia  applications  require  some  Quality  of Service,  QoS,  a  multicast  tree  is  needed  to  satisfy  two main  goals,  the minimum  path  cost  from  source  to  each destination (Shortest Path Tree) and a certain end-to-end delay  constraint  from  source  to  each  destination.  This problem  is  known  to  be  NP-Complete.  The  proposed heuristic algorithm solves this problem in polynomial time and gives near optimal tree. We first mention some related work  in  this  area  then  we  formalize  the  problem  and introduce the new algorithm with its pseudo code and the proof of its complexity and its correctness by showing that it always finds a feasible tree if one exists. Other heuristic algorithms are examined and compared with the proposed algorithm via simulation.

Shortest  path  tree  or source-based trees tends to minimize the cost of each path from  source  to  any  destination,  this  can  be  achieved  in polynomial  time  by  using  one  of  the  two  famous algorithms of  Bellman  and Dijkstra  and pruning the undesired  links.  Recently,  with  the  rapid  evolution  of multimedia  and  real-time  applications  like  audio / video conferencing,  interactive  distributed  games  and  real-time remote control system, certain QoS need to be guaranteed in the resulted tree. One such QoS, and the most important one,  is  the  end-to-end  delay  between  source  and  each destination, where  the  information must  be  sent within  a certain delay constraint D. By adding this constraint to the original  problem  of  multicast  routing,  the  problem  is reformulated and the multicast tree should be either delayconstrained Steiner tree, or delay-constrained shortest path tree.  Delay  constrained  Steiner  tree  is  an  NP-Complete problem,  several  heuristics  are  introduced  for  this problem each trying to get near optimal tree cost, without regarding to the cost of each individual path for  each  destination. Delay-constrained  shortest  path  tree is  also  an  NP-Complete  problem.  An  optimal algorithm  for  this  problem  is  presented  at,  but  its execution  time  is  exponential  and  used  only  for comparison  with  other  algorithms.  Heuristic  for  this problem  is  presented  in, which  tries  to  get  a  near optimal  tree  from  the  point  of  view  of  each  destination without regarding the total cost of the tree. An exhaustive comparison  between  the  previous  heuristics  for  the  two problems  can  be  found  in.  We investigate  the problem of delay constrained  shortest path tree since it is appropriate in some applications like Video on  Demand  (VoD),  where  the  multicast  group  has  a frequent  change,  and  every  user  wants  to  get  his information  in  the  lowest  possible  cost  for  him  without regarding  the  total  cost  of  the  routing  tree. Also  shortest path  tree  always  gives  average  cost  per  destination  less than  Steiner  tree.


Tags : , , , , , , , , , , , , , , ,

Distributed Query Optimization

The distributed query optimization is one of the hardest problems in the database area. The great commercial success of database systems is partly due to the development of sophisticated query optimization technology where users pose queries in a declarative way using SQL or OQL and the optimizer of the database system finds a good way (i. e. plan) to execute these queries. The optimizer, for example, determines which indices should be used to execute a query and in which order the operations of a query (e. g. joins, selects, and projects) should be executed. To this end, the optimizer enumerates alternative plans, estimates the cost of every plan using a cost model, and chooses the plan with lowest cost. There has been much research into this field. Here we study the problem of distributed query optimization; we focus on the basic components of the distributed query optimizer, i. e. search space, search strategy, and cost model. A survey of the available work into this field is given. Finally, some future work is highlighted based on some recent work that uses mobile agent technologies.

A database is physically distributed across different sites by fragmenting and replicating the data. Fragmentation subdivides each relation into horizontal fragments using select operation or vertical fragments using project operation. Fragmentation is desirable because it enables the placement of data in close proximity to its place of use. Each fragment may also be replicated to a number of sites. This is preferable when the same data is accessed from applications that run at a number of sites. The great commercial success of database systemsis partly due to the development of sophisticated query optimization technology, where users pose queries in a declarative way using SQL or OQL and the optimizer of the database system finds a good way (i. e. plan) to execute these queries. The optimizer, for example, determines which indices should be used to execute a query and in which order the operations of a query (e.g. joins, selects, and projects) should be executed. To this end, the optimizer enumerates alternative plans, estimates the cost of every plan using a cost model, and chooses the plan with lowest cost. Selecting the optimal execution strategy for a query is NP-hard in the number of relations. For complex queries with many relations, this incurs a prohibitive optimization cost. Therefore, the actual objective of the optimizer is to find a strategy close to optimal and to avoid bad strategies. The selection of the optimal strategy generally requires the prediction of execution cost of the alternative candidate ordering prior to actually executing the query. The execution cost is expressed as a weighted combination of I/ O, CPU, and communication costs.

In distributed query optimization two more steps are involved between query decomposition and query optimization: Data localization and global query optimization. The input to data localization is the initial algebraic query generated by the query decomposition step. The initial algebraic query is specified on global relations irrespective of their fragmentation or   distribution. The main role of data localization is to localize thequery using data distributed information. In this step, the fragments that are involved in the query are determined and the query is transformed into one that operates on fragments rather than global relations. Thus during the data localization step, each global relation is first replaced by its localization program,which is union of the fragment of a horizontally or vertically fragment query, and then the resulting fragment query is simplified and restructured to produce another good query. Simplification and restructuring may be done according to the same rules used in the decomposition step. The final fragment query is generally far from optimal; this process only eliminates bad queries.

The input to the third step is a fragment query, that is an algebraic query on fragments. By permuting the ordering of operations within one fragment query, many equivalent query execution plans may be found. The goal of query optimization is to find an execution strategy for the query that is close optimal. An execution strategy for a distributed query can be described with relational algebra operations and communication primitives (send/ receive operations) for transferring data between sites. The query optimizer that follows this approach is seen as three components: A search space, a search strategy and a cost model. The search space is the set of alternative execution to represent the input query. These strategies are equivalent, in the sense that they yield the same result but they differ on the execution order of operations and the way these operations are implemented. The search strategy explores the search space and selects the best plan. It defines which plans are examined and in which order. The cost model predicts the cost of a given execution plan which may consist of the following components.

· Secondary storage cost: This is the cost of searching for reading and writing data blocks on secondary storage.

· Memory storage cost: This is the cost pertaining to the number of memory buffers needed during query execution.

· Computation cost: This the cost of performing in memory operations on the data buffers during query optimization.

· Communication cost: This is the cost of shipping the query and its results from the database site to the site or terminal where the query originated.

Tags : , , , , , , , , , , , , , , , , ,