Posts Tagged ‘software product development

Computer Network Routing with a Fuzzy Neural Network

As more individuals transmit data through a computer network, the quality of service received by the users begins to degrade. A major aspect of computer networks that is vital to quality of service is data routing. A more effective method for routing data through a computer network can assist with the new problems being encountered with today’s growing networks. Effective routing algorithms use various techniques to determine the most appropriate route for transmitting data. Determining the best route through a wide area network (WAN), requires the routing algorithm to obtain information concerning all of the nodes, links, and devices present on the network. The most relevant routing information involves various measures that are often obtained in an imprecise or inaccurate manner, thus suggesting that fuzzy reasoning is a natural method to employ in an improved routing scheme. The neural network is deemed as a suitable accompaniment because it maintains the ability to learn in dynamic situations.

Once the neural network is initially designed, any alterations in the computer routing environment can easily be learned by this adaptive artificial intelligence method. The capability to learn and adapt is essential in today’s rapidly growing and changing computer networks. These techniques, fuzzy reasoning and neural networks, when combined together provide a very effective routing algorithm for computer networks. Computer simulation is employed to prove the new fuzzy routing algorithm outperforms the Shortest Path First (SPF) algorithm in most computer network situations. The benefits increase as the computer network migrates from a stable network to a more variable one. The advantages of applying this fuzzy routing algorithm are apparent when considering the dynamic nature of modern computer networks.

Applying artificial intelligence to specific areas of network management allows the network engineer to dedicate additional time and effort to the more specialized and intricate details of the system. Many forms of artificial intelligence have previously been introduced to network management; however, it appears that one of the more applicable areas, fuzzy reasoning, has been somewhat overlooked. Computer network managers are often challenged with decision-making based on vague or partial information. Similarly, computer networks frequently perform operational adjustments based on this same vague or partial information. The imprecise nature of this information can lead to difficulties and inaccuracies when automating network management using currently applied artificial intelligence techniques. Fuzzy reasoning will allow this type of imprecise information to be dealt with in a precise and well-defined manner, providing a more flawless method of automating the network management decision making process.

The objective of this research is to explore the use of fuzzy reasoning in one area of network management, namely the routing aspect of configuration management. A more effective method for routing data through a computer network needs to be discovered to assist with the new problems being encountered on today’s networks. Although traffic management is only one aspect of  configuration management, at this time it is one of the most visible networking issues. This becomes apparent as consideration is given to the increasing number of network users and the tremendous growth driven by Internet-based multimedia applications. Because of the number of users and the distances between WAN users, efficient routing is more critical in wide area networks than in LANs (also, many LAN architectures such as token ring do not allow any flexibility in the nature of message passing). In order to determine the best route over the WAN, it is necessary to obtain information concerning all of the nodes, links, and LANs present
in the wide area network. The most relevant routing information involves various measures regarding each link. These measures include the distance a message will travel, bandwidth available for transmitting that message (maximum signal frequency), packet size used to segment the message (size of the data group being sent), and the likelihood of a link failure. These are often measured in an imprecise or inaccurate manner, thus suggesting that fuzzy reasoning is a natural method to employ in an improved routing scheme.

Utilizing fuzzy reasoning should assist in expressing these imprecise network measures; however, there still remains the massive growth issue concerning traffic levels. Most routing algorithms currently being implemented as a means of transmitting data from a source node to a destination node cannot effectively handle this large traffic growth. Most network routing methods are designed to be efficient for a current network situation; therefore, when the network deviates from the original situation, the methods begin to lose efficiency. This suggests that an effective routing method should also be capable of learning how to successfully adapt to network growth. Neural networks are extremely capable of adapting to system changes, and thus will be applied as a second artificial intelligence technique to the proposed routing method in this research. The proposed routing approach incorporates fuzzy reasoning in order to prepare a more accurate assessment of the network’s traffic conditions, and hence provide a faster, more reliable, or more efficient route for data exchange. Neural networks will be incorporated into the routing method as a means for the routing method to adapt and learn how to successfully handle network traffic growth. The combination of these two tools is expected to produce a more effective routing method than is currently available.

In order to achieve the primary objective of more efficient routing, several minor objectives also need to be accomplished. A method of data collection is needed throughout the different phases of the study. Data collection will be accomplished through the use of simulation methods; therefore, a simulation model must be accurately designed before proceeding with experimenting or analysis. Additional requirements include building and training the neural network and defining the fuzzy system. The objective of this research is to demonstrate the effective applicability of fuzzy reasoning to only one area of network management, traffic routing.

Tags : , , , , , , , , , , , ,

Reasons for not using assembly code

There are so many disadvantages and problems involved in assembly programming that it is advisable to consider the alternatives before deciding to use assembly code for a particular task. The most important reasons for not using assembly programming are:

1. Development time : Writing code in assembly language takes much longer time than in a high level language.

2. Reliability and security : It is easy to make errors in assembly code. The assembler is not checking if the calling conventions and register save conventions are obeyed. Nobody is checking for you if the number of PUSH and POP instructions is the same in
all possible branches and paths. There are so many possibilities for hidden errors in assembly code that it affects the reliability and security of the project unless you have a very systematic approach to testing and verifying.

3. Debugging and verifying : Assembly code is more difficult to debug and verify because there are more possibilities for errors than in high level code.

4. Maintainability : Assembly code is more difficult to modify and maintain because the language allows unstructured spaghetti code and all kinds of dirty tricks that are difficult for others to understand. Thorough documentation and a consistent programming style is needed.

5. System code can use intrinsic functions instead of assembly : The best modern C++ compilers have intrinsic functions for accessing system control registers and other system instructions. Assembly code is no longer needed for device drivers and other system code when intrinsic functions are available.

6. Application code can use intrinsic functions or vector classes instead of assembly: The best modern C++ compilers have intrinsic functions for vector operations and other special instructions that previously required assembly programming. It is no longer necessary to use old fashioned assembly code to take advantage of the Single-Instruction-Multiple-Data (SIMD)  instructions.

7. Portability: Assembly code is very platform-specific. Porting to a different platform is difficult. Code that uses intrinsic functions instead of assembly are portable to all x86 and x86-64 platforms.

8. Compilers have been improved a lot in recent years : The best compilers are now better than the average assembly programmer in many situations.

9. Compiled code may be faster than assembly code because compilers can make inter-procedural optimization and whole-program optimization : The assembly programmer usually has to make well-defined functions with a well-defined call interface that obeys all calling conventions in order to make the code testable and verifiable. This prevents many of the optimization methods that compilers use, such as function inlining, register allocation, constant propagation, common subexpression elimination across functions, scheduling across functions, etc. These advantages can be obtained by using C++ code with intrinsic functions instead of
assembly code.

 

Tags : , , , , , , , , ,

Flaws in Authentication Protocols

Several database systems have plain text authentication protocols, by which we mean authentication protocols in which the password is passed “on the wire” in a plain text or easily decrypted format. In a default configuration (that Sybase warns against, but which we have still seen in use) Sybase passes passwords in plaintext. By default, Microsoft SQL Server obfuscates passwords by swapping the nibbles (4-bit halves of a byte) and XORing with 0xA5. In both ofthese cases, the vendors warn against using the plain text versions of their authentication protocols and provide strong, encrypted mechanisms that are relatively easy to deploy — but the defaults are still there, and still dangerous.

MySQL has historically had a number of serious problems with its authentication protocol. Although the protocol isn’t plaintext, the mathematical basis of the authentication algorithm prior to version 4.1 was called into question by Ariel Waissbein, Emiliano Kargieman, Carlos Sarraute, Gerardo Richarte, and Agustin Azubel of CORE SDI (CVE-2000-0981). This describes an attack in which an attacker that can observe multiple authentications is quickly ableto determine the password hash.

A further conceptual problem with the authentication protocol in MySQL prior to version 4.1 is that the protocol only tests knowledge of the password hash, not the password itself. This leads to serious problems if a user is able to somehow determine another user’s password hash — and MySQL has been subject to a number of issues in which that was possible. Robert van der Meulen found an issue (CVE-2000-0148) in MySQL versions prior to 3.23.11 whereby an attacker could authenticate using only a single byte of the expected response to the server’s challenge, leading to a situation whereby if you knew a user’s username, you could authenticate as that user in around 32 attempts.

Chris Anley recently found a very similar problem in MySQL (CAN-2004-0627) whereby a user could authenticate using an empty response to the server’s challenge, provided he or she passed certain flags to the remote server.This category of bugs is almost as dangerous as the “unauthenticated flaws in network protocols” category, because in many cases the traffic simply looks like a normal authentication. Attackers don’t need to exploit an overflow or do anything clever, they simply authenticate without necessarily needing the password—or if they’ve been able to sniff the password, they just authenticate.The best defense against this kind of bug is to ensure that your database patches are up-to-date, and that you don’t have any plaintext authentication mechanisms exposed on your databases. If your DBMS cannot support encrypted authentication in your environment, you could use IPSec or SSH to provide an encrypted tunnel. MySQL provides explicit guidelines on how to do this in its documentation, though recent versions of MySQL allow authentication to take place over an SSL-encrypted channel.

 

Tags : , , , , , , ,

Fabric Shortest Path First

FSPF (Fabric Shortest Path First) is a Link State path selection protocol, similar to OSPF, which is an Interior Gateway Protocol (IGP) widely used in IP networks.This protocol keeps track of the state of the links on all switches in the Fabric. It also associates a cost with eachlink. The protocol computes paths from a switch to all the other switches in the fabric, by adding the cost of allthe links traversed by the path, and choosing the path that minimizes the cost. In order for the protocol to work,the cost must be a positive integer number. The collection of link states (including cost) of all the switches in a fabric constitutes the topology database.

FSPF has four major components:

• A Hello protocol, used to establish connectivity with a neighbor switch, to establish the identity of the neighborswitch, and to exchange FSPF parameters and capabilities.

• A replicated topology database, with the protocols and mechanisms to keep the databases synchronizedacross the fabric.

• A path computation algorithm.

• A routing table update.

The topology database synchronization in turn consists of two major components: an initial database synchronization,and an update mechanism. The initial database synchronization is used when a switch is booted, or whenan Inter Switch Link (ISL) comes up. The update mechanism is used in two circumstances:

• When there is a link state change, for example an ISL going down or coming up.

• On a periodic basis, to prevent switches from deleting topology information from the database.

The path computation algorithm can be any algorithm that yields a minimum cost path. A path selection protocol requires an addressing scheme to uniquely identify every switch in the Fabric. FSPF supports the FC-SW addressing scheme, with a 24-bit Fibre Channel address subdivided in three parts, with 8bits used as switch identifier. However, FSPF uses a 32-bit field as a switch identifier, so that other addressing schemes could be supported as well in the future.

Three messages are defined: Hello (HLO), Link State Update (LSU) and Link State Acknowledgment (LSA). The message formats are listed below. All the messages carry a Protocol Version field, to insure backward compatibilitywith future versions, and a command field to discriminate between messages, ‘Reserved’ fields should be set to 0 by the sender. Undefined bits in bitmap fields should be set to 0 by the sender.

Currently defined commands are:

Table 1, Commands

FSPF Header

The format of the messages header is:

Table 2, FSPF Header

 

Version :  The current version of FSPF, which is 2.

Section ID :  Identifies a Section, that is a set of switches that share an identical topology database.It is used for hierarchical routing. It should be set to 0 for a non-hierarchical fabric.

Authentication Type :  The type of authentication to be used in the ‘Authentication’ field. It should be set to 0for no authentication.

Originating Domain ID :  The ID of the switch that transmitted the message.

Authentication :  The Authentication string. It should be 0 if Authentication Type is 0.

Hello Message Format

The format of the Hello message is:

Table 3, Hello Message

 

Options :  Bit map that indicates the options supported by the sending switch. An option is supported if the corresponding bit is set to 1. All other bits should be 0. Unicast routing issupported by default, so it does not require an Option bit.

Hello Interval :  The interval in seconds between two consecutive Hello messages that the sending switch will transmit for the life of the adjacency.

Dead Interval :  The maximum interval, in seconds, that the sending switch will wait for two consecutive Hello messages from the neighbor switch, before declaring the adjacency down.

Recipient Domain ID :  The ID of the switch that the Hello message is transmitted to.

Originating Port ID :  The port ID that the Hello message is transmitted to.

Tags : , , , , , , , , , ,

More Efficient Distance Vector Routing Algorithm

A more efficient distance vector routing algorithm(EDVA) for computer networks is presented. EDVA is based on enhancements to two classes of distance vector algorithms: the path finding algorithms that report complete path information incrementally, and the diffusing update algorithm (DUAL) which is used in Cisco’s Enhanced Interior Gateway Routing Protocol (EIGRP) and is based on diffusing computations for inter nodal synchronization. EDVA operates by specifying a distance, a feasible distance, a predecessor and diffusing hops toeach destination. This information is used together withdiffusing computations of limited span to avoid routing table loops. The correctness of EDVA is discussed, and its benefits compared with DUAL and path finding algorithms are illustrated.

Information  Exchanged and Maintained in EDVA

Each router maintains a link-cost table, a distance table and a routing table. The link-cost table lists the cost of each link adjacent to the router. The cost of the link from i to n is denoted by Lki and is considered to be infinity when the link fails. The distance table at each router i is a matrix containing, for each destination j and for each neighbor n of router i, the distance, the feasible distance (defined subsequently), the initial diffusing step number (defined subsequently), the successor and the predecessor reported by router n. As Table 1 states, these variables are denoted by Djn, FDjn, IMjn, Sjn and Pjn, respectively.

The routing table at router i is a column vector containing, for each destination j, the minimum distance(denoted by Dji), the feasible distance (denoted by FDji),  the predecessor (denoted by Pji), the successor (denoted bySji), the initial diffusing step number (denoted by IMji), diffusing step number (denoted by Mji) and a marker(denoted by TAGji) used to update the routing table. For destination j, TAGji specifies whether the entry corresponds to a simple path (TAGji = correct), a loop(TAGji = error) or a destination that has not been marked (TAGji = null).

Table 1.    Notation

If any routing information is changed at a router, it sends this information in an update message to certain neighbors. An update message from router i can consist ofone or more entries for one or more destinations, respectively. Each entry specifies an update flag (denotedby Uji), a destination, the predecessor, the distance, andthe feasible distance for that destination.

The update flag indicates whether the entry is an update (Uji =0), a query(Uji =1) or a reply to a query (Uji = 2). In the specification of EDVA, the successor to destination j for a router is simply referred to as the successor of the router, and the same reference applies to other information maintained by a router. Similarly,updates, queries and replies refer to destination j, unless stated otherwise.  When router i receives an input event regarding its neighbor n (an update message from neighbor n or a change in the cost or status of link (i, n)), it updates its destination table and link-cost table accordingly.

Tags : , , , , , , , , , ,

Shortest Path First with Emergency Exits

Under heavy and dynamic traffic, the SPF routing algorithm often suffers from wild oscillation and severe congestion, and results in degradation of the network performance. Here we presenta new routing algorithm (SPF-EE) which attempts to eliminate the problems associated with the SPF algorithm by providing alternate paths as emergency exits.With the SPF-EE algorithm, traffic is routed along the shortest-paths under normal condition. However, in the presence of congestion and resource failures, the traffic can be dispersed temporarily to alternate paths without route re-computation. Simulation experiments show that the SPF-EE algorithm achieves grater throughput, higher responsiveness, better congestion control and fault tolerance, and substantially improves the performance of routing in a dynamic environment.

A distributed routing algorithm can be decomposed into four procedures: distance measurement, information updating, route computation and packet forwarding. The distance measurement procedure monitors and collects certain network parameters according to the particular routing metric used. The collected information is distributed over the entire network by the information distribution procedure. In each node, the route computation procedure then constructs the routing table based on the received information and the packet forwarding procedure actually routes the traffic to the next hop. The new routing algorithm is an improvement over the SPF algorithm. It uses the same distance measurement procedure and information updating procedure as the SPF algorithm. Each node measures the actual delay to its neighbors and periodically broadcasts the changes to all other nodes in the network. However, the new routing algorithm is equipped with more sophisticated route computation procedure and packet forwarding procedure to deal with heavy and dynamic traffic.

The problem with the SPF algorithm is that there are no mechanisms to alter the routing other than updating the routing tables while route updating is too slow and costly for responding to traffic fluctuations. Under heavy traffic load, frequent route updating may also lead to instability. To solve this dilemma, the new routing algorithm, maintains a stable routing table and meanwhile provides alternate paths to disperse traffic when it is accumulating in some points of the network. When congestion and network failures do occur, instead of initiating route updating, the node forwards the traffic along the alternate paths temporarily and pass around the congested or failed areas. If the changes are persistent, the routing tables will be updated eventually when the next route updating time is due.

In the new routing algorithm, the alternate paths are only used as emergency exits when the shortest paths are experiencing problems. In normal conditions, the new routing algorithm performs exactly the same as the SPF algorithm does. Because of this feature, we call the new routing algorithm Shortest Path First with Emergency Exits (SPF-EE). It should be emphasized that the alternate path as anemergency exit does not have to be the shorest path to theparticular destination, nor does it have to be disjoint fromthe current shortest path. When congestion or networkfailures occur, the primary goal is to !ind an alternativepath for the traflic and avoid the packets accumulating inthe network.

 

Tags : , , , , , , , , , ,

Factors examined during query optimization

Query plans consist of retrieval tactics and an ordered set of execution steps to retrieve the data needed by the query. In developing query plans, the optimizer examines:

• The size of each table in the query, both in rows and data pages, and the number of OAM and allocation pages that need to be read.

• The indexes that exist on the tables and columns used in the query, the type of index, and the height, number of leaf pages, and cluster ratios for each index.

• Whether the index covers the query, that is, whether the query can besatisfied by retrieving data from the index leaf pages without having to access the data pages. Adaptive Server can use indexes that cover queries, even if no where clauses are included in the query.

• The density and distribution of keys in the indexes.

• The size of the available data cache or caches, the size of I/O supported bythe caches, and the cache strategy to be used.

• The cost of physical and logical reads.

• Join clauses and the best join order and join type, considering the costs and number of scans required for each join and the usefulness of indexes in limiting the I/O.

• Whether building a worktable (an internal, temporary table) with an index on the join columns would be faster than repeated table scans if there are no useful indexes for the inner table in a join.

• Whether the query contains a max or min aggregate that can use an index to find the value without scanning the table.

• Whether the data or index pages will be needed repeatedly to satisfy a query such as a join or whether a fetch-and-discard strategy can be employed because the pages need to be scanned only once.

For each plan, the optimizer determines the total cost by computing the logical and physical I/Os. Adaptive Server then uses the cheapest plan. Stored procedures and triggers are optimized when the object is first executed,and the query plan is stored in the procedure cache. If other users execute the same procedure while an unused copy of the plan resides in cache, the compiled query plan is copied in cache, rather than being recompiled.

Tags : , , , , , , , , ,

Parallel Dataflow Approach to SQL Software

Terabyte online databases, consisting of billions of records, are becoming common as the price of online storage decreases. These databases are often represented and manipulated using the SQL relational model. A relational database consists of relations (files in COBOL terminology) that in turn contain tuples (records in COBOL terminology). All the tuples in a relation have the same set of attributes (fields in COBOL terminology).

Relations are created, updated, and queried by writing SQL statements. These statements are syntactic sugar for a simple set of operators chosen from the relational algebra. Select project, here called scan, is the simplest and most common operator – it produces a row-and column subset of a relational table. A scan of relation R using predicate P and attribute list L produces a relational data stream as output. The scan reads each tuple, t, of R and applies thepredicate P to it. If P(t) is true, the scan discards any attributes of t not in L and inserts the resulting tuple in the scan output stream. Expressed in SQL, a scan of a telephone book relation to find the phone numbers of all people named Smith would be written:

SELECT     telephone_number      /* the output attribute(s) */

FROM        telephone_book           /* the input relation */

WHERE     last_name = ‘Smith’;   /* the predicate */

A scan’s output stream can be sent to another relational operator, returned to an application, displayed on a terminal, or printed in a report. There in lies the beauty and utility of the relational model. The uniformity of the data and operators allow them to be arbitrarily composed into data flow graphs. The output of a scan may be sent to a sort operator that will reorder the tuples based onan attribute sort criteria, optionally eliminating duplicates. SQL defines several aggregate operators to summarize attributes into a single value, for example, taking the sum, min, or maxof an attribute, or counting the number of distinct values of the attribute. The insert operator adds tuples from a stream to an existing relation. The update and delete operators alter and delete tuples in a relation matching a scan stream.

The relational model defines several operators to combine and compare two or more relations. It provides the usual set operators union, inter section, difference, and some more exoticones like join and division. Discussion here will focus on the equi-join operator (here called join). The join operator composes two relations, A and B, on some attribute to produce a third relation. For each tuple, ta, in A, the join finds all tuples, tb, in B with attribute value equal to that of ta. For each matching pair of tuples, the join operator inserts into the output steam a tuple built by concatenating the pair. Codd, in a classic paper, showed that the relational data model can represent any form of data, and that these operators are complete [CODD70]. Today, SQL applications are typically a combination of conventional programs and SQL statements. The programs interact with clients, perform data display, and provide high-level direction of the SQL dataflow.

The SQL data model was originally proposed to improve programmer productivity by offering a non-procedural database language. Data independence was and additional benefit; since the programs do not specify how the query is to be executed, SQL programs continue to operate as the logical and physical database schema evolves. Parallelism is an unanticipated benefit of the relational model. Since relational queries are really just relational operators applied to very large collections of data, they offer many opportunities for parallelism. Since the queries are presented in a non-procedural language, they offer considerable latitude in executing the queries. Relational queries can be executed as a dataflow graph. As mentioned in the introduction, these graphs can use both pipelined parallelism and partitioned parallelism. If one operator sends its output to another, the two operators can execute in parallel giving potential speedup of two.

The benefits of pipeline parallelism are limited because of three factors: (1) Relational pipelines are rarely very long – a chain of length ten is unusual. (2) Some relational operators donot emit their first output until they have consumed all their inputs. Aggregate and sort operators have this property. One cannot pipeline these operators. (3) Often, the execution cost of one operator is much greater than the others (this is an example of skew). In such cases, the speedup obtained by pipelining will be very limited. Partitioned execution offers much better opportunities for speedup and scaleup. Bytaking the large relational operators and partitioning their inputs and outputs, it is possible to use divide-and-conquer to turn one big job into many independent little ones. This is an ideal situation for speedup and scaleup. Partitioned data is the key to partitioned execution.

Tags : , , , , , , , , , ,

Semantic optimization of OQL queries

This explores all the phases of developing a query processor for OQL, the Object Query Language proposed by the Object Data Management Group (ODMG 3.0). There has been a lot of research on the execution of relational queries and their optimization using syntactic or semantic transformations. However, there is no context that has integrated and tested all the phases of processing an object query language, including the use of semantic optimization heuristics. This research is motivated by the need for query execution tools that combine two valuable properties: i) the expressive power to encompass all the features of the object-oriented paradigm and ii) the flexibility to benefit from the experience gained with relational systems, such as the use of semantic knowledge to speedup query execution. The contribution of this work is two fold. First, it establishes a rigorous basis for OQL by defining a type inference model for OQL queries and proposing a complete framework for their translation into calculus and algebraic representations. Second, in order to enhance query execution it provides algorithms for applying two semantic optimization heuristics: constraint introduction and constraint elimination techniques. By taking into consideration a set of association rules with exceptions, it is possible to add or remove predicates from an OQL query, thus transforming it to a more efficient form.

We have implemented this framework, which enables us to measure the benefits and the cost of exploiting semantic knowledge during query execution. The experiments showed significant benefits, especially in the application of the constraint introduction technique. In contexts where queries are optimized once and are then executed repeatedly, we can ignore the cost of optimization, and it is always worth carrying out the proposed transformation. In the context of adhoc queries the cost ofthe optimization becomes an important consideration. We have developed heuristics to estimate the cost as well as the benefits of optimization. The optimizer will carry out a semantic transformation only when the overhead is less than the expected benefit. Thus transformations are performed safely even with adhoc queries. The framework can often speed up the execution of an OQL query to a considerable extent.

Semantic optimization is a third important aspect of our query processing framework. The underlying idea is to optimize a query based on semantic knowledge in the form of association rules. This first requires identifying the rules using standard data mining methods; these rules must then be combined into forms suitable for the optimization purposes. Finally, assessing the performance benefits of semantic optimization requires estimating the costand the scalability of the methods developed.

The query language used in our framework is OQL, the Object Query Language proposed as part ofthe Object Data Standard (version 3.0) published by the ODMG [CBB+00]. This standard provides specifications for storing and retrieving objects in databases. Its objective is to encompass all the important features of the OO paradigm and to ensure portability across database systems that complywith it. It consists of specifications of the following components:

1.  the Object Model defines the type system of the object paradigm,

2. the Object Definition Language (ODL) is a specification language used to define the classes of a particular schema,

3. the Object Interchange Format (OIF) is a specification language used to dump and load the state of the database to and from a file,

4. the Object Query Language (OQL) is a functional language that allows the user to query objects,

5. C++, Java, Smalltalk bindings enable developers to write code in the respective language to manipulate persistent objects.

It is a functional language, in which queries can be composed to an arbitrary depth, as long as their operands and results respect the type system defined in the Object Model. Many of the constructs of OQL are syntactically similar to those of SQL-92; however, OQL has a number of object-oriented features. First, it deals with complex objects, i.e. objects containing or being related to other objects,or containing collections (or structures) of objects. Secondly, it enables users to navigate over the objects using path expressions of arbitrary length. Thirdly, OQL allows the comparison of objects based on their object identity instead of their content values. Fourthly, it enables method invocations within a query. Fifthly, it manipulates a series of collections (sets, bags, lists, arrays, dictionaries) with the aid of additional language constructs. Sixthly, it handles polymorphic collections, i.e. collections with objects belonging to different classes. The late binding mechanism enables users to perform generic operations on the elements of these collections. The casting mechanism allows them to declare explicitly the class of an object, going down the class hierarchy.

The expressive power of OQL makes it an interesting query language to develop. In addition to its independent use as a query language, it can potentially be embedded in any object-oriented programming language. The object-oriented nature of OQL makes the gap between the query and the programming language much narrower than in the relational context. In certain cases, it allows the programmer to perform exactly the same operation by either using the commands available in the binding of the programming language, or by embedding an OQL query in it.

Tags : , , , , , , , , , , , , ,

Determining InnoDB Resource Requirements

InnoDB clearly requires far more memory for these reasons,  but it gets slightly difficult to pin down exactly how much more memory. This is true for several reasons:

a. How did you load your database?

InnoDB table size is not a constant. If you took a straight SQL dump from a MyISAM table and inserted it into an InnoDB table, it is likely larger than it really needs to be. This is because the data was loaded out of primary key order and the index isn’t tightly packed because of that. If you took the dump with the ­­order ­by ­primary argument to mysql dump, you likely have a much smaller table and will need less memory to buffer it.

b. What exactly is your table size?

This is an easy question to answer with MyISAM: that information is directly in the output of “SHOW TABLE STATUS”. However, the numbers from that same source for InnoDB are known to be estimates only. The sizes shown are the physical sizes reserved for the tables and have nothing to do with the actual data size at that point. Even the row count is a best guess.

c. How large is your primary key?

It was mentioned above that InnoDB clusters the data for a table around the primary key.This means that any secondary index leaves must contain the primary key of the data they “point to.” Thus, if you have tables with a large primary key, you will need more memory to buffer a secondary index and more disk space to hold them. This is one of the reasons some people argue for short “artificial” primary keys for InnoDB tables when there isn’t one “natural” primary key.

In summary, there is no set method that will work for everyone to predict the needed resources. Worse than that, your needed resources will change with time as more inserts to your table increase its size and fragment the packing of the BTree. The best advice to be offered is use the mysql report tool available here (http://hackmysql.com/mysqlreport) to monitor your available innodb_buffer_pool and adjust accordingly, the most important InnoDB tunable. It is important to not run at 100% usage of the innodb buffer, as this likely means that you’re not buffering as much as you could for reads, and that you are starving your write buffer which also lives in the same global innodb_buffer.

Tags : , , , , , , , , , , ,