Posts Tagged ‘IT Consulting

Profiling the TCP switch implementation in userspace

We measured the performance of the TCP switch(user space) & observed that in average, the forwarding is fast, but there is high fluctuation of forwarding time performance.Also, the connection setup and teardown is slow. The fluctuation of performance happens when garbage collector runs to reclaim the unused resources. We profiled each part of the system tofigure out the reasons of such behavior. First, we measured the time to perform a function to insert or delete an entry in the classification table of iptables and to insert or delete a queue and a filter for a particular connection in output scheduler. Each function call takes about 300us. It’s partly due to the communication overhead between kernel and user space. To setup and tear down a connection,we need to do the transaction twice for iptables and twice for traffic control. This gives us motivation to implement the TCP switch inside the Linux kernel.

The more important issue is to make the forwarding performance more predictable(less variable). To find the reason of the fluctuation, we measured thread context switching time and insertion, deletion, and lookup time of iptables in the kernel. One factor of the fluctuation may be the conflict between lookup of forwarding data and operation of the admission controller and the garbage collector. The thread context switching time measurement program creates the given number of threads and each thread continuously yields. Then, we counted the number of yields in given time. The machine we experiment is Pentium III 1GHz PC. The process context switching time is almost same as the thread context switching time.

In average, it’s smaller than 1 us, although there is some variability in the number of context switches in different threads. From this result, we conclude that the thread context switching time is not the major factor of the fluctuation of the forwarding performance. Next, we examined the locking conflict between code that read iptables and code that writes iptables. iptables use read-write lock which is the spin lock with multiple readers and one writer support. When no reader has the lock, writer can get the lock. When writer has the lock, any reader cannot get the lock. The measured lookup (read) time of iptables is 1~2us, but the insertion or deletion of an entry in iptables takes about 100us. When a new entry is inserted oran entry is deleted, a new table is created by vmalloc, modification is done, old table is copiedto the new table by memcpy, and old table is freed. This seems to be an inefficent table management. But it’s a good design considering the usage of iptables. In general, iptables entries are static (i.e. they do not change often). So, the operation of copying whole table is not a problem. By creating a new table and deleting an old one, the memory allocated is compact. However, this is not suitable for the TCP switch operation. The TCP switch inserts or deletes entries dynamically. And this latency in modification of iptables delays the forwarding of packets. When garbage collector runs, it tries to delete inactive entries. Because this deletion is slow, the forwarding of packet gets delayed in this period. The reason of the fluctuation of forwarding time can be explained with this slow modification of iptables. This gives the motivation to change iptables to the data structure suitable for the TCP switch.


Tags : , , , ,

Applying the Lessons of eXtreme Programming


The benefits available from adopting XP style unit testing on aproject, and then moves on to identify useful lessons that can be learned from other XP (eXtreme Programming) practices. This concludes by asking questions about the nature of process improvement in software development and how we can make our software serve society.

Applying JUnit

JUnit is a deceptively simple testing framework that can create a major shift in your personal development process and the enjoyment of programming. Prior to using JUnit, developers typically have resistance to making changes late in a project “because something might break.”With JUnit the worry associated with breaking something goes away. Yes, it might still break, but the tests will detect it. Because every method has a set of tests, it is easy to see which methods have been broken by the changes, and hence make the necessary fix. So changes can bemade late on in a project with confidence, since the tests provide a safety net.

Lesson: Automated tests pay off by improving developer confidence in their code.

In order to create XP style Unit Tests however, developers need to make sure that their classes and methods are well designed. This means that there is minimum coupling between the class being tested and the rest of the classes in the system. Since the test case subclass needs to be able to create an object to test it, the discipline of testing all methods forces developers to create classes with well-defined responsibilities.

Lesson: Requiring all methods to have unit tests forces developers to create better designs.

It is interesting to compare classes that have unit tests with those that do not have unit tests.By following the discipline of unit tests, methods tend to be smaller but more numerous. The implementations tend to be simpler, especially when the XP practice of  “Write the Unit Tests first”  is followed. The big difference  that the really long methods full of nested and twisted conditional code don’t exist in the unit tested code. That kind of code is impossible to write unit tests for, so it ends up being re-factored into a better design.

Lesson: Design for testability is easier if you design and implement the tests first.

Adopting XP style unit testing also drastically alters the minute by minute development process. We write a test, compile and run (after adding just enough implementation to make itrun) so that the test will fail. It notes “This may seem funny – don’t we want the tests to pass? Yes, we do. But by seeing them fail first, we get some assurance that the test isvalid.”  Now that we have a failing test, we can implement the body of the method and run the test again. This time it will probably pass, so now it’s time to run all the other unit tests to see ifthe latest changes broke anything else. Now that we know that it works, we can now tidy up the code, Refactor as needed and possibly optimize this correct implementation. Once we have done this we are ready to restart the cycle by writing the next unit test.

Lesson: Make it Run, Make it Right, Make it Fast (but only if you need to).

Early successes with JUnit encouraged me to experiment with other XP Practices. Just havingthe unit tests in place made programming fun again, and if that practice was valuable, may be other were as well.

Tags : , , , ,

The Open Digital Library

Technology Platform Requirements

By its nature, digital collection development requires extensive use of technological resources. In the early days of digital library development, when collections were typically small and experimental, a wide variety of hardware and software was utilized.Today, the leading digital library developers are putting substantial collections online. Some of these collections include millions of digital objects; collections are being planned that will require storage measured in petabytes—the equivalent of more than 50,000 desktop computers with 20-gigabyte hard drives. As digital libraries scale in size and functionality, it is critical for the underlying technology platform to deliver the performance and reliability required. Many digital libraries are considered “mission critical” to the overall institution. In addition, patrons expect high service levels, which means that downtime and poor response time are not tolerable. Moreover, because cost is a foremost concern, scalability and efficiency with a low total cost of ownership are also key requirements. This type of digital library implementation requires a scalable enterprise-level technology solution with built-in reliability, availability, and service ability (RAS) features.

Storage capacity also must be scalable to adapt to rapid growth in demand, and must be adapted to the mix of media types that may be stored in a digital library, such as:

• Text, which is relatively compact.

• Graphics, which can be data-intensive.

• Audio, which is highly dynamic.

• Video, which is highly dynamic and data intensive.

Storage capacity should be expandable in economical increments and should not require redesign or re-engineering of the system design as requirements grow. An open systems architecture provides both a robust platform and the best selection of digital media management solutions and development tools. The inherent reliability and scalability of open platforms have made them the most popular choice of IT professionals for Internet computing. This computing model features an architecture that is oriented totally around Internet protocols and stresses the role of Websites for a vast and diverse array of services that follow a utility model.

Evolution to Web Services

The digital library of the future will deliver “smart media services”; that is, Webservices that can match “media content” to user “context” in a way that provides a customized, personalized experience. Media content is digital content that includes elements of interactivity. Context includes such information as the identity and location of the user.

Several key technologies must interact to allow Web services to work in this way.Extensible Markup Language (XML) and Standard General Markup Language (SGML)are important standards influencing our ability to create broadly interoperable Webbased applications. SGML is an international standard for text markup systems and is very large and complex, describing thousands of different document types in many fields of human activity. XML is a standard for describing other languages, written in SGML. XML allows the design of customized markup languages for limitless types of documents, providing a very flexible and simple way to write Web-based applications.This differs from HTML, which is a single, predefined markup language and can be considered one application of XML. The primary standards powering Web services today are XML-based. These include:

• Simple Object Access Protocol (SOAP).

• Universal Description, Discovery and Integration (UDDI).

• Web Services Description Language (WSDL).

• Electronic Business XML (ebXML).

These standards are emerging as the basis for the new Web services model. While not all are fully defined standards, they are maturing quickly with broad industry support.


Tags : , , , ,

Stronger Password Authentication Using Browser Extensions

Keystream monitor

A natural idea for anyone who is trying to implement web password hashing is a keystream monitor that detects unsafe user behavior. This defense would consist of a recording component and a monitor component. The recording component records all passwords that the user types while the extension is in password mode and stores a one-way hash of these passwords on disk. The monitor component monitors the entire keyboard key stream for a consecutive sequence of keystrokes that matches one of the user’s passwords. If such a sequence is keyed while the extension is not in password mode, the user is alerted.

We do not use a keystream monitor in Pwd-Hash, but this feature might be useful for an extension that automatically enables password mode when a password field is focused, rather than relying on the user to press the password-key or password-prefix. However, this approach suffers from several limitations.The most severe is that the keystream monitor does not defend against an online mock password field. By the time the monitor detects thata password has been entered, it is too late — the phisher has already obtained all but the last characterof the user’s password. Another problem is that storing hashes of user passwords on disk facilitates an offline password dictionary attack if the user’s machine is infiltrated. However, the same is true of the browser’s auto-complete password database. And finally,novice users tend to choose poor passwords that might occur naturally in the keystream, when the extension is not in password mode. Although the threat of constant warnings might encourage the userto choose unique and unusual passwords, excessive false alarms could also cause the user to disregard monitor warnings.

Tags : , , ,

Transaction Management in the R* Distributed Database Management System

It concentrates primarily on the description of the R* commit protocols, Presumed Abort (PA) and Presumed Commit (PC). PA and PC are extensions of the well-known, two-phase (2P) commit protocol. PA is optimized for read-only transactions and a class of multisite update transactions, and PC is optimized for other classes of multisite update transactions. The optimizations result in reduced intersite message traffic and log writes, and, consequently, a better response time. R* is an experimental, distributed database management system (DDBMS).When a transaction execution starts, its actions and operands are not constrained. Conditional execution and ad hoc SQL statements are available to the application program.The whole transaction need not be fully specified and made known to the systemin advance. A distributed transaction commit protocol is required in order toensure either that all the effects of the transaction persist or that none of the Transaction Management in the R’ Distributed Database Management System effects persist, despite intermittent site or communication link failures. In otherwords, a commit protocol is needed to guarantee the uniform commitment of distributed transaction executions.

Guaranteeing uniformity requires that certain facilities exist in the distributed database system. We assume that each process of a transaction is able to provisionally perform the actions of the transaction in such a way that they canbe undone if the transaction is or needs to be aborted. Also, each database of the distributed database system has a log that is used to recoverably record the state changes of the transaction during the execution of the commit protocol and the transaction’s changes to the database (the UNDO/REDO log ). The log records are carefully written sequentially in a file that is kept in stable(nonvolatile) storage.

The transaction writing the log record is not allowed to continue execution until this operation is completed. This means that, if the site crashes(assuming that a crash results in the loss of the contents of the virtual memory)after the force-write has completed, then the forced record and the ones preceding it will have survived the crash and will be available, from the stable storage,when the site recovers. It is important to be able to “batch” force-writes for high performance. R* does rudimentary batching of force-writes. On the other hand, in the asynchronous case, the record gets written to virtual memory buffer storage and is allowed to migrate to the stable storage later on(due to a subsequent force or when a log page buffer fills up). The transaction writing the record is allowed to continue execution before the migration takesplace. This means that, if the site crashes after the log write, then the record may not be available for reading when the site recovers. An important point to note is that a synchronous write increases the response time of the transaction compared to an asynchronous write. Hereafter, we refer to the latter as simply awrite and the former as a force-write.

Some of the desirable characteristics in a commit protocol are (1) guaranteed transaction atomicity always, (2) ability to “forget” out come of commit processingafter a short amount of time, (3) minimal overhead in terms of log writes and message traffic, (4) optimized performance in the no-failure case, (5) exploitationof completely or partially read-only transactions, and (6) maximizing the abilityto perform unilateral aborts. R*, which is an evolution of the centralized DBMS System R, like its predecessor, supports transaction serializability and uses the two-phase locking(2PL) protocol as the concurrency control mechanism. The use of 2PL introduces the possibility of deadlocks. R*, instead of preventing deadlocks,allows them (even distributed ones) to occur and then resolves them by deadlockdetection and victim transaction abort.

Some of the desirable characteristics in a distributed deadlock detection protocol are (1) all deadlocks are resolved in spite of site and link failures,(2) each deadlock is detected only once, (3) overhead in terms of messages exchanged is small, and (4) once a distributed deadlock is detected the time takento resolve it (by choosing a victim and aborting it) is small.Here we concentrate on the specific implementation of that distributed algorithm in R* and the solution adopted for the global deadlock victim selection problem. In general, as far as global deadlock management is concerned, we suggest that if distributed detection of global deadlocks is to be performed then, in the event of a global deadlock, it makes sense to choose as the victim a transaction that is local to the site of detection of that deadlock (inpreference to, say, the “youngest” transaction which may be a nonlocal transaction),assuming that such a local transaction exists.

First, we give a careful presentation of 2P. Next, we derive from 2P in a stepwise fashion the two new protocols,namely, PA and PC. We then present performance comparisons, optimizations, and extensions of PA and PC. Next, we present the R* approach to global deadlock detection and resolution. We then conclude by outlining the current status of R*.

Tags : , , , ,

Gaps in CBMIR using Different Methods

In the field of information technology along with a digital imaging revolution in the medical domain facilitate the generation and storage of large collections of images by hospitals and clinics. To search these large image collections effectively and efficiently poses significant technical challenges, and it raises the necessity of constructing intelligent retrieval systems. Content-based Medical Image Retrieval (CBMIR) consists of retrieving the most visually similar images to a given query image from a database of images. Medical CBIR (content-based image retrieval) applications pose unique challenges but at the same time offer many new opportunities. On one hand, while one can easily understand news or sports videos,a medical image is often completely incomprehensible to untrained eyes.


The difficulty faced by CBIR methods in making inroads into medical applications can be attributed to a combination of several factors.

a. The Content Gap

It is important to consider image content in light of the context of the medical application for which a CBIR system has been optimized. Too often, we find a generic image retrieval model where the goal is to find medical images thatare similar in overall appearance. The critical factor in medical images, however, is the pathology – the primary reason forwhich the image was taken. This pathology may be expressed in details within the image (e.g., shape of a vertebra or textureand color of a lesion) rather than the entire image (e.g. spinex-ray or cervicographic image).

b. The Feature Gap

Extracted features are used to define the image content. As such, decisions on the types of features, scale(s) at which the features are extracted, and their use individually or in combination determines the extent to which the system“knows” the image and, to a large extent the system capability. It is necessary for the system to support as many types of features as possible and also capture them at several scales.

c. The Performance Gap

Benefits of medical imaging to science and healthcare haveled to an explosive growth in the volume (and rate) of acquired medical images. Additionally, clinical protocols determine the acquisition of these images. There is a need forthe system response to be meaningful, timely and sensitive tothe image acquisition process. These requirements make linear searches of image feature data, very often presented in the literature, impractical and a significant hurdle in the inclusion of CBIR into medical applications.

d. The Usability Gap

This gap is rarely addressed during the design and development of CBIR systems. However, it is the one of most concern to the end user of the system and therefore hasthe greatest potential for affecting the acceptance of a new technology. An idealized system can be designed to overcome all the above gaps, but still fall short of being accepted intothe medical community for lack of (i) useful and clear querying capability; (ii) meaningful and easily understandable responses; and (iii) provision to adapt to user feedback.

Tags : , , , ,

Privacy and accountability in database systems

Databases that preserve a historical record of activities and data offer the important benefit of system accountability, past events can be analyzed to detect breaches and maintain data quality. But the retention of history can also pose a threat to privacy. System designers need to carefully balance the need for privacy and accountability by controlling how and when data is retained by the system and who will be able to recover and analyze it. This  describes the technical challenges faced in enhancing database systems so that they can securely manage history. These include  first, assessing the unintended retention of data in existing database systems that can threaten privacy; second, redesigning system components to avoid this unintended retention; and third,developing new system features to support accountability when it is desired.

Computer forensics is an emerging field which has studied the recovery of data from file systems, and the unintended retention of data by applications like web browsers and document files. Forensic tools like the Sleuth Toolkit and EnCASE Forensic  are commonly used by investigators to recover data from computer systems. These tools are sometimes able to interpret common file types but, to our knowledge, none provide support for analysis of database files. Forensic analysts typically have unrestricted access to storage on disk. We consider as our threat model adversaries with such access, as this models the capabilities of system administrators, a hacker who has gained privileges on the system, or an adversary who has breached physical security.We also note that database storage is increasingly embedded into a wide range of common applications for persistence.For example, embedded database libraries like BerkeleyDB and SQLite  are used as the underlying storage mechanisms for email clients, web browsers, LDAP implementations, and Google Desktop. For example, Apple Mail.appuses an SQLite database file to support searches on subject,recipient, and sender (stored as !/Library/Mail/EnvelopeIndex). Recently, Mozilla has adopted SQLite as a unified storage model for all its applications. In Firefox 2.0, remote sites can store data that persists across sessions in an SQLite database as a sophisticated replacement of cookies. The forensic analysis of such embedded storage is particularly interesting because it impacts everyday users of desktop applications, and because embedded database storage is harder to protect from such investigation.

Forensic analysis can be applied to various components of a database system, and it reveals not only data currently active in the database, but also previously deleted data and historical information about operations performed on thesystem. Record storage in databases contains data that has been logically deleted, but not destroyed. Indexes also contain such deleted values, and in addition may reveal, through their structure, clues about the history of operations that led to their current state. Naturally, the transaction log contains a wealth of forensic information since it often includes before- and after-images of each database update. Other sources of forensic information include temporary relations (often written to disk for large sort operations), the database catalog, and even hidden tuple identifiers that may reveal the order of creation of tuples in the database. The goal here is to understand the magnitude of data retention, and to measure (or bound) the expected lifetime of data.

As a practical matter encrypted storageis not widely used. In databases, encryption often introduces unacceptable performance costs. In addition, forensic investigators or adversaries may recover cryptographic keys because they are shared by many employees, easily subpoenaed,or stored on disk and recovered.

Tags : , , , ,

Fuzzy Logic Based Intelligent Negotiation Agent (FINA) in E-Commerce

The fuzzy logic based intelligent negotiation agent. This is able to interact autonomously and consequently save human labor in negotiations. The aim of modeling a negotiation agent is to reach mutual agreement efficiently and intelligently. The negotiation agent is able to negotiate with other such agents, over various sets of issues, on behalf of the real-world parties they represent, i.e. it can handle multi-issue negotiation.

The reasoning model of the negotiation agent has been implemented partially by using c# based on Microsoft .NET. The reliability and the flexibility of the reasoning model are finally evaluated. The results show that performance of the proposed agent model is acceptable for negotiation parties to achieve mutual benefits.

Software agent technology is widely used in agent-based e-Commerce. These software agents have a certain degree of intelligence, i.e. they can make their own decisions. The agents interact with other agents to achieve certain goals. However,software agents can not directly control other agents because every agent is an independent decision maker. Negotiation becomes the necessary method to achieve mutual agreement between agents.This focuses on modeling multi-issue, one-to-one negotiation agents for a third party driven virtual market place.We consider one-to-one negotiation because it is the characteristic of individual negotiations and because it allows cooperative negotiation which is not suitable for many-to-many auction based negotiations.

When building autonomous negotiation agents which are capable of flexible and sophisticated negotiation, three broadareas need to be considered:

Negotiation protocols – the set of rules which govern the interaction

Negotiation issues – the range of issues over which agreement must be reached

Agent reasoning models – the agents employ to act in line with the negotiation protocol in order to achieve their negotiation objectives.

This reasoning model aims at the negotiation process. The process of matching and hand shaking in a pre-negotiation process has been solved in several papers. We assume that the buyer agent and vendor agent have roughly matched their similarity and start a negotiation on issues which they have not reached agreement. In a certain round of negotiation, the negotiation agent can pre-prepare a counter offer for the next round of negotiation. The counter offer is generated by the new offer generation engine. Both the incoming offer from the opponent negotiation agent and the counter offer are sent to the offer evaluation block. This does the analysis of the offer and calculates the degree of satisfaction (acceptance of the agent)of the incoming offer and the counter offer. The result is scaled over the range from 0 to 100. Finally, the decision making block makes the decision. It could be acceptance of the current incoming offer, rejection of the current incoming offer, or counter offer.

Tags : , , , ,

Sedna : A Native XML DBMS

Sedna is an XML database system. It implements XQuery and its data model exploiting techniques developed specially for this language.

Sedna is designed with having two main goals in mind.First, it should be the full-featured database system. That requires support for all traditional database services such as external memory management, query and update facilities,concurrency control, query optimization, etc. Second, itshould provide a run-time environment for XML-data intensive applications. That involves tight integration of database management functionality and that of a programming language. Developing Sedna,  not to adopt any existing database system. Instead of building a super structure upon an existing database system, use a native system from scratch. It took more time and eff ort but gave us more freedom in making design decisions and allowed avoiding undesirable run-time overheads resulting from interfacing with the data model of the underlying database system.

We take the XQuery 1.0 language  and its data mode as a basis for our implementation. In order to support updates we extend XQuery with an update language named XUpdate. Sedna is written in Scheme and C++. Static query analysis and optimization is written in Scheme. Parser, executor,memory and transaction management are written in C++.The implementation platform is Windows.

Tags : , , , ,

The use of the random oracle model in cryptography

Possibly the most controversial issue in provable security research is the use of the random oracle model (Bellare & Rogaway 1993). The random oracle model involves modelling certain parts of cryptosystems, called hash functions, as totally random functions about whose internal workings the attacker has no information.This theoretical model vastly simplies the analysis of cryptosystems and allows many schemes to be ‘proven’ secure that would otherwise be too complicated to be proven secure.

A hash function is a keyless algorithm that takes arbitrary-length inputs and outputs a fixed-length hash value or hash. There are several properties that one would expect a hash function to exhibit, including pre-image resistance (given a random element of the output set, it should be computationally infeasible to find a pre-image of that element)  and collision resistance  (it should be computationally infeasible to find two elements that have the same hash value). However, there are many more properties that we might require of a hash function depending on the circumstances. For example, it might be hoped that if the hash function is evaluated on two related inputs, then the outputs will appear unrelated.

From a provable security point of view, hash functions present a difficult problem. They are usually developed using symmetric techniques, either as standalone algorithms or based on the use of a block cipher. Thus it is difficult to apply the reductionist theory of provable security to them because there are no natural candidate problems to which we may reduce the security. There are constructions of hash functions from block ciphers for which it can be proven that the hash function has certain properties (such as pre-image and collision resistance) as long as the underlying block cipher is indistinguishable from a random permutation however, it is impossible for any publicly known function to produce outputs that appear independent when evaluated on two known inputs.

The random oracle model attempts to overcome our inability to make strong statements about the security of hash functions by modelling them as completely random functions about which an attacker has no information. The attacker (andall other parties in the security model) may evaluate such a random hash functionby querying an oracle. The original interpretation of this simplifcation was that it heuristically demonstrated that a cryptosystem was secure up to attacks against the system that may be introduced via the use of a specific hash function. Equivalently,it was thought that a proof of security in the random oracle model meant that, with overwhelming probability, the cryptosystem was secure when instantiated with a randomly chosen hash function.This interpretation of the random oracle model is correct up to a point. It ispossible to construct families of efficient hash functions for which it is computationally infeasible to differentiate between access to an oracle which computes a randomly selected hash function, and access to an oracle which computes a random function. If such a hash function is used in place of the random oracle, then we can be sure that the scheme is secure against attackers whose only interaction with the hash function is to directly compute the output of the hash function on certain inputs.

The one key difference between the random oracle model and the use of a hash function selected at random from a random looking function family is that in the latter case the attacker is given access to a description of a Turing machine that can compute the hash function  in the former the attacker is not given such a description.This led to the cataclysmic result of Ran Canetti  (2004) who demonstrated that it was possible to have a scheme that was provably secure in the random oracle model,and yet insecure when the random oracle was replaced with any hash function. The trick Canetti  employ is to use knowledge of the Turing machine that computes the hash function like a password that forces the cryptosystem to release sensitive information.

Tags : , , ,