Posts Tagged ‘database
In most enterprises there are two types of passwords: local and domain. Domain passwords are centralized passwords that are authenticated at an authentication server (e.g., a Lightweight Directory Access Protocol server, an Active Directory server). Local passwords are passwords that are stored and authenticated on the local system (e.g., a workstation or server). Although most local passwords can be managed using centralized password management mechanisms, some can only be managed through third-party tools, scripts, or manual means. A common example is built-in administrator and root accounts. Having a common password shared among all local administrator or root accounts on all machines within a network simplifies system maintenance, but it is a widespread weakness. If a single machine is compromised, an attacker may be able to recover the password and use it to gain access to all other machines that use the shared password. Organizations should avoid using the same local administrator or root account password across many systems. Also, built-in accounts are often not affected by password policies and filters, so it may be easier to just disable the built-in accounts and use other administrator-level accounts instead.
A solution to this local password management problem is the use of randomly generated passwords, unique to each machine, and a central password database that is used to keep track of local passwords on client machines. Such a database should be strongly secured and access to it limited to only the minimum needed. Specific security controls to implement include only permitting authorized administrators from authorized hosts to access the data, requiring strong authentication to access the database (for example, multi-factor authentication), storing the passwords in the database in an encrypted form (e.g., cryptographic hash), and requiring administrators to verify the identity of the database server before providing authentication credentials to it.
Another solution to management of local account passwords is to generate passwords based on system characteristics such as machine name or media access control (MAC) address. For example, the local password could be based on a cryptographic hash of the MAC address and a standard password. A machine’s MAC address, “00:16:59:7F:2C:4D”, could be combined with the password “N1stSPsRul308” to form the string “00:16:59:7F:2C:4D N1stSPsRul308”. This string could be hashed using SHA and the first 20 characters of the hash used as the password for the machine. This would create a pseudo-salt that would prevent many attackers from discovering that there is a shared password. However, if an attacker recovers one local password, the attacker would be able to determine other local passwords relatively easily.
The fundamental reason for building digital libraries is belief that it will provide better delivery of information than was not possible in the past. The major advantages of digital libraries over traditional libraries include:
- Digital libraries bring the libraries closer to the users: Information are brought to the users, either at home or work, making it more accessible, and increases its usage. This is very much different that traditional libraries where the users have to physically go to the library.
- Computer technology is used for searching and browsing: Computer systems are better than manual methods for finding information. It is useful for reference work that involves repeated leaps from one source of information to another.
- Information can be shared: Placing digital information on a network makes it available to everyone. Many digital libraries are maintained at a single central site.This is a vast improvement over expensive physical duplication of little used material, or the inconvenience of unique material that is inaccessible without traveling to the location where it is stored.
- Information is always available: The digital library’s doors will never close; usage of digital libraries’ collections can be done at hours when the library buildings are closed. Materials are never checked-out, missed-shelve, or stolen. In traditional libraries, information is much more likely to be available when and where the user wants it.
- New forms of information become possible: A database may be the best way to record and disseminate information. Whereas conventional libraries are printed on paper, yet print is not always the best way to record and disseminate information.
Digital libraries would definitely facilitate research work and this should be accepted mainly by those involved in the field of research. However, recent studies showed that people still prefer to read from paper despite the progress in technology. Today with many people searching for new knowledge and information, the Internet is expected to take on board the role of the human intermediary. There is also an expectation that people are digitally literate. On the other hand, some end-users do not always have the literacy to search the Internet effectively for information. The problem is compounded by the fact the Internet as a whole is not well organized and information retrieval is inevitably a difficult and time consuming process.
The service-oriented automated negotiation system includes the following four participants: service registration centre, negotiation service requester, negotiation service provider, and protocol.
Service registration centre is a database of service providers‟ information. The negotiation service provider follows a standard service API so that customers can use negotiation services from different service providers. Registration service centre supports all customers and services in the open system, of which negotiation is just one. Service registration centre provides customers with other services such as security and auditing in addition to negotiation service registration.
Negotiation service requester discovers and calls negotiation service. It provides business solutions to enterprises and individuals using the negotiation service. In general, the service requestor does not directly interact with service registration centre, but through an application portal system. The benefit for doing so is the application portal can provide users with access to a wide range of services.
Negotiation service provider, managed by the negotiation software vendors, advertises negotiation service to theservice registration centre, including the registration of their functions and access interfaces. It responds to theservice request. At the same time, it must ensure that any modifications to the service will not affect the requester.
Protocol is an agreement between negotiation service requesters and providers. It standardizes service requestand response and ensures mutual communication.
Figure 1: Abstract architecture for automated negotiation system
Figure 1 illustrates a simple negotiation service interaction cycle, which begins with a negotiation service ADVERTISING itself through a well-known service registration centre. A negotiation requester, who may or maynot run as a separate service, queries the service registration centre to DISCOVER a service that meets its need ofnegotiation. The service registration centre returns a (possibly empty) list of suitable services, and the servicerequester selects one and passes a request message to it, using any mutually recognized protocol. In this example,the negotiation service responds either with the result of the requested operation or with a fault message. Then they INTERACT with each other continuously.
All of the databases we cover in this volume have had serious security flaws at some point. Oracle has published 69 security alerts on its “critical patch updates and security alerts” page — though some of these alerts relate to alarge number of vulnerabilities, with patch 68 alone accounting for somewhere between 50 and 100 individual bugs. Depending on which repository you search, Microsoft SQL Server and its associated components have been subject to something like 36 serious security issues — though again, some of these patches relate to multiple bugs. According to the ICAT metabase, DB2 has had around 20 published security issues — although the authors of this book have recently worked with IBM to fix a further 13 issues. MySQL has had around 25 issues; Sybase ASE is something of a dark horse with a mere 2 published vulnerabilities. PostgreSQL has had about a dozen. Informix has had about half a dozen, depending on whose count you use.
The problem is that comparing these figures is almost entirely pointless. Different databases receive different levels of scrutiny from security researchers. To date, Microsoft SQL Server and Oracle have probably received the most, which accounts for the large number of issues documented for each of those databases. Some databases have been around for many years, and others are relatively recent. Different databases have different kinds of flaws; some databasesare not vulnerable to whole classes of problems that might plague others. Even defining “database” is problematic. Oracle bundles an entire application environment with its database server, with many samples and prebuilt applications. Should these applications be considered a part of the database? Is Microsoft’s MSDE a different database than SQL Server ? They are certainly used in different ways and have a number of differing components, but they were both subject to the UDP Resolution Service bug that was the basis for the “Slammer” worm.
Even if we were able to determine some weighted metric that accounted forage, stability, scrutiny, scope, and severity of published vulnerabilities, we would still be considering only “patchable” issues, rather than the inherent security features provided by the database. Is it fair to directly compare the comprehensive audit capabilities of Oracle with the rather more limited capabilities of MySQL, for instance? Should a database that supports securable views be considered “more secure” than a database that doesn’t implement that abstraction? By default, PostgreSQL is possibly the most security-aware database available — but you can’t connect to it over the network unless you explicitly enable that functionality. Should we take default configurations into account? The list of criteria is almost endless, and drawing any firm conclusions from it is extremely dangerous.
Ultimately, the more you know about a system, the better you will be able to secure it — up to a limit imposed by the features of that system. It isn’t true tosay, however, that the system with the most features is the most secure because the more functionality a system has, the more target surface there is for an attacker to abuse. The point of this book is to demonstrate the strengths and weaknesses of the various database systems we’re discussing, not — most emphatically not — to determine which is the “most secure”.
Tandem and Teradata have demonstrated that the same architecture can be used successfully to process transaction-processing workloads as well as ad-hoc queries. Running a mix of both types of queries concurrently, however, presents a number of unresolved problems. The problem is that ad-hoc relational queries tend to acquire a large number of locks (at least from a logical viewpoint) and tend to hold them for a relatively long period of time, (at least from the viewpoint of a competing debit-credit style query). The solution currently offered is “Browse mode” locking for ad-hoc queries that do not update the database. While such a “dirty_read” solution is acceptable for some applications, it is not universally acceptable. The solution advocated by XPRS, is to use a versioning mechanism to enable readers to read a consistent version of the database without setting any locks.While this approach is intuitively appealing, the associated performance implications need further exploration. Other, perhaps better, solutions for this problem may also exist.
A related issue is priority scheduling in a shared-nothing architecture. Even in centralized systems, batch jobs have a tendency to monopolize the processor, flood the memory cache, and make large demands on the I/O subsystem. It is up to the underlying operating system to quantizeand limit the resources used by such batch jobs in order to insure short response times and low variance in response times for short transactions. A particularly difficult problem, is the priority inversion problem, in which a low-priority client makes a request to a high priority server. The server must run at high priority because it is managing critical resources. Given this, the work ofthe low priority client is effectively promoted to high priority when the low priority request is serviced by the high-priority server. There have been several ad-hoc attempts at solving this problem, but considerably more work is needed.
In a Grid environment, data may be stored in different locations and on different devices with different characteristics. The mechanism neutrality implies that applications should not need to be aware of the specic low-level mechanisms required to access data at a particular location. Instead, applications should be presented with a uniform view of data and with uniform mechanisms for accessing that data. These requirements are met by the storage system abstraction and our grid storage API. Together, these dene our data access service.
1. Data Abstraction: Storage Systems
We introduce as a basic data grid component what we call a storage system, which we dene as an entity that can be manipulated with a set of functions for creating, destroying, reading, writing, and manipulating the attributes of named sequences of bytes called file instances. Notice that our definition of a storage system is a logical one: a storage system can be implemented by any storage technology that can support the required access functions. Implementations that target Unix file systems, HTTP servers, hierarchical storage systems such as HPSS, and network caches such as the Distributed Parallel Storage System (DPSS) are certainly envisioned. In fact, a storage system need not map directly to a single low-level storage device. For example, a distributed file system that manages files distributed over multiple storage devices or even sites can serve as a storage system, as can an SRB system that serves requests by mapping to multiple storage systems of different types.
Our definition of a file instance is also logical rather than physical. A storage system holds data, which may actually be stored in a file system, database, or other system; we do not care about how data is stored but specify simply that the basic unit that we deal with is a named sequences of uninterpreted bytes. The use of the term “file instance” for this basic unit is not intended to imply that the data must live in a conventional file system. For example, a data grid implementation might use a system such as SRB to access data stored within a database management system. A storage system will associate with each of the le instances that it contains a set of properties, including a name and attributes such as its size and access restrictions. The name assigned to a leinstance by a particular storage system is arbitrary and has meaning only to that storage system. In many storage systems, a name will be a hierarchical directory path. In other systems such as SRB, it may be a set of application metadata that the storage system maps internally to a physical leinstance.
2. Grid Storage API
The behavior of a storage system as seen by a data grid user is defined by the data grid storage API, which defines a variety of operations on storage systems and file instances. Our understanding of the functionality required in this API is still evolving, but it certainly should include support for remote requests to read and/or write named file instances and to determine file instance attributes such as size. In addition, to support optimized implementation of replica management services (discussed below) we require a third party transfer operation used to transfer the entire contents of a le instance from one storage system to another.
While the basic storage system functions just listed are relatively simple, various data grid considerations can increase the complexity of an implementation. For example, storage system access functions must be integrated with the security environment of each site to which remote access is required. Robust performance within higher-level functions requires reservation capabilities within storage systems and network interfaces. Applications should be able to provide storage systems with hints concerning access patterns, network performance, and so forth that the storage system can use to optimize its behavior. Similarly, storage systems should be capable of characterizing and monitoring their own performance; this information, when made available to storage system clients, allows them to optimize their behavior. Finally, data movement functions must be able to detect and report errors. While it may be possible to recover from some errors with in the storage system, other errors may need to reported back to the remote application that initiated the movement.
Terabyte online databases, consisting of billions of records, are becoming common as the price of online storage decreases. These databases are often represented and manipulated using the SQL relational model. A relational database consists of relations (files in COBOL terminology) that in turn contain tuples (records in COBOL terminology). All the tuples in a relation have the same set of attributes (fields in COBOL terminology).
Relations are created, updated, and queried by writing SQL statements. These statements are syntactic sugar for a simple set of operators chosen from the relational algebra. Select project, here called scan, is the simplest and most common operator – it produces a row-and column subset of a relational table. A scan of relation R using predicate P and attribute list L produces a relational data stream as output. The scan reads each tuple, t, of R and applies the predicate P to it. If P(t) is true, the scan discards any attributes of t not in L and inserts the resulting tuple in the scan output stream. Expressed in SQL, a scan of a telephone book relation to find the phone numbers of all people named Smith would be written:
SELECT telephone_number /* the output attribute(s) */
FROM telephone_book /* the input relation */
WHERE last_name = ‘Smith’; /* the predicate */
A scan’s output stream can be sent to another relational operator, returned to an application, displayed on a terminal, or printed in a report. Therein lies the beauty and utility of the relational model. The uniformity of the data and operators allow them to be arbitrarily composed into dataflow graphs. The output of a scan may be sent to a sort operator that will reorder the tuples based onan attribute sort criteria, optionally eliminating duplicates. SQL defines several aggregate operators to summarize attributes into a single value, for example, taking the sum, min, or max of an attribute, or counting the number of distinct values of the attribute. The insert operator adds tuples from a stream to an existing relation. The update and delete operators alter and delete tuples in a relation matching a scan stream.
The relational model defines several operators to combine and compare two or more relations. It provides the usual set operators union, intersection, difference, and some more exoticones like join and division. Discussion here will focus on the equi-join operator (here called join). The join operator composes two relations, A and B, on some attribute to produce a third relation. For each tuple, ta, in A, the join finds all tuples, tb, in B with attribute value equal to that of ta. For each matching pair of tuples, the join operator inserts into the output steam a tuple built by concatenating the pair. Codd, in a classic paper, showed that the relational data model can represent any form of data, and that these operators are complete. Today, SQL applications are typically a combination of conventional programs and SQL statements. The programs interact with clients, perform data display, and provide high-level direction of the SQL dataflow. The SQL data model was originally proposed to improve programmer productivity by offering a non-procedural database language. Data independence was and additional benefit; since the programs do not specify how the query is to be executed, SQL programs continue to operate as the logical and physical database schema evolves.
Parallelism is an unanticipated benefit of the relational model. Since relational queries are really just relational operators applied to very large collections of data, they offer many opportunities for parallelism. Since the queries are presented in a non-procedural language, they offer considerable latitude in executing the queries. Relational queries can be executed as a dataflow graph. As mentioned in the introduction, these graphs can use both pipelined parallelism and partitioned parallelism. If one operator sends its output to another, the two operators can execute in parallel giving potential speedup of two.
The benefits of pipeline parallelism are limited because of three factors: (1) Relational pipelines are rarely very long – a chain of length ten is unusual. (2) Some relational operators donot emit their first output until they have consumed all their inputs. Aggregate and sort operators have this property. One cannot pipeline these operators. (3) Often, the execution cost of one operator is much greater than the others (this is an example of skew). In such cases, the speedup obtained by pipelining will be very limited. Partitioned execution offers much better opportunities for speedup and scaleup. By taking the large relational operators and partitioning their inputs and outputs, it is possible to use divide-and-conquer to turn one big job into many independent little ones. This is an ideal situation for speedup and scaleup. Partitioned data is the key to partitioned execution.
Today, in a competitive world, enterprises of all kinds use and depend on timely available, up-to-date information. Information volumes are growing 25-35% per year and the traditional transaction rate has been forecast to grow by a factor of 10 over the next five years-twice the current trend in mainframe growth. In addition, there is already an increasing number of transactions arising from computer systems in business-to-business interworking and by intelligent terminals in the home, office or factory.
The profile of the transaction load is also changing as decision-support queries, typically complex, are added to the existing simpler, largely clerical workloads. Thus, complex queries such as those macro-generated by decision support systems or system-generated as in production control will increase to demand significant throughput with acceptable response times. In addition, very complex queries onvery large databases, generated by skilled staff workers or expert systems, may hurt throughput while demanding good response times.
From a database point of view, the problem is to come up with database servers that support all these types of queries efficiently on possibly very largeon-line databases. However, the impressive silicon technology improvements alone cannot keep pace with these increasing requirements. Microprocessor performance is now increasing 50% per year, and memory chips are increasing in capacity by a factor of 16 every six years. RISC processors today can deliver between 50 and 100 MIPS (the new 64 bit DEC Alpha processor is predicted to deliver 200 MIPS at cruise speed!) at a much lower price/MIPS than mainframe processors. This is in contrast to much slower progress in disk technology which has been improving by a factor of 2 in response time and throughput over the last 10 years. With such progress, the I/O bottleneck worsens with time.
The solution is therefore to use large-scale parallelism to magnify the raw power of individual components by integrating these in a complete system along with the appropriate parallel database software. Using standard hardware components is essential to exploit the continuing technology improvements with minimal delay. Then, the database software can exploit the three forms of parallelism inherent in data-intensive application workloads. Interquery parallelism enables the parallel execution of multiple queries generated by concurrent transactions. Intraquery parallelism makes the parallel execution of multiple, independent operations (e.g.,select operations) possible within the same query. Both interquery and intraquery parallelism can be obtained by using data partitioning. Finally, with intraoperation parallelism, the same operation can be executed as many sub-operations using function partitioning in addition to data partitioning. The set-oriented mode of database languages (e.g., SQL) provides many opportunities for intraoperation parallelism. For example, the performance of the join operation can be increased significantly by parallelism.