News: DBMS - June 1997 - Extended Relational DBMSs: The Technology - Part I

Jonathan, Jeon (hollobit@kisco.co.kr)
Tue, 27 May 1997 10:01:22 +0900

If you reply to this message at intranet, include:
In-Reply-To: <"/Admin:/intranet/17/39"@www.kisco.co.kr>

Source Page[1]

Extended Relational DBMSs: The Technology - Part I -

By Judy Davis; Application Requirements Driving Industry Efforts to
Extend Relational DBMSs to Handle Complex Data.
-----------------------------------------------------------------------
Exploding interest in the Internet and World Wide Web as application
platforms has had a significant effect on the relational database
management system (RDBMS) marketplace. Initially, Web hype reduced the
profile of the major RDBMS combatants and overshadowed their emphasis on
areas such as parallel database operations, data warehousing, and data
replication. But users developing Internet/Web applications quickly
recognized the need for a scalable, robust environment in which to
store, manipulate, and manage the dynamic delivery of multimedia data
and other complex datatypes. Connecting the DBMS and Web applications
via a Web server also requires effective support for a three-tiered
application architecture. In fact, a Web application can represent a
microcosm of the full range of business requirements -- scalability,
performance, access to integrated data across an organization,
transaction management, deployment on multiple platforms, Java support,
and so on.

When the Web came along, most RDBMS vendors were already working on
extensibility to handle more complex data and applications -- Web
applications simply provided an immediate need and a focal point for
these efforts. Thus we are seeing a resurgence of interest in RDBMS
technology as an underlying platform for complex-data applications,
particularly those implemented on the Web.

This article addresses the application requirements that drive the
efforts to extend the RDBMS to handle complex data. It also describes
the components of an extensible data management architecture and the
core features required to achieve RDBMS server extensibility. Part II of
this article, which will appear next month, will discuss how the five
major RDBMS vendors -- Informix Software Inc. (Menlo Park, Calif.), IBM
Corp. (Armonk, N.Y.), Microsoft Corp. (Redmond, Wash.), Oracle Corp.
(Redwood Shores, Calif.), and Sybase Inc. (Emeryville, Calif.) -- plan
to support data extensibility.

Application Requirements
Users have always wanted a unified view of and integrated access to all
data across the organization. A related requirement is integrated
content searching. Delivering these capabilities has been difficult if
not impossible in the past. Although the RDBMS has successfully met the
requirements of a wide range of applications, it natively understands
only highly structured data in the form of simple alphanumeric
datatypes. Text and images may be stored as BLOBs, but the DBMS doesn't
have a clue about the content of these BLOBs. Users who want to build
applications that intelligently access data stored as time series,
geospatial locations, dynamic Web pages, documents, spreadsheets, mail
messages, and other data have had to explore other alternatives, such as
specialized servers or logic in the application itself. The RDBMS has
not been a viable alternative.

In addition, corporate information systems are becoming more complex as
organizations strive to design and integrate operational, data
warehousing, and Web information systems. ( See Figure 1[2] .) A Web
information system, for example, lets users access corporate information
using a Web browser and Web-based applications; it essentially
Web-enables anything.

To meet these application requirements, many organizations are looking
for a single database platform that applies scalability, transaction
integrity, proactive enforcement of business rules, and other robust
DBMS functionality to complex data as well as traditional data. Because
the RDBMS is already widely used for traditional applications, it makes
sense to explore the possibility of extending it to manage a wider range
of datatypes, application-specific semantics, and complex data
relationships intelligently. This evolution of the RDBMS is referred to
as "extended relational" (because the underlying data model is
extensible) or "object-relational" (because the RDBMS can now understand
"rich datatypes" or "objects" that represent complex internal
structures, attributes, and behavior and require new search methods). A
common term for products in this space is "universal server" or
"universal database."

Another factor that is driving efforts to extend the RDBMS is the
ever-present desire to increase developer productivity. One aspect of
this is providing a single API to all data so that developers don't have
to contend with multiple APIs depending on the type and source of data.
An example is retrieving structured data from an RDBMS using SQL and
retrieving related information on documents using the native API of a
text search engine. If the data is related, why not just use one API,
such as SQL, and have a single database server manage both types of
data? The controversy that is evolving in this area is whether SQL is
the appropriate language for this process. The object-relational
approach pushes SQL3 as the single API. Microsoft, on the other hand,
would like to see its OLE DB common object method interface take on this
single-API role.

A second aspect of developer productivity is pushing support for object
modeling techniques into the database server itself. These techniques
include the ability to encapsulate data and its associated methods as
objects and to reuse code through features such as inheritance and
polymorphism.

There are two important points to be made here. One is that many people
in the industry confuse objects with datatypes. Objects encapsulate both
data and methods. Adding new datatypes to the RDBMS is only one step
toward supporting true objects. Also required is the ability to define
new methods and to associate these methods with the appropriate
datatypes.

The second point is the need to differentiate the object-relational
approach from that of object DBMSs. Object-relational DBMSs deliver some
object capabilities, but they do not yet offer the same level of support
as object DBMSs for features such as encapsulation and inheritance. In
addition, it is unlikely that object-relational DBMSs will fully support
features such as pointer navigation and tight integration with
object-oriented programming languages to provide persistent storage of
native objects created in an application. I expect object-relational
products to move in this direction to gain the benefits of the object
approach where possible, but object features will be implemented
differently in object-relational, given its underlying data structure of
tables and columns.

An Extensible Data Management Architecture
There are three major approaches to creating an extensible data
management architecture: the universal server approach, the middleware
approach, and the object layer approach, which I describe later in this
article. Perhaps a better way to present these approaches is to refer to
each one as a component of data extensibility, for they are not mutually
exclusive. In fact, a fully extensible DBMS solution should address all
of these components. ( See Figure 2[3] .)

There are two key differences among these approaches. The first is how
and where the data is managed. Is all of the data tightly integrated and
managed by a single DBMS server? Or is the data loosely integrated and
managed by multiple servers? The second differentiator is where query
optimization occurs and how well it performs. Query optimization maps
the logical view of data to the physical view and decides the best way
to execute the query. Is this handled by the database server (the
universal server approach) or in middleware (the OLE DB approach)? Good
optimization is particularly important in an environment in which data
can be stored in any combination of one table, multiple tables, multiple
databases, and/or files outside the DBMS.

The Universal Server Approach
The "universal server" approach extends RDBMS server capabilities to
understand, store, and manage complex data natively in the database
itself. Informix, IBM, and Oracle are all implementing this approach
(with Informix-Universal Server, IBM DB2 Universal Database, and
Oracle8, respectively). Sybase will also add limited complex-data
support to Sybase SQL Server in the future. This approach assumes that
all of the data is physically stored within the database.

An "extended universal server" accommodates the fact that there may be
very good reasons (such as performance) for not storing all data in the
DBMS. So the DBMS must also be able to efficiently access data stored in
external files. Large data values -- images, for example -- can be
stored externally, and a pointer to each image file is stored inside the
database as a column value. An additional step is enabling the DBMS to
also manage and ensure the integrity of this external data. The only
RDBMS vendor planning to address the latter is IBM, with its robust
file-links. However, I expect efforts in this area to receive much more
attention in the industry over the next 12 months.

Object purists already criticize the extended-relational approach
because of the fact that the DBMS must decompose objects into relational
tables -- rows and columns -- for storage and then rebuild them into
objects before delivery to the user. The vendors extending relational
are very much aware of the performance issues and the need to avoid the
overhead of joins where possible. My philosophy here is the following:
If product performance meets user requirements, it doesn't matter what
the product does under the covers (unless, of course, the architecture
hampers the ability to add future enhancements). In the real world, user
experience will demonstrate whether the extended RDBMS vendors have been
successful in adapting the relational model for complex-data support. I
discuss specific universal server extensions in more detail later in
this article.

The Middleware Approach
Another approach is to use middleware that coordinates and executes
requests across multiple, heterogeneous servers (RDBMS, text search
engine, image system, and flat files); the data itself is managed within
each specialized server. The middleware provides the unified view of the
data, executes the global optimization of user queries, and provides
global transaction management. There are two types of middleware in an
extensible data management architecture. Both types use the SQL API and
provide drivers out the back end to access each supported server. One is
database middleware, such as IBM's DataJoiner and Sybase's OmniConnect,
for integrated access to heterogeneous data. Sybase also plans to extend
this "federated server" approach in its adaptive server architecture.

Microsoft's OLE DB and DCOM, and other object request brokers (ORBs), on
the other hand, represent another type of middleware: application
middleware. OLE DB is an interface that was designed to provide
universal access to data. It "componentizes" DBMS functionality,
breaking it up into components that can run in the middleware space or
in the operating system, such as query processors, optimizers, and
transaction managers. OLE DB will be an integral component of
Microsoft's operating systems and servers. In fact, DBMSs were
originally developed because of operating-system deficiencies. If these
services are implemented at the operating-system level, will we still
need the DBMS? This discussion will become particularly interesting if
SQL3 doesn't prove that it is up to the task of handling real objects.
Oracle's Network Computing Architecture also addresses the ability to
extend the data environment at the middleware level; Oracle's Web
Application Server will evolve into a generic application server that
can provide some database functionality such as transaction management.

How well a middleware approach performs will be determined by several
factors: the level of integration among components, how smart the
middleware is about data stored in supported servers and the native
capabilities of each server, whether the user wants to manipulate the
data or simply retrieve it, and the mechanism by which components
communicate. Some RDBMS vendors claim that using RPCs to execute queries
across distributed data will have serious performance implications.
However, the comment I just made about performance as it relates to
object-relational also applies here.

One issue with OLE DB in particular and ORBs in general is the ability
of third-party software vendors to provide competent database
functionality such as global query processing and optimization. Given
the amount of effort and R&D investment that have gone into optimization
algorithms on the part of the major RDBMS vendors, it is not clear that
middleware will offer comparable functionality and performance. Another
issue is the number of data sources accessible through middleware. Is
the user dependent on the DBMS vendor, or can a customer or a third
party integrate data to meet specific business requirements?

The Object Layer Approach
The object layer in an extensible data architecture provides integrated
object views and object functionality at the application level. This can
encompass client cache management, pointer navigation among objects,
local execution of functions, and local query optimization. Object DBMSs
are clearly focused here, including persistent storage of objects
created by the application. In the case of the RDBMS, the object layer
could include the ability to map objects in the application to objects
in the database so that relational data can be materialized in the form
of native C or C++ objects, Java objects, and so on. The benefits of
this approach are tighter integration between the data manager and the
application development language and the potential for better
performance. IBM is planning to address the object layer in the future
through its client object-support development effort. Oracle will do
some of this in Oracle 8 with its object views of relational data and
client-cache management. Microsoft's OLE DB also provides support at the
object layer with a unified view of heterogeneous data.

Universal Server Extensions
Much of the marketing hype in the industry is focused on the universal
server component of the architecture. Here is a brief look at how RDBMS
vendors are extending their products to satisfy the user requirements I
described previously. Many, but not all, of these features are included
in the SQL3 draft standard. ( See Table 1[4] for a summary.)

Extensible Type System. An extended RDBMS must support user-defined
datatypes (UDTs) at both the column and the row level. Column-level UDTs
are either distinct or abstract datatypes. UDTs enable the extended
RDBMS to incorporate new datatypes and understand complex data or
business relationships. Distinct types are relatively simple UDTs that
extend an existing base datatype for a column. A strongly typed system
will not permit the user to make inappropriate direct comparisons
between types with different names, even though they share the same base
datatype and length. Abstract datatypes define more complex datatypes
that have special internal structures and attributes, such as text,
geospatial, or time-series data. As with objects, the internal structure
of an abstract datatype is hidden from the user; data is accessed and
manipulated using a set of external attributes and functions. Abstract
datatypes are defined using SQL (the database engine is aware of the
attributes and internal structure) or a host language (the type is
essentially "opaque" and seen only as a large object by the DBMS).

A row type describes an entire row or a set of nested columns in a
table, providing a way to represent hierarchical "entities" in the
database -- customer, employee, and so on -- and identify multiple
related columns. Reference types can then define relationships between
row types and uniquely identify a row within an entire database.
References enable users to replace complex-join definitions in queries
with much simpler path expressions. References also give the optimizer
an alternative way to navigate data instead of via value-based joins.

Collections are type constructors that are used to define collections of
other types, such as arrays, lists, and sets. Collections are used to
store multiple values in a single column in a table and can result in
nested tables where a column in one table actually contains another
table. The result can be a single table that represents multiple
master-detail levels. Collections add flexibility to the design of
database structures.

An important aspect of object modeling is inheritance, in which subtypes
inherit the attributes and behavior of their supertypes. Inheritance
facilitates code reuse and the ability to maintain logical integrity in
the database.

User-Defined Functions. User-defined functions (UDFs) define methods for
manipulating data and are an important adjunct to UDTs. An extended
RDBMS should provide significant flexibility in this area, such as
allowing UDFs to return complex values that can then be further
manipulated (such as tables), execution options so that the user can
decide whether performance or security is more important when running
UDFs, and support for overloading of function names to simplify
application development.

Index Structures. Traditional RDBMSs use B-tree (binary tree) indexes to
speed access to scalar data. With the ability to define more complex
datatypes in the RDBMS, specialized index structures are required for
efficient access to data. Some extended RDBMSs are beginning to support
additional index types, such as R-trees (region trees) for fast access
to two- and three-dimensional data, and the ability to index on the
output of a function. A mechanism to plug in any user-defined index
structure provides the highest level of flexibility.

Optimizer. The query optimizer is the heart of RDBMS performance and
must also be extended with knowledge about how to execute UDFs
efficiently, take advantage of new index structures, transform queries
in new ways, and navigate among data using references. Successfully
opening up such a critical and highly tuned DBMS component and educating
third parties about optimization techniques is a major challenge for
DBMS vendors.

Other Extensions. Other important extensions are support for
large-object storage either inside the database or outside in external
files, the ability to apply business rules and integrity constraints to
new datatypes, recursive queries to support complex-data relationships,
and extended language support in the server. This last area is key to
improving both flexibility and portability. Extended RDBMSs must support
the SQL3 standard (in committee draft status now) plus additional
languages for writing UDFs and stored procedures, such as 3GLs and Java.
Unfortunately, the SQL3 standard does not address some areas of
extensibility, so implementation of features such as the mechanism for
giving the optimizer cost information about UDFs and new index
structures will vary among products. The lack of a standard way for
third-party vendors to integrate their software with multiple RDBMSs
demonstrates the need for standards beyond the focus of SQL3.

We also need application language extensions to "complex-data-enable"
applications on the front end. Application development tools must be
extended to take advantage of new server functionality.

Getting Customers to Migrate
Vendors must address two major obstacles to getting customers to buy
into their universal server solutions. One is providing a substantial
portfolio of predefined extensions -- such as DataBlades for
Informix-Universal Server -- as building blocks for application
development. Most extensions will come from a combination of the DBMS
vendor and third-party software vendors who want to integrate their
products tightly within the DBMS. The most flexible solution is for DBMS
vendors to offer basic built-in extensions but also support a wide
variety of (potentially competitive) third-party extensions. Thus the
customer can assess the trade-off between one-stop shopping and a
best-of-breed solution. Third parties that write DBMS extensions must
understand complex database functionality such as transaction management
and how to assess the cost of executing a particular function. They also
must keep their products in synch with new releases of the database
server. Third-party database extensions would be similar to the way
third parties provide components used in application development tools.

The second obstacle is convincing customers that to get extensibility,
they don't have to give up what they already have, such as good
performance on existing applications. A question to ask each vendor is
how integrated complex-data extensions are with existing DBMS
functionality such as parallel processing, backup and recovery, data
integrity constraints, and data replication. Does the customer have to
choose between complex-data support and the ability to use other
features such as parallel processing, data replication, or distributed
databases? In some cases, the answer is yes.

Upgrade or Perish?
The next generation of RDBMS products has started to roll out in the
form of object-relational "universal servers." However, none of the
initial product releases from Informix, IBM, or Oracle supports a
complete set of object extensions, nor do any of the releases support
all of the components of an extensible, data management architecture.
Taking advantage of a universal server requires customers to upgrade to
a new version of the DBMS. It is important to understand what this
entails and whether existing applications must be modified. Can the
customer take advantage of new extensions in an evolutionary way without
losing the benefits of current database features and functions? Stay
tuned for a look at the specific vendor strategies and product plans of
Informix, IBM, Microsoft, Oracle, and Sybase and how they are addressing
data extensibility.

-----------------------------------------------------------------------
Judith R. Davis is a principal with InfoIT Inc., a DataBase Associates
International company that provides in-depth industry analysis of all
aspects of new and evolving information technologies. She has over 15
years of experience as a consultant and industry analyst specializing in
DBMSs and related technologies. You can email Judy at jdavis@dbaint.com.
-----------------------------------------------------------------------
-----------------------------------------------------------------------
This article has been adapted from InfoIT's in-depth research report,
Object-Relational DBMSs, which is available on the InfoIT Web site,
www.infoit.com[5] .
-----------------------------------------------------------------------
Figure 1.
[IMAGE][6]
--Corporate Information System.
Figure 2.
[IMAGE][7]
--Components of an Extensible Data Management Architecture.
-----------------------------------------------------------------------
TABLE 1. Summary of Object-Relational Extensions Feature Included in
SQL3 Extensible type system (user-defined types or UDTs) Yes Support for
strong typing Yes Support for hierarchies of types and inheritance Yes
Data replication support for UDTs No User-defined functions (UDFs) Yes
Function overloading Yes Function resolution based on multiple
attributes Yes Extensible indexing system No Extensible query optimizer
No Support for large objects (LOBs) Yes Support for external data No
Integrated searchable content Yes Extended language support Yes SQL3 and
SQL/Multimedia Yes 3GLs Yes for stored procedures 4GLs No Java No
Object-oriented languages No Predefined extensions available N/A
Facilities (API, developer's kit) for adding extensions No Application
language support for extensions No Systems management support for
extensions No Source: InfoIT, Inc., a DataBase Associates company
-----------------------------------------------------------------------
-----------------------------------------------------------------------
[1] http://www.dbmsmag.com/9706d13.html
[2] #figure1
[3] #figure2
[4] #table1
[5] http://www.infoit.com
[6] 9706d131.gif
[7] 9706d132.gif