Butler Group White Paper
Aruna Companion
Aruna
Author: Martin Butler
SUMMARY
Aruna Companion is a very timely solution to a wide
range of business problems. As data sources proliferate and data volumes
grow exponentially so it becomes more difficult for business managers to
get timely access to information. Aruna has addressed this problem
through the creation of an innovative information delivery platform
based on its own unique database technology that complements existing
technologies and builds on their strengths. Its technology platform
addresses three pressing issues:
-
Business managers are prohibited from asking many
questions simply because existing database technology cannot process
the resulting queries. Real database queries are usually complex,
generating joins that typically take many hours and sometimes days
to execute – even on the most powerful hardware. The techniques
developed by Aruna mean that complex queries will typically execute
within a few minutes and maybe even seconds – even on modest
hardware configurations.
-
Proliferation of data sources means that an
integrated view of data is often impossible to achieve. Aruna has
addressed this problem also, supporting many data formats and
databases and facilitating consolidation within a single Query Data
Store. Subsequent queries can be applied to individual tables, or to
the whole unified data store – a unique capability in my experience.
-
Business users want an easy to use query interface
that exploits their knowledge of the data. Aruna has incorporated a
natural language query interface that supports everyday language for
query composition. The resulting SQL is often very complex, but is
hidden from the user. This makes the technology available to anyone
that has authority to use it.
It is very important to appreciate the small footprint
of Aruna’s technology. There is little need for users to train in its
use, and even at a technical level there is almost nothing to be done
other than product installation which is designed to be something the
business users can do for themselves. Tuning and configuration are
virtually non-existent – and very deliberately so. Add to this the
modest resources needed to execute complex queries on very large data
sets, and we have a technology that offers significant capability with
minimum overhead.
The uses for this unique technology are growing fast.
Wherever there is a need to query large data sets on a frequent basis
Aruna Companion is being seen as a cost effective, powerful solution.
Financial services and telecoms companies are typically finding uses for
the technology, and there are already some impressive success stories.
Unlike many IT developments, Aruna Companion is
non-disruptive in that it works with existing database technologies. It
is not intended to be a replacement for traditional relational
databases, but a complement that utilises the relational view of data
while offering orders of magnitude performance improvements. Aruna
Companion, as its name implies, sits next to them on its own server and
simply augments their query capabilities while reducing the burden on
the operational environment. Aruna has not created a transactional
database system – Companion has been created for rapid execution of
queries, but does not support online update (although Aruna’s Query Data
Store can be incrementally updated on a frequent basis if needed). It is
best thought of as a highly effective data warehouse, but with a much
smaller technology and cost footprint.
Perhaps most attractive of all is the "proof of value"
proposition offered by Aruna. To overcome scepticism and risk Aruna will
create a Query Data Store from a company’s existing data and demonstrate
the capability that is being offered. There is no excuse for ignoring
this technology if complex querying of large data sets has proved to be
problematical in the past. Aruna Companion is a low risk, high payback
database technology that most large corporations should have as part of
their technology arsenal. THE NEED FOR FASTER DATABASE
TECHNOLOGY
Increases in speed usually manifest as quantum jumps,
and a new generation of technologies, particularly corporate portals and
business intelligence tools, demand a quantum increase in the speed of
information access and delivery. End users are being presented with
direct access to the information assets of the enterprise, and they are
intolerant of delays. Even worse, there is some information that simply
cannot be delivered because of its inaccessibility, its complexity, or a
lack of resources to process it in a timely fashion.
Critical factors affecting business performance are
increasingly linked to the speed and availability of information.
Responsiveness to customers and market conditions is wholly dependent on
timely access to the relevant information. The ability to exercise
appropriate levels of agility depend on the speed of management decision
making; which is again dependent on speed of access to information.
There is hardly a single aspect of corporate performance that is not
adversely affected by inadequate information delivery.
Deriving benefits from information is one side of the
equation. The other is minimising the cost of information delivery. Most
things are possible with IT – but at a cost. Tighter economic conditions
mean the cost of information has to be reduced as far as possible.
Expensive, large-scale projects with implementation timescales measured
in years are generally not acceptable. There is a need for technologies
that provide rapid Return On Investment (ROI), are friendly to the end
user, and deliver benefits that are otherwise unavailable. Butler Group
divides the cost of exploiting information into three main areas of
activity:
-
Origination is all costs associated with resources
used to derive information requested by the user. This includes
database technologies, data warehousing, and data mining technologies.
-
Delivery is the next element of cost, and this is all
resources used to deliver information to the user (portal technology
for example).
-
Execution is the cost incurred in applying the
information. Time is spent using information and formulating queries.
Origination, delivery, and execution are all
interlinked and while delivery mechanisms have developed considerably
over the last three years, origination and execution have seen little
innovation. This means that we have excellent user interfaces and
information delivery channels, but poor origination technology resulting
in delays and lack of access. Core to origination is database
technology, and while the relational database has proved to be an
excellent all purpose database technology, it is hardly well suited to
high-speed origination of information. There is no question of replacing
this technology – most large organisations have significant investments
in relational databases and they have no desire to replace them. We need
a technology that can build on the strengths of the relational model,
but not be limited by its run-time inefficiencies. This opportunity has
been very well understood by Aruna and its technology addresses these
specific needs.
You could interpret Aruna’s offering as being a
delivery platform, with a mirrored cache of multiple Origination
sources. This is a new concept that not only improves delivery speed and
scalability but ‘buffers’ the Origination sources from any impact
whatsoever during query execution. While presenting a familiar and
widely recognised relational view of data, and acting as a complement to
existing database technologies, Aruna provides the quantum increase in
database speed that portals and business intelligence applications now
require.
Of equal importance is the fact that Aruna offers a
natural language interface that directly impacts on the execution costs
associated with information usage. Not only are the timescales required
to formulate queries dramatically reduced, but so are the skill levels.
This affects execution costs quite significantly; an area that is often
overlooked in the use of IT.
The speed increases available from Aruna are
significant enough to create new business opportunities – complex
queries that might take many hours or days to execute using traditional
technology will typically execute within a few minutes or even seconds.
Organisations using the technology have been able to offer a new level
of capability to their information workers, resulting in some
significant cost savings, and the creation of opportunities that
otherwise would have been infeasible.
Aruna Companion
WHAT ARUNA COMPANION IS AND WHAT IT ISN’T
Very fast query execution is one major attribute of
Aruna Companion. Depending on the type of query this speed improvement
over conventional relational technology can be a factor of 100 or more.
The main enabling technology is sophisticated indexing called the
Universal Index, effectively indexing all the values held in a database.
This means that query execution isn’t crippled by the join overhead – a
factor that makes many complex joins totally unfeasible with traditional
technology.
Data consolidation is another major feature of the
product, allowing data from diverse sources to be pulled together within
a single database. This might include data held in applications such as
SAP or Seibel, relational databases, file systems, spreadsheets, and
almost any source of tabular data.
Real business needs dictate that this type of
functionality is available – the value of a query tends to increase with
the number of data sources it accesses. In this case there is only one
data source – an Aruna Companion database that holds data unified from
multiple sources.
The natural language interface places the formulation
of queries directly into the hands of the end-user, and the IT
department does not have to hold its breath in the expectation of
light-dimming resource consumption. This is a real win-win situation
where users get what they want without compromising the IT resources
that the rest of the organisation relies upon. From the business user’s
perspective it is simply an immediate resolution to many long-standing
areas of pain that does not rely on IT or long winded, expensive
consultancy projects.
It has to be stressed that Aruna Companion is aimed
fairly and squarely at accelerating query execution. Other performance
problems such as database update bottlenecks require other technology
and are in no way addressed by this product.
Aruna Companion is an essential complementary
technology to the relational database and data warehousing technologies
used in most large organisations, so that queries involving large data
sets can execute in a timely manner. VERTICAL SOLUTIONS
Corporate Reporting
Aruna Companion is well suited to several classes of
business solutions; all characterised by the need to speedily explore
data that is complex, and has been consolidated from several data
sources. Industries dealing with large numbers of customers, usually
consumers, tend to generate very large data volumes and it is this data
that often proves to be too much for traditional database technology to
handle in a timely manner. Several verticals are briefly mentioned below
– this list is not exhaustive, but gives a flavour of the applications
of Aruna Companion. Telecommunications
The aim of corporate reporting is to support better
decision-making, demonstrate compliance to appropriate legislation, and
monitor business operations. Cost effective support for this type of
application is perhaps the most appropriate use of Aruna Companion.
Conversation style querying of very large aggregated data sets is not
only possible, but easily and efficiently executed. This avoids the
sub-optimal reporting that is a common feature with large data volumes,
frequently caused by an inability to access all relevant data sources.
The data aggregation features of Aruna Companion provide an ideal
environment in which to execute this type of task. Fraud Detection
Aruna Companion has proved to be particularly effective
in the telecoms industry. Cable & Wireless have quickly addressed
revenue assurance problems providing a ROI measured in the hundreds of
percent. One of the mobile telecoms operators was able to analyse Call
Detail Records (CDR) for better targeting of promotions and services.
More generally, Aruna Companion provides an excellent support technology
for CRM applications and the analysis of the many gigabytes of data
telcos store as a result of their billing and service monitoring
activities.
One of the most significant problems many organisations
face in trying to deal with fraud is the integration of data from many
disparate sources. Only a unified view of customer data can facilitate
the identification of fraud, and Aruna Companion enables this through
its global indexing technology. Support for "similarity searching" and
"sounds like" searching also helps detect fraudulent activity. Fraud
detectives are helped by the natural language interface and the very
high speeds of query execution meaning that "train of thought" querying
is feasible. Data Protection Mergers and Acquisitions
(M&A)
Every citizen has the right to demand a Subject Access
Request (SAR) of an organisation holding their details. The requestor
can ask for a description of the details that are held, what they are
used for and who uses them. Satisfying such a request is not trivial.
Details of individuals may be held in many databases and applications
and a scan of these is typically time consuming and costly. The data
consolidation features of Aruna Companion are very well suited to
addressing this type of problem, without impacting on the live
production systems. Web-Based Delivery of Information
Integration of systems and data is a major headache
associated with M&A activity. Pre-M&A activity may require
analysis and comparison of customer databases and detailed analysis of
target financial records. Post-M&A requirements can balloon into an
immediate need to consolidate data. While Aruna Companion cannot address
the process integration problem in any significant manner, it can
provide a bridge to allow a graceful merging of two or more
organisations’ operational activity.
Delivering access to large numbers of business users
for data interrogation that involves complex SQL queries against large
amounts of data has not previously been feasible without the support of
very significant IT infrastructures. Even then, query complexities and
volumes have needed to be limited. This is the classic ‘richness versus
reach quandary’. The characteristics of Aruna Companion enable a low
cost technology footprint to deliver high levels of query performance
even over the Web. This has enabled companies such as the 192.com
directory service to offer complex queries to half a million users per
day on a small number of Pentium-based servers. This opens up many
Web-based service opportunities that were previously not cost effective.
TECHNICAL ARCHITECTURE
Aruna’s technology has been under development for the
last six years, and the capability that is offered is not the result of
some overnight tweaking of existing technologies. Several patents have
been granted and are a measure of the technology barriers that had to be
overcome to realise what might now be so easily taken for granted –
lightning fast execution of complex queries on data that has been
consolidated from diverse sources. Aruna QDS Builder
There are only a few components in the Aruna Companion
architecture, but what they achieve is really quite significant. At the
highest level Aruna Companion consists of the Aruna QDS (Query Data
Store) Builder and the Aruna Query Processor. Each of these has several
components and we shall look briefly at the functionality they provide.
QDS Builder creates data stores from a wide variety of
inputs. This includes ODBC data sources as well as delimited flat files,
and it is Aruna Connector that provides the connectivity to these
sources. It offers a graphical user interface and supports third-party
products as well as Aruna’s own connection mechanisms. ERP systems (most
notably SAP), text files, and XML files are all supported and indexed
using Aruna FastPath. It is FastPath that actually performs the clever
work of building indexes that will subsequently support very fast query
execution.
Four types of data structure are created to enable
Aruna’s speed and functionality. Metadata is stored that describes the
layout of data imported from the various sources. Then data is imported
in a compressed format so there is no need to go back to the sources to
load records. The third data structure is a universal index of all data
values that have been imported. The final data structure is a complex
and patented composition of pointers between indexes, metadata, and
records.
This is the real heart of Aruna Companion since it
provides very rapid access to data values, and it is important to note
that these are indexed at word level. This means for example, that a
query can be launched to find all occurrences of George. Companion
supports queries against the full data set (it obviously supports
queries against a specified table also) and a query looking for George
might deliver records with an address containing George (e.g. George
Street) as well as records where the first or middle name of an
individual is George.
Aruna QDS Builder indexes all imported data. Clearly
this can be a time consuming activity. As a rule of thumb this activity
takes around an hour a gigabyte – however, the next release of the
product will improve this by up to an order of magnitude. Once the data
is loaded almost anything is possible, without the need for optimisation
and tuning. It is hard to convey the power of the product without
actually seeing it, but a query such as "where are the best Indian
restaurants in Birmingham" would execute in a meaningful way assuming
the database was relevant. It would also interrogate millions of records
and give a result within a few seconds. Aruna Query Processor
Some modifications can be made at loading. The most
common is the need to reconcile field (or attribute) names between
different data sources. One source might label a job field as "position"
and another as "title". It is possible to specify that these mean the
same thing so that Aruna can subsequently index them appropriately.
The two major components in this part of the product
are Aruna Query and Aruna View. Additionally, there is an ODBC interface
for submitting SQL queries, enabling continued use of a customer’s
favourite query tools.
Aruna Query is the name for the natural language
interface – and it should be stressed that this does not require careful
phrasing or other contortions of the English language to get accurate
results. This is useful for ad-hoc queries and exploration, and popular
questions can be saved for reuse or scheduled to run at pre-defined
periods. The results from these queries can generate automatic reports
sent by e-mail. Aruna Query supports the definition of synonyms and can
even detect ambiguity in questions.Aruna View provides a mechanism for
business users to build simple Web browser applications where repeatable
access is needed to common views of data. Business users can also
specify the content of the Aruna View screens. At the simplest level,
this involves selecting a table or row for display. Screens can be
linked, or ‘transitioned’ allowing for navigation between forms. For
example, a user can view a customer’s record and then transition to a
view of the associated invoice records for that customer.
Multiple tables can also be nested so that they are
displayed in the same screen. In this case, a table for customers may
appear at the top of the form and associated invoice tables are
displayed at the bottom of the screen. In fact, an unlimited number of
tables can be linked and displayed in this way. Aruna’s Natural
Language Dictionary
Forms can also display the results of complex queries.
For example, on the customer form, a simple button may launch the SQL
query that answers the question "Who were my top ten customers in terms
of sales last month?" Aruna Companion provides business users with very
fast query and analytical capabilities without the need for IT support.
Questions can only be put to Aruna Companion if it
understands the vocabulary being used. To support this a dictionary is
used that stores an English Lexicon and domain specific words and
phrases. For example, a financial database’s vocabulary would include
interest, collateral, credit, rating, borrow, default, principal,
overdrawn, current, and deposit.
User-specific dictionaries are also supported that
support a specific query style. Customisation is facilitated through
synonyms and phrases that are frequently used.
(Monday sales report = last weeks sales for the London
area, or, risky = customers with low balances and high monthly spend
etc.). Users can also add verbs to define a relationship. For example,
if the verb "buy" is defined as the relationship between customers and
orders, then whenever "buy" appears in a query, or any of its synonyms
such as ‘bought’ or ‘buys’, it creates a join between the customer and
order tables. PERFORMANCE
The radically different technology employed by Aruna
Companion delivers some surprising and pleasing characteristics. With
traditional database technologies performance tends to degrade
exponentially with the volume of data. This is not the case with
Companion where the opposite happens. Performance degrades
logarithmically to the point where increase in volumes actually makes no
difference to performance for locating database matches. Exponential
degradation is also experienced with traditional database technology as
join complexity increases. Aruna Companion demonstrates a linear
degradation that is easily addressed through linear addition of
resources (memory, disk, processor etc.).
RETURN ON INVESTMENT
The investment profile of Aruna Companion is very
efficient. Training is kept to a minimum, as is the need for hardware
resources to support it. Licensing is based on volumes of data and as
such is efficient for the user (capability is only paid for when it is
needed).
Experience points to very rapid ROI – the outlay is
small and the benefits are very quickly realised. Users can be gaining
benefit from Aruna Companion within the first month of its installation.
CONCLUSION
Here is a technology that offers a platform for
delivering intelligence into the Enterprise. It provides access to
corporate information resources in a way that does not compromise the
operation of the core IT capability, at the same time providing an
easy-to-use interface to business managers hungry for information. The
risks associated with an investment in Aruna Companion are as good as
negligible – only gross negligence on the part of the host organisation
could produce undesirable outcomes.
We feel that Aruna is addressing a problem close to the
hearts of business managers in all large organisations, and it addresses
the issue of fast access to large data volumes in a unique way. Aruna’s
proof of value proposition is a low risk way for organisations to prove
the returns enabled by the technology in their own organisation – there
seem to be few reasons not to investigate the technology, and many
reasons why it should be considered.
MARTIN BUTLER
Martin Butler is Founder and President of Butler Group.
He is well known throughout the world as one of the most incisive
commentators on the business use of information technology, and has
earned a reputation for direct, unambiguous analysis. Martin has
authored numerous reports on topics ranging from e-commerce to the
competitive use of information, and his Master Class series of seminars
has now been running for over five years.
As CW360 – the online IT magazine comments – "Martin
Butler is one of the world’s foremost thought leaders on technology and
strategy." This is borne out by the thousands of business and technology
managers that attend his seminars every year, and by the tens of
thousands of professionals that subscribe to Martin’s monthly comment
piece TECHwatch.
Martin Butler Founder and President
Butler Group
Martin has over twenty years experience in the IT
industry. After gaining a first class honours degree in Theoretical
Physics, he worked in computing in the defence industry before going on
to work for major systems houses and large corporations in banking and
pharmaceuticals. He founded Butler Group in 1991, and today – to quote
CW360 once more "...Butler Group, one of Europe’s leading analyst
groups."
© 1996-2005
Butler Direct Ltd. |