Butler Group White Paper

Aruna Companion
Aruna

Author: Martin Butler

SUMMARY

Aruna Companion is a very timely solution to a wide range of business problems. As data sources proliferate and data volumes grow exponentially so it becomes more difficult for business managers to get timely access to information. Aruna has addressed this problem through the creation of an innovative information delivery platform based on its own unique database technology that complements existing technologies and builds on their strengths. Its technology platform addresses three pressing issues:

  1. Business managers are prohibited from asking many questions simply because existing database technology cannot process the resulting queries. Real database queries are usually complex, generating joins that typically take many hours and sometimes days to execute – even on the most powerful hardware. The techniques developed by Aruna mean that complex queries will typically execute within a few minutes and maybe even seconds – even on modest hardware configurations.

  2. Proliferation of data sources means that an integrated view of data is often impossible to achieve. Aruna has addressed this problem also, supporting many data formats and databases and facilitating consolidation within a single Query Data Store. Subsequent queries can be applied to individual tables, or to the whole unified data store – a unique capability in my experience.

  3. Business users want an easy to use query interface that exploits their knowledge of the data. Aruna has incorporated a natural language query interface that supports everyday language for query composition. The resulting SQL is often very complex, but is hidden from the user. This makes the technology available to anyone that has authority to use it.

It is very important to appreciate the small footprint of Aruna’s technology. There is little need for users to train in its use, and even at a technical level there is almost nothing to be done other than product installation which is designed to be something the business users can do for themselves. Tuning and configuration are virtually non-existent – and very deliberately so. Add to this the modest resources needed to execute complex queries on very large data sets, and we have a technology that offers significant capability with minimum overhead.

The uses for this unique technology are growing fast. Wherever there is a need to query large data sets on a frequent basis Aruna Companion is being seen as a cost effective, powerful solution. Financial services and telecoms companies are typically finding uses for the technology, and there are already some impressive success stories.

Unlike many IT developments, Aruna Companion is non-disruptive in that it works with existing database technologies. It is not intended to be a replacement for traditional relational databases, but a complement that utilises the relational view of data while offering orders of magnitude performance improvements. Aruna Companion, as its name implies, sits next to them on its own server and simply augments their query capabilities while reducing the burden on the operational environment. Aruna has not created a transactional database system – Companion has been created for rapid execution of queries, but does not support online update (although Aruna’s Query Data Store can be incrementally updated on a frequent basis if needed). It is best thought of as a highly effective data warehouse, but with a much smaller technology and cost footprint.

Perhaps most attractive of all is the "proof of value" proposition offered by Aruna. To overcome scepticism and risk Aruna will create a Query Data Store from a company’s existing data and demonstrate the capability that is being offered. There is no excuse for ignoring this technology if complex querying of large data sets has proved to be problematical in the past. Aruna Companion is a low risk, high payback database technology that most large corporations should have as part of their technology arsenal.

THE NEED FOR FASTER DATABASE TECHNOLOGY

Increases in speed usually manifest as quantum jumps, and a new generation of technologies, particularly corporate portals and business intelligence tools, demand a quantum increase in the speed of information access and delivery. End users are being presented with direct access to the information assets of the enterprise, and they are intolerant of delays. Even worse, there is some information that simply cannot be delivered because of its inaccessibility, its complexity, or a lack of resources to process it in a timely fashion.

Critical factors affecting business performance are increasingly linked to the speed and availability of information. Responsiveness to customers and market conditions is wholly dependent on timely access to the relevant information. The ability to exercise appropriate levels of agility depend on the speed of management decision making; which is again dependent on speed of access to information. There is hardly a single aspect of corporate performance that is not adversely affected by inadequate information delivery.

Deriving benefits from information is one side of the equation. The other is minimising the cost of information delivery. Most things are possible with IT – but at a cost. Tighter economic conditions mean the cost of information has to be reduced as far as possible. Expensive, large-scale projects with implementation timescales measured in years are generally not acceptable. There is a need for technologies that provide rapid Return On Investment (ROI), are friendly to the end user, and deliver benefits that are otherwise unavailable. Butler Group divides the cost of exploiting information into three main areas of activity:

  • Origination is all costs associated with resources used to derive information requested by the user. This includes database technologies, data warehousing, and data mining technologies.

  • Delivery is the next element of cost, and this is all resources used to deliver information to the user (portal technology for example).

  • Execution is the cost incurred in applying the information. Time is spent using information and formulating queries.

Origination, delivery, and execution are all interlinked and while delivery mechanisms have developed considerably over the last three years, origination and execution have seen little innovation. This means that we have excellent user interfaces and information delivery channels, but poor origination technology resulting in delays and lack of access. Core to origination is database technology, and while the relational database has proved to be an excellent all purpose database technology, it is hardly well suited to high-speed origination of information. There is no question of replacing this technology – most large organisations have significant investments in relational databases and they have no desire to replace them. We need a technology that can build on the strengths of the relational model, but not be limited by its run-time inefficiencies. This opportunity has been very well understood by Aruna and its technology addresses these specific needs.

You could interpret Aruna’s offering as being a delivery platform, with a mirrored cache of multiple Origination sources. This is a new concept that not only improves delivery speed and scalability but ‘buffers’ the Origination sources from any impact whatsoever during query execution. While presenting a familiar and widely recognised relational view of data, and acting as a complement to existing database technologies, Aruna provides the quantum increase in database speed that portals and business intelligence applications now require.

Of equal importance is the fact that Aruna offers a natural language interface that directly impacts on the execution costs associated with information usage. Not only are the timescales required to formulate queries dramatically reduced, but so are the skill levels. This affects execution costs quite significantly; an area that is often overlooked in the use of IT.

The speed increases available from Aruna are significant enough to create new business opportunities – complex queries that might take many hours or days to execute using traditional technology will typically execute within a few minutes or even seconds. Organisations using the technology have been able to offer a new level of capability to their information workers, resulting in some significant cost savings, and the creation of opportunities that otherwise would have been infeasible.

Aruna Companion

WHAT ARUNA COMPANION IS AND WHAT IT ISN’T

Very fast query execution is one major attribute of Aruna Companion. Depending on the type of query this speed improvement over conventional relational technology can be a factor of 100 or more. The main enabling technology is sophisticated indexing called the Universal Index, effectively indexing all the values held in a database. This means that query execution isn’t crippled by the join overhead – a factor that makes many complex joins totally unfeasible with traditional technology.

Data consolidation is another major feature of the product, allowing data from diverse sources to be pulled together within a single database. This might include data held in applications such as SAP or Seibel, relational databases, file systems, spreadsheets, and almost any source of tabular data.

Real business needs dictate that this type of functionality is available – the value of a query tends to increase with the number of data sources it accesses. In this case there is only one data source – an Aruna Companion database that holds data unified from multiple sources.

The natural language interface places the formulation of queries directly into the hands of the end-user, and the IT department does not have to hold its breath in the expectation of light-dimming resource consumption. This is a real win-win situation where users get what they want without compromising the IT resources that the rest of the organisation relies upon. From the business user’s perspective it is simply an immediate resolution to many long-standing areas of pain that does not rely on IT or long winded, expensive consultancy projects.

It has to be stressed that Aruna Companion is aimed fairly and squarely at accelerating query execution. Other performance problems such as database update bottlenecks require other technology and are in no way addressed by this product.

Aruna Companion is an essential complementary technology to the relational database and data warehousing technologies used in most large organisations, so that queries involving large data sets can execute in a timely manner.

VERTICAL SOLUTIONS

Corporate Reporting

Aruna Companion is well suited to several classes of business solutions; all characterised by the need to speedily explore data that is complex, and has been consolidated from several data sources. Industries dealing with large numbers of customers, usually consumers, tend to generate very large data volumes and it is this data that often proves to be too much for traditional database technology to handle in a timely manner. Several verticals are briefly mentioned below – this list is not exhaustive, but gives a flavour of the applications of Aruna Companion.

Telecommunications

The aim of corporate reporting is to support better decision-making, demonstrate compliance to appropriate legislation, and monitor business operations. Cost effective support for this type of application is perhaps the most appropriate use of Aruna Companion. Conversation style querying of very large aggregated data sets is not only possible, but easily and efficiently executed. This avoids the sub-optimal reporting that is a common feature with large data volumes, frequently caused by an inability to access all relevant data sources. The data aggregation features of Aruna Companion provide an ideal environment in which to execute this type of task.

Fraud Detection

Aruna Companion has proved to be particularly effective in the telecoms industry. Cable & Wireless have quickly addressed revenue assurance problems providing a ROI measured in the hundreds of percent. One of the mobile telecoms operators was able to analyse Call Detail Records (CDR) for better targeting of promotions and services. More generally, Aruna Companion provides an excellent support technology for CRM applications and the analysis of the many gigabytes of data telcos store as a result of their billing and service monitoring activities.

One of the most significant problems many organisations face in trying to deal with fraud is the integration of data from many disparate sources. Only a unified view of customer data can facilitate the identification of fraud, and Aruna Companion enables this through its global indexing technology. Support for "similarity searching" and "sounds like" searching also helps detect fraudulent activity. Fraud detectives are helped by the natural language interface and the very high speeds of query execution meaning that "train of thought" querying is feasible.

Data Protection
Mergers and Acquisitions (M&A)

Every citizen has the right to demand a Subject Access Request (SAR) of an organisation holding their details. The requestor can ask for a description of the details that are held, what they are used for and who uses them. Satisfying such a request is not trivial. Details of individuals may be held in many databases and applications and a scan of these is typically time consuming and costly. The data consolidation features of Aruna Companion are very well suited to addressing this type of problem, without impacting on the live production systems.

Web-Based Delivery of Information

Integration of systems and data is a major headache associated with M&A activity. Pre-M&A activity may require analysis and comparison of customer databases and detailed analysis of target financial records. Post-M&A requirements can balloon into an immediate need to consolidate data. While Aruna Companion cannot address the process integration problem in any significant manner, it can provide a bridge to allow a graceful merging of two or more organisations’ operational activity.

Delivering access to large numbers of business users for data interrogation that involves complex SQL queries against large amounts of data has not previously been feasible without the support of very significant IT infrastructures. Even then, query complexities and volumes have needed to be limited. This is the classic ‘richness versus reach quandary’. The characteristics of Aruna Companion enable a low cost technology footprint to deliver high levels of query performance even over the Web. This has enabled companies such as the 192.com directory service to offer complex queries to half a million users per day on a small number of Pentium-based servers. This opens up many Web-based service opportunities that were previously not cost effective.

TECHNICAL ARCHITECTURE

Aruna’s technology has been under development for the last six years, and the capability that is offered is not the result of some overnight tweaking of existing technologies. Several patents have been granted and are a measure of the technology barriers that had to be overcome to realise what might now be so easily taken for granted – lightning fast execution of complex queries on data that has been consolidated from diverse sources.

Aruna QDS Builder

There are only a few components in the Aruna Companion architecture, but what they achieve is really quite significant. At the highest level Aruna Companion consists of the Aruna QDS (Query Data Store) Builder and the Aruna Query Processor. Each of these has several components and we shall look briefly at the functionality they provide.

QDS Builder creates data stores from a wide variety of inputs. This includes ODBC data sources as well as delimited flat files, and it is Aruna Connector that provides the connectivity to these sources. It offers a graphical user interface and supports third-party products as well as Aruna’s own connection mechanisms. ERP systems (most notably SAP), text files, and XML files are all supported and indexed using Aruna FastPath. It is FastPath that actually performs the clever work of building indexes that will subsequently support very fast query execution.

Four types of data structure are created to enable Aruna’s speed and functionality. Metadata is stored that describes the layout of data imported from the various sources. Then data is imported in a compressed format so there is no need to go back to the sources to load records. The third data structure is a universal index of all data values that have been imported. The final data structure is a complex and patented composition of pointers between indexes, metadata, and records.

This is the real heart of Aruna Companion since it provides very rapid access to data values, and it is important to note that these are indexed at word level. This means for example, that a query can be launched to find all occurrences of George. Companion supports queries against the full data set (it obviously supports queries against a specified table also) and a query looking for George might deliver records with an address containing George (e.g. George Street) as well as records where the first or middle name of an individual is George.

Aruna QDS Builder indexes all imported data. Clearly this can be a time consuming activity. As a rule of thumb this activity takes around an hour a gigabyte – however, the next release of the product will improve this by up to an order of magnitude. Once the data is loaded almost anything is possible, without the need for optimisation and tuning. It is hard to convey the power of the product without actually seeing it, but a query such as "where are the best Indian restaurants in Birmingham" would execute in a meaningful way assuming the database was relevant. It would also interrogate millions of records and give a result within a few seconds.

Aruna Query Processor

Some modifications can be made at loading. The most common is the need to reconcile field (or attribute) names between different data sources. One source might label a job field as "position" and another as "title". It is possible to specify that these mean the same thing so that Aruna can subsequently index them appropriately.

The two major components in this part of the product are Aruna Query and Aruna View. Additionally, there is an ODBC interface for submitting SQL queries, enabling continued use of a customer’s favourite query tools.

Aruna Query is the name for the natural language interface – and it should be stressed that this does not require careful phrasing or other contortions of the English language to get accurate results. This is useful for ad-hoc queries and exploration, and popular questions can be saved for reuse or scheduled to run at pre-defined periods. The results from these queries can generate automatic reports sent by e-mail. Aruna Query supports the definition of synonyms and can even detect ambiguity in questions.Aruna View provides a mechanism for business users to build simple Web browser applications where repeatable access is needed to common views of data. Business users can also specify the content of the Aruna View screens. At the simplest level, this involves selecting a table or row for display. Screens can be linked, or ‘transitioned’ allowing for navigation between forms. For example, a user can view a customer’s record and then transition to a view of the associated invoice records for that customer.

Multiple tables can also be nested so that they are displayed in the same screen. In this case, a table for customers may appear at the top of the form and associated invoice tables are displayed at the bottom of the screen. In fact, an unlimited number of tables can be linked and displayed in this way.

Aruna’s Natural Language Dictionary

Forms can also display the results of complex queries. For example, on the customer form, a simple button may launch the SQL query that answers the question "Who were my top ten customers in terms of sales last month?" Aruna Companion provides business users with very fast query and analytical capabilities without the need for IT support.

Questions can only be put to Aruna Companion if it understands the vocabulary being used. To support this a dictionary is used that stores an English Lexicon and domain specific words and phrases. For example, a financial database’s vocabulary would include interest, collateral, credit, rating, borrow, default, principal, overdrawn, current, and deposit.

User-specific dictionaries are also supported that support a specific query style. Customisation is facilitated through synonyms and phrases that are frequently used.

(Monday sales report = last weeks sales for the London area, or, risky = customers with low balances and high monthly spend etc.). Users can also add verbs to define a relationship. For example, if the verb "buy" is defined as the relationship between customers and orders, then whenever "buy" appears in a query, or any of its synonyms such as ‘bought’ or ‘buys’, it creates a join between the customer and order tables.

PERFORMANCE

The radically different technology employed by Aruna Companion delivers some surprising and pleasing characteristics. With traditional database technologies performance tends to degrade exponentially with the volume of data. This is not the case with Companion where the opposite happens. Performance degrades logarithmically to the point where increase in volumes actually makes no difference to performance for locating database matches. Exponential degradation is also experienced with traditional database technology as join complexity increases. Aruna Companion demonstrates a linear degradation that is easily addressed through linear addition of resources (memory, disk, processor etc.).


RETURN ON INVESTMENT

The investment profile of Aruna Companion is very efficient. Training is kept to a minimum, as is the need for hardware resources to support it. Licensing is based on volumes of data and as such is efficient for the user (capability is only paid for when it is needed).

Experience points to very rapid ROI – the outlay is small and the benefits are very quickly realised. Users can be gaining benefit from Aruna Companion within the first month of its installation.

CONCLUSION

Here is a technology that offers a platform for delivering intelligence into the Enterprise. It provides access to corporate information resources in a way that does not compromise the operation of the core IT capability, at the same time providing an easy-to-use interface to business managers hungry for information. The risks associated with an investment in Aruna Companion are as good as negligible – only gross negligence on the part of the host organisation could produce undesirable outcomes.

We feel that Aruna is addressing a problem close to the hearts of business managers in all large organisations, and it addresses the issue of fast access to large data volumes in a unique way. Aruna’s proof of value proposition is a low risk way for organisations to prove the returns enabled by the technology in their own organisation – there seem to be few reasons not to investigate the technology, and many reasons why it should be considered.

MARTIN BUTLER

Martin Butler is Founder and President of Butler Group. He is well known throughout the world as one of the most incisive commentators on the business use of information technology, and has earned a reputation for direct, unambiguous analysis. Martin has authored numerous reports on topics ranging from e-commerce to the competitive use of information, and his Master Class series of seminars has now been running for over five years.

As CW360 – the online IT magazine comments – "Martin Butler is one of the world’s foremost thought leaders on technology and strategy." This is borne out by the thousands of business and technology managers that attend his seminars every year, and by the tens of thousands of professionals that subscribe to Martin’s monthly comment piece TECHwatch.

Martin Butler Founder and President Butler Group

Martin has over twenty years experience in the IT industry. After gaining a first class honours degree in Theoretical Physics, he worked in computing in the defence industry before going on to work for major systems houses and large corporations in banking and pharmaceuticals. He founded Butler Group in 1991, and today – to quote CW360 once more "...Butler Group, one of Europe’s leading analyst groups."



© 1996-2005 Butler Direct Ltd.