Thursday, December 24, 2009

The EAV Model of Data Representation

Definition

EAV = Entity-Attribute-Value. EAV/CR = EAV with Classes and Relationships

Conceptually, a table with three columns:

  • Entity/Object ID
  • Attribute/Parameter
  • the Value for the attribute.

The table has one row for each Entity-Attribute-Value triple.

In reality, we prefer to segregate values based on data type, so as to support indexing and let the database perform type validation checks where possible. So there are separate EAV tables for strings, real and integer numbers, dates, long text and Binary large objects (BLOBS).


Benefits

  • Flexibility. There are no arbitrary limits on the number of attributes per entity. The number of parameters can grow as the database evolves, without schema redesign. (Important in the EPRS)
  • Space-efficient storage for highly sparse data: One need not reserve space for attributes whose values are null.
  • A simple physical data format with partially self-describing data. Maps naturally to interchange formats like XML (the attribute name is replaced with start-attribute and end-attribute tags.)
  • For databases holding data describing rapidly evolving scientific domains, insulation against consequences of change and potential domain independence .

Physical vs. Logical Schema

  • EAV is primarily a means of simplifying the physical schema of a database.
  • Users of the system (as well as analytical programs) expect the data to be conventionally structured. The logical schema of a database (which is domain-specific) reflects the users' perception of the data.
  • In an EAV database, the logical schema differs greatly from the physical schema. In a conventional database, the two do not differ appreciably.
  • The user interface of a good EAV system conforms to the logical schema as much as possible, creating the illusion of conventional data organization.
  • An EAV system must record the logical schema through metadata.
  • If sufficiently rich, metadata can also be used actively (i.e., during actual system operation), instead of only describing the system passively.

EAV/CR Overview

EAV/CR overlays an object-oriented framework on top of an EAV physical structure.

  • A "class" and "object" in EAV/CR are similar their OOP counterparts.
  • EAV/CR allows modeling of inter-class relationships.
  • EAV/CR allows classes to contain other classes as members. For this purpose, EAV/CR supports class instances (Object IDs) as values.
  • EAV/CR permits inheritance of properties (attributes) between classes.
  • Allows representation of non-first-normal-form (NF2) data.

Relationship Details as Inverted-File Indexes

  • Knowledge of the object involved in a relationship does not describe the relationship itself, because the objects can interact in various ways.
  • Therefore the description of a fact in a relationship is sometimes best served by narrative text.
  • The objects allow the fact to be indexed. (This way, facts related to a particular object can be rapidly retrieved.) The set of objects that index particular facts serve the same role as the Inverted Files used in Text Information Retrieval, with the difference that the objects are strongly typed because they belong to specific classes.

Drawbacks of EAV/CR

  • Considerable up-front programming (wheel reinvention) is needed to do the tasks that a conventional architecture would do automatically. However, such programming needs to be done only once, and availability of generic EAV tools could remove this limitation.
  • EAV design is less efficient than a conventional structure for retrieving data in bulk on numerous objects at a time. (For object-at-a-time retrieval, such as through a Web-based browsing interface, the volume of data is small enough that the difference is not noticeable.)
  • Performing complex attribute-centric queries is both significantly less efficient as well as technically more difficult. This needs a query generator. However, most queries on scientific databases are relatively straightforward, and directed toward specific objects of interest.
  • For schemas that are relatively static and/or simple (e.g., databases for business applications, such as inventory or accounting), the overhead of EAV design exceeds its advantages. So don't blindly represent all data in EAV form: use conventional tables where the objects are numerous and non-sparse.

No comments:

Post a Comment