Accumulo

Apache Accumulo is a computer software project that developed a sorted, distributed key/value store based on the design of Google’s Big Table and is powered by Apache Hadoop, Apache Zookeeper, and Apache Thrift.

Accumulo is unlike some other new distributed databases in that it was developed with more of a focus on building analytical platforms, rather than simply as the scalable persistence layer for data generated via a web application.The flexibility of the data model and support for building indexes in Accumulo make analyzing data from a variety of sources easier. Accumulo also introduces fine-grained access control to make it possible for organizations to confidently protect data of varying sensitivity levels in the same physical cluster. Accumulo makes it possible to group sets of columns together on disk via a feature called locality groups so analytical applications can gain advantages. Unlike other distributed database, in the names of columns don’t have to be declared beforehand, there is no penalty for a large number of different column names, and the columns can be mapped to locality groups in any way desired. We discuss locality groups in depth in Column Families
At the most basic level, Accumulo stores key-value pairs on disk (Figure 1-6), keeping the keys sorted at all times. This allows a user to look up the value of a particular key or range of keys very quickly. Values are stored as byte arrays, and Accumulo doesn’t restrict the type or size of the values stored. The default constraint on the maximum size of the key is 1 MB.

When stored in Accumulo, key-value pairs are grouped into tables. You can apply some settings at the table level to control the behavior and management of the data. The key-value pairs within tables are partitioned into tablets and distributed automatically across multiple machines in a cluster. Each table begins life as a single tablet, spanning all possible keys. Once data is written to a table and it reaches a certain size threshold, the tablet server hosting it finds a good point in the middle of the tablet and splits it into two tablets. Writing to Multiple Tables Often applications will want to write data to multiple tables. It is possible to simply have multiple BatchWriters, each writing to a different table. [ref]


Lookups and Scanning

Reading data from Accumulo is accomplished with a Scanner. To create a Scanner, simply specify the table over which to scan and provide an Authorizations object representing the authorizations of the user.

Scanner scanner = connector.createScanner(“table_name”, new Authorizations());
for (Entry<Key,Value> entry : scanner) {
Key k = entry.getKey();
Value v = entry.getValue();

}

 

Testing
Applications can be tested in several ways other than with a fully distributed Accumulo instance. These include the MockAccumulo and the MiniAccumuloCluster classes. MockAccumulo is an in-memory instance that can be used to test applications without setting up an Accumulo instance.
Obtaining a new MockInstance is done as follows:

Instance instance = new MockInstance();

 

Common Iterations in tablet server for analytics and management for key/value pairs:

-Aggregations
-Partitioned Joins
-File Reads
-Block Caching
-Merging
-Deletion
-Isolation
-Locality Groups
-Range Selection
-Column Selection
-Cell level security
-Versioning
-Filtering

Throughput:

Ingest speed: 500K records/s /node
Scan: 1 M resods/s/node

Notes: Sqqrl Enterprise is a proprietary tools can run on Accumulo and can facilitate auto indexing, Lucene integration, security extensions, custom iterations.

19,377 total views, 4 views today

Comments are closed.