Exasol In-Memory Database
What is Exasol?
Exasol Company was established in 2000 in Nuremberg, Germany, with an ambition – To engineer the world’s fastest database for Data warehousing and Data analytics, with no limits on data volume.
Exasol Database is an In-memory, massively parallel and columnar storage database for Data Warehousing and complex analytical queries processing.
- In-memory Computation
- Massively Parallel Processing
- Columnar storage
A Database becomes incredibly faster when it utilizes all these three technologies together.
Exasol recommended a configuration for a server instance is listed below:
- 2 Intel Xeon CPUs each with at least 4 cores
- 128 GB of RAM
- 8 to 24 SAS hard drives in RAID-1 pairs
The cluster interconnects are typically implemented via 1-Gbit to 10-GBit Ethernet.
EXASOL can operate on low-cost commodity hardware with a shared-nothing architecture to ensure high resilience and can scale from a single node to hundreds of nodes to support 1000’s of concurrent users and up to 100 TB Data.
What makes it better?
In-memory Database: In-memory distributed systems are one of the most recent developments of Big Data technologies. In memory computation tools have been developed with the help of Hadoop’s massively parallel processing cluster architecture. Example: Apache Spark
In-memory Database stores all required Data in main memory and this reduces Disk I/O and makes the analytical process extremely efficient.
RAM Sizing– EXASOL use compression algorithms, which is capable of compressing Data with 2.5X compression factor.
Example:
- RAW Data volume: 2.4TB
- Compression factor: 2.5x
- DB RAM estimation: 10% (of RAW Data)
Note: Exasol is In-memory Database, it keeps all Database table in main memory and removes file I/O process. This results in blazing fast execution of queries.
Massively Parallel Processing: Massively Parallel Processing Architecture explains how Multiple systems are connected together to perform a task by splitting the computation process among different nodes and operate concurrently.
Massively Parallel Processing cluster works on shared nothing architecture. Each node or server contains the Data for which it is responsible and the computing power to analyze that Data.
Note: Exasol uses MPP to split query process over the multi-nodes cluster and increases its processing speed.
Columnar Storage: A columnar storage is Database storage technique which stores tables in a columnar format.
For any Database, its DISK I/O process is very important. It means Read/Write speed of Database. If any Database has high DISK I/O Speed, then its performance will be better.
Some columnar file storage examples are Parquet file, ORC file, RC file etc.
Conventional RDBMS stores Data as Row storage format, where each row contains field values for a single record. In Row Storage each row data are stored in disk blocks, Number of blocks utilized by a single row depends on data in that row.
If ROW DATA SIZE > BLOCKSIZE
Row Data splits into two blocks and half of the Block memory remains unused. So the unused Block cannot be used by next row.
If ROW DATA < BLOCKSIZE
Since Row Data is less than the Block size, the Block Space is not fully utilized.
But columnar storage resolves Data storage utilization, by saving a whole column Data in a block.
This way a block holds a certain data size and data space doesn’t get wasted.
Solution: Columnar Storage
Column Data Size = Define Block Size
Key advantages of Columnar Storage:
- Each column is stored as a separate file that’s why a different compression algorithm can be used as per the requirement of column data type.
- DISK I/O gets utilize because only those columns are read which are in queries and needed.
- Reading from columnar storage is faster than row storage.
Note: Columnar Storage helps Exasol to Read/Write data quickly, that’s how Exasol improves performance.
Advance Exasol features:
- Virtual Schema: Virtual schema can be in Exasol to perform analysis on an external database.
- Self-Tuning and Query optimization: Exasol has Auto query optimizer, which optimizes the query.
- Scalability: Can be scale up easily from a single node to hundreds of nodes.
- Big Data Integration: Exasol can be configured with Hadoop’s ecosystem and work on Hadoop cluster.
- Easy integration with other tools via ODBC and JDBC connectors.
- Supports UDF scripts in Python, Java, Lua, and R
Conclusion: Exasol delivers high-performance analytics on a distributed cluster system which is easy to use, highly scalable and cost-efficient. It has reliable support and easy integration with other BI tools which makes ETL process a comfortable journey.
If your organization is seeking for in-memory Database solution then Exasol can be a prominent choice. There are several similar Database like Exasol which are mentioned below.
- Redshift (From AMAZON)
- Sap HANA (From SAP)
- VoltDB
- MemSQL
- SolidDB (From IBM)