Agenda

Overview

A document-oriented database is a database designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

Document-oriented databases are one of the main categories of NoSQL databases, and the popularity of the term “document-oriented database” has grown with the use of the term NoSQL itself. XML databases are a subclass of document-oriented databases that are optimized to work with XML documents. Graph databases are similar, but add another layer, the relationship, which allows them to link documents for rapid traversal.

Document-oriented databases are inherently a subclass of the key-value store, another NoSQL database concept. The difference lies in the way the data is processed; in a key-value store, the data is considered to be inherently opaque to the database, whereas a document-oriented system relies on internal structure in the document in order to extract metadata that the database engine uses for further optimization. Although the difference is often moot due to tools in the systems, conceptually the document-store is designed to offer a richer experience with modern programming techniques.

Document databases contrast strongly with the traditional relational database (RDB). Relational databases generally store data in separate tables that are defined by the programmer, and a single object may be spread across several tables. Document databases store all information for a given object in a single instance in the database, and every stored object can be different from every other. This eliminates the need for object-relational mapping while loading data into the database.

Next, we are going to discuss some of the popular ‘Document Oriented Databases’ -

ArangoDB

(https://www.arangodb.com/)

ArangoDB is the open-source native multi-model database for graph, document, key/value and search needs.

In this database -

developers can Map data natively to the database and access it with the best patterns for the job – traversals, joins, search, ranking, geospatial, aggregations – you name it. It is available in under Apache License. It supports languages like - C, .NET, Java, Python, Node.js, PHP, Scala, Go, Ruby, Elixir.

For architects - Polyglot persistence without the costs. Easily design, scale and adapt your architectures to changing needs and with much less effort.

For Data Scientiests - Combine the flexibility of JSON with semantic search and graph technology for next generation feature extraction even for large datasets.

ArangoDB OASIS is scalable fully managed service for ArangoDB. Which can be deployed in AWS, Google Cloud or Azure. It can be run under Docker or Kubernetes.

AQL (ArangoDB’s query language) is a declarative query language letting you access the very same data with a broad range of access patterns like traversals, JOINs, search, geospatial or any combination. Everyone experienced with SQL will have an easy start with AQL and might think AQL feels more like coding.

CouchDB

(https://couchdb.apache.org/)

Seamless multi-master sync, that scales from Big Data to Mobile, with an Intuitive HTTP/JSON API and designed for Reliability. Apache CouchDB™ lets you access your data where you need it. The Couch Replication Protocol is implemented in a variety of projects and products that span every imaginable computing environment from globally distributed server-clusters, over mobile phones to web browsers.

Store your data safely, on your own servers, or with any leading cloud provider. Your web- and native applications love CouchDB, because it speaks JSON natively and supports binary data for all your data storage needs.

The Couch Replication Protocol lets your data flow seamlessly between server clusters to mobile phones and web browsers, enabling a compelling offline-first user-experience while maintaining high performance and strong reliability. CouchDB comes with a developer-friendly query language, and optionally MapReduce for simple, efficient, and comprehensive data retrieval.

Single node or cluster CouchDB is a terrific single-node database that works just like any other database behind an application server of your choice. Most people start with a single node CouchDB instance. More demanding projects can seamlessly upgrade to a cluster. CouchDB is also a clustered database that allows you to run a single logical database server on any number of servers or VMs. A CouchDB cluster improves on the single-node setup with higher capacity and high-availability without changing any APIs.

HTTP/JSON CouchDB makes use of the ubiquitous HTTP protocol and JSON data format and is compatible with any software that supports them. CouchDB also works great with external tools like HTTP proxy servers, load balancers.

Offline First Data Sync CouchDB’s unique Replication Protocol is the foundation for a whole new generation of “Offline First” applications for Mobile applications and other environments with challenging network infrastructures.

Ecosystem CouchDB is built for servers (from a Raspberry Pi to big cloud installations), while PouchDB is built for mobile & desktop web-browsers and Couchbase Lite is built for native iOS & Android apps. And all of them can seamlessly replicate data with each other. CouchDB is serious about data reliability.

Reliability Individual nodes use a crash-resistent append-only data structure. A multi-node CouchDB cluster saves all data redundantly, so it is always available when you need it.

Crate.IO

(https://crate.io/)

It claims to be the #1 database for IoT-scale. Purpose-built to scale modern applications in a machine data world.

A highly scalable SQL database seam­lessly growing with the use case. Pro­cess, store, query & analyze massive amounts of data in real-time with ease.

SQL Ease + NoSQL Agility A distributed SQL DBMS built atop NoSQL storage & indexing delivers the best of SQL & NoSQL in one DB.

Simple Scalability, Always On Masterless architecture with auto-sharding & replication. Simple to scale and to keep running, 24x7.

Real-time Performance Distributed. In-memory. Columnar. Query a firehose of data in real time–time series, geospatial, joins, aggregations, text search,…

Dynamic Schema Schema evolves automatically as new columns are inserted. Elegantly handles any tabular or non-tabular data to support a wide range of use cases.

Instant Results Monitor your data in real-time. Connect your data to any SQL-based visualization tool. Turn your data into action.

OrientDB

(https://orientdb.org)

OrientDB is the first Multi-Model Open Source NoSQL DBMS that combines the power of graphs and the flexibility of documents into one scalable, high-performance operational database.

Gone are the days where your database only supports a single data model. As a direct response to polyglot persistence, multi-model databases acknowledge the need for multiple data models, combining them to reduce operational complexity and maintain data consistency. Though graph databases have grown in popularity, most NoSQL products are still used to provide scalability to applications sitting on a relational DBMS. Advanced 2nd generation NoSQL products like OrientDB are the future: providing more functionality and flexibility, while being powerful enough to replace your operational DBMS.

SednaDB

(https://www.sedna.org/)

Sedna is a free native XML database which provides a full range of core database services - persistent storage, ACID transactions, security, indices, hot backup. Flexible XML processing facilities include W3C XQuery implementation, tight integration of XQuery with full-text search facilities and a node-level update language.

Basic Features

  • Available for free in open source form under Apache License 2.0
  • Native XML database system implemented in C/C++
  • Support for W3C XQuery language validated by W3C XQuery Test Suite
  • Full-text search indices (native or based on dtSearch)
  • Support for a declarative node-level update language
  • Support for ACID transactions
  • Support for fine-grained XML triggers
  • Incremental hot backup
  • Indices (based on B-tree)
  • Support for Unicode (utf8)
  • SQL connection from XQuery
  • XQuery external functions implemented in C
  • Database security (users, roles and privileges)

Apache Solr

(https://lucene.apache.org/solr/)

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world’s largest internet sites.

Features * Advanced Full-Text Search Capabilities * Optimized for High Volume Traffic * Standards Based Open Interfaces - XML, JSON and HTTP * Comprehensive Administration Interfaces * Easy Monitoring * Highly Scalable and Fault Tolerant * Flexible and Adaptable with easy configuration * Near Real-Time Indexing * Extensible Plugin Architecture

RethinkDB

(https://rethinkdb.com/)

RethinkDB is the open-source, scalable database that makes building realtime apps dramatically easier. RethinkDB pushes JSON to your apps in realtime. When your app polls for data, it becomes slow, unscalable, and cumbersome to maintain.

Use Cases

Web + mobile apps

Web apps like Google Docs, Trello, and Quora pioneered the realtime experience on the web. With RethinkDB, you can build amazing realtime apps with dramatically less engineering effort.

Multiplayer games

When a player takes an action in a multiplayer game, every other player in the game needs to see the change in realtime. RethinkDB dramatically simplifies the data infrastructure for low latency, high throughput realtime interactions.

Realtime marketplaces

RethinkDB dramatically reduces the complexity of building realtime trading and optimization engines. Publish realtime updates to thousands of clients, and provide pricing updates to users in milliseconds.

Streaming analytics

Build realtime dashboards with RethinkDB data push notifications, and make instantaneous business decisions.

Connected devices

RethinkDB dramatically simplifies modern IoT infrastructures. Stream data between connected devices, enable messaging and signaling, and trigger actions in millions of devices in milliseconds.

Other features

Work with your favorite stack Query JSON documents with Python, Ruby, Node.js or dozens of other languages. Build modern apps using your favorite web framework, paired with realtime technologies like Socket.io or SignalR.

Everything you need to build modern apps Express relationships using joins, build location-aware apps, or store multimedia and time-series data. Do analytics with aggregation and map/reduce, and speed up your apps using flexible indexing.

Built with love by the open source community Originally developed by a core team of database experts and over 100 contributors from around the world, RethinkDB is shaped by developers like you participating in an open source community development process.

Robust architecture RethinkDB integrates the latest advances in database technology. It has a modern distributed architecture, a highly-optimized buffer cache, and a state-of-the-art storage engine. All of these components work together to create a robust, scalable, high-performance database.

Changefeeds in RethinkDB Learn about changefeeds, RethinkDB’s realtime push technology, and how it can be used to build and scale realtime apps.

Map-reduce in RethinkDB RethinkDB has powerful Hadoop-style map-reduce tools, that integrate cleanly into the query language. Learn how they work, and play with a few examples.

Geospatial queries Learn how to use GeoJSON features to build location-aware apps in RethinkDB.

Deploying with a PaaS Learn how to deploy RethinkDB on cloud services like Compose.io, AWS, and others.

MongoDB

(https://www.mongodb.com/)

As claimed by themselves, it is most popular database. It is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. No database makes you more productive.

MongoDB is a document database, which means it stores data in JSON-like documents. We believe this is the most natural way to think about data, and is much more expressive and powerful than the traditional row/column model.

Rich JSON Documents The most natural and productive way to work with data. Supports arrays and nested objects as values. Allows for flexible and dynamic schemas.

Powerful query language Rich and expressive query language that allows you to filter and sort by any field, no matter how nested it may be within a document. Support for aggregations and other modern use-cases such as geo-based search, graph search, and text search. Queries are themselves JSON, and thus easily composable. No more concatenating strings to dynamically generate SQL queries.

All the power of a relational database, and more… Full ACID transactions. Support for joins in queries. Two types of relationships instead of one: reference and embedded.

  • Fully Automated
  • Global Clusters
  • Backup
  • Monitoring & Alerts
  • Serverless Triggers
  • Best-In-Class Security
  • Charts
  • BI Connector
  • Compass

NOTE - For any question/suggestion, please send email to amu.prashant1@gmail.com

References

  • Wikipedia
  • Websites of mentioned databases