文档结构  
翻译进度:11%     翻译赏金:0 元 (?)    ¥ 我要打赏

你知道SQL吗?如果你是一名软件开发人员、DBA 或者开发运维人员,那么我很确定你对这个问题的答案。

这就是为什么OrientDB采用SQL作为其主要查询语言的原因。

SQL是一种优雅,清晰,定义良好,功能强大的查询语言(对于RDBMS而言),但它是在表是唯一允许的数据结构的时期创建的(你知道的,有关关系数据库的结构有:表,主键,外键等)。

现在,随着 NoSQL 的到来,事情发生了变化。 数据结构现在要复杂得多,所以SQL需要补充一些新内容来适应这种新模式。

第 1 段(可获 1.19 积分)

我们来看看OrientDB SQL与标准SQL的相似之处以及它们的不同点。

过去: Join(连接), Filtering(过滤), Projection(投影), Aggregation(聚合)

这是标准SQL在关系数据库中的工作方式。

让我们来看一个例子:计算所有欧洲国家的人口(即所有城市人口的总和)。

城市

name

countryId

Population

Rome

32

2.500.000

国家

id

name

32

Italy

SELECT country.name, sum(city.population)
FROM city, country 
WHERE city.countryId = country.id
AND country.continent = 'Europe'
GROUP BY country.name

您可以通过以下四个步骤来思考此查询的执行情况:

第 2 段(可获 1.04 积分)
  1. Join: Match city and country tables based on a primary key (country.id) and a foreign key (city.countryId).

  2. Filtering: Filter results based on some attributes (the continent name in this case)

  3. Projection: Choose a subset of attributes to be returned (country name and city population in this case)

  4. Aggregation: Aggregate data, calculate the sum of populations grouping by country name.

This is a typical pattern in SQL, but it has some problems:

Problem 1: The join operation is EXPENSIVE, it relies on a convention (the same value on country.id and city.countryId), and it has to be re-calculated at every query execution.

第 3 段(可获 1.3 积分)

The situation is even more critical when you have MULTIPLE joins. Let’s make it more complicated:

SELECT country.name, sum(city.population)
FROM city, region, state, country, continent
WHERE city.regionId = regiony.id
AND region.stateId = state.id
AND state.countryId = country.id
AND country.continentId = continent.id
AND continent.name = 'Europe'
GROUP BY country.name

How much time will it take to execute?

Problem 2: We are assuming that we know how deep the hierarchy is, but in some cases, we simply don't know. Suppose you are dealing with an organization structure, with departments, managers and so on, how can you write a query that returns all the sub-trees under a manager?

第 4 段(可获 0.94 积分)

Problem 3: Until now, we just assumed that our data is made of simple attributes, eg. the country name is a string, the city population is an integer. But this does not represent reality very well.

An example could be having a list of names for the same city.

names

countryId

population

[Rome, Roma]

32

2.500.000

Or even better, a list of names with languages:

names

countryId

population

[

  {name: Rome, language: EN},

  {name: Roma, language: IT}

]

32

2.500.000

It is not possible in standard RDBMS (but it is in NoSQL), so regular SQL lacks the commands to manage this kind of data.

第 5 段(可获 1.33 积分)

Problem 4: The schema is well-defined here, so we know exactly which properties a single table has and SQL was designed to work on this assumption. It's not true in NoSQL, so we will need some extensions to manage cases where we don’t know the schema (or where there is no defined schema at all).

From Join to Links

A join operation is expensive, so OrientDB replaced it with something better: physical links.

City:
{
   @rid: "#11:50",
   name: "Rome",
   population: 2500000,
   country: "#12:32"
}

Country:
{
   @rid: "#12:32",
   name: "Italy"
}
第 6 段(可获 0.94 积分)

The big difference here is that @rid values are not a convention, they just represent a physical position of a record, so calculating a relationship is a very fast operation, O(1) while the cost of a single lookup for a primary key in an RDBMS is O(logN).

Efficiency is a big advantage, of course, but SQL has to be enhanced to leverage this new element.

In OrientDB SQL, a join is not allowed, so relationships are calculated using dot notation:

Example 1:

Relational database:

SELECT country.name, sum(city.population) 
FROM city, country 
WHERE city.countryId = country.id 
AND country.continent = 'Europe'
GROUP BY country.name
第 7 段(可获 1.08 积分)

OrientDB:

SELECT country.name, sum(population)
FROM city
WHERE country.continent = 'Europe'
GROUP BY country.name

Example 2:

Relational database:

SELECT country.name, sum(city.population) 
FROM city, region, state, country, continent
WHERE city.regionId = regiony.id
AND region.stateId = state.id
AND state.countryId = country.id
AND country.continentId = continent.id
AND continent.name = 'Europe'
GROUP BY country.name

OrientDB:

SELECT region.state.country.name as contryName, sum(population)
FROM city 
WHERE region.state.country.continent.name = 'Europe'
GROUP BY countryName
第 8 段(可获 0.08 积分)

From Links to Edges

OrientDB is also a Graph Database, so you can also represent your data elements as Vertices and relationships as Edges.

The big difference is that Edges are bidirectional, which means you can traverse them in both directions with the same efficiency, using out() and in() operators for outgoing and incoming edges:

Image title

You can retrieve the continent of a city with:

SELECT out('inRegion').out('inState').out('inCountry').out('inContinent')
FROM city 
WHERE name = 'Rome'

But you can also retrieve all the cities in a continent:

SELECT in('inContinent').in('inContry').in('inState').in('inRegion')
FROM continent 
WHERE name = 'Europe'
第 9 段(可获 0.94 积分)

Deep Traversal

Sometimes you know how deep you have to traverse (eg. city -> region -> state -> country), but in other cases, you just don't know. Consider having an organization structure:

Image title

If you want to retrieve all of the subtrees of a given department, but you don't know how deep the structure is, there is no way to write a single standard SQL statement to perform the job.

OrientDB SQL has an extension called TRAVERSE, specifically designed for this use case

The following statement retrieves all the sub-departments of Dept 1:

TRAVERSE in("Parent") from (
    SELECT * FROM Department WHERE name = "Dept 1"
)
第 10 段(可获 1.18 积分)

The following one retrieves the first three levels of sub-departments of Dept 1:

TRAVERSE in("Parent") from (
    SELECT * FROM Department WHERE name = "Dept 1"
) WHILE $depth <= 3

Or, using the new MAXDEPTH operator:

TRAVERSE in("Parent") from (
    SELECT * FROM Department WHERE name = "Dept 1"
) MAXDEPTH 3

You can also filter the result of this traversal to retrieve only the third level:

SELECT * FROM (
    TRAVERSE in("Parent") from (
       SELECT * FROM Department WHERE name = "Dept 1"
   ) WHILE $depth <= 3
) WHERE $depth = 3

The WHILE condition is used by the query to decide if the traversal has to continue on the current traversal branch. It can contain any boolean expression that could be evaluated in a WHERE condition.

第 11 段(可获 0.86 积分)

Dealing With Complex Types

Standard SQL was designed to manage simple data types (eg. strings and numbers). OrientDB data model is quite complex compared to RDBMS: records can have simple attributes (strings, numbers, dates), link attributes (see dot notation), list and set attributes, embedded objects. As a consequence, OrientDB SQL was enriched to manage this kind of data.

Managing Collections and Embedded Objects

Records can be created with complex attributes:

INSERT INTO Conference SET
name = "OrientDB Next World Conference", 
tags = ["NoSQL", "Graph", "Document"], 
location = {address: "abc Street", city: {name: "London", country: {name: "UK"} } }
第 12 段(可获 0.88 积分)

Image title

The result will be the following:

{
   @class: Conference,
   @rid: "#12:10", //self generated and managed by OrientDB
   name:  "OrientDB Next World Conference",
   tags = ["NoSQL", "Graph", "Document"], 
   location = {
      address: "abc Street", 
      city: {
         name: "London", 
         country: {
            name: "UK"
         }
      } 
   }
}

As you can see, complex attributes are not managed as strings or BLOBs: they are first-class, structured objects that can be queried and manipulated via SQL.

To add a new tag to the conference, we can just use the UPDATE ADD syntax:

第 13 段(可获 0.6 积分)
UPDATE Conference ADD tags = "Multi-Model" 
WHERE name = "OrientDB Next World Conference"

To remove a tag, you can just use UPDATE REMOVE syntax:

UPDATE Conference REMOVE tags = "NoSQL"
WHERE name = "OrientDB Next World Conference"

You can also use DOT notation to update embedded objects content, eg.

UPDATE Conference SET location.city.country.name = "United Kingdom"
WHERE name = "OrientDB Next World Conference"

Querying the Conference class will result in a single record:

Image title

UNWINDing Collections

In some cases, you will need to unwind collection attributes to have a single row per collection item.

第 14 段(可获 0.66 积分)

Since v 2.1, OrientDB provides a new UNWIND operator, specifically designed for this need:

SELECT FROM Conference UNWIND tags

Image title

UNWIND operator can also be used to unwind multiple collections in the same query.

Querying Collections and Embedded Objects

Of course, you will be able to compare collections with other collections using a WHERE condition:

SELECT FROM Conference WHERE tags = ["Graph", "Document", "Multi-Model"]

But in some cases, you will have to query for collections that contain a particular value:

SELECT FROM Conference WHERE tags contains "Graph"
第 15 段(可获 0.83 积分)

Or for embedded objects that contain a particular value as a key or as a value:

SELECT FROM Conference WHERE location containsKey 'address'
 
SELECT FROM Conference WHERE location containsValue 'abc Street'

OrientDB allows you to work with a dynamic schema or in a completely schemaless mode, which means that two records in the same class can have completely different attribute sets and types.

OrientDB is very tolerant when dealing with non-existing attributes, eg. queries will never result in an error if attributes in the WHERE condition are not defined for a particular record.

第 16 段(可获 0.98 积分)

Sometimes you just want to query a set of records based on their content, without having all the complete information about their schema. Consider querying the Conference class for any records that contain the word ‘OrientDB’ in any of their attributes, you can use any() operator for this:

SELECT FROM Conference WHERE any() like "%OrientDB%"

Of course, if needed, you can define a strict schema for your classes, so you can define attribute names, types, and constraints (not null, min, max etc.):

Image title

When you have a defined schema, you can query it using SQL:

第 17 段(可获 1.09 积分)
SELECT expand(classes) from metadata:schema

Image title

Conclusion

SQL is a fantastic language because it’s clear and effective, but to harness all the power of a Multi-Model database, it needs some small improvements. OrientDB SQL is very similar to the standard, but includes the addition of operators to operate graph traversal, manage and query complex attributes, and to manage schema and schema-less documents.

If you already know SQL, you are already at 70% of the learning curve and the remaining 30% will be very smooth. All you need to do is get used to a richer, more powerful, versatile and more efficient data model: the multi-model database.

第 18 段(可获 1.26 积分)

文章评论