This post explains Collations in three ways.This includes an overview, how to use it, and operations that support sorting.Let's start with an overview of sorting rules
Overview of Collation
Sorting rules provide a set of rules for string comparison in specific language habits. For example, in Canadian French, the last syllable of a given word determines its sort order.
Consider the following French vocabulary:
cote < coté < cte < cté
Using the Canadian French collation, you get the following sort results
cote < cte < coté < cté
If no collation is specified, MongoDB uses a simple binary comparison sort.By this rule, the words above are sorted
cote < coté < cte < cté
2. Use of collation
When creating collections and indexes, we can specify a default collation or a collation for CRUD operations on collections and aggregates.For operations that support collation, MongoDB uses the default collation if no different collation is specified.
Collation parameters
'collation' => { 'locale' => <string>, 'caseLevel' => <bool>, 'caseFirst' => <string>, 'strength' => <int>, 'numericOrdering' => <bool>, 'alternate' => <string>, 'maxVariable' => <string>, 'normalization' => <bool>, 'backwards' => <bool>}
The only parameter that must be set is locale.The server converts the parameter to a single ICU format locale ID .For example, set the locale value to en_US for American English and fr_CA for Canadian French.Complete parameter values can be viewed MongoDB manual entry.
2.1 Specify a collation for a collection
The following example creates a collection of contacts on the test database and assigns it a default locale value of fr_CA collation.When creating a collection, specify a collation that ensures that all operations on the collection contacts, including queries, use the fr_CA collation unless the operation specifies a specific collation.Indexes on new collections also inherit the default collation unless other collations are specified when the index is created.
client=Mongo::Client.new(['127.0.0.1:27017'],:database=>'test') client[:contacts,{"collation"=>{"locale"=>"fr_CA"}}]
2.2 Specify collation for index
To specify a collation for an index, you can specify the collation parameter when you create the index. The following example creates an index on the name field field (first_name) named the address_book collection, sets it as a unique index, and sets the local property of the default collation to en_US.
client=Mongo::Client.new(['127.0.0.1:27017'],:database=>'test') client[:address_book].indexes.create_one({"first_name"=>1}, "unique"=>true, "collation"=>{ "locale" => "en_US" } )
In order to use this index, you must ensure that this sort rule is also specified in the query you are using. The query below uses the index defined above
client[:address_book].find({"first_name"=>"Adam"}, "collation"=>{ "locale" => "en_US" })
The following queries cannot use the indexes defined above, either in the absence of a collation attribute for the sorted set or in the absence of an additional strength attribute for the sorted set.
client[:address_book].find({"first_name"=>"Adam"}) client[:address_book].find({"first_name"=>"Adam"}, "collation"=>{"locale"=>"en_US","strength"=>2})
3 Supports collation operations
In MongoDB databases, collation is supported for all query, update, and delete methods.Here are some common methods:
3.1 find and sort methods
When querying results and sorting, a single query can specify a collation.The following example of query ordering sets the collation's locale property to de to use a German-based sorting rule for query ordering
client=Mongo::Client.new(['127.0.0.1:27017'],:database=>"test") client[:contacts].find({"city"=>"New York"},{"collation"=>{"locale"=>"de"}}).sort({"name"=>1})
3.2 find_one_and_update method
Suppose a collection name contains the following documents:
{ "_id" : 1, "first_name" : "Hans" } { "_id" : 2, "first_name" : "Gunter" } { "_id" : 3, "first_name" : "Günter" } { "_id" : 4, "first_name" : "Jürgen" }
The following find_one_and_update operation does not specify a collation:
client=Mongo::Client.new(['127.0.0.1:27017'],:database=>'test') doc=client[:names].find_one_and_update({"first_name"=>{"$lt"=>"Gunter"}},{"$set"=>{"verified"=>true}})
Since Gunter is the first vocabulary in a collection document, the above query results are empty and no documents will be updated.The same find_one_and_update method, but a sort combination is specified, and the lock property is set to de@collation=phonebook.
For languages that distinguish proprietary nouns from other words, some locale attributes include the collation=phonebook optional parameter.The collation=phonebook sorting rule is set, and characters with a vowel change return before they have no vowel change.
client = Mongo::Client.new([ "127.0.0.1:27017" ], :database => "test") doc = client[:names].find_one_and_update( { "first_name" => { "$lt" => "Gunter" } }, { "$set" => { "verified" => true } }, { "collation" => { "locale" => "de@collation=phonebook" },:return_document => :after } )
The results are as follows:
{ "_id" => 3, "first_name" => "Günter", "verified" => true }
3.2 find_one_and_update method
By setting the numericOrdering parameter true, you can compare numeric strings by using the numeric values corresponding to the strings.For example, the numbers collection contains the following documents:
{ "_id" : 1, "a" : "16" } { "_id" : 2, "a" : "84" } { "_id" : 3, "a" : "179" }
The following example is to find a document containing a numeric field with a value greater than 100 and delete it
docs=find_one_and_deletes({"a"=>{"$gt"=>100}},{"collation"=>{"locale"=>"end","numericOrdering"=>true}})
After doing this, the
{ "_id" : 1, "a" : "16" } { "_id" : 2, "a" : "84" }
Still exists, but
{ "_id" : 3, "a" : "179" }
Deleted.
But if you do the same, you do not use a collation.The server then finds the first document with a vocabulary value greater than 100 and deletes it.
At this point, the first one in the document is deleted.The results of the query are as follows:
{ "_id" : 2, "a" : "84" } { "_id" : 3, "a" : "179" }
3.3 Multiple deletes delete_many()
Collation parameters are available for all bulk operations in the Ruby driver.Suppose the collection recipes contains the following documents:
{ "_id" : 1, "dish" : "veggie empanadas", "cuisine" : "Spanish" } { "_id" : 2, "dish" : "beef bourgignon", "cuisine" : "French" } { "_id" : 3, "dish" : "chicken molé", "cuisine" : "Mexican" } { "_id" : 4, "dish" : "chicken paillard", "cuisine" : "french" } { "_id" : 5, "dish" : "pozole verde", "cuisine" : "Mexican" }
Setting the strength in collation to 1 or 2 allows the server to ignore case when query filters run. The following example uses a case-insensitive query filter to delete a cuisine field that matches a French document.
client=Mongo::Cient.new(['127.0.0.1:27017'],:database=>'test') receipes=client[:receipes] docs=delete_many({"cusine"=>"French"},{"collation"=>{"locale"=>"en_US","strength"=>1}})
Documents with _id values of 2 and 4 were deleted after executing the above instructions.
3.4 Aggregation Polymerization
Aggregation operations on collections require that the aggregation field for collation be set.The aggregation example below uses a collection named names and groups the first_name field together, calculates the number of result documents for each group, and sorts them by German phonebook.
aggregation=names.aggregate( [ {"$group"=>{"$_id"=>"$first_name","name_count"=>{"$sum"=>1}}}, {"$sort"=>{"$id"=>1}} ],{"collection"=>{"locale"=>"de@collation=phonebook"}}) aggregation.each do |doc| p doc end