Skip to content

Conversation

@felliott
Copy link
Contributor

@felliott felliott commented Mar 3, 2014

Hello,

I'm filing this dummy pull request so that my branch adding elasticsearch support will be visible from the main repo. There are four outstanding tasks before its ready for merge:

1.) mapping returned values to native types
2.) case-insensitive searching
3.) escaping metacharacters in contains regex
4.) abstract backref support

I plan on working on these this week and will hopefully have a final PR soon.

Cheers,
Fitz

felliott added 23 commits July 7, 2014 07:28
 * This is essentially just a search and replace on mongostorage.py
 * I assumed that a modular-odm collection was the same thing as an
   elasticsearch index.  Nope. A collection is the doc-type in ES
   parlance.  Add a required es_index attribute to ElasticsearchStorage
   constructor and update ES calls to use correct nomenclature.

 * The test ES storage object now provides es_index.
 * ElasticsearchQuerySet.data is an array, not a real cursor. Stop
   calling count() on it, just use len() instead.
 * ES does real-time search, so we have to refresh the index to make
   sure all of the fixtures have been inserted before searching.
 * Get find(), find_one(), update(), remove() working properly
 * Turn modular-odm query objects into filter structures suitable for
   passing to Elasticsearch. We now support everything in the tests
   except for icontains.
 * Since we're storing a list of results rather than an actual cursor,
   our queryset implementation is basically the same as Pickle's rather
   than Mongo's.
 * ES is returning integer ID fields as strings instead of integers. Add
   a stub _to_native_types method where casting will take place.
 * delete_by_query() won't accept a plain filter as the query
   body. Instead, pass a "filtered" query that does a match_all + filter
 * Backrefs are stored as tuples of (id, ref_name).  If id is an
   integer, then Elasticsearch assumes all elements in the tuple will be
   integers and chokes when it encounters ref_name (a string).  For now,
   explicitly cast the first element of a tuple as a str().
 * Elasicsearch does real-time search, so searching immediately after
   save() may not return up-to-date-results.  Add a default-noop
   refresh() method to Storage() and implement for Elasticsearch
   backend.  Call after save().
@felliott
Copy link
Contributor Author

felliott commented Jul 7, 2014

Hello,

I've updated this PR with elasticsearch v1.0 support. It's still failing three tests:

test_foreign_queries.py:test_eq_abstract
test_foreign_queries.py:test_eq_abstract_list
test_string_operators.py:test_icontains

The test_eq_abstract tests are failing because searching for an array like ["b3e4d", "foo"] in ES has match-any semantics. Since the "foo" key is present in all of the stored backrefs, it will falsely return a match. More details and a possible solutions are here:

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_finding_multiple_exact_values.html

The icontains test is failing because the default analyzer in ES is case sensitive. To be able to search both case-sensitive and case-insensitive, you'll need to set up a custom analyzer on init.

I'm not sure I'll be able to spend more time on this, but if anyone else would like to use the code or ask me some questions, feel free!

Cheers,
Fitz

@jmcarp
Copy link
Contributor

jmcarp commented Jul 7, 2014

Thanks for the updates! Will look into the remaining failing tests and merge when I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants