Skip to content

Commit 450bfa2

Browse files
authored
Merge pull request #2092 from MTG/vector-search-speedup
Remove old similarity code and improve speed of vector-search queries
2 parents 9396797 + 6d74384 commit 450bfa2

33 files changed

Lines changed: 191 additions & 3890 deletions

_docs/api/source/overview.rst

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -41,31 +41,22 @@ Searching
4141
=========
4242

4343
There are several ways in which you can search sounds using the Freesound APIv2.
44-
The most basic one is using the :ref:`sound-text-search` resource which allows you to define some query terms and other parameters to filter query results.
44+
The most basic one is using the :ref:`sound-text-search` resource which allows you to define query terms and other parameters to filter query results.
4545
As a quick example, the following request would return all sorts of dog sounds:
4646

4747
::
4848

49-
curl "https://freesound.org/apiv2/search/text/?query=dogs&token=YOUR_API_KEY"
49+
curl "https://freesound.org/apiv2/search/?query=dogs&token=YOUR_API_KEY"
5050

5151

52-
Besides text-search, you can also use the :ref:`sound-content-search` resource to perform queries and define filters based on audio features (descriptors) rather than tags and textual metadata.
52+
Search queries can include filters based on audio features (descriptors) rather than tags and textual metadata.
5353
That means that you can retrieve sounds that, for example, have a particular pitch or bpm. These queries may include almost any of the audio features listed in :ref:`analysis-docs`.
5454
Note however that these features are automatically extracted and might not be always accurate.
55-
As a quick example, you can retrieve sounds that feature a particular pitch mean as follows:
55+
As a quick example, you can retrieve sounds that feature a particular pitch as follows:
5656

5757
::
5858

59-
curl "https://freesound.org/apiv2/search/content/?descriptors_filter=lowlevel.pitch.mean:\[219.9%20TO%20220.1\]"
60-
61-
62-
Furthermore, you can combine both textual and content based search strategies using the :ref:`sound-combined-search` resource.
63-
This is useful as it allows you to specify a query or filter both in terms of metadata and audio features.
64-
For example, you could search for loops with a particular bpm using the following query:
65-
66-
::
67-
68-
curl "https://freesound.org/apiv2/search/combined/?filter=tag:loop&descriptors_filter=rhythm.bpm:\[119%20TO%20121\]"
59+
curl "https://freesound.org/apiv2/search/?filter=pitch:\[219.9%20TO%20220.1\]"
6960

7061

7162
Downloading sounds

_docs/api/source/resources.rst

Lines changed: 0 additions & 201 deletions
Original file line numberDiff line numberDiff line change
@@ -255,207 +255,6 @@ Examples
255255
{{examples_Search}}
256256

257257

258-
.. _sound-content-search:
259-
260-
Content Search (deprecated)
261-
=========================================================
262-
263-
::
264-
265-
GET /apiv2/search/content/
266-
POST /apiv2/search/content/
267-
268-
This resource allows searching sounds in Freesound based on their content descriptors.
269-
270-
.. warning:: As of December 2023, this resource is deprecated and will be removed in the comming months. Similar functionality
271-
will be achievable using the :ref:`sound-search` resource. Documentation about how to do this will be added in due time
272-
but in the meantime, please contact us if you need help with this.
273-
274-
.. _sound-content-search-parameters:
275-
276-
Parameters (content search parameters)
277-
----------------------------------------------
278-
279-
Content search queries are defined using the following request parameters:
280-
281-
.. rst-class:: fieldstable
282-
========================= ========================= ======================
283-
Name Type Description
284-
========================= ========================= ======================
285-
target string or numeric This parameter defines a target based on content-based descriptors to sort the search results. It can be set as a number of descriptor name and value pairs, or as a sound id. See below.
286-
analysis_file file **Experimental** - Alternatively, targets can be specified by uploading a file with the output of the Essentia Freesound Extractor analysis of any sound that you analyzed locally (see below). This parameter overrides ``target``, and requires the use of POST method.
287-
descriptors_filter string This parameter allows filtering query results by values of the content-based descriptors. See below for more information.
288-
========================= ========================= ======================
289-
290-
**The 'target' and 'analysis_file' parameters**
291-
292-
The ``target`` parameter can be used to specify a content-based sorting of your search results.
293-
Using ``target`` you can sort the query results so that the first results will be the sounds featuring the most similar descriptors to the given target.
294-
To specify a target you must use a syntax like ``target=descriptor_name:value``.
295-
You can also set multiple descriptor/value pairs in a target separating them with spaces (``target=descriptor_name:value descriptor_name:value``).
296-
Descriptor names must be chosen from those listed in :ref:`analysis-docs`.
297-
Only numerical descriptors are allowed.
298-
Multidimensional descriptors with fixed-length (that always have the same number of dimensions) are allowed too (see below).
299-
Consider the following two ``target`` examples::
300-
301-
(A) target=lowlevel.pitch.mean:220
302-
(B) target=lowlevel.pitch.mean:220 lowlevel.pitch.var:0
303-
304-
Example A will sort the query results so that the first results will have a mean pitch as close to 220Hz as possible.
305-
Example B will sort the query results so that the first results will have a mean pitch as close to 220Hz as possible and a pitch variance as close as possible to 0.
306-
In that case example B will promote sounds that have a steady pitch close to 220Hz.
307-
308-
Multidimensional descriptors can also be used in the ``target`` parameter::
309-
310-
target=sfx.tristimulus.mean:0,1,0
311-
312-
Alternatively, ``target`` can also be set to point to a Freesound sound.
313-
In that case the descriptors of the sound will be used as the target for the query, therefore query results will be sorted according to their similarity to the targeted sound.
314-
To set a sound as a target of the query you must use the sound id. For example, to use sound with id 1234 as target::
315-
316-
target=1234
317-
318-
319-
There is even another way to specify a target for the query, which is by uploading an analysis file generated using the Essentia Freesound Extractor.
320-
For doing that you will need to download and compile Essentia (we recommend using release 2.0.1), an open source feature extraction library developed at the Music Technology Group (https://github.com/mtg/essentia/tree/2.0.1),
321-
and use the 'streaming_extractor_freesound' example to analyze any sound you have in your local computer.
322-
As a result, the extractor will create a JSON file that you can use as target in your Freesound API content search queries.
323-
To use this file as target you will need to use the POST method (instead of GET) and attach the file as an ``analysis_file`` POST parameter (see example below).
324-
Setting the target as an ``analysis_file`` allows you to to find sounds in Freesound that are similar to any other sound that you have in your local computer and that it is not part of Freesound.
325-
When using ``analysis_file``, the contents of ``target`` are ignored. Note that **this feature is experimental**. Some users reported not being able to generate compatible analysis files.
326-
327-
Note that if ``target`` (or ``analysis_file``) is not used in combination with ``descriptors_filter``, the results of the query will
328-
include all sounds from Freesound indexed in the similarity server, sorted by similarity to the target.
329-
330-
331-
**The 'descriptors_filter' parameter**
332-
333-
The ``descriptors_filter`` parameter is used to restrict the query results to those sounds whose content descriptor values match with the defined filter.
334-
To define ``descriptors_filter`` parameter you can use the same syntax as for the normal ``filter`` parameter, including numeric ranges and simple logic operators.
335-
For example, ``descriptors_filter=lowlevel.pitch.mean:220`` will only return sounds that have an EXACT pitch mean of 220hz.
336-
Note that this would probably return no results as a sound will rarely have that exact pitch (might be very close like 219.999 or 220.000001 but not exactly 220).
337-
For this reason, in general it might be better to indicate ``descriptors_filter`` using ranges.
338-
Descriptor names must be chosen from those listed in :ref:`analysis-docs`.
339-
Note that most of the descriptors provide several statistics (var, mean, min, max...). In that case, the descriptor name must include also the desired statistic (see examples below).
340-
Non fixed-length descriptors are not allowed.
341-
Some examples of ``descriptors_filter`` for numerical descriptors::
342-
343-
descriptors_filter=lowlevel.pitch.mean:[219.9 TO 220.1]
344-
descriptors_filter=lowlevel.pitch.mean:[219.9 TO 220.1] AND lowlevel.pitch_salience.mean:[0.6 TO *]
345-
descriptors_filter=lowlevel.mfcc.mean[0]:[-1124 TO -1121]
346-
descriptors_filter=lowlevel.mfcc.mean[1]:[17 TO 20] AND lowlevel.mfcc.mean[4]:[0 TO 20]
347-
348-
Note how in the last two examples the filter operates in a particular dimension of a multidimensional descriptor (with dimension index starting at 0).
349-
350-
``descriptors_filter`` can also be defined using non numerical descriptors such as 'tonal.key_key' or 'tonal.key_scale'.
351-
In that case, the value must be enclosed in double quotes '"', and the character '#' (for example for an A# key) must be indicated with the string 'sharp'.
352-
Non numerical descriptors can not be indicated using ranges.
353-
For example::
354-
355-
descriptors_filter=tonal.key_key:"Asharp"
356-
descriptors_filter=tonal.key_scale:"major"
357-
descriptors_filter=(tonal.key_key:"C" AND tonal.key_scale:"major") OR (tonal.key_key:"A" AND tonal.key_scale:"minor")
358-
359-
You can combine both numerical and non numerical descriptors as well::
360-
361-
descriptors_filter=tonal.key_key:"C" tonal.key_scale="major" tonal.key_strength:[0.8 TO *]
362-
363-
364-
Response
365-
--------
366-
367-
The Content Search resource returns a sound list just like :ref:`sound-list-response`.
368-
The same extra request parameters apply (``page``, ``page_size``, ``fields``, ``descriptors`` and ``normalized``).
369-
370-
371-
Examples
372-
--------
373-
374-
{{examples_ContentSearch}}
375-
376-
377-
.. _sound-combined-search:
378-
379-
Combined Search (deprecated)
380-
=========================================================
381-
382-
::
383-
384-
GET /apiv2/search/combined/
385-
POST /apiv2/search/combined/
386-
387-
This resource is a combination of :ref:`sound-search` and :ref:`sound-content-search`, and allows searching sounds in Freesound based on their tags, metadata and content-based descriptors.
388-
389-
.. warning:: As of December 2023, this resource is deprecated and will be removed in the comming months. Similar functionality
390-
will be achievable using the :ref:`sound-search` resource. Documentation about how to do this will be added in due time
391-
but in the meantime, please contact us if you need help with this.
392-
393-
Parameters
394-
------------------
395-
396-
Combined Search request parameters can include any of the parameters from text-based search queries (``query``, ``filter`` and ``sort``, :ref:`sound-search-parameters`)
397-
and content-based search queries (``target``, ``analysis_file`` and ``descriptors_filter`` and, :ref:`sound-content-search-parameters`).
398-
Note that ``group_by_pack`` **is not** available in combined search queries.
399-
400-
In Combined Search, queries can be defined both like a standard textual query or as a target of content-descriptors, and
401-
query results can be filtered by values of sounds' metadata and sounds' content-descriptors... all at once!
402-
403-
To perform a Combined Search query you must at least specify a ``query`` or a ``target`` parameter (as you would do in text-based and content-based searches respectively),
404-
and at least one text-based or content-based filter (``filter`` and ``descriptors_filter``).
405-
Request parameters ``query`` and ``target`` can not be used at the same time, but ``filter`` and ``descriptors_filter`` can both be present in a single Combined Search query.
406-
In any case, you must always use at least one text-based search request parameter and one content-based search request parameter.
407-
Note that ``sort`` parameter must always be accompanied by a ``query`` or ``filter`` parameter (or both), otherwise it is ignored.
408-
``sort`` parameter will also be ignored if parameter ``target`` (or ``analysis_file``) is present in the query.
409-
410-
Combined Search requests might **require significant computational resources** on our servers depending on the particular
411-
query that is made. Therefore, responses might take longer than usual. Fortunately, response times can vary a lot
412-
with some small modifications in the query, and this is in your hands ;).
413-
As a general rule, we recommend not to use the text-search parameter ``query``, and instead define metadata stuff in a ``filter``.
414-
For example, instead of setting the parameter ``query=loop``, try filtering results to sounds that have the tag loop (``filter=tag:loop``).
415-
Furthermore, you can try narrowing down your filter or filters (``filter`` and ``descriptors_filter``) and possibly make the queries faster.
416-
Best response times are normally obtained by specifying a content-based ``target`` in combination with text-based and
417-
content-based filters (``filter`` and ``descriptors_filter``).
418-
419-
420-
Response
421-
--------
422-
423-
The Combined Search resource **returns a variation** of the standard sound list response :ref:`sound-list-response`.
424-
Combined Search responses are dictionaries with the following structure:
425-
426-
::
427-
428-
{
429-
"results": [
430-
<sound result #1 info>,
431-
<sound result #2 info>,
432-
...
433-
],
434-
"more": <link to get more results (null if there are no more results)>,
435-
}
436-
437-
The ``results`` field will include a list of sounds just like in the normal sound list response.
438-
The length of this list can be defined using the ``page_size`` request parameter like in normal sound list responses.
439-
However, Combined Search responses **do not guarantee** that the number of elements inside ``results`` will be equal to
440-
the number specified in ``page_size``. In some cases, you might find less results, so **you should verify the length of the list**.
441-
442-
Furthermore, instead of the ``next`` and ``previous`` links to navigate among results, Combined Search responses
443-
only offer a ``more`` link that you can use to obtain more results. You can think of the ``more`` link as a
444-
rough equivalent to ``next``, but it does not work by indicating page numbers as in normal sound list responses.
445-
446-
Also, note that ``count`` field is not present in the Combined Search response, therefore you do not know in advance the total
447-
amount of results that a query can return.
448-
449-
Finally, Combined Search responses does allow you to use the ``fields``, ``descriptors`` and ``normalized``
450-
parameters just like you would do in standard sound list responses.
451-
452-
453-
Examples
454-
--------
455-
456-
{{examples_CombinedSearch}}
457-
458-
459258
Sound resources
460259
>>>>>>>>>>>>>>>
461260

accounts/tests/test_profile.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -673,9 +673,8 @@ def setUp(self):
673673
num_sounds=3, num_packs=3, processing_state="OK", moderation_state="OK"
674674
)
675675

676-
@mock.patch("sounds.models.delete_sound_from_gaia")
677676
@mock.patch("sounds.models.delete_sounds_from_search_engine")
678-
def test_download_sound_count_field_is_updated(self, delete_sounds_from_search_engine, delete_sound_from_gaia):
677+
def test_download_sound_count_field_is_updated(self, delete_sounds_from_search_engine):
679678
# Test downloading sounds increases the "num_sound_downloads" field
680679
for i in range(len(self.sounds)):
681680
Download.objects.create(user=self.user, sound=self.sounds[i], license_id=self.sounds[i].license_id)

accounts/tests/test_user.py

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -306,9 +306,8 @@ def test_user_delete_keep_sounds(self):
306306
self.assertTrue(Sound.objects.filter(user__id=user.id).exists())
307307
self.assertTrue(Sound.objects.filter(user__id=user.id)[0].is_index_dirty)
308308

309-
@mock.patch("sounds.models.delete_sound_from_gaia")
310309
@mock.patch("sounds.models.delete_sounds_from_search_engine")
311-
def test_user_delete_include_sounds(self, delete_sounds_from_search_engine, delete_sound_from_gaia):
310+
def test_user_delete_include_sounds(self, delete_sounds_from_search_engine):
312311
# This should set user's attribute deleted_user to True and anonymize it
313312
# Sounds and Packs should be deleted (creating DeletedSound objects), but other user content should be preserved
314313
user = self.create_user_and_content()
@@ -334,11 +333,9 @@ def test_user_delete_include_sounds(self, delete_sounds_from_search_engine, dele
334333
self.assertTrue(DeletedSound.objects.filter(user__id=user.id).exists())
335334

336335
delete_sounds_from_search_engine.assert_has_calls([mock.call([i]) for i in user_sound_ids], any_order=True)
337-
delete_sound_from_gaia.assert_has_calls([mock.call(i) for i in user_sound_ids], any_order=True)
338336

339-
@mock.patch("sounds.models.delete_sound_from_gaia")
340337
@mock.patch("sounds.models.delete_sounds_from_search_engine")
341-
def test_user_delete_sounds_and_user_object(self, delete_sounds_from_search_engine, delete_sound_from_gaia):
338+
def test_user_delete_sounds_and_user_object(self, delete_sounds_from_search_engine):
342339
# This should delete all user content, including the User object, and create a DeletedUser object. This will
343340
# create DeletedSound objects.
344341
user = self.create_user_and_content()
@@ -357,12 +354,10 @@ def test_user_delete_sounds_and_user_object(self, delete_sounds_from_search_engi
357354
self.assertFalse(OldUsername.objects.filter(user__id=user.id).exists())
358355

359356
delete_sounds_from_search_engine.assert_has_calls([mock.call([i]) for i in user_sound_ids], any_order=True)
360-
delete_sound_from_gaia.assert_has_calls([mock.call(i) for i in user_sound_ids], any_order=True)
361357

362358
@skipIf(True, "This tests a method that should never be called")
363-
@mock.patch("sounds.models.delete_sound_from_gaia")
364359
@mock.patch("sounds.models.delete_sounds_from_search_engine")
365-
def test_user_full_delete(self, delete_sounds_from_search_engine, delete_sound_from_gaia):
360+
def test_user_full_delete(self, delete_sounds_from_search_engine):
366361
# This should delete all user content, including the User object and without creating DeletedUser. It does
367362
# create however DeletedSound objects.
368363
user = self.create_user_and_content()
@@ -386,7 +381,6 @@ def test_user_full_delete(self, delete_sounds_from_search_engine, delete_sound_f
386381

387382
calls = [mock.call(i) for i in user_sound_ids]
388383
delete_sounds_from_search_engine.assert_has_calls(calls, any_order=True)
389-
delete_sound_from_gaia.assert_has_calls(calls, any_order=True)
390384

391385
@mock.patch("general.tasks.delete_user.delay")
392386
def test_user_delete_include_sounds_using_web_form(self, submit_job):
@@ -506,11 +500,10 @@ def test_delete_user_reasons(self):
506500
user.profile.delete_user(deletion_reason=reason)
507501
self.assertEqual(DeletedUser.objects.get(user_id=user.id).reason, reason)
508502

509-
@mock.patch("sounds.models.delete_sound_from_gaia")
510503
@mock.patch("sounds.models.delete_sounds_from_search_engine")
511504
@mock.patch("forum.models.delete_posts_from_search_engine")
512505
def test_delete_user_with_count_fields_out_of_sync(
513-
self, delete_posts_from_search_engine, delete_sounds_from_search_engine, delete_sound_from_gaia
506+
self, delete_posts_from_search_engine, delete_sounds_from_search_engine
514507
):
515508
# Test that deleting a user work properly even when the profile count fields (num_sounds, num_posts,
516509
# num_sound_downloads and num_pack_downloads) are out of sync. This is a potential issue because if the

0 commit comments

Comments
 (0)