[almighty] Fwd: Re: How would the 'search' URLs look like for the ALM Search service ?

Thu Oct 20 18:30:52 UTC 2016

Oops, this was suppose to go to the list
---------- Forwarded message ----------
From: "Aslak Knutsen" <aslak at redhat.com>
Date: Oct 20, 2016 14:07
Subject: Re: How would the 'search' URLs look like for the ALM Search
service ?
To: "Shoubhik Bose" <shbose at redhat.com>
Cc:

On Thu, Oct 20, 2016 at 12:32 PM, Shoubhik Bose <shbose at redhat.com> wrote:

> Thanks Aslak --
> Please find my comments inline:
>
> On Thu, Oct 20, 2016 at 3:14 PM, Aslak Knutsen <aslak at redhat.com> wrote:
>
>>
>>
>> On Thu, Oct 20, 2016 at 9:23 AM, Shoubhik Bose <shbose at redhat.com> wrote:
>>
>>> Hi folks,
>>>
>>> Wanted to discuss how the /api/search URL(s) would look like.
>>>
>>> To set the scope of the discussion right, I must mention that in this
>>> sprint we are working on supporting search on ID , URL and full text ( for
>>> title and description only ) of a workitem.
>>>
>>>
>>>
>>>    1. The ID search is currently only on workitems. ( But in future it
>>>    would be supported for other objects as well. Example, Users,etc. )
>>>
>>>    To search by ID --
>>>
>>>    /api/search/q*=id:4324*
>>>    This is going to look up the workitem table *only*.
>>>
>>>
>>>
>> I would remove the need for "id:". It's fine to support a direct field
>> search, but it shouldn't be required.
>>
>> Any object can have a "id", so it can't 'just' look in workitem table,
>> even tho that is the only supported for now.
>>
>
>
> If I'm doing an ID lookup, should I be doing a /api/search/q=20596  ?
>

Yes. It's fine to support "id:20596" and "20596" but both should 'at least'
find the Object with id 20596(the non direct field search could find more
as 20596 could match some text somewhere as well).

>
>
>>
>>
>>>
>>>    1.
>>>    2.
>>>    To search by URL --
>>>
>>>    /api/search/q*=url:http://demo.almighty.io/#/detail/71
>>>    <http://demo.almighty.io/#/detail/71>*
>>>
>>>
>>>
>> I would remove the need for "url:". It's fine to support a direct field
>> search, but it shouldn't be required.
>>
>>
>
> If I'm doing a URL lookup, should I be doing a /api/search/q=*http://demo
> <http://demo>.almighty.io/#/detail/71 <http://almighty.io/#/detail/71>*
> ?
>
>

Yes

>
>>>    1.
>>>
>>>    2.
>>>    Search by ID *AND* URL is not supported.
>>>    If a q=xxyzz  contains both URL and ID, we only pickup the URL and
>>>    discard the rest.
>>>
>>>    Example.
>>>    /api/search/q=*id:4324*+
>>>
>>> *url:http://demo.almighty.io/#/detail/71
>>>    <http://demo.almighty.io/#/detail/71> *is effectively
>>>
>>>    /api/search/q=
>>>
>>>
>>> *url:http://demo.almighty.io/#/detail/71
>>>    <http://demo.almighty.io/#/detail/71> *
>>>
>>> Perosnally I wouldn't ignore any fields, I would rather return no
>> result. There is no 'Object' with id 4324 and URL x.
>>
>> e.g. this JPQL is perfectly legal, it just doesn't return anything:
>>
>> issue = "ARQ-100" AND issue = "ARQ-200"
>>
>>
>
> Got it , makes sense.
>
>
>>
>>>    1.
>>>    2.
>>>    Free text search is supported only on workitem title and description
>>>
>>>    /api/search/q:=*title:some title substring*+*description:some
>>>    description substring*
>>>
>>>
>> I would remove the need for "title:" and "description:". It's fine to
>> support a direct field search, but it shouldn't be required.
>>
>> I would expect "some title substring some description substring" to end
>> up as a search like:
>>
>> *some* AND *title* AND *substring* AND *description" -> search across all
>> available fields for all objects
>>
>>
>
>
> If I'm doing a search as
>
> /api/search/q=20596 my_title my_description http://some_url
> ( here the spaces are AND )
>
> As per your below comment on delimiters,
>
> We get the following tokens:
>
> 1. 20596
> 2. my_title
> 3. my_description
> 4. http://some_url
>
> How would you recommend the search to go on from here?
>
>
>

That is a very good question.

1. We can detect that this 'might' be an ID  (ref 5.)
4. We can detect this is a known URL and extract the ID (ref 6.)

1. 2. and 3. are random pices of text, while 5. and 6. might be ID.

I think we can start of with the assumption that the user want to find
'1.-6. somewhere in the docuemnt, but not nessesarly in the same 'field''.

Assuming we have a document with

{
  id: 20596,
  title: "Label identification pattern",
  description: "As a User I want to have the labels / tags configured with
a predefined set of colors and options"
}

I would expect a query like: "20596 label user http://some_url/20596" to
find that document.
(note, the complete doc match the complete query but the individual fields
do not match the complete query)

I would expect match on: "20596"
I would expect match on: "user label"
I would expect match on: "http://demo.almighty.io/20596"
I would expect match on: "labels users"

I would not expect match on: "20595 http://demo.almighty.io/20596"
(assuming 20595 doesn't mention http://demo.almighty.io/20596 as text
somewhere)

I would not expect match on: "user label authentication"

>
>>>    1.
>>>
>>> 2.
>>>    The delimiter for multiple clauses ( we shall support "AND" now )
>>>    will be "+" . ( inspired from Github ) as already used above. We need to
>>>    consider situations where the string itself contains a "+" by escaping them
>>>    properly.
>>>
>>>
>>> I think tokenized on 'space' and use that as AND would work for now.
>>
>>
>>>
>>>    1.
>>>    2. if the search query contains a mix of ID, URL , title,description
>>>    :
>>>    The order in which we are going to look for fields to make a
>>>    decision on the search is:
>>>
>>>    1. URL
>>>    2. ID
>>>    3. title and description ( for free text search )
>>>
>>>
>>>
>>>
>>>    1.
>>>
>>>    Let me know your thoughts :)
>>>
>>>
>>>    -
>>>    Shoubhik
>>>
>>>
>>>
>>>
>>
>> -aslak-
>>
>
>
> ---
> Shoubhik
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/almighty-public/attachments/20161020/636da031/attachment.htm>