[katello-devel] search over rest api - interface design

Mon Jul 18 17:59:53 UTC 2011

On 07/18/2011 02:52 AM, Amos Benari wrote:
>
> ----- Original Message -----
>> From: "Justin Sherrill"<jsherril at redhat.com>
>> To: katello-devel at redhat.com
>> Sent: Monday, July 18, 2011 4:28:17 AM
>> Subject: Re: [katello-devel] search over rest api - interface design
>> On 07/17/2011 11:32 AM, Amos Benari wrote:
>>> ----- Original Message -----
>>>> From: "Lukas Zapletal"<lzap at redhat.com>
>>>> To: katello-devel at redhat.com
>>>> Sent: Thursday, July 14, 2011 4:07:24 PM
>>>> Subject: Re: [katello-devel] search over rest api - interface
>>>> design
>>>> On 07/14/2011 12:01 PM, Amos Benari wrote:
>>>    ...
>>>> Well I am surprised that scoped_search is not a fulltext. As a
>>>> long-time Apache Lucene user I have to recommend this project.
>>> Lucene vs. scoped search
>>> -------------------------
>>> I am not really going to compere them because it's a bit like
>>> comparing oranges to apples.
>>> I am just going to write a short description of scoped search and
>>> try to point what use case will each fit.
>>>
>>> scoped_search is not an indexer, it's also not a data store. It's a
>>> light weight query parser, auto-completer and SQL query builder.
>>> It enables feature rich search box in the GUI and a powerful API, on
>>> top of any RDBMS, no schema change is needed.
>>> scoped search on top of RDBMS can easily handle structured data, it
>>> returns updated data as soon as it's stored in the database.
>>> No scoring, no ranking, just a simple "order by" clause in the SQL.
>>> It will scale as much as the underline database does.
>>>
>>> Lucene will index your data, store it in a document format optimized
>>> for searching.
>>> It will scale-out well as long as you can live with eventual
>>> consistency.
>>>
>>> So does Lucene it fit our needs?
>>> In Kalpana, Foreman and Candlepin the data set is not large enough
>>> to make a scale issue.
>> Is this true? From earlier discussions I seem to remember that Katello
>> should be able to support ~100,000 systems eventually. I would think
>> this could indicate scale may be an issue?
> Well, I guess that if the rest of the application is going to behave nice with
>   ~100K systems then a search over the database should be fine.
> I think that the main area of concern here is if we are going to have a very high
> rate of updates to the systems table, it might slow read operation significantly.
> Otherwise, indexes and cache of the database engine should be able handle that scale.
> Amos.
>
I wouldn't think a high number of updates would happen very often (e.g. 
you aren't going to update 100,000 systems very often).  But the reading 
of many systems could happen quite a bit (through search).  You may 
schedule some action to ~10,000 systems, but that type of data probably 
won't be searchable (but maybe?).

Currently in the scoped_search we search across all fields of an object 
when you search.  To extend this thinking to systems(which i assume we 
want to do in order to be consistent?) , lets say you search for 
'192.168.0.2'.  If we do not designate what field this actually is 
searching and want to search across all system data, we would need to 
search system information (including facts) in Candlepin (to possibly 
get what products they are subscribed to), pulp (in order to search 
package profile information), katello (to search any metadata we have 
there), and foreman (in order to get configuration related information 
and puppet facts).

This seems daunting, and I don't know if scoped search scale well or not 
in the above case, but I ++ to brad's idea of doing some performance 
testing.

-Justin

>>> In Pulp we already have a no-sql data store (mongoDB) with it's own
>>> search interface.
>>>
>>> Adding an external index store for searching can be an interesting
>>> idea, but it comes in the cost of:
>>> modeling the searchable data into document format.
>>> Updating the index on write, otherwise when a user update an item,
>>> he might not see the changes he made reflected in the GUI.
>>> RBAC model needs to be re-implemented in the index data store to
>>> prevent reading unauthorized data.
>>>
>>> Seems to me a bit of a complex solution.
>>> Amos.
>>>
>>> _______________________________________________
>>> katello-devel mailing list
>>> katello-devel at redhat.com
>>> https://www.redhat.com/mailman/listinfo/katello-devel
>> _______________________________________________
>> katello-devel mailing list
>> katello-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/katello-devel