Discussion:
Boosting Lucene fields with ARQ and comparing ARQ to SIREn
Mikael Pesonen
2016-06-14 12:02:34 UTC
Permalink
Hi,

we are making a document search system which consists of document
database for storing text and Jena for storing all document metadata
(DCMI terms). We need to find documents by boosting certain metadata
fields over content, and also find similar documents with custom
boosting of fields. Search is targeted to content and metadata. In
search results we need to return all related metadata stored in Jena.

I have already made a separate Lucene index for content and some
metadata fields and just noticed ARQ extension can do that (yes, should
have read Jena documentation first). But is it possible to boost Lucene
fields for search and similar when using ARQ?

Also found this SIREn:
http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf

Does anyone have any experience on SIREn, how does it compare to ARQ?

Thanks,
Mikael
--
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: ***@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Andy Seaborne
2016-06-15 20:56:30 UTC
Permalink
Post by Mikael Pesonen
Hi,
we are making a document search system which consists of document
database for storing text and Jena for storing all document metadata
(DCMI terms). We need to find documents by boosting certain metadata
fields over content, and also find similar documents with custom
boosting of fields. Search is targeted to content and metadata. In
search results we need to return all related metadata stored in Jena.
I have already made a separate Lucene index for content and some
metadata fields and just noticed ARQ extension can do that (yes, should
have read Jena documentation first). But is it possible to boost Lucene
fields for search and similar when using ARQ?
At query time:

The query string can be any Lucene syntax so the "^" operator should work.

At index build time through ARQ:
Sorry - don't know for sure ; it doesn't look like it.

Andy
Post by Mikael Pesonen
http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf
Does anyone have any experience on SIREn, how does it compare to ARQ?
Thanks,
Mikael
Mikael Pesonen
2016-06-16 08:48:58 UTC
Permalink
Ok thanks! Looks like we need result highlighting too so seems to be
best so stick with separate Lucene at this time.

So basically I'm duplicating all rdf data in Lucene index so not the
most elegant solution...

Br,
Mikael
Post by Andy Seaborne
Post by Mikael Pesonen
Hi,
we are making a document search system which consists of document
database for storing text and Jena for storing all document metadata
(DCMI terms). We need to find documents by boosting certain metadata
fields over content, and also find similar documents with custom
boosting of fields. Search is targeted to content and metadata. In
search results we need to return all related metadata stored in Jena.
I have already made a separate Lucene index for content and some
metadata fields and just noticed ARQ extension can do that (yes, should
have read Jena documentation first). But is it possible to boost Lucene
fields for search and similar when using ARQ?
The query string can be any Lucene syntax so the "^" operator should work.
Sorry - don't know for sure ; it doesn't look like it.
Andy
Post by Mikael Pesonen
http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf
Does anyone have any experience on SIREn, how does it compare to ARQ?
Thanks,
Mikael
--
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: ***@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Osma Suominen
2016-06-16 10:59:13 UTC
Permalink
Hi Mikael!

I assume you are talking about using (or not) jena-text here?

If needed features such as result highlighting are missing from
jena-text, please consider creating one or more JIRA issues on
issues.apache.org so that they can be discussed and possibly addressed
in future versions. Also pull requests for jena-text are very welcome!

The idea with jena-text is to have text index functionality built in to
the RDF store, so that there is no need for an application to maintain
an external Lucene (or similar) index. It obviously exposes only a
subset of Lucene (or Solr, elasticsearch and the like) capabilities, but
the subset has expanded over time according to users' requirements.

-Osma
Post by Mikael Pesonen
Ok thanks! Looks like we need result highlighting too so seems to be
best so stick with separate Lucene at this time.
So basically I'm duplicating all rdf data in Lucene index so not the
most elegant solution...
Br,
Mikael
Post by Andy Seaborne
Post by Mikael Pesonen
Hi,
we are making a document search system which consists of document
database for storing text and Jena for storing all document metadata
(DCMI terms). We need to find documents by boosting certain metadata
fields over content, and also find similar documents with custom
boosting of fields. Search is targeted to content and metadata. In
search results we need to return all related metadata stored in Jena.
I have already made a separate Lucene index for content and some
metadata fields and just noticed ARQ extension can do that (yes, should
have read Jena documentation first). But is it possible to boost Lucene
fields for search and similar when using ARQ?
The query string can be any Lucene syntax so the "^" operator should work.
Sorry - don't know for sure ; it doesn't look like it.
Andy
Post by Mikael Pesonen
http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf
Does anyone have any experience on SIREn, how does it compare to ARQ?
Thanks,
Mikael
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
Mikael Pesonen
2016-06-16 11:07:22 UTC
Permalink
Hi Osma!

yes sorry I was talking about jena-text. It would really be great to be
able to use one tool for all RDF and text index queries. I'll check how
JIRA works and what is submitted there already.

Br,
Mikael
Post by Osma Suominen
Hi Mikael!
I assume you are talking about using (or not) jena-text here?
If needed features such as result highlighting are missing from
jena-text, please consider creating one or more JIRA issues on
issues.apache.org so that they can be discussed and possibly addressed
in future versions. Also pull requests for jena-text are very welcome!
The idea with jena-text is to have text index functionality built in
to the RDF store, so that there is no need for an application to
maintain an external Lucene (or similar) index. It obviously exposes
only a subset of Lucene (or Solr, elasticsearch and the like)
capabilities, but the subset has expanded over time according to
users' requirements.
-Osma
Post by Mikael Pesonen
Ok thanks! Looks like we need result highlighting too so seems to be
best so stick with separate Lucene at this time.
So basically I'm duplicating all rdf data in Lucene index so not the
most elegant solution...
Br,
Mikael
Post by Andy Seaborne
Post by Mikael Pesonen
Hi,
we are making a document search system which consists of document
database for storing text and Jena for storing all document metadata
(DCMI terms). We need to find documents by boosting certain metadata
fields over content, and also find similar documents with custom
boosting of fields. Search is targeted to content and metadata. In
search results we need to return all related metadata stored in Jena.
I have already made a separate Lucene index for content and some
metadata fields and just noticed ARQ extension can do that (yes, should
have read Jena documentation first). But is it possible to boost Lucene
fields for search and similar when using ARQ?
The query string can be any Lucene syntax so the "^" operator should work.
Sorry - don't know for sure ; it doesn't look like it.
Andy
Post by Mikael Pesonen
http://semtech2011.semanticweb.com/uploads/handouts/THUR_1110_Hugo_3867.pdf
Does anyone have any experience on SIREn, how does it compare to ARQ?
Thanks,
Mikael
--
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: ***@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
Loading...