Text Search Using Lucene

Discussion:

Kumar,Abhishek

2016-11-28 23:31:03 UTC

Hi,

I am trying to implement text search in Jena via Fuseki. I have followed through the documentation and created assembler file.

But after starting fuseki server using config parameter - there is no data in the dataset and thus returns no results for simple query or text query.

What I have tried so far

1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file

2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file

3. Started the fuseki server using fuseki-server --config ../assembler_file.ttl

I tried the answer on Stackoverflow http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results

but using --desc gives error no service name.

Another user had similar issue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well.

Can someone please help here?

Thanks & Regards

Abhishek Kumar

Kumar,Abhishek

2016-11-28 23:39:17 UTC

Permalink

This is my config file

@prefix : <http://localhost/jena_example/#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .

@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .

@prefix text: <http://jena.apache.org/text#> .

@prefix fuseki: <http://jena.apache.org/fuseki#> .

## Example of a TDB dataset and text index

## Initialize TDB

[] ja:loadClass "org.apache.jena.tdb.TDB" .

tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .

tdb:GraphTDB rdfs:subClassOf ja:Model .

## Initialize text query

[] ja:loadClass "org.apache.jena.query.text.TextQuery" .

# A TextDataset is a regular dataset with a text index.

text:TextDataset rdfs:subClassOf ja:RDFDataset .

# Lucene index

text:TextIndexLucene rdfs:subClassOf text:TextIndex .

# Solr index

text:TextIndexSolr rdfs:subClassOf text:TextIndex .

## ---------------------------------------------------------------

## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type text:TextDataset ;

text:dataset <#dataset> ;

text:index <#indexLucene> ;

.

# A TDB datset used for RDF storage

<#dataset> rdf:type tdb:DatasetTDB ;

tdb:location "DB" ;

tdb:unionDefaultGraph true ; # Optional

.

# Text index description

<#indexLucene> a text:TextIndexLucene ;

text:directory <file:Lucene> ;

##text:directory "mem" ;

text:entityMap <#entMap> ;

.

# Mapping in the index

# URI stored in field "uri"

# rdfs:label is mapped to field "text"

<#entMap> a text:EntityMap ;

text:entityField "uri" ;

text:defaultField "text" ;

text:map (

[ text:field "text" ; text:predicate rdfs:label ]

) .

[] rdf:type fuseki:Server ;

# Server-wide context parameters can be given here.

# For example, to set query timeouts: on a server-wide basis:

# Format 1: "1000" -- 1 second timeout

# Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to for rest of query.

# See java doc for ARQ.queryTimeout

# ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "10000" ] ;

# Load custom code (rarely needed)

# ja:loadClass "your.code.Class" ;

# Services available. Only explicitly listed services are configured.

# If there is a service description not linked from this list, it is ignored.

fuseki:services (

<#service_text_tdb>

) .

<#service_text_tdb> rdf:type fuseki:Service ;

fuseki:name "Music" ; # http://host:port/tdb

fuseki:serviceQuery "query" ; # SPARQL query service

fuseki:serviceQuery "sparql" ; # SPARQL query service

fuseki:serviceUpdate "update" ; # SPARQL query service

fuseki:serviceUpload "upload" ; # Non-SPARQL upload service

fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read and write)

fuseki:dataset :text_dataset;

.

________________________________
From: Kumar,Abhishek <***@ufl.edu>
Sent: Monday, November 28, 2016 6:31:03 PM
To: ***@jena.apache.org
Subject: Text Search Using Lucene

Hi,

I am trying to implement text search in Jena via Fuseki. I have followed through the documentation and created assembler file.

But after starting fuseki server using config parameter - there is no data in the dataset and thus returns no results for simple query or text query.

What I have tried so far

1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file

2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file

3. Started the fuseki server using fuseki-server --config ../assembler_file.ttl

I tried the answer on Stackoverflow http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results

but using --desc gives error no service name.

Another user had similar issue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well.

Can someone please help here?

Thanks & Regards

Abhishek Kumar

Rob Vesse

2016-11-29 10:11:14 UTC

Permalink

One possible problem is that both your database and text index locations are given as relative paths. So depending on where on your system you run commands from you can get completely different results. I would strongly recommend using absolute paths if possible.

Rob

On 28/11/2016 23:39, "Kumar,Abhishek" <***@ufl.edu> wrote:

This is my config file

@prefix : <http://localhost/jena_example/#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .

@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .

@prefix text: <http://jena.apache.org/text#> .

@prefix fuseki: <http://jena.apache.org/fuseki#> .

## Example of a TDB dataset and text index

## Initialize TDB

[] ja:loadClass "org.apache.jena.tdb.TDB" .

tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .

tdb:GraphTDB rdfs:subClassOf ja:Model .

## Initialize text query

[] ja:loadClass "org.apache.jena.query.text.TextQuery" .

# A TextDataset is a regular dataset with a text index.

text:TextDataset rdfs:subClassOf ja:RDFDataset .

# Lucene index

text:TextIndexLucene rdfs:subClassOf text:TextIndex .

# Solr index

text:TextIndexSolr rdfs:subClassOf text:TextIndex .

## ---------------------------------------------------------------

## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type text:TextDataset ;

text:dataset <#dataset> ;

text:index <#indexLucene> ;

.

# A TDB datset used for RDF storage

<#dataset> rdf:type tdb:DatasetTDB ;

tdb:location "DB" ;

tdb:unionDefaultGraph true ; # Optional

.

# Text index description

<#indexLucene> a text:TextIndexLucene ;

text:directory <file:Lucene> ;

##text:directory "mem" ;

text:entityMap <#entMap> ;

.

# Mapping in the index

# URI stored in field "uri"

# rdfs:label is mapped to field "text"

<#entMap> a text:EntityMap ;

text:entityField "uri" ;

text:defaultField "text" ;

text:map (

[ text:field "text" ; text:predicate rdfs:label ]

) .

[] rdf:type fuseki:Server ;

# Server-wide context parameters can be given here.

# For example, to set query timeouts: on a server-wide basis:

# Format 1: "1000" -- 1 second timeout

# Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to for rest of query.

# See java doc for ARQ.queryTimeout

# ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "10000" ] ;

# Load custom code (rarely needed)

# ja:loadClass "your.code.Class" ;

# Services available. Only explicitly listed services are configured.

# If there is a service description not linked from this list, it is ignored.

fuseki:services (

<#service_text_tdb>

) .

<#service_text_tdb> rdf:type fuseki:Service ;

fuseki:name "Music" ; # http://host:port/tdb

fuseki:serviceQuery "query" ; # SPARQL query service

fuseki:serviceQuery "sparql" ; # SPARQL query service

fuseki:serviceUpdate "update" ; # SPARQL query service

fuseki:serviceUpload "upload" ; # Non-SPARQL upload service

fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read and write)

fuseki:dataset :text_dataset;

.

________________________________
From: Kumar,Abhishek <***@ufl.edu>
Sent: Monday, November 28, 2016 6:31:03 PM
To: ***@jena.apache.org
Subject: Text Search Using Lucene

Hi,

I am trying to implement text search in Jena via Fuseki. I have followed through the documentation and created assembler file.

But after starting fuseki server using config parameter - there is no data in the dataset and thus returns no results for simple query or text query.

What I have tried so far

1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file

2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file

3. Started the fuseki server using fuseki-server --config ../assembler_file.ttl

I tried the answer on Stackoverflow http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results

but using --desc gives error no service name.

Another user had similar issue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well.

Can someone please help here?

Thanks & Regards

Abhishek Kumar

Osma Suominen

2016-11-29 08:55:33 UTC

Permalink

Hi Abhishek,

What are the contents of the Lucene index directory (called "Lucene"
according to your configuration) after the text indexing operation?

I.e. is the directory
- nonexistent or completely empty?
- with a few empty or very small (up to a few kilobytes) files?
- with real index files of several megabytes?

You mention on StackOverflow that you are using Fuseki 2.0.0. That is a
rather old version, could you upgrade to something newer? I'm not sure
about which version of jena-text was included in 2.0.0 but it must be
old and I'm unsure about the issues it may have.

-Osma

Post by Kumar,Abhishek
Hi,
I am trying to implement text search in Jena via Fuseki. I have followed through the documentation and created assembler file.
But after starting fuseki server using config parameter - there is no data in the dataset and thus returns no results for simple query or text query.
What I have tried so far
1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file
2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file
3. Started the fuseki server using fuseki-server --config ../assembler_file.ttl
I tried the answer on Stackoverflow http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results
but using --desc gives error no service name.
Another user had similar issue a year ago as in this thread http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are no solutions there as well.
Can someone please help here?
Thanks & Regards
Abhishek Kumar

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi

Osma Suominen

2016-11-29 09:08:34 UTC

Permalink

Sorry I misinterpreted the StackOverflow link, so please ignore the part
about the version. I'm assuming you are using a recent Fuseki version.

-Osma

Post by Osma Suominen
Hi Abhishek,
What are the contents of the Lucene index directory (called "Lucene"
according to your configuration) after the text indexing operation?
I.e. is the directory
- nonexistent or completely empty?
- with a few empty or very small (up to a few kilobytes) files?
- with real index files of several megabytes?
You mention on StackOverflow that you are using Fuseki 2.0.0. That is a
rather old version, could you upgrade to something newer? I'm not sure
about which version of jena-text was included in 2.0.0 but it must be
old and I'm unsure about the issues it may have.
-Osma

Post by Kumar,Abhishek
Hi,
I am trying to implement text search in Jena via Fuseki. I have
followed through the documentation and created assembler file.
But after starting fuseki server using config parameter - there is no
data in the dataset and thus returns no results for simple query or
text query.
What I have tried so far
1. Built the TDB dataset using java -cp $FUSEKI_HOME/fuseki-server.jar
tdb.tdbloader --tdb=assembler_file data_file
2. Built the index using java -cp $FUSEKI_HOME/fuseki-server.jar
jena.textindexer --desc=assembler_file
3. Started the fuseki server using fuseki-server --config
../assembler_file.ttl
I tried the answer on Stackoverflow
http://stackoverflow.com/questions/30447536/fuseki-indexed-lucene-text-search-returns-no-results
but using --desc gives error no service name.
Another user had similar issue a year ago as in this thread
http://thread.gmane.org/gmane.comp.apache.jena.user/7892 but there are
no solutions there as well.
Can someone please help here?
Thanks & Regards
Abhishek Kumar