Discussion:
Extending Jena Text to Support ElasticSearch as Indexing/Querying Engine
anuj kumar
2017-02-14 11:32:59 UTC
Permalink
Hi,
I am working on an application where my data (in N-Triple format) is
stored in HBase and now we want to provide a Free Text search capability to
our end users and we have decided to use ElasticSearch (Not listing the
actual reasons here for keeping things simple but in jist using Lucene or
Solr directly has been ruled out).

Thus I started looking at how Jena works with existing indexing
capabilities (Lucene and Solr) to extend Jena to support ElasticSearch.
I figured out that I probably need to perform the following steps/changes:


- Create a Java Class that extends org.apache.jena.query.text.TextIndex
class. I called this Java class: TextIndexES.java. This class is simply
a Copy Past of TextIndexLucene class.
- Create a Java Class that extends
org.apache.jena.assembler.assemblers.AssemblerBase java class. I called
this Java class: TextIndexESAssembler.java
- Update the org.apache.jena.query.text.TextDataFactory.java class to
include a new method :

public static TextIndex createESIndex(Directory dir,
TextIndexConfig config) {}

This method initiates the TextIndexES class, in case there is no
MultiLingual Support specified.

- Create a TTL class to include ES Index mapping capabilities
- Create a simple test that tries to load this TTL class.

The Test fails with the following error:

org.apache.jena.assembler.exceptions.NoSpecificTypeException: the root
file:///Users/LT-Mac-Akumar/personal-projects/jena/jena-text/testing/TextQuery/text-config-es.ttl#indexES
has no most specific type that is a subclass of ja:Object

doing:
root: http://localhost/jena_example/#text_dataset with type:
http://jena.apache.org/text#TextDataset assembler class: class
org.apache.jena.query.text.assembler.TextDatasetAssembler

at
org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:125)
at
org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:81)
at
org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:39)
at
org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:35)
at
org.apache.jena.query.text.assembler.TextDatasetAssembler.open(TextDatasetAssembler.java:62)
at
org.apache.jena.query.text.assembler.TextDatasetAssembler.open(TextDatasetAssembler.java:42)
at
org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:143)
at
org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.open(AssemblerGroup.java:130)
at
org.apache.jena.assembler.assemblers.AssemblerGroup$ExpandingAssemblerGroup.open(AssemblerGroup.java:81)
at
org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:39)
at
org.apache.jena.assembler.assemblers.AssemblerBase.open(AssemblerBase.java:35)
at org.apache.jena.query.DatasetFactory.assemble(DatasetFactory.java:290)
at org.apache.jena.query.DatasetFactory.assemble(DatasetFactory.java:264)
at
org.apache.jena.query.text.TestBuildTextDataset.createAssembler(TestBuildTextDataset.java:124)
at
org.apache.jena.query.text.TestBuildTextDataset.buildText_99(TestBuildTextDataset.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:119)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)


Can some one please point me as to what I am doing wrong or what I may have
missed to update?
For reference, all the classes that I created or modified are attached, so
that, if required, the issue can be reproduced.

Thanks and looking forward to some pointers/resolutions.
--
*Anuj Kumar*
Lorenz B.
2017-02-14 11:46:19 UTC
Permalink
Attachments do not work on this mailing list, thus, it's better to share
resources via some service like Github etc.
Post by anuj kumar
*Anuj Kumar*
--
Lorenz BÃŒhmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center
anuj kumar
2017-02-14 12:06:44 UTC
Permalink
Thanks Lorenz for the quick headsup. Here is the Github link to the listed
files : https://github.com/EaseTech/jena-text

Thanks,
Anuj Kumar

On Tue, Feb 14, 2017 at 12:46 PM, Lorenz B. <
Post by Lorenz B.
Attachments do not work on this mailing list, thus, it's better to share
resources via some service like Github etc.
Post by anuj kumar
*Anuj Kumar*
--
Lorenz BÃŒhmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center
--
*Anuj Kumar*
Osma Suominen
2017-02-14 12:47:25 UTC
Permalink
Hi Anuj,

I'm not sure what the problem is - maybe others more familiar with the
assembler can help - but would it be helpful to work on a fork of the
Jena source tree instead of a separate project? Then all the scaffolding
to load the right classes etc. would already be in place. Maybe you are
already doing it that way (I see that the package declaration is
"package org.apache.jena.query.text;") but it's not obvious from the
files you posted to GitHub.

If you make a good implementation of jena-text with ES (including
writing unit tests), I don't see why it couldn't later be merged to Jena
itself. If you were working on a fork, you could then do a pull request
so that it can be reviewed and, if appropriate, merged.

-Osma
Post by anuj kumar
Thanks Lorenz for the quick headsup. Here is the Github link to the listed
files : https://github.com/EaseTech/jena-text
Thanks,
Anuj Kumar
On Tue, Feb 14, 2017 at 12:46 PM, Lorenz B. <
Post by Lorenz B.
Attachments do not work on this mailing list, thus, it's better to share
resources via some service like Github etc.
Post by anuj kumar
*Anuj Kumar*
--
Lorenz Bühmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
anuj kumar
2017-02-14 13:03:14 UTC
Permalink
Thanks Osma.
I was working on a local copy of Jena source code initially.
I have now forked Jena and added my specific files as I specified in my
previous email to ease debugging by more experienced Jena developers.
The forked repo can be found here : https://github.com/EaseTech/jena

You will see that most of the code in these new files is simply the one
that existed for Lucene based files. My first goal is toinstantiate the
TextIndexES file and get the test case working. I will then move to
implement the actual ES code, which IMO, should be much faster.

Thanks,
Anuj Kumar
Post by Osma Suominen
Hi Anuj,
I'm not sure what the problem is - maybe others more familiar with the
assembler can help - but would it be helpful to work on a fork of the Jena
source tree instead of a separate project? Then all the scaffolding to load
the right classes etc. would already be in place. Maybe you are already
doing it that way (I see that the package declaration is "package
org.apache.jena.query.text;") but it's not obvious from the files you
posted to GitHub.
If you make a good implementation of jena-text with ES (including writing
unit tests), I don't see why it couldn't later be merged to Jena itself. If
you were working on a fork, you could then do a pull request so that it can
be reviewed and, if appropriate, merged.
-Osma
Post by anuj kumar
Thanks Lorenz for the quick headsup. Here is the Github link to the listed
files : https://github.com/EaseTech/jena-text
Thanks,
Anuj Kumar
On Tue, Feb 14, 2017 at 12:46 PM, Lorenz B. <
Attachments do not work on this mailing list, thus, it's better to share
Post by Lorenz B.
resources via some service like Github etc.
*Anuj Kumar*
--
Lorenz BÃŒhmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
Osma Suominen
2017-02-14 13:13:33 UTC
Permalink
Post by anuj kumar
I was working on a local copy of Jena source code initially.
I have now forked Jena and added my specific files as I specified in my
previous email to ease debugging by more experienced Jena developers.
The forked repo can be found here : https://github.com/EaseTech/jena
You will see that most of the code in these new files is simply the one
that existed for Lucene based files. My first goal is toinstantiate the
TextIndexES file and get the test case working. I will then move to
implement the actual ES code, which IMO, should be much faster.
Great, I hope you get this working!

If you feel that you are duplicating existing Lucene code in your ES
implementation, consider abstracting that out into e.g. a common
superclass instead. This is something that already bothers me in the
current Lucene vs Solr implementations - there's even a "DRY" comment in
the code showing that somebody else has thought about it too.

Also it might be helpful to try to reuse all the Lucene unit tests for
ES as well, if you can figure out a way to do that.

-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
anuj kumar
2017-02-14 13:15:42 UTC
Permalink
I will do it. But I need to first get the simple test working in order to
move forward. I hope I someone here can help me.

Thanks,
Anuj Kumar
Post by Osma Suominen
Post by anuj kumar
I was working on a local copy of Jena source code initially.
I have now forked Jena and added my specific files as I specified in my
previous email to ease debugging by more experienced Jena developers.
The forked repo can be found here : https://github.com/EaseTech/jena
You will see that most of the code in these new files is simply the one
that existed for Lucene based files. My first goal is toinstantiate the
TextIndexES file and get the test case working. I will then move to
implement the actual ES code, which IMO, should be much faster.
Great, I hope you get this working!
If you feel that you are duplicating existing Lucene code in your ES
implementation, consider abstracting that out into e.g. a common superclass
instead. This is something that already bothers me in the current Lucene vs
Solr implementations - there's even a "DRY" comment in the code showing
that somebody else has thought about it too.
Also it might be helpful to try to reuse all the Lucene unit tests for ES
as well, if you can figure out a way to do that.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
Osma Suominen
2017-02-14 13:22:48 UTC
Permalink
Post by anuj kumar
I will do it. But I need to first get the simple test working in order to
move forward. I hope I someone here can help me.
Maybe you need to add an implementWith declaration to TextAssembler.java?

-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
anuj kumar
2017-02-14 13:27:13 UTC
Permalink
My saviour Osma. It worked :)
Thanks for pointing that out. Really appreciate it.
I am now to my next task. Implementing the actual code for ElasticSearch
integration with Jena.

Thanks once again.

Anuj Kumar
Post by Osma Suominen
Post by anuj kumar
I will do it. But I need to first get the simple test working in order to
move forward. I hope I someone here can help me.
Maybe you need to add an implementWith declaration to TextAssembler.java?
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
anuj kumar
2017-02-27 12:08:24 UTC
Permalink
Hi All,

*Apologies for the long email.*

As some of you know, I have been working on extending Jena to Support
ElasticSearch for Text Indexing (in addition to Lucene and Solr).

I have come to a point where I have a basic (read non-prod) code that can
index RDFS:label text data into ElasticSearch 5.2.1
The code is working and testable. You simply have to download elasticsearch
5.2.1 and run it locally for executing the test within the ES
implementation.
The code is NOT production Ready but just a PoC code. You can find the
first cut of the code here: https://github.com/EaseTech/jena (look inside
the module jena-text-es)

I need feedback from Jena maintainers and community, in terms of the
structuring of the code as this is important for me to finalize before I
move to implement the full blown Production Ready code for Jean Text
ElasticSearch Integration.

Here is the short description of what I did and the reasoning behind it:

1. Created a separate module : *jena-text-es *that extends from *jena-text*
AND excludes all the Lucene related and Solr related dependencies. The
reason I had to do it was that* jena-text* module depends on Lucene version
4.9.1 whereas ElasticSearch 5.2.1 version depends on Lucene 6.4.1. This was
resulting in the conflicts of Lucene version if I created the code for
ElasticSearch support within the *jena-text *module. Thus the need to
create a separate module.
2. A side effect of creating a separate module meant, I had to extend the
TextDataSetFactory.java class present in the *jena-text *module to include
methods for creating ElasticSearch index objects. I named it
ESTextDataSetFactory. At this point in time I do not know if this is the
right approach or if Jena ALWAYS instantiates Index objects using the
TextDataSetFactory.java class. My initial investigation showed it is fine,
but I want the people who are experts in Jena to please confirm.
3. I have tested a simple integration with ElasticSearch by defining a test
class under
src/test/java/org/apache/jena/query/text/TestBuildTextDataSet.java. You can
run this test by first starting an instance of Elasticsearch 5.2.1 locally.

*My Queries*
1. Is it acceptable by the Jena community that I create a separate module
for support of ElasticSearch and call it *jena-text-es*?
2. Is it fine if I extend the TextDataSetFactory.java class within the
*jena-text-es
*module?

*Food for Thought*

While implementing the ElasticSearch Integration, I could not help but
notice that the module *jena-text *not only contains the core classes for
performing text queries, but also contains technology specific (for eg.
Lucene and Solr) classes.
IMO, these should be separate and defined in their own modules to enable
separation of concerns.
This will also help in easier maintenance and extensions to be added later
on.

I think we should have the following modules:

jena-text - Containing core Jena text specific classes that are technology
agnostic.
jena-text-lucene - Lucene specific implementation of Jena-Text
jena-text-solr - Solr specific implementation of Jena-Text
jena-text-es - ElasticSearch specific implementation of Jena-Text

What does everyone think?

Thanks,
Anuj Kumar
Post by anuj kumar
My saviour Osma. It worked :)
Thanks for pointing that out. Really appreciate it.
I am now to my next task. Implementing the actual code for ElasticSearch
integration with Jena.
Thanks once again.
Anuj Kumar
Post by Osma Suominen
Post by anuj kumar
I will do it. But I need to first get the simple test working in order to
move forward. I hope I someone here can help me.
Maybe you need to add an implementWith declaration to TextAssembler.java?
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
--
*Anuj Kumar*
Osma Suominen
2017-02-28 08:20:40 UTC
Permalink
Hi Anuj!

Congratulations for getting the PoC working!

I'm not sure I like the idea of having a separate jena-text-es module.

Am I right that your main concern with creating a separate module is
that the Elasticsearch client library requires a newer Lucene version
than what jena-text currently uses? In that case, I think the solution
should be upgrading the Lucene version everywhere, i.e. the current
jena-text and jena-spatial modules. This work has already started (see
JENA-1250) but it has recently stalled and has not yet been merged.

I don't think it should be a problem to have multiple implementations
(Lucene, Solr, ES) within the same module. Ideally a lot of the
infrastructure could be shared (which is of course possible also with
separate modules, as you have done), and I would hope that also the unit
tests could be reused for the different implementations, although that
is currently not the case (the unit tests only target Lucene).

The Solr side of jena-text has unfortunately bitrotted even more than
the Lucene support. I've previously suggested that it should be removed
entirely [1], but there were no responses to my suggestion at the time.

-Osma
Post by anuj kumar
Hi All,
*Apologies for the long email.*
As some of you know, I have been working on extending Jena to Support
ElasticSearch for Text Indexing (in addition to Lucene and Solr).
I have come to a point where I have a basic (read non-prod) code that can
index RDFS:label text data into ElasticSearch 5.2.1
The code is working and testable. You simply have to download elasticsearch
5.2.1 and run it locally for executing the test within the ES
implementation.
The code is NOT production Ready but just a PoC code. You can find the
first cut of the code here: https://github.com/EaseTech/jena (look inside
the module jena-text-es)
I need feedback from Jena maintainers and community, in terms of the
structuring of the code as this is important for me to finalize before I
move to implement the full blown Production Ready code for Jean Text
ElasticSearch Integration.
1. Created a separate module : *jena-text-es *that extends from *jena-text*
AND excludes all the Lucene related and Solr related dependencies. The
reason I had to do it was that* jena-text* module depends on Lucene version
4.9.1 whereas ElasticSearch 5.2.1 version depends on Lucene 6.4.1. This was
resulting in the conflicts of Lucene version if I created the code for
ElasticSearch support within the *jena-text *module. Thus the need to
create a separate module.
2. A side effect of creating a separate module meant, I had to extend the
TextDataSetFactory.java class present in the *jena-text *module to include
methods for creating ElasticSearch index objects. I named it
ESTextDataSetFactory. At this point in time I do not know if this is the
right approach or if Jena ALWAYS instantiates Index objects using the
TextDataSetFactory.java class. My initial investigation showed it is fine,
but I want the people who are experts in Jena to please confirm.
3. I have tested a simple integration with ElasticSearch by defining a test
class under
src/test/java/org/apache/jena/query/text/TestBuildTextDataSet.java. You can
run this test by first starting an instance of Elasticsearch 5.2.1 locally.
*My Queries*
1. Is it acceptable by the Jena community that I create a separate module
for support of ElasticSearch and call it *jena-text-es*?
2. Is it fine if I extend the TextDataSetFactory.java class within the
*jena-text-es
*module?
*Food for Thought*
While implementing the ElasticSearch Integration, I could not help but
notice that the module *jena-text *not only contains the core classes for
performing text queries, but also contains technology specific (for eg.
Lucene and Solr) classes.
IMO, these should be separate and defined in their own modules to enable
separation of concerns.
This will also help in easier maintenance and extensions to be added later
on.
jena-text - Containing core Jena text specific classes that are technology
agnostic.
jena-text-lucene - Lucene specific implementation of Jena-Text
jena-text-solr - Solr specific implementation of Jena-Text
jena-text-es - ElasticSearch specific implementation of Jena-Text
What does everyone think?
Thanks,
Anuj Kumar
Post by anuj kumar
My saviour Osma. It worked :)
Thanks for pointing that out. Really appreciate it.
I am now to my next task. Implementing the actual code for ElasticSearch
integration with Jena.
Thanks once again.
Anuj Kumar
Post by Osma Suominen
Post by anuj kumar
I will do it. But I need to first get the simple test working in order to
move forward. I hope I someone here can help me.
Maybe you need to add an implementWith declaration to TextAssembler.java?
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
A. Soroka
2017-02-28 15:12:22 UTC
Permalink
I second Osma's congrats!

Do we want to take this into account:

https://lists.apache.org/thread.html/***@1431107516@%3Cdev.jena.apache.org%3E

? In other words, might it be better to factor out between -text and -spatial and _then_ try to upgrade the Lucene version?

I don't use the Solr component now, but I could easily see so doing... that's pretty vague, I know, and I'm not in a position to do any work to maintain it, so consider that just a very small and blurry data point. :)


---
A. Soroka
The University of Virginia Library
Post by Osma Suominen
Hi Anuj!
Congratulations for getting the PoC working!
I'm not sure I like the idea of having a separate jena-text-es module.
Am I right that your main concern with creating a separate module is that the Elasticsearch client library requires a newer Lucene version than what jena-text currently uses? In that case, I think the solution should be upgrading the Lucene version everywhere, i.e. the current jena-text and jena-spatial modules. This work has already started (see JENA-1250) but it has recently stalled and has not yet been merged.
I don't think it should be a problem to have multiple implementations (Lucene, Solr, ES) within the same module. Ideally a lot of the infrastructure could be shared (which is of course possible also with separate modules, as you have done), and I would hope that also the unit tests could be reused for the different implementations, although that is currently not the case (the unit tests only target Lucene).
The Solr side of jena-text has unfortunately bitrotted even more than the Lucene support. I've previously suggested that it should be removed entirely [1], but there were no responses to my suggestion at the time.
-Osma
Post by anuj kumar
Hi All,
*Apologies for the long email.*
As some of you know, I have been working on extending Jena to Support
ElasticSearch for Text Indexing (in addition to Lucene and Solr).
I have come to a point where I have a basic (read non-prod) code that can
index RDFS:label text data into ElasticSearch 5.2.1
The code is working and testable. You simply have to download elasticsearch
5.2.1 and run it locally for executing the test within the ES
implementation.
The code is NOT production Ready but just a PoC code. You can find the
first cut of the code here: https://github.com/EaseTech/jena (look inside
the module jena-text-es)
I need feedback from Jena maintainers and community, in terms of the
structuring of the code as this is important for me to finalize before I
move to implement the full blown Production Ready code for Jean Text
ElasticSearch Integration.
1. Created a separate module : *jena-text-es *that extends from *jena-text*
AND excludes all the Lucene related and Solr related dependencies. The
reason I had to do it was that* jena-text* module depends on Lucene version
4.9.1 whereas ElasticSearch 5.2.1 version depends on Lucene 6.4.1. This was
resulting in the conflicts of Lucene version if I created the code for
ElasticSearch support within the *jena-text *module. Thus the need to
create a separate module.
2. A side effect of creating a separate module meant, I had to extend the
TextDataSetFactory.java class present in the *jena-text *module to include
methods for creating ElasticSearch index objects. I named it
ESTextDataSetFactory. At this point in time I do not know if this is the
right approach or if Jena ALWAYS instantiates Index objects using the
TextDataSetFactory.java class. My initial investigation showed it is fine,
but I want the people who are experts in Jena to please confirm.
3. I have tested a simple integration with ElasticSearch by defining a test
class under
src/test/java/org/apache/jena/query/text/TestBuildTextDataSet.java. You can
run this test by first starting an instance of Elasticsearch 5.2.1 locally.
*My Queries*
1. Is it acceptable by the Jena community that I create a separate module
for support of ElasticSearch and call it *jena-text-es*?
2. Is it fine if I extend the TextDataSetFactory.java class within the
*jena-text-es
*module?
*Food for Thought*
While implementing the ElasticSearch Integration, I could not help but
notice that the module *jena-text *not only contains the core classes for
performing text queries, but also contains technology specific (for eg.
Lucene and Solr) classes.
IMO, these should be separate and defined in their own modules to enable
separation of concerns.
This will also help in easier maintenance and extensions to be added later
on.
jena-text - Containing core Jena text specific classes that are technology
agnostic.
jena-text-lucene - Lucene specific implementation of Jena-Text
jena-text-solr - Solr specific implementation of Jena-Text
jena-text-es - ElasticSearch specific implementation of Jena-Text
What does everyone think?
Thanks,
Anuj Kumar
Post by anuj kumar
My saviour Osma. It worked :)
Thanks for pointing that out. Really appreciate it.
I am now to my next task. Implementing the actual code for ElasticSearch
integration with Jena.
Thanks once again.
Anuj Kumar
Post by Osma Suominen
Post by anuj kumar
I will do it. But I need to first get the simple test working in order to
move forward. I hope I someone here can help me.
Maybe you need to add an implementWith declaration to TextAssembler.java?
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
Osma Suominen
2017-02-28 16:23:40 UTC
Permalink
Post by A. Soroka
? In other words, might it be better to factor out between -text and -spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer to do
the actual work!
Post by A. Soroka
I don't use the Solr component now, but I could easily see so doing... that's pretty vague, I know, and I'm not in a position to do any work to maintain it, so consider that just a very small and blurry data point. :)
Last time I tried it (it was a while ago) I couldn't figure out how to
get it running... If you could just try that with some toy data, then
your data point would be a lot less blurry :) I haven't used Solr for
anything, so I'm not very familiar with how to set it up, and the
jena-text instructions are pretty vague unfortunately.

-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
anuj kumar
2017-02-28 22:47:03 UTC
Permalink
Hi,

My 2 Cents :

The reason I proposed to have separate modules for Lucene, Solr and ES is
exactly for avoiding the "All or Nothing" approach we need to take if we
club them all together. If they stay together and if in the near future I
want to upgrade ES to another version, I also need to again upgrade Lucene
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not months to
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of Jena-Text as
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.

If they are developed as separate modules, they can evolve independently of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on Solr
Implementation.

We can start with having a separate Module for Jena Text ES and see how
things go. If they go well, we could extract out Solr and Lucene out of
Jena Text.

Again this is just a suggestion based on my limited industry experience.

Thanks,
Anuj Kumar
Post by Osma Suominen
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer to do
the actual work!
I don't use the Solr component now, but I could easily see so doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any work to
maintain it, so consider that just a very small and blurry data point. :)
Last time I tried it (it was a while ago) I couldn't figure out how to get
it running... If you could just try that with some toy data, then your data
point would be a lot less blurry :) I haven't used Solr for anything, so
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
Osma Suominen
2017-03-01 06:27:33 UTC
Permalink
Hi Anuj,

I understand your concerns. However, we also need to balance between the
needs of individual modules/features and the whole codebase. I'm willing
to put in the effort to keep the other modules up to date with newer
Lucene versions. Lucene upgrade requirements are well documented, the
only hitches seen in JENA-1250 were related to how jena-text (ab)used
some Lucene features that were dropped from newer versions.

A perhaps stupid question to more experienced Java developers: is it
even possible to mix modules that depend on different versions of the
Lucene libraries within the same project? In my (quite limited)
understanding of Java projects and libraries, this requires special
arrangements (e.g. shading) as the Java package/class namespace is
shared by all the code running within the same JVM.

So can you create, say, a Fuseki build that contains the current
jena-text module (depending on Lucene 4.x) and the new jena-text-es
module (depending on Lucene 6.4.1) without any compatibility issues?

-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and ES is
exactly for avoiding the "All or Nothing" approach we need to take if we
club them all together. If they stay together and if in the near future I
want to upgrade ES to another version, I also need to again upgrade Lucene
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not months to
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of Jena-Text as
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve independently of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on Solr
Implementation.
We can start with having a separate Module for Jena Text ES and see how
things go. If they go well, we could extract out Solr and Lucene out of
Jena Text.
Again this is just a suggestion based on my limited industry experience.
Thanks,
Anuj Kumar
Post by Osma Suominen
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer to do
the actual work!
I don't use the Solr component now, but I could easily see so doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any work to
maintain it, so consider that just a very small and blurry data point. :)
Last time I tried it (it was a while ago) I couldn't figure out how to get
it running... If you could just try that with some toy data, then your data
point would be a lot less blurry :) I haven't used Solr for anything, so
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
anuj kumar
2017-03-01 09:03:04 UTC
Permalink
Hi Osma,

I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I will
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)

For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"

I actually do not understand what you mean by mixing modules. I assume you
mean having jena-text and jena-text-es as dependencies in a build without
causing the build to conflict. If that is what you mean than the answer is
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.

*Assumption:*
1. At a given point in time, only a single Indexing Technology is used for
text based indexing and searching via Jean. What this means is that we will
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.

Based on these assumptions it is possible to create a build that contains
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very beginning
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom of
jena-text-es module here to see how it can be done :
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml


Thanks,
Anuj Kumar
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between the
needs of individual modules/features and the whole codebase. I'm willing to
put in the effort to keep the other modules up to date with newer Lucene
versions. Lucene upgrade requirements are well documented, the only hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it even
possible to mix modules that depend on different versions of the Lucene
libraries within the same project? In my (quite limited) understanding of
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current jena-text
module (depending on Lucene 4.x) and the new jena-text-es module (depending
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and ES is
exactly for avoiding the "All or Nothing" approach we need to take if we
club them all together. If they stay together and if in the near future I
want to upgrade ES to another version, I also need to again upgrade Lucene
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not months to
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of Jena-Text as
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve independently of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on Solr
Implementation.
We can start with having a separate Module for Jena Text ES and see how
things go. If they go well, we could extract out Solr and Lucene out of
Jena Text.
Again this is just a suggestion based on my limited industry experience.
Thanks,
Anuj Kumar
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
Post by A. Soroka
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer to do
the actual work!
I don't use the Solr component now, but I could easily see so doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any work to
maintain it, so consider that just a very small and blurry data point. :)
Last time I tried it (it was a while ago) I couldn't figure out how to get
it running... If you could just try that with some toy data, then your data
point would be a lot less blurry :) I haven't used Solr for anything, so
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
Osma Suominen
2017-03-01 14:03:30 UTC
Permalink
Hi Anuj!

Thanks for the clarification.

However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the
fact that at runtime, module divisions don't really matter (except that
they usually correspond to package sub-namespaces) and the Java
classloader only sees a single, flat package/class namespace and a set
of compiled classes (usually within JARs) in the classpath that it needs
to check to find the right classes, and if there are two versions of the
same library (eg Lucene) with overlapping class names, that's going to
cause trouble. The only way around that is to shade some of the
libraries, i.e. rename them so that they end up in another,
non-conflicting namespace. Apparently Elasticsearch also did some of
that in the past [1] but nowadays tries to avoid it.

Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES
backends? Or how do you run something like Fuseki that contains (in a
single big JAR) both the jena-text and jena-text-es modules with all
their dependencies, one of which requires the Lucene 4.x classes and the
other one the Lucene 6.4.1 classes? How do you ensure that only one of
them is used at a time, and that the Java classloader, even though it
has access to both versions of Lucene, only loads classes from the
single, correct one and not the other? Or do you need to have separate
"Fuseki-Lucene" and "Fuseki-ES" packages, so that you don't end up with
two Lucene versions within the same Fuseki JAR?

-Osma

[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I will
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume you
mean having jena-text and jena-text-es as dependencies in a build without
causing the build to conflict. If that is what you mean than the answer is
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used for
text based indexing and searching via Jean. What this means is that we will
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that contains
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very beginning
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom of
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between the
needs of individual modules/features and the whole codebase. I'm willing to
put in the effort to keep the other modules up to date with newer Lucene
versions. Lucene upgrade requirements are well documented, the only hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it even
possible to mix modules that depend on different versions of the Lucene
libraries within the same project? In my (quite limited) understanding of
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current jena-text
module (depending on Lucene 4.x) and the new jena-text-es module (depending
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and ES is
exactly for avoiding the "All or Nothing" approach we need to take if we
club them all together. If they stay together and if in the near future I
want to upgrade ES to another version, I also need to again upgrade Lucene
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not months to
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of Jena-Text as
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve independently of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on Solr
Implementation.
We can start with having a separate Module for Jena Text ES and see how
things go. If they go well, we could extract out Solr and Lucene out of
Jena Text.
Again this is just a suggestion based on my limited industry experience.
Thanks,
Anuj Kumar
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
Post by A. Soroka
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer to do
the actual work!
I don't use the Solr component now, but I could easily see so doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any work to
maintain it, so consider that just a very small and blurry data point. :)
Last time I tried it (it was a while ago) I couldn't figure out how to get
it running... If you could just try that with some toy data, then your data
point would be a lot less blurry :) I haven't used Solr for anything, so
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
A. Soroka
2017-03-01 14:36:08 UTC
Permalink
Osma--

The short answer is that yes, given the right tools you _can_ have different versions of code accessible in different ways. The longer answer is that it's probably not a viable alternative for Jena for this problem, at least not without a lot of other change.

You are right to point to the classloader mechanism as being at the heart of this question, but I must alter your remark just slightly. From "the Java classloader only sees a single, flat package/class namespace and a set of compiled classes" to "ANY GIVEN Java classloader only sees a single, flat package/class namespace and a set of compiled classes".

This is the fact that OSGi uses to make it possible to maintain strict module boundaries (and even dynamic module relationships at run-time). Each OSGi bundle sees its own classloader, and the framework is responsible for connecting bundles up to ensure that every bundle has what it needs in the way of types to function, based on metadata that the bundles provide to the framework. It's an incredibly powerful system (I use it every day and enjoy it enormously) but it's also very "heavy" and requires a good deal of investment to use. In particular, it's probably too large to put _inside_ Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)

Java 9 Jigsaw [1] offers some possibility for strong modularization of this kind, but it's really meant for the JDK itself, not application libraries. In theory, we could "roll our own" classloader management for this problem. That sounds like more than a bit of a rabbit hole to me. There might be another, more lightweight, toolkit out there to this purpose, but I'm not aware of any myself.

Otherwise, yes, you get into shading and the like. We have to do that for Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a thing we want to do any more of than needed, I don't think.

---
A. Soroka
The University of Virginia Library

[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I know Maven can perform a lot of tricks, but Maven modules are just convenient ways to structure a Java project. Maven cannot change the fact that at runtime, module divisions don't really matter (except that they usually correspond to package sub-namespaces) and the Java classloader only sees a single, flat package/class namespace and a set of compiled classes (usually within JARs) in the classpath that it needs to check to find the right classes, and if there are two versions of the same library (eg Lucene) with overlapping class names, that's going to cause trouble. The only way around that is to shade some of the libraries, i.e. rename them so that they end up in another, non-conflicting namespace. Apparently Elasticsearch also did some of that in the past [1] but nowadays tries to avoid it.
Does your assumption 1 ("At a given point in time, only a single Indexing Technology is used") imply that in the assembler configuration, you cannot have ja:loadClass declarations for both Lucene and ES backends? Or how do you run something like Fuseki that contains (in a single big JAR) both the jena-text and jena-text-es modules with all their dependencies, one of which requires the Lucene 4.x classes and the other one the Lucene 6.4.1 classes? How do you ensure that only one of them is used at a time, and that the Java classloader, even though it has access to both versions of Lucene, only loads classes from the single, correct one and not the other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES" packages, so that you don't end up with two Lucene versions within the same Fuseki JAR?
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I will
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume you
mean having jena-text and jena-text-es as dependencies in a build without
causing the build to conflict. If that is what you mean than the answer is
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used for
text based indexing and searching via Jean. What this means is that we will
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that contains
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very beginning
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom of
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between the
needs of individual modules/features and the whole codebase. I'm willing to
put in the effort to keep the other modules up to date with newer Lucene
versions. Lucene upgrade requirements are well documented, the only hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it even
possible to mix modules that depend on different versions of the Lucene
libraries within the same project? In my (quite limited) understanding of
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current jena-text
module (depending on Lucene 4.x) and the new jena-text-es module (depending
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and ES is
exactly for avoiding the "All or Nothing" approach we need to take if we
club them all together. If they stay together and if in the near future I
want to upgrade ES to another version, I also need to again upgrade Lucene
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not months to
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of Jena-Text as
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve independently of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on Solr
Implementation.
We can start with having a separate Module for Jena Text ES and see how
things go. If they go well, we could extract out Solr and Lucene out of
Jena Text.
Again this is just a suggestion based on my limited industry experience.
Thanks,
Anuj Kumar
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
Post by A. Soroka
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer to do
the actual work!
I don't use the Solr component now, but I could easily see so doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any work to
maintain it, so consider that just a very small and blurry data point. :)
Last time I tried it (it was a while ago) I couldn't figure out how to get
it running... If you could just try that with some toy data, then your data
point would be a lot less blurry :) I haven't used Solr for anything, so
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
anuj kumar
2017-03-01 14:59:03 UTC
Permalink
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it is
modular which makes it much easier to maintain in the long run. But again
it may not be the quickest one.

I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.

Cheers!
Anuj Kumar
Post by A. Soroka
Osma--
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
Post by Osma Suominen
Post by anuj kumar
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume
you
Post by Osma Suominen
Post by anuj kumar
mean having jena-text and jena-text-es as dependencies in a build
without
Post by Osma Suominen
Post by anuj kumar
causing the build to conflict. If that is what you mean than the answer
is
Post by Osma Suominen
Post by anuj kumar
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
Post by Osma Suominen
Post by anuj kumar
text based indexing and searching via Jean. What this means is that we
will
Post by Osma Suominen
Post by anuj kumar
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that
contains
Post by Osma Suominen
Post by anuj kumar
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
Post by Osma Suominen
Post by anuj kumar
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom
of
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between
the
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
needs of individual modules/features and the whole codebase. I'm
willing to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
put in the effort to keep the other modules up to date with newer
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
versions. Lucene upgrade requirements are well documented, the only
hitches
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
possible to mix modules that depend on different versions of the Lucene
libraries within the same project? In my (quite limited) understanding
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
module (depending on Lucene 4.x) and the new jena-text-es module
(depending
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and
ES is
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
exactly for avoiding the "All or Nothing" approach we need to take if
we
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
club them all together. If they stay together and if in the near
future I
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
want to upgrade ES to another version, I also need to again upgrade
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not
months to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of
Jena-Text as
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve
independently
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on
Solr
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Implementation.
We can start with having a separate Module for Jena Text ES and see
how
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
things go. If they go well, we could extract out Solr and Lucene out
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Jena Text.
Again this is just a suggestion based on my limited industry
experience.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
%3E
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer
to do
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
the actual work!
I don't use the Solr component now, but I could easily see so
doing...
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
maintain it, so consider that just a very small and blurry data
point.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
:)
Last time I tried it (it was a while ago) I couldn't figure out how
to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
get
it running... If you could just try that with some toy data, then
your
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
data
point would be a lot less blurry :) I haven't used Solr for
anything, so
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
anuj kumar
2017-03-01 15:33:24 UTC
Permalink
BTW, I have one more question:

How do I add more than one field to be indexed in my Index?
Basically, if I want to index rdfs:label , rdfs:comment in the same index
document, how do I do it?

I tried :

EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
entDef.setPrimaryPredicate(RDFS.label);
entDef.setGraphField(GRAPH_FIELD_NAME);
entDef.set("comment", RDFS.comment.asNode());

But it doesnt work. Can you please point me on a way to do it please. This
is an important piece of functionality I need.

Thanks,
Anuj Kumar
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it
is modular which makes it much easier to maintain in the long run. But
again it may not be the quickest one.
I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Post by A. Soroka
Osma--
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
Post by Osma Suominen
Post by anuj kumar
not delve into those now. I am not an expert in Jena to convincingly
say
Post by Osma Suominen
Post by anuj kumar
that it is possible, without any hiccups. But I can take a guess and
say
Post by Osma Suominen
Post by anuj kumar
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume
you
Post by Osma Suominen
Post by anuj kumar
mean having jena-text and jena-text-es as dependencies in a build
without
Post by Osma Suominen
Post by anuj kumar
causing the build to conflict. If that is what you mean than the
answer is
Post by Osma Suominen
Post by anuj kumar
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
Post by Osma Suominen
Post by anuj kumar
text based indexing and searching via Jean. What this means is that we
will
Post by Osma Suominen
Post by anuj kumar
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
but
Post by Osma Suominen
Post by anuj kumar
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that
contains
Post by Osma Suominen
Post by anuj kumar
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
Post by Osma Suominen
Post by anuj kumar
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the
pom of
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between
the
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
needs of individual modules/features and the whole codebase. I'm
willing to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
put in the effort to keep the other modules up to date with newer
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
versions. Lucene upgrade requirements are well documented, the only
hitches
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
possible to mix modules that depend on different versions of the
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
libraries within the same project? In my (quite limited)
understanding of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
module (depending on Lucene 4.x) and the new jena-text-es module
(depending
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and
ES is
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
exactly for avoiding the "All or Nothing" approach we need to take
if we
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
club them all together. If they stay together and if in the near
future I
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
want to upgrade ES to another version, I also need to again upgrade
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not
months to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of
Jena-Text as
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve
independently
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
of
each other and we can avoid situations where we cant upgrade to
latest
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
version of Lucene because we do not know what effect it will have on
Solr
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Implementation.
We can start with having a separate Module for Jena Text ES and see
how
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
things go. If they go well, we could extract out Solr and Lucene out
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Jena Text.
Again this is just a suggestion based on my limited industry
experience.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
che.org%3E
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
? In other words, might it be better to factor out between -text
and
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer
to do
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
the actual work!
I don't use the Solr component now, but I could easily see so
doing...
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
maintain it, so consider that just a very small and blurry data
point.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
:)
Last time I tried it (it was a while ago) I couldn't figure out how
to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
get
it running... If you could just try that with some toy data, then
your
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
data
point would be a lot less blurry :) I haven't used Solr for
anything, so
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
--
*Anuj Kumar*
Osma Suominen
2017-03-01 18:36:03 UTC
Permalink
Hi Anuj!

Generally I use assembler descriptions to configure the jena-text index.
An example with multiple properties (SKOS label properties) is here:
https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#creating-a-text-index

For examples on how to use assembler descriptions from Java code, take a
look at the jena-text unit tests. They generally contain a snippet of
assembler definition that configures the text index in a particular way,
then test that it does what it should when using that configuration.

You didn't provide a full example. What is your data and what query did
you use? What results did you expect? What happened instead?

One possible problem in your configuration is that you have set the
primary predicate to rdfs:label, but not set a field for it. Try adding
this:

entDef.set("label", RDFS.label.asNode());

For querying everything else but the default field, you need to specify
the predicate at query time. With your configuration, it should be
possible to query rdfs:comment values like this:

?s text:query (rdfs:comment "word") .

Hope this helps!

-Osma
Post by anuj kumar
How do I add more than one field to be indexed in my Index?
Basically, if I want to index rdfs:label , rdfs:comment in the same index
document, how do I do it?
EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
entDef.setPrimaryPredicate(RDFS.label);
entDef.setGraphField(GRAPH_FIELD_NAME);
entDef.set("comment", RDFS.comment.asNode());
But it doesnt work. Can you please point me on a way to do it please. This
is an important piece of functionality I need.
Thanks,
Anuj Kumar
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it
is modular which makes it much easier to maintain in the long run. But
again it may not be the quickest one.
I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Post by A. Soroka
Osma--
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
Post by Osma Suominen
Post by anuj kumar
not delve into those now. I am not an expert in Jena to convincingly
say
Post by Osma Suominen
Post by anuj kumar
that it is possible, without any hiccups. But I can take a guess and
say
Post by Osma Suominen
Post by anuj kumar
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume
you
Post by Osma Suominen
Post by anuj kumar
mean having jena-text and jena-text-es as dependencies in a build
without
Post by Osma Suominen
Post by anuj kumar
causing the build to conflict. If that is what you mean than the
answer is
Post by Osma Suominen
Post by anuj kumar
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
Post by Osma Suominen
Post by anuj kumar
text based indexing and searching via Jean. What this means is that we
will
Post by Osma Suominen
Post by anuj kumar
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
but
Post by Osma Suominen
Post by anuj kumar
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that
contains
Post by Osma Suominen
Post by anuj kumar
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
Post by Osma Suominen
Post by anuj kumar
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the
pom of
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between
the
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
needs of individual modules/features and the whole codebase. I'm
willing to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
put in the effort to keep the other modules up to date with newer
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
versions. Lucene upgrade requirements are well documented, the only
hitches
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
possible to mix modules that depend on different versions of the
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
libraries within the same project? In my (quite limited)
understanding of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
module (depending on Lucene 4.x) and the new jena-text-es module
(depending
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and
ES is
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
exactly for avoiding the "All or Nothing" approach we need to take
if we
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
club them all together. If they stay together and if in the near
future I
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
want to upgrade ES to another version, I also need to again upgrade
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not
months to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of
Jena-Text as
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve
independently
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
of
each other and we can avoid situations where we cant upgrade to
latest
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
version of Lucene because we do not know what effect it will have on
Solr
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Implementation.
We can start with having a separate Module for Jena Text ES and see
how
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
things go. If they go well, we could extract out Solr and Lucene out
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Jena Text.
Again this is just a suggestion based on my limited industry
experience.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
che.org%3E
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
? In other words, might it be better to factor out between -text
and
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer
to do
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
the actual work!
I don't use the Solr component now, but I could easily see so
doing...
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
maintain it, so consider that just a very small and blurry data
point.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
:)
Last time I tried it (it was a while ago) I couldn't figure out how
to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
get
it running... If you could just try that with some toy data, then
your
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
data
point would be a lot less blurry :) I haven't used Solr for
anything, so
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
anuj kumar
2017-03-01 18:40:27 UTC
Permalink
Thanks Osma. I sent my previous email just a minute early. I will try your
suggestion and if it doesn't work will send you the entire example.

Thanks again.
Anuj
Post by Osma Suominen
Hi Anuj!
Generally I use assembler descriptions to configure the jena-text index.
https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#cre
ating-a-text-index
For examples on how to use assembler descriptions from Java code, take a
look at the jena-text unit tests. They generally contain a snippet of
assembler definition that configures the text index in a particular way,
then test that it does what it should when using that configuration.
You didn't provide a full example. What is your data and what query did
you use? What results did you expect? What happened instead?
One possible problem in your configuration is that you have set the
primary predicate to rdfs:label, but not set a field for it. Try adding
entDef.set("label", RDFS.label.asNode());
For querying everything else but the default field, you need to specify
the predicate at query time. With your configuration, it should be possible
?s text:query (rdfs:comment "word") .
Hope this helps!
-Osma
Post by anuj kumar
How do I add more than one field to be indexed in my Index?
Basically, if I want to index rdfs:label , rdfs:comment in the same index
document, how do I do it?
EntityDefinition entDef = new EntityDefinition(DOC_TYPE, FIELD_TO_SEARCH);
entDef.setPrimaryPredicate(RDFS.label);
entDef.setGraphField(GRAPH_FIELD_NAME);
entDef.set("comment", RDFS.comment.asNode());
But it doesnt work. Can you please point me on a way to do it please. This
is an important piece of functionality I need.
Thanks,
Anuj Kumar
I personally have no preference as to how the code in Jena should be
Post by anuj kumar
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it
is modular which makes it much easier to maintain in the long run. But
again it may not be the quickest one.
I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Osma--
Post by A. Soroka
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
not delve into those now. I am not an expert in Jena to convincingly
say
that it is possible, without any hiccups. But I can take a guess and
say
that it is indeed possible :)
Post by anuj kumar
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume
you
mean having jena-text and jena-text-es as dependencies in a build
without
causing the build to conflict. If that is what you mean than the
answer is
yes it is possible and quite simple as well. Let me explain how it is
Post by anuj kumar
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
text based indexing and searching via Jean. What this means is that we
will
either use Lucene Implementation OR Solr Implementation OR ES
Post by anuj kumar
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
but
only on jena-text classes, if at all.
Post by anuj kumar
Based on these assumptions it is possible to create a build that
contains
jena-text based common classes + ES specific classes without any
Post by anuj kumar
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
in the pom and then include jena-text dependency. Maven will then
Post by anuj kumar
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the
pom of
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Hi Anuj,
Post by Osma Suominen
I understand your concerns. However, we also need to balance between
the
needs of individual modules/features and the whole codebase. I'm
Post by anuj kumar
willing to
put in the effort to keep the other modules up to date with newer
Post by anuj kumar
Lucene
versions. Lucene upgrade requirements are well documented, the only
Post by anuj kumar
hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
Post by anuj kumar
Post by Osma Suominen
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
possible to mix modules that depend on different versions of the
Post by anuj kumar
Lucene
libraries within the same project? In my (quite limited)
Post by anuj kumar
understanding of
Java projects and libraries, this requires special arrangements (e.g.
Post by anuj kumar
Post by Osma Suominen
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
module (depending on Lucene 4.x) and the new jena-text-es module
Post by anuj kumar
(depending
on Lucene 6.4.1) without any compatibility issues?
Post by anuj kumar
Post by Osma Suominen
-Osma
Hi,
Post by anuj kumar
The reason I proposed to have separate modules for Lucene, Solr and
ES is
exactly for avoiding the "All or Nothing" approach we need to take
Post by anuj kumar
Post by Osma Suominen
if we
club them all together. If they stay together and if in the near
Post by anuj kumar
Post by Osma Suominen
future I
want to upgrade ES to another version, I also need to again upgrade
Post by anuj kumar
Post by Osma Suominen
Lucene
and Solr and possibly another implementation that may have been added
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
during the time. As we all know, this means weeks of work if not
months to
get the changes released. This will personally de-motivate me to do
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
anything and I will probably start maintaining my version of
Jena-Text as
that would be much simpler to do than to upgrade and test and in the
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve
independently
of
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
each other and we can avoid situations where we cant upgrade to
latest
version of Lucene because we do not know what effect it will have on
Post by anuj kumar
Post by Osma Suominen
Solr
Implementation.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
We can start with having a separate Module for Jena Text ES and see
how
things go. If they go well, we could extract out Solr and Lucene out
Post by anuj kumar
Post by Osma Suominen
of
Jena Text.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Again this is just a suggestion based on my limited industry
experience.
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
che.org%3E
? In other words, might it be better to factor out between -text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
and
-spatial and _then_ try to upgrade the Lucene version?
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
I certainly wouldn't object to that, but somebody has to volunteer
to do
the actual work!
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I don't use the Solr component now, but I could easily see so
doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
maintain it, so consider that just a very small and blurry data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point.
:)
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Last time I tried it (it was a while ago) I couldn't figure out
how
to
get
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
it running... If you could just try that with some toy data, then
your
data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point would be a lot less blurry :) I haven't used Solr for
anything, so
I'm not very familiar with how to set it up, and the jena-text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
anuj kumar
2017-03-02 10:24:48 UTC
Permalink
Just FYI, I was able to index multiple fields in ElasticSearch using Jena
Text capability.
The issue was in my ElasticSearch code where I was doing insert every time
instead of an update :/

Cheers!
Anuj Kumar
Post by anuj kumar
Thanks Osma. I sent my previous email just a minute early. I will try your
suggestion and if it doesn't work will send you the entire example.
Thanks again.
Anuj
Post by Osma Suominen
Hi Anuj!
Generally I use assembler descriptions to configure the jena-text index.
https://github.com/NatLibFi/Skosmos/wiki/InstallTutorial#cre
ating-a-text-index
For examples on how to use assembler descriptions from Java code, take a
look at the jena-text unit tests. They generally contain a snippet of
assembler definition that configures the text index in a particular way,
then test that it does what it should when using that configuration.
You didn't provide a full example. What is your data and what query did
you use? What results did you expect? What happened instead?
One possible problem in your configuration is that you have set the
primary predicate to rdfs:label, but not set a field for it. Try adding
entDef.set("label", RDFS.label.asNode());
For querying everything else but the default field, you need to specify
the predicate at query time. With your configuration, it should be possible
?s text:query (rdfs:comment "word") .
Hope this helps!
-Osma
Post by anuj kumar
How do I add more than one field to be indexed in my Index?
Basically, if I want to index rdfs:label , rdfs:comment in the same index
document, how do I do it?
EntityDefinition entDef = new EntityDefinition(DOC_TYPE,
FIELD_TO_SEARCH);
entDef.setPrimaryPredicate(RDFS.label);
entDef.setGraphField(GRAPH_FIELD_NAME);
entDef.set("comment", RDFS.comment.asNode());
But it doesnt work. Can you please point me on a way to do it please. This
is an important piece of functionality I need.
Thanks,
Anuj Kumar
I personally have no preference as to how the code in Jena should be
Post by anuj kumar
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it
is modular which makes it much easier to maintain in the long run. But
again it may not be the quickest one.
I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Osma--
Post by A. Soroka
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler
configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their
dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
not delve into those now. I am not an expert in Jena to convincingly
say
that it is possible, without any hiccups. But I can take a guess and
say
that it is indeed possible :)
Post by anuj kumar
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume
you
mean having jena-text and jena-text-es as dependencies in a build
without
causing the build to conflict. If that is what you mean than the
answer is
yes it is possible and quite simple as well. Let me explain how it is
Post by anuj kumar
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
text based indexing and searching via Jean. What this means is that we
will
either use Lucene Implementation OR Solr Implementation OR ES
Post by anuj kumar
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
but
only on jena-text classes, if at all.
Post by anuj kumar
Based on these assumptions it is possible to create a build that
contains
jena-text based common classes + ES specific classes without any
Post by anuj kumar
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
in the pom and then include jena-text dependency. Maven will then
Post by anuj kumar
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the
pom of
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Hi Anuj,
Post by Osma Suominen
I understand your concerns. However, we also need to balance between
the
needs of individual modules/features and the whole codebase. I'm
Post by anuj kumar
willing to
put in the effort to keep the other modules up to date with newer
Post by anuj kumar
Lucene
versions. Lucene upgrade requirements are well documented, the only
Post by anuj kumar
hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
Post by anuj kumar
Post by Osma Suominen
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
possible to mix modules that depend on different versions of the
Post by anuj kumar
Lucene
libraries within the same project? In my (quite limited)
Post by anuj kumar
understanding of
Java projects and libraries, this requires special arrangements (e.g.
Post by anuj kumar
Post by Osma Suominen
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
module (depending on Lucene 4.x) and the new jena-text-es module
Post by anuj kumar
(depending
on Lucene 6.4.1) without any compatibility issues?
Post by anuj kumar
Post by Osma Suominen
-Osma
Hi,
Post by anuj kumar
The reason I proposed to have separate modules for Lucene, Solr and
ES is
exactly for avoiding the "All or Nothing" approach we need to take
Post by anuj kumar
Post by Osma Suominen
if we
club them all together. If they stay together and if in the near
Post by anuj kumar
Post by Osma Suominen
future I
want to upgrade ES to another version, I also need to again upgrade
Post by anuj kumar
Post by Osma Suominen
Lucene
and Solr and possibly another implementation that may have been added
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
during the time. As we all know, this means weeks of work if not
months to
get the changes released. This will personally de-motivate me to do
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
anything and I will probably start maintaining my version of
Jena-Text as
that would be much simpler to do than to upgrade and test and in the
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve
independently
of
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
each other and we can avoid situations where we cant upgrade to
latest
version of Lucene because we do not know what effect it will have on
Post by anuj kumar
Post by Osma Suominen
Solr
Implementation.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
We can start with having a separate Module for Jena Text ES and see
how
things go. If they go well, we could extract out Solr and Lucene out
Post by anuj kumar
Post by Osma Suominen
of
Jena Text.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Again this is just a suggestion based on my limited industry
experience.
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
che.org%3E
? In other words, might it be better to factor out between -text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
and
-spatial and _then_ try to upgrade the Lucene version?
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
I certainly wouldn't object to that, but somebody has to
volunteer
to do
the actual work!
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I don't use the Solr component now, but I could easily see so
doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
maintain it, so consider that just a very small and blurry data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point.
:)
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Last time I tried it (it was a while ago) I couldn't figure out
how
to
get
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
it running... If you could just try that with some toy data, then
your
data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point would be a lot less blurry :) I haven't used Solr for
anything, so
I'm not very familiar with how to set it up, and the jena-text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
Osma Suominen
2017-03-01 18:27:35 UTC
Permalink
Hi Anuj!

I have nothing against modularity in general. However, I cannot see how
your proposal could work in practice for the Fuseki build, due to the
reasons I mentioned in my previous message (and Adam seemed to concur).

In any case, I'll see what I can do to get the Lucene upgrade moving
again. If all current Jena modules (ie jena-text and jena-spatial) were
upgraded to Lucene 6.4.1, then you could just add your ES classes to
jena-text, right? I think that would be better for everyone than having
to maintain your own separate module.

-Osma
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it is
modular which makes it much easier to maintain in the long run. But again
it may not be the quickest one.
I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Post by A. Soroka
Osma--
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
Post by Osma Suominen
Post by anuj kumar
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume
you
Post by Osma Suominen
Post by anuj kumar
mean having jena-text and jena-text-es as dependencies in a build
without
Post by Osma Suominen
Post by anuj kumar
causing the build to conflict. If that is what you mean than the answer
is
Post by Osma Suominen
Post by anuj kumar
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
Post by Osma Suominen
Post by anuj kumar
text based indexing and searching via Jean. What this means is that we
will
Post by Osma Suominen
Post by anuj kumar
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that
contains
Post by Osma Suominen
Post by anuj kumar
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
Post by Osma Suominen
Post by anuj kumar
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom
of
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between
the
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
needs of individual modules/features and the whole codebase. I'm
willing to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
put in the effort to keep the other modules up to date with newer
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
versions. Lucene upgrade requirements are well documented, the only
hitches
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
possible to mix modules that depend on different versions of the Lucene
libraries within the same project? In my (quite limited) understanding
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
module (depending on Lucene 4.x) and the new jena-text-es module
(depending
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and
ES is
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
exactly for avoiding the "All or Nothing" approach we need to take if
we
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
club them all together. If they stay together and if in the near
future I
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
want to upgrade ES to another version, I also need to again upgrade
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not
months to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of
Jena-Text as
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve
independently
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on
Solr
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Implementation.
We can start with having a separate Module for Jena Text ES and see
how
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
things go. If they go well, we could extract out Solr and Lucene out
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Jena Text.
Again this is just a suggestion based on my limited industry
experience.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
%3E
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer
to do
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
the actual work!
I don't use the Solr component now, but I could easily see so
doing...
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
maintain it, so consider that just a very small and blurry data
point.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
:)
Last time I tried it (it was a while ago) I couldn't figure out how
to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
get
it running... If you could just try that with some toy data, then
your
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
data
point would be a lot less blurry :) I haven't used Solr for
anything, so
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
anuj kumar
2017-03-01 18:37:51 UTC
Permalink
I agree Osma. If Lucent is upgraded to 6.4.1 it would be much easier for me
to integrate the Elastic Search implementation.

But I am still waiting for someone to provide me a hint as to how I can
index multiple predicate values. This is the most pressing issue for me
currently.

Thanks,
Anuj Kumar
Post by Osma Suominen
Hi Anuj!
I have nothing against modularity in general. However, I cannot see how
your proposal could work in practice for the Fuseki build, due to the
reasons I mentioned in my previous message (and Adam seemed to concur).
In any case, I'll see what I can do to get the Lucene upgrade moving
again. If all current Jena modules (ie jena-text and jena-spatial) were
upgraded to Lucene 6.4.1, then you could just add your ES classes to
jena-text, right? I think that would be better for everyone than having to
maintain your own separate module.
-Osma
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it is
modular which makes it much easier to maintain in the long run. But again
it may not be the quickest one.
I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Osma--
Post by A. Soroka
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
not delve into those now. I am not an expert in Jena to convincingly say
Post by anuj kumar
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume
you
mean having jena-text and jena-text-es as dependencies in a build
without
causing the build to conflict. If that is what you mean than the answer
is
yes it is possible and quite simple as well. Let me explain how it is
Post by anuj kumar
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
text based indexing and searching via Jean. What this means is that we
will
either use Lucene Implementation OR Solr Implementation OR ES
Post by anuj kumar
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that
contains
jena-text based common classes + ES specific classes without any
Post by anuj kumar
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
in the pom and then include jena-text dependency. Maven will then
Post by anuj kumar
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom
of
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Hi Anuj,
Post by Osma Suominen
I understand your concerns. However, we also need to balance between
the
needs of individual modules/features and the whole codebase. I'm
Post by anuj kumar
willing to
put in the effort to keep the other modules up to date with newer
Post by anuj kumar
Lucene
versions. Lucene upgrade requirements are well documented, the only
Post by anuj kumar
hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
Post by anuj kumar
Post by Osma Suominen
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
possible to mix modules that depend on different versions of the Lucene
Post by anuj kumar
Post by Osma Suominen
libraries within the same project? In my (quite limited) understanding
of
Java projects and libraries, this requires special arrangements (e.g.
Post by anuj kumar
Post by Osma Suominen
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
module (depending on Lucene 4.x) and the new jena-text-es module
Post by anuj kumar
(depending
on Lucene 6.4.1) without any compatibility issues?
Post by anuj kumar
Post by Osma Suominen
-Osma
Hi,
Post by anuj kumar
The reason I proposed to have separate modules for Lucene, Solr and
ES is
exactly for avoiding the "All or Nothing" approach we need to take if
Post by anuj kumar
Post by Osma Suominen
we
club them all together. If they stay together and if in the near
Post by anuj kumar
Post by Osma Suominen
future I
want to upgrade ES to another version, I also need to again upgrade
Post by anuj kumar
Post by Osma Suominen
Lucene
and Solr and possibly another implementation that may have been added
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
during the time. As we all know, this means weeks of work if not
months to
get the changes released. This will personally de-motivate me to do
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
anything and I will probably start maintaining my version of
Jena-Text as
that would be much simpler to do than to upgrade and test and in the
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve
independently
of
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on
Solr
Implementation.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
We can start with having a separate Module for Jena Text ES and see
how
things go. If they go well, we could extract out Solr and Lucene out
Post by anuj kumar
Post by Osma Suominen
of
Jena Text.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Again this is just a suggestion based on my limited industry
experience.
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
Post by anuj kumar
che.org
%3E
? In other words, might it be better to factor out between -text and
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by anuj kumar
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer
to do
the actual work!
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I don't use the Solr component now, but I could easily see so
doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
maintain it, so consider that just a very small and blurry data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point.
:)
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by anuj kumar
Last time I tried it (it was a while ago) I couldn't figure out how
to
get
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
it running... If you could just try that with some toy data, then
your
data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point would be a lot less blurry :) I haven't used Solr for
anything, so
I'm not very familiar with how to set it up, and the jena-text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
A. Soroka
2017-03-02 21:28:50 UTC
Permalink
I do agree that trying to juggle different versions of Lucene libraries is probably not a realistic option right now. Luckily (if I understand the conversation thus far correctly) we have a solid alternative; getting our current Lucene dependency upgraded should allow us to (eventually) merge Anuj's work into the mainstream of development. Someone please tell me if I have that wrong! :grin:

Let me reiterate that this seems like very good work and speaking for myself, I certainly want to get it included into Jena. It's just a question of fitting it in correctly, which might take a bit of time.

---
A. Soroka
The University of Virginia Library
Post by Osma Suominen
Hi Anuj!
I have nothing against modularity in general. However, I cannot see how your proposal could work in practice for the Fuseki build, due to the reasons I mentioned in my previous message (and Adam seemed to concur).
In any case, I'll see what I can do to get the Lucene upgrade moving again. If all current Jena modules (ie jena-text and jena-spatial) were upgraded to Lucene 6.4.1, then you could just add your ES classes to jena-text, right? I think that would be better for everyone than having to maintain your own separate module.
-Osma
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO, it is
modular which makes it much easier to maintain in the long run. But again
it may not be the quickest one.
I already have been given a deadline, by the company to have ES extension
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a separate
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Post by A. Soroka
Osma--
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer answer
is that it's probably not a viable alternative for Jena for this problem,
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the heart
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time). Each
OSGi bundle sees its own classloader, and the framework is responsible for
connecting bundles up to ensure that every bundle has what it needs in the
way of types to function, based on metadata that the bundles provide to the
framework. It's an incredibly powerful system (I use it every day and enjoy
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put _inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other hand.)
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management for
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that for
Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the fact
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader only
sees a single, flat package/class namespace and a set of compiled classes
(usually within JARs) in the classpath that it needs to check to find the
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble. The
only way around that is to shade some of the libraries, i.e. rename them so
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries to
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler configuration,
you cannot have ja:loadClass declarations for both Lucene and ES backends?
Or how do you run something like Fuseki that contains (in a single big JAR)
both the jena-text and jena-text-es modules with all their dependencies,
one of which requires the Lucene 4.x classes and the other one the Lucene
6.4.1 classes? How do you ensure that only one of them is used at a time,
and that the Java classloader, even though it has access to both versions
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the same
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks and
balance the refactoring without affecting the existing modules. But I
will
Post by Osma Suominen
Post by anuj kumar
not delve into those now. I am not an expert in Jena to convincingly say
that it is possible, without any hiccups. But I can take a guess and say
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I assume
you
Post by Osma Suominen
Post by anuj kumar
mean having jena-text and jena-text-es as dependencies in a build
without
Post by Osma Suominen
Post by anuj kumar
causing the build to conflict. If that is what you mean than the answer
is
Post by Osma Suominen
Post by anuj kumar
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is used
for
Post by Osma Suominen
Post by anuj kumar
text based indexing and searching via Jean. What this means is that we
will
Post by Osma Suominen
Post by anuj kumar
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that
contains
Post by Osma Suominen
Post by anuj kumar
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
Post by Osma Suominen
Post by anuj kumar
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the pom
of
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between
the
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
needs of individual modules/features and the whole codebase. I'm
willing to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
put in the effort to keep the other modules up to date with newer
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
versions. Lucene upgrade requirements are well documented, the only
hitches
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
possible to mix modules that depend on different versions of the Lucene
libraries within the same project? In my (quite limited) understanding
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Java projects and libraries, this requires special arrangements (e.g.
shading) as the Java package/class namespace is shared by all the code
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
module (depending on Lucene 4.x) and the new jena-text-es module
(depending
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and
ES is
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
exactly for avoiding the "All or Nothing" approach we need to take if
we
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
club them all together. If they stay together and if in the near
future I
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
want to upgrade ES to another version, I also need to again upgrade
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
and Solr and possibly another implementation that may have been added
during the time. As we all know, this means weeks of work if not
months to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of
Jena-Text as
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
that would be much simpler to do than to upgrade and test and in the
process own(read fix bugs) the upgrade for each and every technology.
If they are developed as separate modules, they can evolve
independently
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
of
each other and we can avoid situations where we cant upgrade to latest
version of Lucene because we do not know what effect it will have on
Solr
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Implementation.
We can start with having a separate Module for Jena Text ES and see
how
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
things go. If they go well, we could extract out Solr and Lucene out
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Jena Text.
Again this is just a suggestion based on my limited industry
experience.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
%3E
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
? In other words, might it be better to factor out between -text and
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer
to do
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
the actual work!
I don't use the Solr component now, but I could easily see so
doing...
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
maintain it, so consider that just a very small and blurry data
point.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
:)
Last time I tried it (it was a while ago) I couldn't figure out how
to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
get
it running... If you could just try that with some toy data, then
your
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
data
point would be a lot less blurry :) I haven't used Solr for
anything, so
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I'm not very familiar with how to set it up, and the jena-text instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
anuj kumar
2017-03-03 00:45:18 UTC
Permalink
I second that. I am now finalising the integration of ES and should have a
good production quality implementation ready in a week's time. At that
time I would want you guys to have a look at the implementation and provide
feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
code in jena-text module and do a round of testing.

Thanks,
Anuj Kumar
Post by A. Soroka
I do agree that trying to juggle different versions of Lucene libraries is
probably not a realistic option right now. Luckily (if I understand the
conversation thus far correctly) we have a solid alternative; getting our
current Lucene dependency upgraded should allow us to (eventually) merge
Anuj's work into the mainstream of development. Someone please tell me if I
Let me reiterate that this seems like very good work and speaking for
myself, I certainly want to get it included into Jena. It's just a question
of fitting it in correctly, which might take a bit of time.
---
A. Soroka
The University of Virginia Library
Post by Osma Suominen
Hi Anuj!
I have nothing against modularity in general. However, I cannot see how
your proposal could work in practice for the Fuseki build, due to the
reasons I mentioned in my previous message (and Adam seemed to concur).
Post by Osma Suominen
In any case, I'll see what I can do to get the Lucene upgrade moving
again. If all current Jena modules (ie jena-text and jena-spatial) were
upgraded to Lucene 6.4.1, then you could just add your ES classes to
jena-text, right? I think that would be better for everyone than having to
maintain your own separate module.
Post by Osma Suominen
-Osma
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO,
it is
Post by Osma Suominen
Post by anuj kumar
modular which makes it much easier to maintain in the long run. But
again
Post by Osma Suominen
Post by anuj kumar
it may not be the quickest one.
I already have been given a deadline, by the company to have ES
extension
Post by Osma Suominen
Post by anuj kumar
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a
separate
Post by Osma Suominen
Post by anuj kumar
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Post by A. Soroka
Osma--
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer
answer
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
is that it's probably not a viable alternative for Jena for this
problem,
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the
heart
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and
a set
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time).
Each
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
OSGi bundle sees its own classloader, and the framework is responsible
for
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
connecting bundles up to ensure that every bundle has what it needs in
the
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
way of types to function, based on metadata that the bundles provide
to the
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
framework. It's an incredibly powerful system (I use it every day and
enjoy
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put
_inside_
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Jena. (I frequently put Jena inside an OSGi instance, on the other
hand.)
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management
for
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that
for
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Guava for now because of HADOOP-10101 (grumble grumble) but it's
hardly a
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the
fact
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader
only
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
sees a single, flat package/class namespace and a set of compiled
classes
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
(usually within JARs) in the classpath that it needs to check to find
the
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble.
The
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
only way around that is to shade some of the libraries, i.e. rename
them so
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries
to
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler
configuration,
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
you cannot have ja:loadClass declarations for both Lucene and ES
backends?
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Or how do you run something like Fuseki that contains (in a single big
JAR)
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
both the jena-text and jena-text-es modules with all their
dependencies,
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
one of which requires the Lucene 4.x classes and the other one the
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
6.4.1 classes? How do you ensure that only one of them is used at a
time,
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
and that the Java classloader, even though it has access to both
versions
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the
same
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks
and
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
balance the refactoring without affecting the existing modules. But I
will
Post by Osma Suominen
Post by anuj kumar
not delve into those now. I am not an expert in Jena to convincingly
say
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
that it is possible, without any hiccups. But I can take a guess and
say
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I
assume
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
you
Post by Osma Suominen
Post by anuj kumar
mean having jena-text and jena-text-es as dependencies in a build
without
Post by Osma Suominen
Post by anuj kumar
causing the build to conflict. If that is what you mean than the
answer
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
is
Post by Osma Suominen
Post by anuj kumar
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is
used
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
for
Post by Osma Suominen
Post by anuj kumar
text based indexing and searching via Jean. What this means is that
we
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
will
Post by Osma Suominen
Post by anuj kumar
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
but
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that
contains
Post by Osma Suominen
Post by anuj kumar
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
Post by Osma Suominen
Post by anuj kumar
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the
pom
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between
the
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
needs of individual modules/features and the whole codebase. I'm
willing to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
put in the effort to keep the other modules up to date with newer
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
versions. Lucene upgrade requirements are well documented, the only
hitches
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
possible to mix modules that depend on different versions of the
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
libraries within the same project? In my (quite limited)
understanding
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Java projects and libraries, this requires special arrangements
(e.g.
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
shading) as the Java package/class namespace is shared by all the
code
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
module (depending on Lucene 4.x) and the new jena-text-es module
(depending
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and
ES is
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
exactly for avoiding the "All or Nothing" approach we need to take
if
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
we
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
club them all together. If they stay together and if in the near
future I
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
want to upgrade ES to another version, I also need to again upgrade
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
and Solr and possibly another implementation that may have been
added
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
during the time. As we all know, this means weeks of work if not
months to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of
Jena-Text as
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
that would be much simpler to do than to upgrade and test and in
the
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
process own(read fix bugs) the upgrade for each and every
technology.
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
If they are developed as separate modules, they can evolve
independently
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
of
each other and we can avoid situations where we cant upgrade to
latest
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
version of Lucene because we do not know what effect it will have
on
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Solr
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Implementation.
We can start with having a separate Module for Jena Text ES and see
how
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
things go. If they go well, we could extract out Solr and Lucene
out
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Jena Text.
Again this is just a suggestion based on my limited industry
experience.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
apache.org
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
%3E
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
? In other words, might it be better to factor out between -text
and
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer
to do
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
the actual work!
I don't use the Solr component now, but I could easily see so
doing...
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
maintain it, so consider that just a very small and blurry data
point.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
:)
Last time I tried it (it was a while ago) I couldn't figure out
how
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
get
it running... If you could just try that with some toy data, then
your
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
data
point would be a lot less blurry :) I haven't used Solr for
anything, so
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I'm not very familiar with how to set it up, and the jena-text
instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
Osma Suominen
2017-03-03 10:22:26 UTC
Permalink
Hi Anuj,

It's great that we found agreement over this!

I've restarted the Lucene upgrade effort (JENA-1250) that had stalled
and made a PR [1] that implements the upgrade up to version 6.4.1 (with
5.5.4 as an intermediate step). I'll wait for comments on the PR and if
people think it's OK I will merge it soon to Jena master. Meanwhile, you
can already base your ES implementation on that branch [2] if you like.

Could you please open a JIRA issue on issues.apache.org explaining the
Elasticsearch support feature, so that we have a place for tracking this
work, request comments etc.

Also I suggest we move the discussion around this to the developers'
list (***@jena.apache.org) where it's more appropriate.

-Osma

[1] https://github.com/apache/jena/pull/219

[2] https://github.com/osma/jena/tree/jena-1250-lucene6
Post by anuj kumar
I second that. I am now finalising the integration of ES and should have a
good production quality implementation ready in a week's time. At that
time I would want you guys to have a look at the implementation and provide
feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
code in jena-text module and do a round of testing.
Thanks,
Anuj Kumar
Post by A. Soroka
I do agree that trying to juggle different versions of Lucene libraries is
probably not a realistic option right now. Luckily (if I understand the
conversation thus far correctly) we have a solid alternative; getting our
current Lucene dependency upgraded should allow us to (eventually) merge
Anuj's work into the mainstream of development. Someone please tell me if I
Let me reiterate that this seems like very good work and speaking for
myself, I certainly want to get it included into Jena. It's just a question
of fitting it in correctly, which might take a bit of time.
---
A. Soroka
The University of Virginia Library
Post by Osma Suominen
Hi Anuj!
I have nothing against modularity in general. However, I cannot see how
your proposal could work in practice for the Fuseki build, due to the
reasons I mentioned in my previous message (and Adam seemed to concur).
Post by Osma Suominen
In any case, I'll see what I can do to get the Lucene upgrade moving
again. If all current Jena modules (ie jena-text and jena-spatial) were
upgraded to Lucene 6.4.1, then you could just add your ES classes to
jena-text, right? I think that would be better for everyone than having to
maintain your own separate module.
Post by Osma Suominen
-Osma
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO,
it is
Post by Osma Suominen
Post by anuj kumar
modular which makes it much easier to maintain in the long run. But
again
Post by Osma Suominen
Post by anuj kumar
it may not be the quickest one.
I already have been given a deadline, by the company to have ES
extension
Post by Osma Suominen
Post by anuj kumar
implemented in the next 15 days :). What this means is that I will be
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a
separate
Post by Osma Suominen
Post by anuj kumar
module. Till the time Lucene and Solr is not upgraded to the latest
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Post by A. Soroka
Osma--
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer
answer
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
is that it's probably not a viable alternative for Jena for this
problem,
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
at least not without a lot of other change.
You are right to point to the classloader mechanism as being at the
heart
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of this question, but I must alter your remark just slightly. From "the
Java classloader only sees a single, flat package/class namespace and
a set
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time).
Each
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
OSGi bundle sees its own classloader, and the framework is responsible
for
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
connecting bundles up to ensure that every bundle has what it needs in
the
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
way of types to function, based on metadata that the bundles provide
to the
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
framework. It's an incredibly powerful system (I use it every day and
enjoy
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
it enormously) but it's also very "heavy" and requires a good deal of
investment to use. In particular, it's probably too large to put
_inside_
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Jena. (I frequently put Jena inside an OSGi instance, on the other
hand.)
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management
for
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
this problem. That sounds like more than a bit of a rabbit hole to me.
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that
for
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Guava for now because of HADOOP-10101 (grumble grumble) but it's
hardly a
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
thing we want to do any more of than needed, I don't think.
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the
fact
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
that at runtime, module divisions don't really matter (except that they
usually correspond to package sub-namespaces) and the Java classloader
only
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
sees a single, flat package/class namespace and a set of compiled
classes
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
(usually within JARs) in the classpath that it needs to check to find
the
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
right classes, and if there are two versions of the same library (eg
Lucene) with overlapping class names, that's going to cause trouble.
The
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
only way around that is to shade some of the libraries, i.e. rename
them so
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
that they end up in another, non-conflicting namespace. Apparently
Elasticsearch also did some of that in the past [1] but nowadays tries
to
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
avoid it.
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler
configuration,
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
you cannot have ja:loadClass declarations for both Lucene and ES
backends?
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Or how do you run something like Fuseki that contains (in a single big
JAR)
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
both the jena-text and jena-text-es modules with all their
dependencies,
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
one of which requires the Lucene 4.x classes and the other one the
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
6.4.1 classes? How do you ensure that only one of them is used at a
time,
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
and that the Java classloader, even though it has access to both
versions
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of Lucene, only loads classes from the single, correct one and not the
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the
same
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Fuseki JAR?
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks
and
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
balance the refactoring without affecting the existing modules. But I
will
Post by Osma Suominen
Post by anuj kumar
not delve into those now. I am not an expert in Jena to convincingly
say
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
that it is possible, without any hiccups. But I can take a guess and
say
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
that it is indeed possible :)
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I
assume
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
you
Post by Osma Suominen
Post by anuj kumar
mean having jena-text and jena-text-es as dependencies in a build
without
Post by Osma Suominen
Post by anuj kumar
causing the build to conflict. If that is what you mean than the
answer
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
is
Post by Osma Suominen
Post by anuj kumar
yes it is possible and quite simple as well. Let me explain how it is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is
used
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
for
Post by Osma Suominen
Post by anuj kumar
text based indexing and searching via Jean. What this means is that
we
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
will
Post by Osma Suominen
Post by anuj kumar
either use Lucene Implementation OR Solr Implementation OR ES
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
but
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
only on jena-text classes, if at all.
Based on these assumptions it is possible to create a build that
contains
Post by Osma Suominen
Post by anuj kumar
jena-text based common classes + ES specific classes without any
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
Post by Osma Suominen
Post by anuj kumar
in the pom and then include jena-text dependency. Maven will then
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the
pom
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Post by Osma Suominen
Hi Anuj,
I understand your concerns. However, we also need to balance between
the
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
needs of individual modules/features and the whole codebase. I'm
willing to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
put in the effort to keep the other modules up to date with newer
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
versions. Lucene upgrade requirements are well documented, the only
hitches
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
possible to mix modules that depend on different versions of the
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
libraries within the same project? In my (quite limited)
understanding
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Java projects and libraries, this requires special arrangements
(e.g.
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
shading) as the Java package/class namespace is shared by all the
code
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
running within the same JVM.
So can you create, say, a Fuseki build that contains the current
jena-text
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
module (depending on Lucene 4.x) and the new jena-text-es module
(depending
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on Lucene 6.4.1) without any compatibility issues?
-Osma
Post by anuj kumar
Hi,
The reason I proposed to have separate modules for Lucene, Solr and
ES is
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
exactly for avoiding the "All or Nothing" approach we need to take
if
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
we
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
club them all together. If they stay together and if in the near
future I
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
want to upgrade ES to another version, I also need to again upgrade
Lucene
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
and Solr and possibly another implementation that may have been
added
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
during the time. As we all know, this means weeks of work if not
months to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
get the changes released. This will personally de-motivate me to do
anything and I will probably start maintaining my version of
Jena-Text as
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
that would be much simpler to do than to upgrade and test and in
the
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
process own(read fix bugs) the upgrade for each and every
technology.
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
If they are developed as separate modules, they can evolve
independently
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
of
each other and we can avoid situations where we cant upgrade to
latest
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
version of Lucene because we do not know what effect it will have
on
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Solr
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Implementation.
We can start with having a separate Module for Jena Text ES and see
how
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
things go. If they go well, we could extract out Solr and Lucene
out
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
of
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Jena Text.
Again this is just a suggestion based on my limited industry
experience.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
apache.org
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
%3E
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
? In other words, might it be better to factor out between -text
and
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
-spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer
to do
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
the actual work!
I don't use the Solr component now, but I could easily see so
doing...
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
maintain it, so consider that just a very small and blurry data
point.
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by A. Soroka
:)
Last time I tried it (it was a while ago) I couldn't figure out
how
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
to
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
get
it running... If you could just try that with some toy data, then
your
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
data
point would be a lot less blurry :) I haven't used Solr for
anything, so
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I'm not very familiar with how to set it up, and the jena-text
instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
anuj kumar
2017-03-03 12:23:01 UTC
Permalink
I Osma,
I briefly looked at the pull request. I beieve we need to upgrade Lucene
and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
4.9.1

Also how do i log into issues.apache.org and where to file this bug?

Thanks,
Anuj Kumar
Post by Osma Suominen
Hi Anuj,
It's great that we found agreement over this!
I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
as an intermediate step). I'll wait for comments on the PR and if people
think it's OK I will merge it soon to Jena master. Meanwhile, you can
already base your ES implementation on that branch [2] if you like.
Could you please open a JIRA issue on issues.apache.org explaining the
Elasticsearch support feature, so that we have a place for tracking this
work, request comments etc.
Also I suggest we move the discussion around this to the developers' list (
-Osma
[1] https://github.com/apache/jena/pull/219
[2] https://github.com/osma/jena/tree/jena-1250-lucene6
Post by anuj kumar
I second that. I am now finalising the integration of ES and should have a
good production quality implementation ready in a week's time. At that
time I would want you guys to have a look at the implementation and provide
feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
code in jena-text module and do a round of testing.
Thanks,
Anuj Kumar
I do agree that trying to juggle different versions of Lucene libraries is
Post by A. Soroka
probably not a realistic option right now. Luckily (if I understand the
conversation thus far correctly) we have a solid alternative; getting our
current Lucene dependency upgraded should allow us to (eventually) merge
Anuj's work into the mainstream of development. Someone please tell me if I
Let me reiterate that this seems like very good work and speaking for
myself, I certainly want to get it included into Jena. It's just a question
of fitting it in correctly, which might take a bit of time.
---
A. Soroka
The University of Virginia Library
Post by Osma Suominen
Hi Anuj!
I have nothing against modularity in general. However, I cannot see how
your proposal could work in practice for the Fuseki build, due to the
reasons I mentioned in my previous message (and Adam seemed to concur).
Post by Osma Suominen
In any case, I'll see what I can do to get the Lucene upgrade moving
again. If all current Jena modules (ie jena-text and jena-spatial) were
upgraded to Lucene 6.4.1, then you could just add your ES classes to
jena-text, right? I think that would be better for everyone than having to
maintain your own separate module.
Post by Osma Suominen
-Osma
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO,
it is
modular which makes it much easier to maintain in the long run. But
again
it may not be the quickest one.
Post by anuj kumar
I already have been given a deadline, by the company to have ES
extension
implemented in the next 15 days :). What this means is that I will be
Post by anuj kumar
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a
separate
module. Till the time Lucene and Solr is not upgraded to the latest
Post by anuj kumar
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Osma--
Post by A. Soroka
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer
answer
is that it's probably not a viable alternative for Jena for this
Post by anuj kumar
problem,
at least not without a lot of other change.
Post by anuj kumar
Post by A. Soroka
You are right to point to the classloader mechanism as being at the
heart
of this question, but I must alter your remark just slightly. From "the
Post by anuj kumar
Post by A. Soroka
Java classloader only sees a single, flat package/class namespace and
a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
Post by anuj kumar
Post by A. Soroka
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time).
Each
OSGi bundle sees its own classloader, and the framework is responsible
Post by anuj kumar
for
connecting bundles up to ensure that every bundle has what it needs in
Post by anuj kumar
the
way of types to function, based on metadata that the bundles provide
Post by anuj kumar
to the
framework. It's an incredibly powerful system (I use it every day and
Post by anuj kumar
enjoy
it enormously) but it's also very "heavy" and requires a good deal of
Post by anuj kumar
Post by A. Soroka
investment to use. In particular, it's probably too large to put
_inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other
Post by anuj kumar
hand.)
Post by A. Soroka
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management
for
this problem. That sounds like more than a bit of a rabbit hole to me.
Post by anuj kumar
Post by A. Soroka
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that
for
Guava for now because of HADOOP-10101 (grumble grumble) but it's
Post by anuj kumar
hardly a
thing we want to do any more of than needed, I don't think.
Post by anuj kumar
Post by A. Soroka
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the
fact
that at runtime, module divisions don't really matter (except that they
Post by anuj kumar
Post by A. Soroka
usually correspond to package sub-namespaces) and the Java classloader
only
sees a single, flat package/class namespace and a set of compiled
Post by anuj kumar
classes
(usually within JARs) in the classpath that it needs to check to find
Post by anuj kumar
the
right classes, and if there are two versions of the same library (eg
Post by anuj kumar
Post by A. Soroka
Lucene) with overlapping class names, that's going to cause trouble.
The
only way around that is to shade some of the libraries, i.e. rename
Post by anuj kumar
them so
that they end up in another, non-conflicting namespace. Apparently
Post by anuj kumar
Post by A. Soroka
Elasticsearch also did some of that in the past [1] but nowadays tries
to
avoid it.
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler
configuration,
you cannot have ja:loadClass declarations for both Lucene and ES
Post by anuj kumar
backends?
Or how do you run something like Fuseki that contains (in a single big
Post by anuj kumar
JAR)
both the jena-text and jena-text-es modules with all their
Post by anuj kumar
dependencies,
one of which requires the Lucene 4.x classes and the other one the
Post by anuj kumar
Lucene
6.4.1 classes? How do you ensure that only one of them is used at a
Post by anuj kumar
time,
and that the Java classloader, even though it has access to both
Post by anuj kumar
versions
of Lucene, only loads classes from the single, correct one and not the
Post by anuj kumar
Post by A. Soroka
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the
same
Fuseki JAR?
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks
and
balance the refactoring without affecting the existing modules. But I
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
will
not delve into those now. I am not an expert in Jena to convincingly
say
that it is possible, without any hiccups. But I can take a guess and
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
say
that it is indeed possible :)
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I
assume
you
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
mean having jena-text and jena-text-es as dependencies in a build
without
causing the build to conflict. If that is what you mean than the
answer
is
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
yes it is possible and quite simple as well. Let me explain how it is
Post by anuj kumar
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is
used
for
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
text based indexing and searching via Jean. What this means is that
we
will
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
either use Lucene Implementation OR Solr Implementation OR ES
Post by anuj kumar
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
but
only on jena-text classes, if at all.
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Based on these assumptions it is possible to create a build that
contains
jena-text based common classes + ES specific classes without any
Post by anuj kumar
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
in the pom and then include jena-text dependency. Maven will then
Post by anuj kumar
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the
pom
of
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Hi Anuj,
Post by Osma Suominen
I understand your concerns. However, we also need to balance between
the
needs of individual modules/features and the whole codebase. I'm
Post by anuj kumar
willing to
put in the effort to keep the other modules up to date with newer
Post by anuj kumar
Lucene
versions. Lucene upgrade requirements are well documented, the only
Post by anuj kumar
hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
Post by anuj kumar
Post by Osma Suominen
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
possible to mix modules that depend on different versions of the
Post by anuj kumar
Lucene
libraries within the same project? In my (quite limited)
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
understanding
of
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Java projects and libraries, this requires special arrangements
Post by anuj kumar
(e.g.
shading) as the Java package/class namespace is shared by all the
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
code
running within the same JVM.
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
So can you create, say, a Fuseki build that contains the current
jena-text
module (depending on Lucene 4.x) and the new jena-text-es module
Post by anuj kumar
(depending
on Lucene 6.4.1) without any compatibility issues?
Post by anuj kumar
Post by Osma Suominen
-Osma
Hi,
Post by anuj kumar
The reason I proposed to have separate modules for Lucene, Solr and
ES is
exactly for avoiding the "All or Nothing" approach we need to take
Post by anuj kumar
Post by Osma Suominen
if
we
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
club them all together. If they stay together and if in the near
Post by anuj kumar
Post by Osma Suominen
future I
want to upgrade ES to another version, I also need to again upgrade
Post by anuj kumar
Post by Osma Suominen
Lucene
and Solr and possibly another implementation that may have been
Post by anuj kumar
Post by Osma Suominen
added
during the time. As we all know, this means weeks of work if not
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
months to
get the changes released. This will personally de-motivate me to do
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
anything and I will probably start maintaining my version of
Jena-Text as
that would be much simpler to do than to upgrade and test and in
Post by anuj kumar
Post by Osma Suominen
the
process own(read fix bugs) the upgrade for each and every
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
technology.
Post by anuj kumar
If they are developed as separate modules, they can evolve
independently
of
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
each other and we can avoid situations where we cant upgrade to
latest
version of Lucene because we do not know what effect it will have
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on
Solr
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Implementation.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
We can start with having a separate Module for Jena Text ES and see
how
things go. If they go well, we could extract out Solr and Lucene
Post by anuj kumar
Post by Osma Suominen
out
of
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Jena Text.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Again this is just a suggestion based on my limited industry
experience.
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
apache.org
%3E
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
? In other words, might it be better to factor out between -text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
and
-spatial and _then_ try to upgrade the Lucene version?
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
I certainly wouldn't object to that, but somebody has to
volunteer
to do
the actual work!
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I don't use the Solr component now, but I could easily see so
doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
maintain it, so consider that just a very small and blurry data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point.
:)
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Last time I tried it (it was a while ago) I couldn't figure out
how
to
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
get
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
it running... If you could just try that with some toy data, then
your
data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point would be a lot less blurry :) I haven't used Solr for
anything, so
I'm not very familiar with how to set it up, and the jena-text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
anuj kumar
2017-03-03 14:43:52 UTC
Permalink
Hey,
I just saw https://issues.apache.org/jira/browse/JENA-1301
Should we not first officially deprecate it and gives any users of Solr a
chance to move to different Indexing technology?

BTW, I dont know yet how to login to apache JIRA.

Thanks,
Anuj Kumar
Post by anuj kumar
I Osma,
I briefly looked at the pull request. I beieve we need to upgrade Lucene
and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene
4.9.1
Also how do i log into issues.apache.org and where to file this bug?
Thanks,
Anuj Kumar
Post by Osma Suominen
Hi Anuj,
It's great that we found agreement over this!
I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and
made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4
as an intermediate step). I'll wait for comments on the PR and if people
think it's OK I will merge it soon to Jena master. Meanwhile, you can
already base your ES implementation on that branch [2] if you like.
Could you please open a JIRA issue on issues.apache.org explaining the
Elasticsearch support feature, so that we have a place for tracking this
work, request comments etc.
Also I suggest we move the discussion around this to the developers' list
-Osma
[1] https://github.com/apache/jena/pull/219
[2] https://github.com/osma/jena/tree/jena-1250-lucene6
Post by anuj kumar
I second that. I am now finalising the integration of ES and should have a
good production quality implementation ready in a week's time. At that
time I would want you guys to have a look at the implementation and provide
feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the
code in jena-text module and do a round of testing.
Thanks,
Anuj Kumar
I do agree that trying to juggle different versions of Lucene libraries
Post by A. Soroka
is
probably not a realistic option right now. Luckily (if I understand the
conversation thus far correctly) we have a solid alternative; getting our
current Lucene dependency upgraded should allow us to (eventually) merge
Anuj's work into the mainstream of development. Someone please tell me if I
Let me reiterate that this seems like very good work and speaking for
myself, I certainly want to get it included into Jena. It's just a question
of fitting it in correctly, which might take a bit of time.
---
A. Soroka
The University of Virginia Library
Post by Osma Suominen
Hi Anuj!
I have nothing against modularity in general. However, I cannot see how
your proposal could work in practice for the Fuseki build, due to the
reasons I mentioned in my previous message (and Adam seemed to concur).
Post by Osma Suominen
In any case, I'll see what I can do to get the Lucene upgrade moving
again. If all current Jena modules (ie jena-text and jena-spatial) were
upgraded to Lucene 6.4.1, then you could just add your ES classes to
jena-text, right? I think that would be better for everyone than having to
maintain your own separate module.
Post by Osma Suominen
-Osma
Post by anuj kumar
I personally have no preference as to how the code in Jena should be
structured, as long as I am able to use it :).
I have personal preference of doing it in a specific way because IMO,
it is
modular which makes it much easier to maintain in the long run. But
again
it may not be the quickest one.
Post by anuj kumar
I already have been given a deadline, by the company to have ES
extension
implemented in the next 15 days :). What this means is that I will be
Post by anuj kumar
maintaining the ES code extension to Jena Text at-least locally for a
coming period of time. I would be more than happy to contribute to Jena
community whatever is required to have a proper ElasticSearch
Implementation in place, whether within jena-text module or as a
separate
module. Till the time Lucene and Solr is not upgraded to the latest
Post by anuj kumar
version, I will have to maintain a separate module for jena-text-es.
Cheers!
Anuj Kumar
Osma--
Post by A. Soroka
The short answer is that yes, given the right tools you _can_ have
different versions of code accessible in different ways. The longer
answer
is that it's probably not a viable alternative for Jena for this
Post by anuj kumar
problem,
at least not without a lot of other change.
Post by anuj kumar
Post by A. Soroka
You are right to point to the classloader mechanism as being at the
heart
of this question, but I must alter your remark just slightly. From "the
Post by anuj kumar
Post by A. Soroka
Java classloader only sees a single, flat package/class namespace and
a set
of compiled classes" to "ANY GIVEN Java classloader only sees a single,
Post by anuj kumar
Post by A. Soroka
flat package/class namespace and a set of compiled classes".
This is the fact that OSGi uses to make it possible to maintain strict
module boundaries (and even dynamic module relationships at run-time).
Each
OSGi bundle sees its own classloader, and the framework is responsible
Post by anuj kumar
for
connecting bundles up to ensure that every bundle has what it needs in
Post by anuj kumar
the
way of types to function, based on metadata that the bundles provide
Post by anuj kumar
to the
framework. It's an incredibly powerful system (I use it every day and
Post by anuj kumar
enjoy
it enormously) but it's also very "heavy" and requires a good deal of
Post by anuj kumar
Post by A. Soroka
investment to use. In particular, it's probably too large to put
_inside_
Jena. (I frequently put Jena inside an OSGi instance, on the other
Post by anuj kumar
hand.)
Post by A. Soroka
Java 9 Jigsaw [1] offers some possibility for strong modularization of
this kind, but it's really meant for the JDK itself, not application
libraries. In theory, we could "roll our own" classloader management
for
this problem. That sounds like more than a bit of a rabbit hole to me.
Post by anuj kumar
Post by A. Soroka
There might be another, more lightweight, toolkit out there to this
purpose, but I'm not aware of any myself.
Otherwise, yes, you get into shading and the like. We have to do that
for
Guava for now because of HADOOP-10101 (grumble grumble) but it's
Post by anuj kumar
hardly a
thing we want to do any more of than needed, I don't think.
Post by anuj kumar
Post by A. Soroka
---
A. Soroka
The University of Virginia Library
[1] http://openjdk.java.net/projects/jigsaw/
Post by Osma Suominen
Hi Anuj!
Thanks for the clarification.
However, I'm still not sure I understand the situation completely. I
know Maven can perform a lot of tricks, but Maven modules are just
convenient ways to structure a Java project. Maven cannot change the
fact
that at runtime, module divisions don't really matter (except that they
Post by anuj kumar
Post by A. Soroka
usually correspond to package sub-namespaces) and the Java classloader
only
sees a single, flat package/class namespace and a set of compiled
Post by anuj kumar
classes
(usually within JARs) in the classpath that it needs to check to find
Post by anuj kumar
the
right classes, and if there are two versions of the same library (eg
Post by anuj kumar
Post by A. Soroka
Lucene) with overlapping class names, that's going to cause trouble.
The
only way around that is to shade some of the libraries, i.e. rename
Post by anuj kumar
them so
that they end up in another, non-conflicting namespace. Apparently
Post by anuj kumar
Post by A. Soroka
Elasticsearch also did some of that in the past [1] but nowadays tries
to
avoid it.
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Does your assumption 1 ("At a given point in time, only a single
Indexing Technology is used") imply that in the assembler
configuration,
you cannot have ja:loadClass declarations for both Lucene and ES
Post by anuj kumar
backends?
Or how do you run something like Fuseki that contains (in a single big
Post by anuj kumar
JAR)
both the jena-text and jena-text-es modules with all their
Post by anuj kumar
dependencies,
one of which requires the Lucene 4.x classes and the other one the
Post by anuj kumar
Lucene
6.4.1 classes? How do you ensure that only one of them is used at a
Post by anuj kumar
time,
and that the Java classloader, even though it has access to both
Post by anuj kumar
versions
of Lucene, only loads classes from the single, correct one and not the
Post by anuj kumar
Post by A. Soroka
other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES"
packages, so that you don't end up with two Lucene versions within the
same
Fuseki JAR?
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
-Osma
[1] https://www.elastic.co/blog/to-shade-or-not-to-shade
Post by anuj kumar
Hi Osma,
I understand what you are saying. There are ways to mitigate risks
and
balance the refactoring without affecting the existing modules. But I
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
will
not delve into those now. I am not an expert in Jena to convincingly
say
that it is possible, without any hiccups. But I can take a guess and
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
say
that it is indeed possible :)
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
For the question: "is it even possible to mix modules that depend on
different versions of the Lucene libraries within the same project?"
I actually do not understand what you mean by mixing modules. I
assume
you
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
mean having jena-text and jena-text-es as dependencies in a build
without
causing the build to conflict. If that is what you mean than the
answer
is
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
yes it is possible and quite simple as well. Let me explain how it
Post by anuj kumar
is
possible. But before that some assumption which I want to call out
explicitly.
*Assumption:*
1. At a given point in time, only a single Indexing Technology is
used
for
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
text based indexing and searching via Jean. What this means is that
we
will
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
either use Lucene Implementation OR Solr Implementation OR ES
Post by anuj kumar
Implementation at any given point in time.
2. Fuseki build does not depend on any Lucene 4.9.1 specific classes
but
only on jena-text classes, if at all.
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Based on these assumptions it is possible to create a build that
contains
jena-text based common classes + ES specific classes without any
Post by anuj kumar
compatibility issues. And it is infact quite simple. I did it in the
current jena-text-es module and ran the entire build which succeeded.
The key is to include the latest Lucene dependencies at the very
beginning
in the pom and then include jena-text dependency. Maven will then
Post by anuj kumar
automatically resolve the dependency issues by including the Lucene
librarires that we included in our es specific pom. Have a look the
pom
of
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml
Thanks,
Anuj Kumar
On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen <
Hi Anuj,
Post by Osma Suominen
I understand your concerns. However, we also need to balance between
the
needs of individual modules/features and the whole codebase. I'm
Post by anuj kumar
willing to
put in the effort to keep the other modules up to date with newer
Post by anuj kumar
Lucene
versions. Lucene upgrade requirements are well documented, the only
Post by anuj kumar
hitches
seen in JENA-1250 were related to how jena-text (ab)used some Lucene
Post by anuj kumar
Post by Osma Suominen
features that were dropped from newer versions.
A perhaps stupid question to more experienced Java developers: is it
even
possible to mix modules that depend on different versions of the
Post by anuj kumar
Lucene
libraries within the same project? In my (quite limited)
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
understanding
of
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Java projects and libraries, this requires special arrangements
Post by anuj kumar
(e.g.
shading) as the Java package/class namespace is shared by all the
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
code
running within the same JVM.
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
So can you create, say, a Fuseki build that contains the current
jena-text
module (depending on Lucene 4.x) and the new jena-text-es module
Post by anuj kumar
(depending
on Lucene 6.4.1) without any compatibility issues?
Post by anuj kumar
Post by Osma Suominen
-Osma
Hi,
Post by anuj kumar
The reason I proposed to have separate modules for Lucene, Solr and
ES is
exactly for avoiding the "All or Nothing" approach we need to take
Post by anuj kumar
Post by Osma Suominen
if
we
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
club them all together. If they stay together and if in the near
Post by anuj kumar
Post by Osma Suominen
future I
want to upgrade ES to another version, I also need to again upgrade
Post by anuj kumar
Post by Osma Suominen
Lucene
and Solr and possibly another implementation that may have been
Post by anuj kumar
Post by Osma Suominen
added
during the time. As we all know, this means weeks of work if not
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
months to
get the changes released. This will personally de-motivate me to do
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
anything and I will probably start maintaining my version of
Jena-Text as
that would be much simpler to do than to upgrade and test and in
Post by anuj kumar
Post by Osma Suominen
the
process own(read fix bugs) the upgrade for each and every
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
technology.
Post by anuj kumar
If they are developed as separate modules, they can evolve
independently
of
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
each other and we can avoid situations where we cant upgrade to
latest
version of Lucene because we do not know what effect it will have
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
on
Solr
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Implementation.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
We can start with having a separate Module for Jena Text ES and see
how
things go. If they go well, we could extract out Solr and Lucene
Post by anuj kumar
Post by Osma Suominen
out
of
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Jena Text.
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Again this is just a suggestion based on my limited industry
experience.
Post by anuj kumar
Thanks,
Anuj Kumar
On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen <
Post by A. Soroka
https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc
apache.org
%3E
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
? In other words, might it be better to factor out between -text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
and
-spatial and _then_ try to upgrade the Lucene version?
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
I certainly wouldn't object to that, but somebody has to
volunteer
to do
the actual work!
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
I don't use the Solr component now, but I could easily see so
doing...
Post by A. Soroka
that's pretty vague, I know, and I'm not in a position to do any
work to
maintain it, so consider that just a very small and blurry data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point.
:)
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
Last time I tried it (it was a while ago) I couldn't figure out
how
to
Post by anuj kumar
Post by A. Soroka
Post by Osma Suominen
get
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
it running... If you could just try that with some toy data, then
your
data
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
point would be a lot less blurry :) I haven't used Solr for
anything, so
I'm not very familiar with how to set it up, and the jena-text
Post by anuj kumar
Post by Osma Suominen
Post by anuj kumar
Post by A. Soroka
instructions
are pretty vague unfortunately.
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
http://www.nationallibrary.fi
--
*Anuj Kumar*
--
*Anuj Kumar*
A. Soroka
2017-03-01 15:36:23 UTC
Permalink
Post by A. Soroka
? In other words, might it be better to factor out between -text and -spatial and _then_ try to upgrade the Lucene version?
I certainly wouldn't object to that, but somebody has to volunteer to do the actual work!
Yes, you are right, and I admit I haven't got anything like the time for it now.
Post by A. Soroka
I don't use the Solr component now, but I could easily see so doing... that's pretty vague, I know, and I'm not in a position to do any work to maintain it, so consider that just a very small and blurry data point. :)
Last time I tried it (it was a while ago) I couldn't figure out how to get it running... If you could just try that with some toy data, then your data point would be a lot less blurry :) I haven't used Solr for anything, so I'm not very familiar with how to set it up, and the jena-text instructions are pretty vague unfortunately.
I will try to perform a test sometime in the next week or so. Hopefully I will at least get it running. If not, then we maybe don't need to worry about it so much! :grin:

ajs6f
Loading...