Osma Suominen
2017-03-09 14:48:47 UTC
Hi,
I wanted to report a performance regression I found. This is probably
something that happened to the query optimizer in the Jena 3.1.1
development. It may be rather benign, but the result was a severe
performance regression in my application.
With YSO [1] as data loaded into TDB, this query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
<http://www.yso.fi/onto/yso/p8627> ?p ?o .
OPTIONAL {
{ ?p rdfs:subPropertyOf ?pp }
UNION
{ ?o a ?ot }
}
}
takes about 300 ms on Jena 3.2.0, while it took only around 25 ms on
Jena 3.1.0.
The fix was to separate the single OPTIONAL block into two:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
<http://www.yso.fi/onto/yso/p8627> ?p ?o .
OPTIONAL { ?p rdfs:subPropertyOf ?pp }
OPTIONAL { ?o a ?ot }
}
The result is that both Jena versions execute the query in around 25 ms.
You may wonder why I had a query like that in the first place, but this
is not the actual query that I started with, which is a way more complex
CONSTRUCT query and has many UNIONs within the OPTIONAL block (see [2]).
The important thing was to separate the OPTIONAL block dealing with ?p
from the OPTIONAL block dealing with ?o - as long as the block only
deals with one variable from the pattern above, it may contain multiple
UNIONs and actually it makes sense to use UNIONs to avoid internal cross
products and combinatorial explosion when there are multiple solutions
for each pattern.
-Osma
[1] http://api.finto.fi/download/yso/yso-skos.ttl
[2]
https://github.com/NatLibFi/Skosmos/blob/master/model/sparql/GenericSparql.php#L404
I wanted to report a performance regression I found. This is probably
something that happened to the query optimizer in the Jena 3.1.1
development. It may be rather benign, but the result was a severe
performance regression in my application.
With YSO [1] as data loaded into TDB, this query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
<http://www.yso.fi/onto/yso/p8627> ?p ?o .
OPTIONAL {
{ ?p rdfs:subPropertyOf ?pp }
UNION
{ ?o a ?ot }
}
}
takes about 300 ms on Jena 3.2.0, while it took only around 25 ms on
Jena 3.1.0.
The fix was to separate the single OPTIONAL block into two:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
<http://www.yso.fi/onto/yso/p8627> ?p ?o .
OPTIONAL { ?p rdfs:subPropertyOf ?pp }
OPTIONAL { ?o a ?ot }
}
The result is that both Jena versions execute the query in around 25 ms.
You may wonder why I had a query like that in the first place, but this
is not the actual query that I started with, which is a way more complex
CONSTRUCT query and has many UNIONs within the OPTIONAL block (see [2]).
The important thing was to separate the OPTIONAL block dealing with ?p
from the OPTIONAL block dealing with ?o - as long as the block only
deals with one variable from the pattern above, it may contain multiple
UNIONs and actually it makes sense to use UNIONs to avoid internal cross
products and combinatorial explosion when there are multiple solutions
for each pattern.
-Osma
[1] http://api.finto.fi/download/yso/yso-skos.ttl
[2]
https://github.com/NatLibFi/Skosmos/blob/master/model/sparql/GenericSparql.php#L404
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
***@helsinki.fi
http://www.nationallibrary.fi