Michael Brunnbauer
2015-06-02 13:58:41 UTC
hi all,
I have performance problems with queries using property paths on a Fuseki
2.0.0 TDB with half a billion triples from Wikidata. Ramdom disk access does
not seem to be the cause. I use a SSD and see low IO tps values during queries
but high CPU usage. I tried with and without the automatically generated
stats.opt.
Counting all birds takes ca. 8s if not called for the first time (no disk
access, everything in memory):
select count(*) where {
?d1 ( <http://www.wikidata.org/entity/P171s> / <http://www.wikidata.org/entity/P171v> )+ <http://www.wikidata.org/entity/Q5113>
}
Counting all beetles does not seem to finish:
select count(*) where {
?d1 ( <http://www.wikidata.org/entity/P171s> / <http://www.wikidata.org/entity/P171v> )+ <http://www.wikidata.org/entity/Q22671>
}
I tried with and without stats.opt and also with inverse paths (^property)
without success.
I guess this is not the "Counting Beyond a Yottabyte" problem?
http://www.w3.org/blog/SW/2012/04/19/no-more-counting-beyond-a-yottabyte-or-why-the-w3c-process-works/
https://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0003.html
If I do a count(distinct ?d1) in the Bird query, I get the same number so I
guess that the + makes the query "non-counting".
Any idea if this slow performance is to be expected and why?
Regards,
Michael Brunnbauer
--
++ Michael Brunnbauer
++ netEstate GmbH
++ Geisenhausener Straße 11a
++ 81379 München
++ Tel +49 89 32 19 77 80
++ Fax +49 89 32 19 77 89
++ E-Mail ***@netestate.de
++ http://www.netestate.de/
++
++ Sitz: München, HRB Nr.142452 (Handelsregister B München)
++ USt-IdNr. DE221033342
++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
I have performance problems with queries using property paths on a Fuseki
2.0.0 TDB with half a billion triples from Wikidata. Ramdom disk access does
not seem to be the cause. I use a SSD and see low IO tps values during queries
but high CPU usage. I tried with and without the automatically generated
stats.opt.
Counting all birds takes ca. 8s if not called for the first time (no disk
access, everything in memory):
select count(*) where {
?d1 ( <http://www.wikidata.org/entity/P171s> / <http://www.wikidata.org/entity/P171v> )+ <http://www.wikidata.org/entity/Q5113>
}
Counting all beetles does not seem to finish:
select count(*) where {
?d1 ( <http://www.wikidata.org/entity/P171s> / <http://www.wikidata.org/entity/P171v> )+ <http://www.wikidata.org/entity/Q22671>
}
I tried with and without stats.opt and also with inverse paths (^property)
without success.
I guess this is not the "Counting Beyond a Yottabyte" problem?
http://www.w3.org/blog/SW/2012/04/19/no-more-counting-beyond-a-yottabyte-or-why-the-w3c-process-works/
https://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0003.html
If I do a count(distinct ?d1) in the Bird query, I get the same number so I
guess that the + makes the query "non-counting".
Any idea if this slow performance is to be expected and why?
Regards,
Michael Brunnbauer
--
++ Michael Brunnbauer
++ netEstate GmbH
++ Geisenhausener Straße 11a
++ 81379 München
++ Tel +49 89 32 19 77 80
++ Fax +49 89 32 19 77 89
++ E-Mail ***@netestate.de
++ http://www.netestate.de/
++
++ Sitz: München, HRB Nr.142452 (Handelsregister B München)
++ USt-IdNr. DE221033342
++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel