Discussion:
query by insertion order
Marijn Schraagen
2016-12-12 09:01:01 UTC
Permalink
I have Fuseki server 2.4.0 running. New triples are added using INSERT
queries over the SPARQL endpoint. It turns out that it would be
convenient for me to know the order in which the insert queries have
been performed. For new queries, I could add a timestamp somewhere, but
for already existing triples I would like to know the insertion order as
well.

I know that a query on a triple store is not intented to guarantee order
of results. However, when I perform a simple 'SELECT ?s ?p ?o WHERE { ?s
?p ?o }' then the triples always come out in the same order.
Furthermore, this order appears to be the insertion order.

My question is: how is a select query , without ORDER BY clause,
implemented? Does it actually return results in insertion order? Does it
matter whether or not UPDATE and DELETE queries are performed, on these
triples, or on unrelated triples?

Thanks in advance, Marijn
Andy Seaborne
2016-12-12 13:24:58 UTC
Permalink
Post by Marijn Schraagen
I have Fuseki server 2.4.0 running. New triples are added using INSERT
queries over the SPARQL endpoint. It turns out that it would be
convenient for me to know the order in which the insert queries have
been performed. For new queries, I could add a timestamp somewhere, but
for already existing triples I would like to know the insertion order as
well.
I know that a query on a triple store is not intented to guarantee order
of results. However, when I perform a simple 'SELECT ?s ?p ?o WHERE { ?s
?p ?o }' then the triples always come out in the same order.
Furthermore, this order appears to be the insertion order.
depends :-)

Are you using TDB? The same general points apply to the in-memory graph
as well but it is using less predicatable internal ids.

Things will come out clustered by subject.

Try:
PREFIX ex: <http://example/>

ex:s1 ex:p ex:o1 .
ex:s2 ex:p ex:o2 .
ex:s1 ex:p ex:o3
.

note that ex:s1 comes in two places, before and after an ex:s2.

From TDB:

------------------------------------------------------------------
| s | p | o |
==================================================================
| <http://example/s1> | <http://example/p> | <http://example/o1> |
| <http://example/s1> | <http://example/p> | <http://example/o3> |
| <http://example/s2> | <http://example/p> | <http://example/o2> |
------------------------------------------------------------------

the two ex:s1 are together.

WHERE { ?s ?p ?o }

will use the SPO index so things come out in subject clusters.

Now in data, often all the triples of the same subject come together
which hides the fact that the order is unpredictable.

In TDB, the internal id is an increasing number (currently - it may well
change in a future major revision) and index can is from low to high.
An id is allocated whenever a new item is seen and the previous one
reused if it has been seen before (ex:s1).

So data where subject are grouped tends to come out in order but it's an
effect of that s not guaranteed and does not happen in all cases.

It is also relying on the parser:

Another example:

ex:s9 ex:p ex:o ;
ex:p ( 1 2 3) .

this comes out in a mixed order from parsing then in a different order
when loaded and queried.

Any of the pretty output formats sort the data.
Post by Marijn Schraagen
My question is: how is a select query , without ORDER BY clause,
implemented?
Looking in an index ... then maybe some joins.
Post by Marijn Schraagen
Does it actually return results in insertion order?
No.
Post by Marijn Schraagen
Does it
matter whether or not UPDATE and DELETE queries are performed, on these
triples, or on unrelated triples?
Yes though more so for "?s <predicate> value" which uses the OSP index.

O can be an encoded value (integer etc) as well as as a ptr to a URI.
Post by Marijn Schraagen
Thanks in advance, Marijn
Andy
Marijn Schraagen
2016-12-14 10:34:05 UTC
Permalink
Hi Andy,

Thanks for your fast, clear and extensive reply. I understand much
better now the various mechanisms that influence the query result order.
I noticed myself that adding WHERE conditions or a VALUES clause changes
the order, presumably because of the joins etc. However, I do think
that, based on what you explained, in my particular situation I could
retrieve the original insertion order, without guarantees of course.
Could you please confirm? Thanks in advance.

The situation is as follows: I'm using TDB. My subject names are unique.
To simplify, assume they are called #subject-n with 'n' a unique random
(non-sequential and unordered) number. Therefore, each newly inserted
triple gets assigned a new id in TDB, if I understand correctly. The
possible values for ?p and ?o are from a small fixed set (i.e., not
unique, each value repeated many times).

Now, performing the following query:

SELECT ?s ?p ?o
WHERE {
?s ?p ?o .
}

will give me, most likely, the triples in insertion order. Is that
correct? (this is my actual situation, so I would be happy if it did).

Another question: I would like to use OFFSET (although it is not
necessary). Could this, in any way, influence the order of the results?

Thanks again,

Marijn
Post by Andy Seaborne
Post by Marijn Schraagen
I have Fuseki server 2.4.0 running. New triples are added using INSERT
queries over the SPARQL endpoint. It turns out that it would be
convenient for me to know the order in which the insert queries have
been performed. For new queries, I could add a timestamp somewhere, but
for already existing triples I would like to know the insertion order as
well.
I know that a query on a triple store is not intented to guarantee order
of results. However, when I perform a simple 'SELECT ?s ?p ?o WHERE { ?s
?p ?o }' then the triples always come out in the same order.
Furthermore, this order appears to be the insertion order.
depends :-)
Are you using TDB? The same general points apply to the in-memory
graph as well but it is using less predicatable internal ids.
Things will come out clustered by subject.
PREFIX ex: <http://example/>
ex:s1 ex:p ex:o1 .
ex:s2 ex:p ex:o2 .
ex:s1 ex:p ex:o3
.
note that ex:s1 comes in two places, before and after an ex:s2.
------------------------------------------------------------------
| s | p | o |
==================================================================
| <http://example/s1> | <http://example/p> | <http://example/o1> |
| <http://example/s1> | <http://example/p> | <http://example/o3> |
| <http://example/s2> | <http://example/p> | <http://example/o2> |
------------------------------------------------------------------
the two ex:s1 are together.
WHERE { ?s ?p ?o }
will use the SPO index so things come out in subject clusters.
Now in data, often all the triples of the same subject come together
which hides the fact that the order is unpredictable.
In TDB, the internal id is an increasing number (currently - it may
well change in a future major revision) and index can is from low to
high. An id is allocated whenever a new item is seen and the previous
one reused if it has been seen before (ex:s1).
So data where subject are grouped tends to come out in order but it's
an effect of that s not guaranteed and does not happen in all cases.
ex:s9 ex:p ex:o ;
ex:p ( 1 2 3) .
this comes out in a mixed order from parsing then in a different order
when loaded and queried.
Any of the pretty output formats sort the data.
Post by Marijn Schraagen
My question is: how is a select query , without ORDER BY clause,
implemented?
Looking in an index ... then maybe some joins.
Post by Marijn Schraagen
Does it actually return results in insertion order?
No.
Post by Marijn Schraagen
Does it
matter whether or not UPDATE and DELETE queries are performed, on these
triples, or on unrelated triples?
Yes though more so for "?s <predicate> value" which uses the OSP index.
O can be an encoded value (integer etc) as well as as a ptr to a URI.
Post by Marijn Schraagen
Thanks in advance, Marijn
Andy
Continue reading on narkive:
Loading...