Duplicate entries produced by SPARQL query

Discussion:

lookman sanni

2016-11-15 00:09:33 UTC

Hi,

I have the following RDF:

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://www.somelink.org/ofdf#">
<rdf:Description rdf:about="
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678">
<j.0:creditedToCust>ABCDEFG1_5#XYZ1</j.0:creditedToCust>
<j.0:createdOnDate>ABCDEFG1_5#20151208</j.0:createdOnDate>
<j.0:consumedOnDate>ABCDEFG1_5#20151111</j.0:consumedOnDate>
<j.0:creditedFor>ABCDEFG1_5#124.50</j.0:creditedFor>
<j.0:createdOnDate>ABCDEFG2#20151104</j.0:createdOnDate>
<j.0:creditedToCust>ABCDEFG2#XYZ2</j.0:creditedToCust>
<j.0:creditedFor>ABCDEFG2#124.50</j.0:creditedFor>
<j.0:consumedOnDate>ABCDEFG2#20151111</j.0:consumedOnDate>
</rdf:Description>
</rdf:RDF>

and the following SPARQL query:

String qs1 = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> " +
"PREFIX ticket: <http://www.lookouster.org/ofdf#> " +
"SELECT ?lticket ?type ?bf1 ?cust1 ?tAmount1 ?dtCreation1 ?dtTravel1 ?bf2
?cust2 ?tAmount2 ?dtCreation2 ?dtTravel2 " +
"WHERE { " +
"?lticket ticket:creditedToCust ?_cust1 .\n" +
"?lticket ticket:creditedToCust ?_cust2 .\n" +
"?lticket ticket:creditedFor ?_tAmount1 .\n" +
"?lticket ticket:creditedFor ?_tAmount2 .\n" +
"?lticket ticket:createdOnDate ?_dtCreation1 .\n" +
"?lticket ticket:createdOnDate ?_dtCreation2 .\n" +
"?lticket ticket:consumedOnDate ?_dtTravel1 .\n" +
"?lticket ticket:consumedOnDate ?_dtTravel2 .\n" +
"FILTER ( strafter( ?_cust1, '#') != strafter( ?_cust2, '#') ) .\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_tAmount1, '#') ) .\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_dtCreation1, '#') )
.\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_dtTravel1, '#') ) .\n"
+
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_tAmount2, '#') ) .\n" +
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_dtCreation2, '#') )
.\n" +
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_dtTravel2, '#') ) .\n"
+
"FILTER ( strafter( ?_tAmount1, '#' ) = strafter( ?_tAmount2, '#' ) )
.\n" +
"bind(strafter( ?_cust1, '#' ) as ?cust1) .\n" +
"bind(strafter( ?_cust2, '#' ) as ?cust2) .\n" +
"bind(strbefore( ?_cust1, '#' ) as ?bf1) .\n" +
"bind(strbefore( ?_cust2, '#' ) as ?bf2) .\n" +
"bind(strafter( ?_tAmount1, '#' ) as ?tAmount1) .\n" +
"bind(strafter( ?_tAmount2, '#' ) as ?tAmount2) .\n" +
"bind(strafter( ?_dtCreation1, '#' ) as ?dtCreation1) .\n" +
"bind(strafter( ?_dtCreation2, '#' ) as ?dtCreation2) .\n" +
"bind(strafter( ?_dtTravel1, '#' ) as ?dtTravel1) .\n" +
"bind(strafter( ?_dtTravel2, '#' ) as ?dtTravel2) .\n" +
"values ?type { '2x Credit Notes' } .\n" +
"} ORDER BY ?lticket ?dtCreation1 " ;

The query ran well, but return the following 2 lines:

lticket,type,bf1,cust1,tAmount1,dtCreation1,dtTravel1,bf2,cust2,tAmount2,dtCreation2,dtTravel2
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678,2x Credit
Notes,ABCDEFG2,XYZ2,124.50,20151104,20151111,ABCDEFG1,XYZ1,124.50,20151208,20151111
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678,2x Credit
Notes,ABCDEFG1,XYZ1,124.50,20151208,20151111,ABCDEFG2,XYZ2,124.50,20151104,20151111

How do I manage to only output one line, given that the two lines are just
inverted ?

Thank you for your help.

--
Best Regards

Lookman SANNI

Brian McBride

2016-11-15 08:27:50 UTC

Permalink

Would adding a filter

FILTER (?bf1 <= ?bf2)

or similar do the trick?

Brian

Post by lookman sanni
Hi,
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://www.somelink.org/ofdf#">
<rdf:Description rdf:about="
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678">
<j.0:creditedToCust>ABCDEFG1_5#XYZ1</j.0:creditedToCust>
<j.0:createdOnDate>ABCDEFG1_5#20151208</j.0:createdOnDate>
<j.0:consumedOnDate>ABCDEFG1_5#20151111</j.0:consumedOnDate>
<j.0:creditedFor>ABCDEFG1_5#124.50</j.0:creditedFor>
<j.0:createdOnDate>ABCDEFG2#20151104</j.0:createdOnDate>
<j.0:creditedToCust>ABCDEFG2#XYZ2</j.0:creditedToCust>
<j.0:creditedFor>ABCDEFG2#124.50</j.0:creditedFor>
<j.0:consumedOnDate>ABCDEFG2#20151111</j.0:consumedOnDate>
</rdf:Description>
</rdf:RDF>
String qs1 = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> " +
"PREFIX ticket: <http://www.lookouster.org/ofdf#> " +
"SELECT ?lticket ?type ?bf1 ?cust1 ?tAmount1 ?dtCreation1 ?dtTravel1 ?bf2
?cust2 ?tAmount2 ?dtCreation2 ?dtTravel2 " +
"WHERE { " +
"?lticket ticket:creditedToCust ?_cust1 .\n" +
"?lticket ticket:creditedToCust ?_cust2 .\n" +
"?lticket ticket:creditedFor ?_tAmount1 .\n" +
"?lticket ticket:creditedFor ?_tAmount2 .\n" +
"?lticket ticket:createdOnDate ?_dtCreation1 .\n" +
"?lticket ticket:createdOnDate ?_dtCreation2 .\n" +
"?lticket ticket:consumedOnDate ?_dtTravel1 .\n" +
"?lticket ticket:consumedOnDate ?_dtTravel2 .\n" +
"FILTER ( strafter( ?_cust1, '#') != strafter( ?_cust2, '#') ) .\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_tAmount1, '#') ) .\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_dtCreation1, '#') )
.\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_dtTravel1, '#') ) .\n"
+
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_tAmount2, '#') ) .\n" +
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_dtCreation2, '#') )
.\n" +
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_dtTravel2, '#') ) .\n"
+
"FILTER ( strafter( ?_tAmount1, '#' ) = strafter( ?_tAmount2, '#' ) )
.\n" +
"bind(strafter( ?_cust1, '#' ) as ?cust1) .\n" +
"bind(strafter( ?_cust2, '#' ) as ?cust2) .\n" +
"bind(strbefore( ?_cust1, '#' ) as ?bf1) .\n" +
"bind(strbefore( ?_cust2, '#' ) as ?bf2) .\n" +
"bind(strafter( ?_tAmount1, '#' ) as ?tAmount1) .\n" +
"bind(strafter( ?_tAmount2, '#' ) as ?tAmount2) .\n" +
"bind(strafter( ?_dtCreation1, '#' ) as ?dtCreation1) .\n" +
"bind(strafter( ?_dtCreation2, '#' ) as ?dtCreation2) .\n" +
"bind(strafter( ?_dtTravel1, '#' ) as ?dtTravel1) .\n" +
"bind(strafter( ?_dtTravel2, '#' ) as ?dtTravel2) .\n" +
"values ?type { '2x Credit Notes' } .\n" +
"} ORDER BY ?lticket ?dtCreation1 " ;
lticket,type,bf1,cust1,tAmount1,dtCreation1,dtTravel1,bf2,cust2,tAmount2,dtCreation2,dtTravel2
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678,2x Credit
Notes,ABCDEFG2,XYZ2,124.50,20151104,20151111,ABCDEFG1,XYZ1,124.50,20151208,20151111
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678,2x Credit
Notes,ABCDEFG1,XYZ1,124.50,20151208,20151111,ABCDEFG2,XYZ2,124.50,20151104,20151111
How do I manage to only output one line, given that the two lines are just
inverted ?
Thank you for your help.

lookman sanni

2016-11-15 21:51:59 UTC

Permalink

That's what I actually implemented. I was hoping there would be something
sexier.

Thanks though Brian.

Post by Brian McBride
Would adding a filter
FILTER (?bf1 <= ?bf2)
or similar do the trick?
Brian

Post by lookman sanni
Hi,
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://www.somelink.org/ofdf#">
<rdf:Description rdf:about="
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678">
<j.0:creditedToCust>ABCDEFG1_5#XYZ1</j.0:creditedToCust>
<j.0:createdOnDate>ABCDEFG1_5#20151208</j.0:createdOnDate>
<j.0:consumedOnDate>ABCDEFG1_5#20151111</j.0:consumedOnDate>
<j.0:creditedFor>ABCDEFG1_5#124.50</j.0:creditedFor>
<j.0:createdOnDate>ABCDEFG2#20151104</j.0:createdOnDate>
<j.0:creditedToCust>ABCDEFG2#XYZ2</j.0:creditedToCust>
<j.0:creditedFor>ABCDEFG2#124.50</j.0:creditedFor>
<j.0:consumedOnDate>ABCDEFG2#20151111</j.0:consumedOnDate>
</rdf:Description>
</rdf:RDF>
String qs1 = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> " +
"PREFIX ticket: <http://www.lookouster.org/ofdf#> " +
"SELECT ?lticket ?type ?bf1 ?cust1 ?tAmount1 ?dtCreation1 ?dtTravel1 ?bf2
?cust2 ?tAmount2 ?dtCreation2 ?dtTravel2 " +
"WHERE { " +
"?lticket ticket:creditedToCust ?_cust1 .\n" +
"?lticket ticket:creditedToCust ?_cust2 .\n" +
"?lticket ticket:creditedFor ?_tAmount1 .\n" +
"?lticket ticket:creditedFor ?_tAmount2 .\n" +
"?lticket ticket:createdOnDate ?_dtCreation1 .\n" +
"?lticket ticket:createdOnDate ?_dtCreation2 .\n" +
"?lticket ticket:consumedOnDate ?_dtTravel1 .\n" +
"?lticket ticket:consumedOnDate ?_dtTravel2 .\n" +
"FILTER ( strafter( ?_cust1, '#') != strafter( ?_cust2, '#') ) .\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_tAmount1, '#') )
.\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_dtCreation1, '#') )
.\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_dtTravel1, '#') )
.\n"
+
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_tAmount2, '#') )
.\n" +
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_dtCreation2, '#') )
.\n" +
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_dtTravel2, '#') )
.\n"
+
"FILTER ( strafter( ?_tAmount1, '#' ) = strafter( ?_tAmount2, '#' ) )
.\n" +
"bind(strafter( ?_cust1, '#' ) as ?cust1) .\n" +
"bind(strafter( ?_cust2, '#' ) as ?cust2) .\n" +
"bind(strbefore( ?_cust1, '#' ) as ?bf1) .\n" +
"bind(strbefore( ?_cust2, '#' ) as ?bf2) .\n" +
"bind(strafter( ?_tAmount1, '#' ) as ?tAmount1) .\n" +
"bind(strafter( ?_tAmount2, '#' ) as ?tAmount2) .\n" +
"bind(strafter( ?_dtCreation1, '#' ) as ?dtCreation1) .\n" +
"bind(strafter( ?_dtCreation2, '#' ) as ?dtCreation2) .\n" +
"bind(strafter( ?_dtTravel1, '#' ) as ?dtTravel1) .\n" +
"bind(strafter( ?_dtTravel2, '#' ) as ?dtTravel2) .\n" +
"values ?type { '2x Credit Notes' } .\n" +
"} ORDER BY ?lticket ?dtCreation1 " ;
lticket,type,bf1,cust1,tAmount1,dtCreation1,dtTravel1,bf2,
cust2,tAmount2,dtCreation2,dtTravel2
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678,2x Credit
Notes,ABCDEFG2,XYZ2,124.50,20151104,20151111,ABCDEFG1,XYZ1,
124.50,20151208,20151111
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678,2x Credit
Notes,ABCDEFG1,XYZ1,124.50,20151208,20151111,ABCDEFG2,XYZ2,
124.50,20151104,20151111
How do I manage to only output one line, given that the two lines are just
inverted ?
Thank you for your help.

--
Best Regards

Lookman SANNI

Lorenz B.

2016-11-16 10:32:33 UTC

Permalink

No, that's the usual way how to avoid duplicates.

One comment for the future: Please

a) show the data in Turtle format instead of RDF/XML - RDF/XML is
verbose and difficult to read and Turtle is what SPARQL BGPs are made of.
b) show the "real" SPARQL query, i.e. not as Java String concatenation
- by the way, this can also be used to debug syntax errors more easily.

Both will help us to give faster and maybe also better support, although
in the current case it wasn't the case.

Cheers,
Lorenz

Post by lookman sanni
That's what I actually implemented. I was hoping there would be something
sexier.
Thanks though Brian.

Post by Brian McBride
Would adding a filter
FILTER (?bf1 <= ?bf2)
or similar do the trick?
Brian

Post by lookman sanni
Hi,
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://www.somelink.org/ofdf#">
<rdf:Description rdf:about="
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678">
<j.0:creditedToCust>ABCDEFG1_5#XYZ1</j.0:creditedToCust>
<j.0:createdOnDate>ABCDEFG1_5#20151208</j.0:createdOnDate>
<j.0:consumedOnDate>ABCDEFG1_5#20151111</j.0:consumedOnDate>
<j.0:creditedFor>ABCDEFG1_5#124.50</j.0:creditedFor>
<j.0:createdOnDate>ABCDEFG2#20151104</j.0:createdOnDate>
<j.0:creditedToCust>ABCDEFG2#XYZ2</j.0:creditedToCust>
<j.0:creditedFor>ABCDEFG2#124.50</j.0:creditedFor>
<j.0:consumedOnDate>ABCDEFG2#20151111</j.0:consumedOnDate>
</rdf:Description>
</rdf:RDF>
String qs1 = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> " +
"PREFIX ticket: <http://www.lookouster.org/ofdf#> " +
"SELECT ?lticket ?type ?bf1 ?cust1 ?tAmount1 ?dtCreation1 ?dtTravel1 ?bf2
?cust2 ?tAmount2 ?dtCreation2 ?dtTravel2 " +
"WHERE { " +
"?lticket ticket:creditedToCust ?_cust1 .\n" +
"?lticket ticket:creditedToCust ?_cust2 .\n" +
"?lticket ticket:creditedFor ?_tAmount1 .\n" +
"?lticket ticket:creditedFor ?_tAmount2 .\n" +
"?lticket ticket:createdOnDate ?_dtCreation1 .\n" +
"?lticket ticket:createdOnDate ?_dtCreation2 .\n" +
"?lticket ticket:consumedOnDate ?_dtTravel1 .\n" +
"?lticket ticket:consumedOnDate ?_dtTravel2 .\n" +
"FILTER ( strafter( ?_cust1, '#') != strafter( ?_cust2, '#') ) .\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_tAmount1, '#') )
.\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_dtCreation1, '#') )
.\n" +
"FILTER ( strbefore( ?_cust1, '#') = strbefore( ?_dtTravel1, '#') )
.\n"
+
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_tAmount2, '#') )
.\n" +
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_dtCreation2, '#') )
.\n" +
"FILTER ( strbefore( ?_cust2, '#') = strbefore( ?_dtTravel2, '#') )
.\n"
+
"FILTER ( strafter( ?_tAmount1, '#' ) = strafter( ?_tAmount2, '#' ) )
.\n" +
"bind(strafter( ?_cust1, '#' ) as ?cust1) .\n" +
"bind(strafter( ?_cust2, '#' ) as ?cust2) .\n" +
"bind(strbefore( ?_cust1, '#' ) as ?bf1) .\n" +
"bind(strbefore( ?_cust2, '#' ) as ?bf2) .\n" +
"bind(strafter( ?_tAmount1, '#' ) as ?tAmount1) .\n" +
"bind(strafter( ?_tAmount2, '#' ) as ?tAmount2) .\n" +
"bind(strafter( ?_dtCreation1, '#' ) as ?dtCreation1) .\n" +
"bind(strafter( ?_dtCreation2, '#' ) as ?dtCreation2) .\n" +
"bind(strafter( ?_dtTravel1, '#' ) as ?dtTravel1) .\n" +
"bind(strafter( ?_dtTravel2, '#' ) as ?dtTravel2) .\n" +
"values ?type { '2x Credit Notes' } .\n" +
"} ORDER BY ?lticket ?dtCreation1 " ;
lticket,type,bf1,cust1,tAmount1,dtCreation1,dtTravel1,bf2,
cust2,tAmount2,dtCreation2,dtTravel2
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678,2x Credit
Notes,ABCDEFG2,XYZ2,124.50,20151104,20151111,ABCDEFG1,XYZ1,
124.50,20151208,20151111
http://www.somelink.org/ofdf#ticket/supplier-123/ticket-12345678,2x Credit
Notes,ABCDEFG1,XYZ1,124.50,20151208,20151111,ABCDEFG2,XYZ2,
124.50,20151104,20151111
How do I manage to only output one line, given that the two lines are just
inverted ?
Thank you for your help.

--
Lorenz BÃŒhmann
AKSW group, University of Leipzig
Group: http://aksw.org - semantic web research center