Discussion:
RDF/XML serializer issue with blank nodes
Tim Harsch
12 years ago
Permalink
The following gist:

https://gist.github.com/harschware/6835202

shows what I think may be a bug in the RDF/XML serializer.  If I'm not mistaken the output should look like something like this:
  <rdf:Description rdf:nodeID="b0">
    <ns0:p xmlns:ns0="http://" rdf:resource="http://o"/>
  </rdf:Description>

I can file a bug if there isn't one already.
Thanks,
Tim
Andy Seaborne
12 years ago
Permalink
Hi Tim,

It's not to do with blank nodes.
Post by Tim Harsch
https://gist.github.com/harschware/6835202
<rdf:Description rdf:nodeID="b0">
<ns0:p xmlns:ns0="http://" rdf:resource="http://o"/>
</rdf:Description>
That would be:

<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
<rdf:Description rdf:nodeID="b0">
<ns0:p xmlns:ns0="http://" rdf:resource="http://o"/>
</rdf:Description>
</rdf:RDF>

(I don't do "something like this"!)
Post by Tim Harsch
I can file a bug if there isn't one already.
Thanks,
Tim
[[
Exception in thread "main" com.hp.hpl.jena.shared.BadURIException: Only
well-formed absolute URIrefs can be included in RDF/XML output:
<http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that
is required by the scheme is missing.
]]


In Turtle etc, prefix names are defined by string concatenation.

XML has namespace rules. They are different.

http://www.w3.org/TR/REC-xml-names/#sec-namespaces

[[
Definition: An XML namespace is identified by a URI reference [RFC3986];
element and attribute names may be placed in an XML namespace using the
mechanisms described in this specification.
]]

so the namespace name must be a URI.

http://p is a legal URI but to create the property in RDF/XML you need a
qname.

The local part of a qname can't be the empty string (this is a gotcha if
you think Turtle).

Hence you thinking of "http://" but that's not a valid URI.

It would have been better if the code had not tried http:// in the first
place (e.g. http://example/123 gives "InvalidPropertyURIException")

The writer has stopped you creating illegal XML. Some XML parsers will
reject it; there are some very strict parsers. Xerces is a bit more
forgiving.

Jena parses the form above with a warning:

"""
WARN {W124} toAscii failed for namespace URI: <http://>. Bad
Internationalized Domain Name: String index out of range: 0
"""

so

"Be strict in what you output, be generous in what you accept."

Andy
Tim Harsch
12 years ago
Permalink
Hi Andy,
Thanks for your careful explanation.  It helps a lot.   As an aside, I did rerun the experiment with
writer.setProperty("allowBadURIs","true");
which allowed the code to produce the following:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://" > 
  <rdf:Description rdf:nodeID="A0">
    <j.0:p rdf:resource="http://o"/>
  </rdf:Description>
</rdf:RDF>

I was running down an issue in our stack that I still believe is related to blank nodes when I concocted the example we are discussing.  I made in error in thinking I could test the blank node handling with the data I created which had the missing qname in URI issue.  I then tried to validate using Rapper, which is what generated the snippet I provided.  Since this was not related to bnode's the issue becomes much less important for me.  However, now that I've run into this and experienced what our users will experience I do have some minor concerns that remain.  It seems to me the error message could be improved if some context were provided.  If a user were seeing this when serializing a result set of thousands or millions of statements, I think they would be hard pressed to find a way to isolate the URI causing the issue.

The current message:
Only well-formed absolute URIrefs can be included in RDF/XML output: <http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that 
is required by the scheme is missing.


doesn't point to the URI that caused the issue or, better yet, the statement.  Perhaps an improved message would look something like:
The statement: 
_:b0 <http://p> <http://o> 
contains the malformed URI <http://p>.   Only well-formed absolute URIrefs can be included in RDF/XML output: <http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that is required by the scheme is missing.


If you agree this would be a useful enhancement then I could file an RFE and try to come up with a patch as well.

Thanks,
Tim
...
Andy Seaborne
12 years ago
Permalink
...
to isolate the URI causing the issue.
Post by Tim Harsch
Only well-formed absolute URIrefs can be included in RDF/XML output: <http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that
is required by the scheme is missing.
_:b0 <http://p> <http://o>
contains the malformed URI <http://p>. Only well-formed absolute URIrefs can be included in RDF/XML output: <http://> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A component that is required by the scheme is missing.
It can point to the namespace, for printing, esp RDF/XML-ABBREV,
namespaces are decided separately from processing statement.

What should change is to not call that code in the first place. I think
the "generate a namespace" code is wrong - it should only generate legal
ones.

BaseWriter.xmlnsDecl at a guess.
Post by Tim Harsch
If you agree this would be a useful enhancement then I could file an RFE and try to come up with a patch as well.
(some general observations ...)

RFE? What's that for an open source project? !!!

The reality is that what counts is contribution.

Filing JIRA, doing testing etc is great but consider the critical question:

Who is going to do the work?
What motivates them?

If there's a patch, then sure!

The committers and PMC's responsiblity is applying patches, not taking
on RFE's.

Similarly, patches that are require signficant work to integrate will
make slow propgress if any. If you look at projects in the
Hadoop-o-sphere, you'll see this very sharply. They can be quite direct
about this but it's really just a simple matter of resourcing and
motivation.

When the RDF world was smaller (and when HP was backing the work) things
were different. That was then, not now.

Andy
...
Loading...