Discussion:
EXCEPTION: org.apache.jena.atlas.AtlasException: java.nio.charset.MalformedInputException: Input length = 1, while i read a NT file
Marco Tenti
2015-02-20 08:01:30 UTC
Permalink
Hi everyone, i'm loading milions of triple split in hundred file in the
same server very simple, but during the process i get sometime a fail to
read some file with .nt extension. These file are generated with SILK (
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/)
I get in the specific these error:

InputStream in = Filemanager.get().open("filename");

//1)
model.read(in, "NT");

Console: [line: 1, col: 7 ] Element or attribute do not match QName
production: QName::=(NCName':')?NCName.
*Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1,
col: 7 ] Element or attribute do not match QName production:
QName::=(NCName':')?NCName. *
:
//2)
org.apache.jena.riot.RDFDataMgr.read(model,in,"NT");
*Exception org.apache.jena.atlas.AtlasException:
java.nio.charset.MalformedInputException: Input length = 1*

any idea why jena trhow these exception?
ty in advance. Greetings.
Dave Reynolds
2015-02-20 08:35:23 UTC
Permalink
Post by Marco Tenti
Hi everyone, i'm loading milions of triple split in hundred file in the
same server very simple, but during the process i get sometime a fail to
read some file with .nt extension. These file are generated with SILK (
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/)
InputStream in = Filemanager.get().open("filename");
//1)
model.read(in, "NT");
Console: [line: 1, col: 7 ] Element or attribute do not match QName
production: QName::=(NCName':')?NCName.
*Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1,
QName::=(NCName':')?NCName. *
The exception is telling you there is a syntax error in your data. Look
at that line of the data to see what the problem is.
Post by Marco Tenti
//2)
org.apache.jena.riot.RDFDataMgr.read(model,in,"NT");
java.nio.charset.MalformedInputException: Input length = 1*
Similarly telling your data has a syntax problem but in this case it is
a low level error of invalid character sequences. Ntriples (at least in
RDF 1.1) are supposed to be UTF-8 perhaps you are using an incorrect
encoding.

Dave
Andy Seaborne
2015-02-20 09:35:58 UTC
Permalink
Post by Marco Tenti
Hi everyone, i'm loading milions of triple split in hundred file in the
same server very simple, but during the process i get sometime a fail to
read some file with .nt extension. These file are generated with SILK (
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/)
InputStream in = Filemanager.get().open("filename");
//1)
model.read(in, "NT");
That is :
model.read(in, baseURI)

not setting the language.
Post by Marco Tenti
Console: [line: 1, col: 7 ] Element or attribute do not match QName
production: QName::=(NCName':')?NCName.
*Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1,
QName::=(NCName':')?NCName. *
It thinks its RDF/XML because you set the base URI to "NT" and the
default language is RDF/XML.

Better:

RDFDataMgr.read(model, in, Lang.NT) ;

as it uses typed constants.

RDFDataMgr.read(model, "filename") ;

will work with file extension .nt/.ttl etc

(actually, model.read("filename2) works nowadays)
Post by Marco Tenti
//2)
org.apache.jena.riot.RDFDataMgr.read(model,in,"NT");
java.nio.charset.MalformedInputException: Input length = 1*
any idea why jena trhow these exception?
Bad data.

If you get

java.nio.charset.MalformedInputException

it means the file is not valid UTF-8. Exactly where is hard to
determine from the error because Jena reads a block of 128K bytes for
efficiency reasons (it's a major cost of N-Triples parsing) and the java
bytes to chars conversion for UTF-8 does not say where the error occurs.

A common cause is iso-8859-1 data. N-Triples is UTF-8 only.

There is a utility in jena "riotcmd.utf8" that does a careful utf8 read
of the file character by character.

Look at your data and very carefully check how the program you are using
is setup. It's all too easy to accidentally view a file in the platform
native setup.
Post by Marco Tenti
ty in advance. Greetings.
Andy
Marco Tenti
2015-02-20 12:36:09 UTC
Permalink
K solved, ty Dave and Andy for the response and sorry i writing bad
constructor because i was in a rush, anyway in the specific my problem was
the iso-8859-1 encoding. Like Andy has said.
TY all.
Post by Andy Seaborne
Post by Marco Tenti
Hi everyone, i'm loading milions of triple split in hundred file in the
same server very simple, but during the process i get sometime a fail to
read some file with .nt extension. These file are generated with SILK (
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/)
InputStream in = Filemanager.get().open("filename");
//1)
model.read(in, "NT");
model.read(in, baseURI)
not setting the language.
Post by Marco Tenti
Console: [line: 1, col: 7 ] Element or attribute do not match QName
production: QName::=(NCName':')?NCName.
*Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1,
QName::=(NCName':')?NCName. *
It thinks its RDF/XML because you set the base URI to "NT" and the default
language is RDF/XML.
RDFDataMgr.read(model, in, Lang.NT) ;
as it uses typed constants.
RDFDataMgr.read(model, "filename") ;
will work with file extension .nt/.ttl etc
(actually, model.read("filename2) works nowadays)
Post by Marco Tenti
//2)
org.apache.jena.riot.RDFDataMgr.read(model,in,"NT");
java.nio.charset.MalformedInputException: Input length = 1*
any idea why jena trhow these exception?
Bad data.
If you get
java.nio.charset.MalformedInputException
it means the file is not valid UTF-8. Exactly where is hard to determine
from the error because Jena reads a block of 128K bytes for efficiency
reasons (it's a major cost of N-Triples parsing) and the java bytes to
chars conversion for UTF-8 does not say where the error occurs.
A common cause is iso-8859-1 data. N-Triples is UTF-8 only.
There is a utility in jena "riotcmd.utf8" that does a careful utf8 read of
the file character by character.
Look at your data and very carefully check how the program you are using
is setup. It's all too easy to accidentally view a file in the platform
native setup.
ty in advance. Greetings.
Andy
Loading...