Discussion:
TDB2 read-only dataset
Laura Morales
2018-11-11 06:54:45 UTC
Permalink
I have a 40MB NT file that once tdb2.tdbload'ed it produces a 85MB database. Looking inside the Data-0001 directory I see many .dat and .idn files that are 8MB in size. If I remember correctly this structure is used in anticipation of future modifications of the dataset, therefore some extra space is allocated beforehand.
I would like to suggest a new feature for TDB2 and tdb2.tdbloader, that is a "read-only" flag that will create the dataset (or even a single graph) assuming that it will not change in the future, and therefore optimize it for space. Or maybe a tool similar to tdb2.tdbcompact that will compact the dataset to a space-optimized read-only structure.
If this is possible and if it doesn't require a complete overhaul of TDB2, it would be a really useful feature to have especially when importing 3rd-party graphs that I only need to read from, for having compact graphs/datasets.

Should I maybe open a ticket?
ajs6f
2018-11-11 15:42:44 UTC
Permalink
I'll let Andy comment on the TDB2 question, but as for compact read-only storage, you may wish to experiment with HDT [1][2]. It is not supported by the Jena project, but some Jena users have used it with success.

ajs6f

[1] www.rdfhdt.org
[2] https://github.com/rdfhdt/hdt-java/tree/master/hdt-jena
Post by Laura Morales
I have a 40MB NT file that once tdb2.tdbload'ed it produces a 85MB database. Looking inside the Data-0001 directory I see many .dat and .idn files that are 8MB in size. If I remember correctly this structure is used in anticipation of future modifications of the dataset, therefore some extra space is allocated beforehand.
I would like to suggest a new feature for TDB2 and tdb2.tdbloader, that is a "read-only" flag that will create the dataset (or even a single graph) assuming that it will not change in the future, and therefore optimize it for space. Or maybe a tool similar to tdb2.tdbcompact that will compact the dataset to a space-optimized read-only structure.
If this is possible and if it doesn't require a complete overhaul of TDB2, it would be a really useful feature to have especially when importing 3rd-party graphs that I only need to read from, for having compact graphs/datasets.
Should I maybe open a ticket?
Loading...