Discussion:
tdb2.tdbloader performance
Laura Morales
2017-11-28 14:06:57 UTC
Permalink
So I had a laptop at hand with a 3GHz i7 CPU, 8GB DDR3 1600MHz RAM, and SATA3 HDD available. I decided to try the conversion again on a 1.1GB .nt file.
I used `./tbd2.tdbloader --loc xxx --verbose file.nt`.
Reading the .nt file from the HDD, and writing to the HDD gave me about 60K triples/second on AVG. I don't have an SSD, but this PC seems to have enough RAM. So I started a livecd to be sure that I was running everything from RAM; all disks unmounted. I ran the same command, and the AVG number of triples/seconds is pretty much the same, perhaps only slightly better with 2K or 3K more per seconds. Conversion from the livecd seemed to use a full thread at 100%, 25% RAM, 0 SWAP.

This is... very surprising, I wasn't expecting this. I was expecting a significant improvement since I was running everything from RAM. What I get from this is that SATA3 disks are OK? That SSD won't really make any difference? Are faster RAM, faster CPU, or maybe more RAM/CPU cache the only ways to get more performance out of tdb2.tdbloader (since more RAM capacity doesn't seem to make any difference)?

Or does tdb2.tdbloader (or maybe Java) have any mechanism in place that is slowing down conversion? Like for example using less RAM than it's available or whatever?
dandh988
2017-11-28 15:14:06 UTC
Permalink
Eventually something will give and you'll get a wait as something is spilled to something, ie cache to physical drive.
Also different settings suit different work loads. I have a number of +128GB units configured differently depending on what they need to do. The ETL setting only gives Java 8GB but the OS will consume close to 90GB virtual for the process as it basically dumps into file cache. At some point though that cache is written out to noon volatile storage. As the units have 24 cores I can actually run close to 12 processes before things start to effect each other. If you consider server class hardware there's a lot of thought to cache levels and how they cascade.
Switch the SATA for M.2 and you'll move the issue somewhere else...


Dick
-------- Original message --------From: Laura Morales <***@mail.com> Date: 28/11/2017 14:06 (GMT+00:00) To: jena-users-ml <***@jena.apache.org> Subject: tdb2.tdbloader performance
So I had a laptop at hand with a 3GHz i7 CPU, 8GB DDR3 1600MHz RAM, and SATA3 HDD available. I decided to try the conversion again on a 1.1GB .nt file.
I used `./tbd2.tdbloader --loc xxx --verbose file.nt`.
Reading the .nt file from the HDD, and writing to the HDD gave me about 60K triples/second on AVG. I don't have an SSD, but this PC seems to have enough RAM. So I started a livecd to be sure that I was running everything from RAM; all disks unmounted. I ran the same command, and the AVG number of triples/seconds is pretty much the same, perhaps only slightly better with 2K or 3K more per seconds. Conversion from the livecd seemed to use a full thread at 100%, 25% RAM, 0 SWAP.

This is... very surprising, I wasn't expecting this. I was expecting a significant improvement since I was running everything from RAM. What I get from this is that SATA3 disks are OK? That SSD won't really make any difference? Are faster RAM, faster CPU, or maybe more RAM/CPU cache the only ways to get more performance out of tdb2.tdbloader (since more RAM capacity doesn't seem to make any difference)?

Or does tdb2.tdbloader (or maybe Java) have any mechanism in place that is slowing down conversion? Like for example using less RAM than it's available or whatever?
Laura Morales
2017-11-28 15:30:37 UTC
Permalink
Post by dandh988
Eventually something will give and you'll get a wait as something is spilled to something, ie cache to physical drive.
Also different settings suit different work loads. I have a number of +128GB units configured differently depending on what they need to do. The ETL setting only gives Java 8GB but the OS will consume close to 90GB virtual for the process as it basically dumps into file cache. At some point though that cache is written out to noon volatile storage. As the units have 24 cores I can actually run close to 12 processes before things start to effect each other. If you consider server class hardware there's a lot of thought to cache levels and how they cascade.
Switch the SATA for M.2 and you'll move the issue somewhere else...
Well yeah, but having a problem at 10K triples/seconds is not the same problem as 1M triples/seconds. I'll gladly "move the problem elsewhere" if I knew how to get to 1M triples/seconds.
Moving from SATA to M.2 I don't know if it's worth the trouble (and money) given that on my computer running from SATA3 disks or RAMdisk doesn't seem like it's making any difference. And RAM is much faster than M.2 too.
Just out of curiosity, how many "AVG triples/seconds" can you get with your server-class hardware when converting a .nt to TDB2 using tdb2.tdbloader?
Dick Murray
2017-11-28 18:03:34 UTC
Permalink
LOL, there's lots of things where I'd like to "move the problem elsewhere".

I've achieved concurrent 120K on the server hardware but it depends on the
input. There's another recent Jena thread regarding sizing and that's tied
up with what's in the input. I see the same thing with loading data, some
files fly others seem to drag and it's not just the size. What the server
hardware does do is allow me to run multiple processes and average 60K.
Also up to a certain size I have an overclocked AMD (4.5Ghz) and it will
outperform everything until it hits its cache limit.

We tend towards running multiple TDB's and present them as one, a legacy of
overcoming the one writer in TDB1. This brings it's own issues such as
distinct being high cost which we mitigate with a few tricks.

On the minefield subject of hardware, do you have DDR3 or DDR4? What
chipset is driving it because Haswell’s dual-channel memory controller is
going to have a hard time keeping up with the quad-channel memory
controllers on Ivy Bridge-E and Haswell-E. And yes Corsair quote 47GB/s for
DDR4, but you still need to write that somewhere and a M.2 a PCI-E 2.0 x4
at 1.6GB/s is almost 3x the througput of SATAIII at 600MB/s, PCI-E 3.0 x4
is 3.9GB/s, plus you now have Optane or 3D XPoint depending on what sounds
better

What files are you trying to import and i'll run them through?

Regards Dick
Post by dandh988
Post by dandh988
Eventually something will give and you'll get a wait as something is
spilled to something, ie cache to physical drive.
Post by dandh988
Also different settings suit different work loads. I have a number of
+128GB units configured differently depending on what they need to do. The
ETL setting only gives Java 8GB but the OS will consume close to 90GB
virtual for the process as it basically dumps into file cache. At some
point though that cache is written out to noon volatile storage. As the
units have 24 cores I can actually run close to 12 processes before things
start to effect each other. If you consider server class hardware there's a
lot of thought to cache levels and how they cascade.
Post by dandh988
Switch the SATA for M.2 and you'll move the issue somewhere else...
Well yeah, but having a problem at 10K triples/seconds is not the same
problem as 1M triples/seconds. I'll gladly "move the problem elsewhere" if
I knew how to get to 1M triples/seconds.
Moving from SATA to M.2 I don't know if it's worth the trouble (and money)
given that on my computer running from SATA3 disks or RAMdisk doesn't seem
like it's making any difference. And RAM is much faster than M.2 too.
Just out of curiosity, how many "AVG triples/seconds" can you get with
your server-class hardware when converting a .nt to TDB2 using
tdb2.tdbloader?
Laura Morales
2017-11-28 18:34:45 UTC
Permalink
Post by Dick Murray
I've achieved concurrent 120K on the server hardware but it depends on the
input.

Good to see that it can go faster. I do understand that this metric is dependent on input, but it still looks rather slow considering that datasets keep growing. At this (constant) rate, Wikidata would still take at least 12-13 hours.
Post by Dick Murray
What the server hardware does do is allow me to run multiple processes and average 60K.
tdb2.tdbloader is single threaded though, I don't know how multiple cores are going to help.
Post by Dick Murray
We tend towards running multiple TDB's and present them as one, a legacy of
overcoming the one writer in TDB1.

One graph per TDB store?
Post by Dick Murray
On the minefield subject of hardware, do you have DDR3 or DDR4?
DDR3 1600MHz
Post by Dick Murray
What
chipset is driving it because Haswell’s dual-channel memory controller is
going to have a hard time keeping up with the quad-channel memory
controllers on Ivy Bridge-E and Haswell-E
Haswell, dual-channel I think.
Post by Dick Murray
What files are you trying to import and i'll run them through?
The 1.1GB that I mentioned contains data that I can't make public on the Internet, but you can try with the Wikidata dump https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
You probably don't have to convert all of it. Just starting the conversion you should already see how many triples it's handling. I ran this comman `./tdb2.tdbloader --loc wikidata --verbose wikidata.nt`.
If it goes any faster than 70K AVG triples/second, I'd be interested to know what hardware components you've got.
dandh988
2017-11-28 18:46:30 UTC
Permalink
I've had loads take over 24 hours and produce 350GB TDB1 instances...
You can run multiple loaders into separate instances and on sufficient kit they don't slow down. As a back ground I convert CAD files to triples or quads, typically 100M but some can be 500M. That's triples output not file input size.
Ok with the data, I have that somewhere and will run it through, hopefully tonight if paid work doesn't get in the way ;-)

Dick
Post by Dick Murray
I've achieved concurrent 120K on the server hardware but it depends on the
input.

Good to see that it can go faster. I do understand that this metric is dependent on input, but it still looks rather slow considering that datasets keep growing. At this (constant) rate, Wikidata would still take at least 12-13 hours.
Post by Dick Murray
What the server hardware does do is allow me to run multiple processes and average 60K.
tdb2.tdbloader is single threaded though, I don't know how multiple cores are going to help.
Post by Dick Murray
We tend towards running multiple TDB's and present them as one, a legacy of
overcoming the one writer in TDB1.

One graph per TDB store?
Post by Dick Murray
On the minefield subject of hardware, do you have DDR3 or DDR4?
DDR3 1600MHz
Post by Dick Murray
What
chipset is driving it because Haswell’s dual-channel memory controller is
going to have a hard time keeping up with the quad-channel memory
controllers on Ivy Bridge-E and Haswell-E
Haswell, dual-channel I think.
Post by Dick Murray
What files are you trying to import and i'll run them through?
The 1.1GB that I mentioned contains data that I can't make public on the Internet, but you can try with the Wikidata dump https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz
You probably don't have to convert all of it. Just starting the conversion you should already see how many triples it's handling. I ran this comman `./tdb2.tdbloader --loc wikidata --verbose wikidata.nt`.
If it goes any faster than 70K AVG triples/second, I'd be interested to know what hardware components you've got.
Laura Morales
2017-11-28 19:08:39 UTC
Permalink
Post by dandh988
I've had loads take over 24 hours and produce 350GB TDB1 instances...
Yeah 24H is still acceptable, but it's very borderline. Running a conversion that takes days becomes frustrating very soon. Of course I'm not trying to be mean here, but I think it's good to push the limits because we are already at a point where graphs have several billions triples. If my computer, which is an average consumer PC at best, can do 60-70K, two "average grade" nodes could already outperform your beefy server if only I could share the load on multiple PCs.
Post by dandh988
Ok with the data, I have that somewhere and will run it through, hopefully tonight if paid work doesn't get in the way ;-)
Thank you very much for trying this and for offering feedback. I'd be interested to know

- what components do you have (cpu/ram/disks/...)
- the AVG number of triples/second
- the final size of the TDB2 store

Also since you're already running this test, would you mind sharing the final TDB2 store instead of deleting it? :) If the output is not too large...
Dick Murray
2017-12-01 20:11:39 UTC
Permalink
Hi.

Sorry for the delay :-)

Short story I used the following "reasonable" device

Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads

to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;

@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec

The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.

Check with Andy but I think it's limited by CPU, which is why my 24 core (4
x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
performance hit.

I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.

I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!

Long story follows...

decompress the file;

pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com]
Uses libbzip2 by Julian Seward

# CPUs: 4
Maximum Memory: 1024 MB
Ignore Trailing Garbage: off
-------------------------------------------
File #: 1 of 1
Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt

BWT Block Size: 900k
Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
-------------------------------------------

Wall Clock: 5871.550948 seconds

count the lines;

wc -l latest-truthy.nt
2199382887 latest-truthy.nt

Just short of 2200M...

split the file into 10M chunks;

split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file 'latest-truthy.nt.003'
creating file 'latest-truthy.nt.004'
creating file 'latest-truthy.nt.005'
...

Restart!

sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

ps aux | grep tdb2
root 3358 0.0 0.0 222844 5756 pts/0 S+ 19:22 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3359 0.0 0.0 4500 776 pts/0 S+ 19:22 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3360 0.0 0.0 120304 3288 pts/0 S+ 19:22 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 3361 4.9 0.0 4500 92 pts/0 S<+ 19:22 0:05 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3366 95.7 14.8 7866116 2418768 pts/0 Sl+ 19:22 1:42 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 3477 0.0 0.0 119728 972 pts/1 S+ 19:24 0:00 grep
--color=auto tdb2

Notice PID 3366 is -Xmx2G default.

19:26:49 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.28s (Avg: 42,404)

After the first pass there is no read from the 1TB source as the OS has
cached the 1.2G source.

19:33:50 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 245.70s (Avg: 42,677)

export JVM_ARGS="-Xmx4G" i.e. increase the max heap and help the GC

sudo ps aux | grep tdb2
root 4317 0.0 0.0 222848 6236 pts/0 S+ 19:35 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4321 0.0 0.0 4500 924 pts/0 S+ 19:35 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4322 0.0 0.0 120304 3356 pts/0 S+ 19:35 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4323 4.8 0.0 4500 88 pts/0 S<+ 19:35 0:09 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4328 94.8 18.5 8406788 3036188 pts/0 Sl+ 19:35 3:01 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4594 0.0 0.0 119728 1024 pts/1 S+ 19:38 0:00 grep
--color=auto tdb2

At 800K PID was 3GB and peaked at 3.4GB just prior to completion.

19:39:23 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.65s (Avg: 42,340)

Throw all CPU resources at it i.e. 800

sudo cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

Average was at +45K by 350K and +60K by 1.2M

19:43:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 166.91s (Avg: 62,823)

sudo ps aux | grep tdb2
root 4740 0.0 0.0 222848 6264 pts/0 S+ 19:40 0:00 sudo
cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4744 0.0 0.0 4500 720 pts/0 S+ 19:40 0:00 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4745 0.0 0.0 120304 3208 pts/0 S+ 19:40 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4746 4.7 0.0 4500 92 pts/0 R<+ 19:40 0:07 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4751 131 21.1 8693508 3448252 pts/0 Sl+ 19:40 3:32 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4808 0.0 0.0 119728 1060 pts/1 S+ 19:43 0:00 grep
--color=auto tdb2

Heap peaked at 3.4GB

sudo cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

sudo ps aux | grep tdb2
root 4898 0.0 0.0 222844 5672 pts/0 S+ 19:45 0:00 sudo
cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4899 0.0 0.0 4500 724 pts/0 S+ 19:45 0:00 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4900 0.0 0.0 120304 3244 pts/0 T+ 19:45 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4901 5.5 0.0 4500 92 pts/0 S<+ 19:45 0:25 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4906 50.5 20.7 8685316 3395236 pts/0 Tl+ 19:45 3:55 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4983 0.0 0.0 119728 1072 pts/1 S+ 19:53 0:00 grep
--color=auto tdb2

19:53:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 482.27s (Avg: 21,742)
Post by Laura Morales
Post by dandh988
I've had loads take over 24 hours and produce 350GB TDB1 instances...
Yeah 24H is still acceptable, but it's very borderline. Running a
conversion that takes days becomes frustrating very soon. Of course I'm not
trying to be mean here, but I think it's good to push the limits because we
are already at a point where graphs have several billions triples. If my
computer, which is an average consumer PC at best, can do 60-70K, two
"average grade" nodes could already outperform your beefy server if only I
could share the load on multiple PCs.
Post by dandh988
Ok with the data, I have that somewhere and will run it through,
hopefully tonight if paid work doesn't get in the way ;-)
Thank you very much for trying this and for offering feedback. I'd be interested to know
- what components do you have (cpu/ram/disks/...)
- the AVG number of triples/second
- the final size of the TDB2 store
Also since you're already running this test, would you mind sharing the
final TDB2 store instead of deleting it? :) If the output is not too
large...
Laura Morales
2017-12-01 22:28:15 UTC
Permalink
Thank you very much, this is great feedback!
Your setup was very similar to mine, except:

- I have 8GB RAM single bank, you have 16GB probably on two banks
- my CPU is "half" of yours, 2 cores 4 threads

despite this, the results are very similar; maybe yours are slightly better. I don't understand why this "60K" seems so hard to beat. What's so special about it?? It's so difficult to understand what to do to improve the conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? Or a CPU with more cache? Or more memory channels? I still can't find an answer... Why would more cores help if tdb2.tdbloader runs in a single thread? Maybe the reason is that with more cores, your xeon can handle more RAM concurrently? I don't understand...
With your xeon, you said you were able to get to 120K? Right? What xeon, mobo, and RAM did you use?
If anybody has any xeon or opteron, it would be nice if they could offer more feedback too. Even with slower RAM such as DDR3-1333. I certainly can't wait to read your feedback with the Threadripper :)

keep us posted!





Sent: Friday, December 01, 2017 at 9:11 PM
From: "Dick Murray" <***@gmail.com>
To: ***@jena.apache.org
Subject: Re: tdb2.tdbloader performance
Hi.

Sorry for the delay :-)

Short story I used the following "reasonable" device

Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads

to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;

@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec

The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.

Check with Andy but I think it's limited by CPU, which is why my 24 core (4
x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
performance hit.

I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.

I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!

Long story follows...

decompress the file;

pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]]
Uses libbzip2 by Julian Seward

# CPUs: 4
Maximum Memory: 1024 MB
Ignore Trailing Garbage: off
-------------------------------------------
File #: 1 of 1
Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt

BWT Block Size: 900k
Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
-------------------------------------------

Wall Clock: 5871.550948 seconds

count the lines;

wc -l latest-truthy.nt
2199382887 latest-truthy.nt

Just short of 2200M...

split the file into 10M chunks;

split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file 'latest-truthy.nt.003'
creating file 'latest-truthy.nt.004'
creating file 'latest-truthy.nt.005'
...

Restart!

sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

ps aux | grep tdb2
root 3358 0.0 0.0 222844 5756 pts/0 S+ 19:22 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3359 0.0 0.0 4500 776 pts/0 S+ 19:22 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3360 0.0 0.0 120304 3288 pts/0 S+ 19:22 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 3361 4.9 0.0 4500 92 pts/0 S<+ 19:22 0:05 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3366 95.7 14.8 7866116 2418768 pts/0 Sl+ 19:22 1:42 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 3477 0.0 0.0 119728 972 pts/1 S+ 19:24 0:00 grep
--color=auto tdb2

Notice PID 3366 is -Xmx2G default.

19:26:49 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.28s (Avg: 42,404)

After the first pass there is no read from the 1TB source as the OS has
cached the 1.2G source.

19:33:50 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 245.70s (Avg: 42,677)

export JVM_ARGS="-Xmx4G" i.e. increase the max heap and help the GC

sudo ps aux | grep tdb2
root 4317 0.0 0.0 222848 6236 pts/0 S+ 19:35 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4321 0.0 0.0 4500 924 pts/0 S+ 19:35 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4322 0.0 0.0 120304 3356 pts/0 S+ 19:35 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4323 4.8 0.0 4500 88 pts/0 S<+ 19:35 0:09 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4328 94.8 18.5 8406788 3036188 pts/0 Sl+ 19:35 3:01 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4594 0.0 0.0 119728 1024 pts/1 S+ 19:38 0:00 grep
--color=auto tdb2

At 800K PID was 3GB and peaked at 3.4GB just prior to completion.

19:39:23 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.65s (Avg: 42,340)

Throw all CPU resources at it i.e. 800

sudo cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

Average was at +45K by 350K and +60K by 1.2M

19:43:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 166.91s (Avg: 62,823)

sudo ps aux | grep tdb2
root 4740 0.0 0.0 222848 6264 pts/0 S+ 19:40 0:00 sudo
cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4744 0.0 0.0 4500 720 pts/0 S+ 19:40 0:00 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4745 0.0 0.0 120304 3208 pts/0 S+ 19:40 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4746 4.7 0.0 4500 92 pts/0 R<+ 19:40 0:07 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4751 131 21.1 8693508 3448252 pts/0 Sl+ 19:40 3:32 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4808 0.0 0.0 119728 1060 pts/1 S+ 19:43 0:00 grep
--color=auto tdb2

Heap peaked at 3.4GB

sudo cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

sudo ps aux | grep tdb2
root 4898 0.0 0.0 222844 5672 pts/0 S+ 19:45 0:00 sudo
cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4899 0.0 0.0 4500 724 pts/0 S+ 19:45 0:00 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4900 0.0 0.0 120304 3244 pts/0 T+ 19:45 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4901 5.5 0.0 4500 92 pts/0 S<+ 19:45 0:25 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4906 50.5 20.7 8685316 3395236 pts/0 Tl+ 19:45 3:55 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4983 0.0 0.0 119728 1072 pts/1 S+ 19:53 0:00 grep
--color=auto tdb2

19:53:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 482.27s (Avg: 21,742)
Andy Seaborne
2017-12-02 21:16:56 UTC
Permalink
Post by Laura Morales
Thank you very much, this is great feedback!
- I have 8GB RAM single bank, you have 16GB probably on two banks
- my CPU is "half" of yours, 2 cores 4 threads
despite this, the results are very similar; maybe yours are slightly better. I don't understand why this "60K" seems so hard to beat. What's so special about it?? It's so difficult to understand what to do to improve the conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? Or a CPU with more cache? Or more memory channels? I still can't find an answer... Why would more cores help if tdb2.tdbloader
As already said - tdb2.tdbloader in its current form is not suitable for
loading billion triple datasets (unless there is a lot of RAM ... I'd
guess upward of 256G for truthy and a tuned server (swappy=0 for
example), not that I've tried).
Post by Laura Morales
runs in a single thread? Maybe the reason is that with more cores, your xeon can handle more RAM concurrently? I don't understand...
With your xeon, you said you were able to get to 120K? Right?
"concurrent 120K"

I understood that to mean more than one load running at once. Dick's
system has multiple TDB databases and a large disk cache.

(I got 76K, single load, on somewhat less hardware so that suggests 120K
may be affected by I/O contention.)
Post by Laura Morales
What xeon, mobo, and RAM did you use?
If anybody has any xeon or opteron, it would be nice if they could offer more feedback too. Even with slower RAM such as DDR3-1333. I certainly can't wait to read your feedback with the Threadripper :)
Threads will not help a single load except for tdbloader2 (which is for
TDB1) if tuned - see the command help and notes. It uses sort(1) which
can utilize multiple threads.

Andy
Post by Laura Morales
keep us posted!
Sent: Friday, December 01, 2017 at 9:11 PM
Subject: Re: tdb2.tdbloader performance
Hi.
Sorry for the delay :-)
Short story I used the following "reasonable" device
Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;
@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec
The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.
Check with Andy but I think it's limited by CPU, which is why my 24 core (4
performance hit.
I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.
I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!
Long story follows...
decompress the file;
pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]]
Uses libbzip2 by Julian Seward
# CPUs: 4
Maximum Memory: 1024 MB
Ignore Trailing Garbage: off
-------------------------------------------
File #: 1 of 1
Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt
BWT Block Size: 900k
Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
-------------------------------------------
Wall Clock: 5871.550948 seconds
count the lines;
wc -l latest-truthy.nt
2199382887 latest-truthy.nt
Just short of 2200M...
split the file into 10M chunks;
split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file 'latest-truthy.nt.003'
creating file 'latest-truthy.nt.004'
creating file 'latest-truthy.nt.005'
...
Restart!
sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
ps aux | grep tdb2
root 3358 0.0 0.0 222844 5756 pts/0 S+ 19:22 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3359 0.0 0.0 4500 776 pts/0 S+ 19:22 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3360 0.0 0.0 120304 3288 pts/0 S+ 19:22 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 3361 4.9 0.0 4500 92 pts/0 S<+ 19:22 0:05 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3366 95.7 14.8 7866116 2418768 pts/0 Sl+ 19:22 1:42 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 3477 0.0 0.0 119728 972 pts/1 S+ 19:24 0:00 grep
--color=auto tdb2
Notice PID 3366 is -Xmx2G default.
19:26:49 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.28s (Avg: 42,404)
After the first pass there is no read from the 1TB source as the OS has
cached the 1.2G source.
19:33:50 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 245.70s (Avg: 42,677)
export JVM_ARGS="-Xmx4G" i.e. increase the max heap and help the GC
sudo ps aux | grep tdb2
root 4317 0.0 0.0 222848 6236 pts/0 S+ 19:35 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4321 0.0 0.0 4500 924 pts/0 S+ 19:35 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4322 0.0 0.0 120304 3356 pts/0 S+ 19:35 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4323 4.8 0.0 4500 88 pts/0 S<+ 19:35 0:09 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4328 94.8 18.5 8406788 3036188 pts/0 Sl+ 19:35 3:01 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4594 0.0 0.0 119728 1024 pts/1 S+ 19:38 0:00 grep
--color=auto tdb2
At 800K PID was 3GB and peaked at 3.4GB just prior to completion.
19:39:23 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.65s (Avg: 42,340)
Throw all CPU resources at it i.e. 800
sudo cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
Average was at +45K by 350K and +60K by 1.2M
19:43:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 166.91s (Avg: 62,823)
sudo ps aux | grep tdb2
root 4740 0.0 0.0 222848 6264 pts/0 S+ 19:40 0:00 sudo
cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4744 0.0 0.0 4500 720 pts/0 S+ 19:40 0:00 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4745 0.0 0.0 120304 3208 pts/0 S+ 19:40 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4746 4.7 0.0 4500 92 pts/0 R<+ 19:40 0:07 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4751 131 21.1 8693508 3448252 pts/0 Sl+ 19:40 3:32 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4808 0.0 0.0 119728 1060 pts/1 S+ 19:43 0:00 grep
--color=auto tdb2
Heap peaked at 3.4GB
sudo cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
sudo ps aux | grep tdb2
root 4898 0.0 0.0 222844 5672 pts/0 S+ 19:45 0:00 sudo
cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4899 0.0 0.0 4500 724 pts/0 S+ 19:45 0:00 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4900 0.0 0.0 120304 3244 pts/0 T+ 19:45 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4901 5.5 0.0 4500 92 pts/0 S<+ 19:45 0:25 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4906 50.5 20.7 8685316 3395236 pts/0 Tl+ 19:45 3:55 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4983 0.0 0.0 119728 1072 pts/1 S+ 19:53 0:00 grep
--color=auto tdb2
19:53:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 482.27s (Avg: 21,742)
ajs6f
2017-12-02 21:59:29 UTC
Permalink
Threads will not help a single load except for tdbloader2 (which is for TDB1) if tuned - see the command help and notes. It uses sort(1) which can utilize multiple threads.
This was worth tuning for me. sort generally picks good parameters for a system, but I was able to get noticeably better performance by adjusting (up) the parallelism manually. But of course, that's a limited amount of improvement. (It's also worth making sure your locale is set appropriately. Avoid using Unicode collation and it will speed things up impressively.)

ajs6f
Post by Laura Morales
Thank you very much, this is great feedback!
- I have 8GB RAM single bank, you have 16GB probably on two banks
- my CPU is "half" of yours, 2 cores 4 threads
despite this, the results are very similar; maybe yours are slightly better. I don't understand why this "60K" seems so hard to beat. What's so special about it?? It's so difficult to understand what to do to improve the conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? Or a CPU with more cache? Or more memory channels? I still can't find an answer... Why would more cores help if tdb2.tdbloader
As already said - tdb2.tdbloader in its current form is not suitable for loading billion triple datasets (unless there is a lot of RAM ... I'd guess upward of 256G for truthy and a tuned server (swappy=0 for example), not that I've tried).
Post by Laura Morales
runs in a single thread? Maybe the reason is that with more cores, your xeon can handle more RAM concurrently? I don't understand...
With your xeon, you said you were able to get to 120K? Right?
"concurrent 120K"
I understood that to mean more than one load running at once. Dick's system has multiple TDB databases and a large disk cache.
(I got 76K, single load, on somewhat less hardware so that suggests 120K may be affected by I/O contention.)
Post by Laura Morales
What xeon, mobo, and RAM did you use?
If anybody has any xeon or opteron, it would be nice if they could offer more feedback too. Even with slower RAM such as DDR3-1333. I certainly can't wait to read your feedback with the Threadripper :)
Threads will not help a single load except for tdbloader2 (which is for TDB1) if tuned - see the command help and notes. It uses sort(1) which can utilize multiple threads.
Andy
Post by Laura Morales
keep us posted!
Sent: Friday, December 01, 2017 at 9:11 PM
Subject: Re: tdb2.tdbloader performance
Hi.
Sorry for the delay :-)
Short story I used the following "reasonable" device
Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;
@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec
The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.
Check with Andy but I think it's limited by CPU, which is why my 24 core (4
performance hit.
I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.
I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!
Long story follows...
decompress the file;
pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]]
Uses libbzip2 by Julian Seward
# CPUs: 4
Maximum Memory: 1024 MB
Ignore Trailing Garbage: off
-------------------------------------------
File #: 1 of 1
Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt
BWT Block Size: 900k
Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
-------------------------------------------
Wall Clock: 5871.550948 seconds
count the lines;
wc -l latest-truthy.nt
2199382887 latest-truthy.nt
Just short of 2200M...
split the file into 10M chunks;
split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file 'latest-truthy.nt.003'
creating file 'latest-truthy.nt.004'
creating file 'latest-truthy.nt.005'
...
Restart!
sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
ps aux | grep tdb2
root 3358 0.0 0.0 222844 5756 pts/0 S+ 19:22 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3359 0.0 0.0 4500 776 pts/0 S+ 19:22 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3360 0.0 0.0 120304 3288 pts/0 S+ 19:22 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 3361 4.9 0.0 4500 92 pts/0 S<+ 19:22 0:05 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3366 95.7 14.8 7866116 2418768 pts/0 Sl+ 19:22 1:42 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 3477 0.0 0.0 119728 972 pts/1 S+ 19:24 0:00 grep
--color=auto tdb2
Notice PID 3366 is -Xmx2G default.
19:26:49 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.28s (Avg: 42,404)
After the first pass there is no read from the 1TB source as the OS has
cached the 1.2G source.
19:33:50 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 245.70s (Avg: 42,677)
export JVM_ARGS="-Xmx4G" i.e. increase the max heap and help the GC
sudo ps aux | grep tdb2
root 4317 0.0 0.0 222848 6236 pts/0 S+ 19:35 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4321 0.0 0.0 4500 924 pts/0 S+ 19:35 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4322 0.0 0.0 120304 3356 pts/0 S+ 19:35 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4323 4.8 0.0 4500 88 pts/0 S<+ 19:35 0:09 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4328 94.8 18.5 8406788 3036188 pts/0 Sl+ 19:35 3:01 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4594 0.0 0.0 119728 1024 pts/1 S+ 19:38 0:00 grep
--color=auto tdb2
At 800K PID was 3GB and peaked at 3.4GB just prior to completion.
19:39:23 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.65s (Avg: 42,340)
Throw all CPU resources at it i.e. 800
sudo cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
Average was at +45K by 350K and +60K by 1.2M
19:43:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 166.91s (Avg: 62,823)
sudo ps aux | grep tdb2
root 4740 0.0 0.0 222848 6264 pts/0 S+ 19:40 0:00 sudo
cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4744 0.0 0.0 4500 720 pts/0 S+ 19:40 0:00 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4745 0.0 0.0 120304 3208 pts/0 S+ 19:40 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4746 4.7 0.0 4500 92 pts/0 R<+ 19:40 0:07 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4751 131 21.1 8693508 3448252 pts/0 Sl+ 19:40 3:32 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4808 0.0 0.0 119728 1060 pts/1 S+ 19:43 0:00 grep
--color=auto tdb2
Heap peaked at 3.4GB
sudo cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
sudo ps aux | grep tdb2
root 4898 0.0 0.0 222844 5672 pts/0 S+ 19:45 0:00 sudo
cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4899 0.0 0.0 4500 724 pts/0 S+ 19:45 0:00 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4900 0.0 0.0 120304 3244 pts/0 T+ 19:45 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4901 5.5 0.0 4500 92 pts/0 S<+ 19:45 0:25 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4906 50.5 20.7 8685316 3395236 pts/0 Tl+ 19:45 3:55 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4983 0.0 0.0 119728 1072 pts/1 S+ 19:53 0:00 grep
--color=auto tdb2
19:53:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 482.27s (Avg: 21,742)
Andy Seaborne
2017-12-02 22:37:23 UTC
Permalink
Post by ajs6f
Threads will not help a single load except for tdbloader2 (which is for TDB1) if tuned - see the command help and notes. It uses sort(1) which can utilize multiple threads.
This was worth tuning for me. sort generally picks good parameters for a system, but I was able to get noticeably better performance by adjusting (up) the parallelism manually. But of course, that's a limited amount of improvement. (It's also worth making sure your locale is set appropriately. Avoid using Unicode collation and it will speed things up impressively.)
Shouldn't be necessary - tdbloader2index sets

export LC_ALL="C"

(see sort(1))
[[
*** WARNING *** The locale specified by the environment affects
sort order. Set LC_ALL=C to get the traditional sort order
that uses native byte values.
]]

If that didn't work, it needs a fix.

Andy
Post by ajs6f
ajs6f
Post by Laura Morales
Thank you very much, this is great feedback!
- I have 8GB RAM single bank, you have 16GB probably on two banks
- my CPU is "half" of yours, 2 cores 4 threads
despite this, the results are very similar; maybe yours are slightly better. I don't understand why this "60K" seems so hard to beat. What's so special about it?? It's so difficult to understand what to do to improve the conversion speed... do I buy more ram? Faster ram? A faster CPU? More cores? Or a CPU with more cache? Or more memory channels? I still can't find an answer... Why would more cores help if tdb2.tdbloader
As already said - tdb2.tdbloader in its current form is not suitable for loading billion triple datasets (unless there is a lot of RAM ... I'd guess upward of 256G for truthy and a tuned server (swappy=0 for example), not that I've tried).
Post by Laura Morales
runs in a single thread? Maybe the reason is that with more cores, your xeon can handle more RAM concurrently? I don't understand...
With your xeon, you said you were able to get to 120K? Right?
"concurrent 120K"
I understood that to mean more than one load running at once. Dick's system has multiple TDB databases and a large disk cache.
(I got 76K, single load, on somewhat less hardware so that suggests 120K may be affected by I/O contention.)
Post by Laura Morales
What xeon, mobo, and RAM did you use?
If anybody has any xeon or opteron, it would be nice if they could offer more feedback too. Even with slower RAM such as DDR3-1333. I certainly can't wait to read your feedback with the Threadripper :)
Threads will not help a single load except for tdbloader2 (which is for TDB1) if tuned - see the command help and notes. It uses sort(1) which can utilize multiple threads.
Andy
Post by Laura Morales
keep us posted!
Sent: Friday, December 01, 2017 at 9:11 PM
Subject: Re: tdb2.tdbloader performance
Hi.
Sorry for the delay :-)
Short story I used the following "reasonable" device
Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;
@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec
The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.
Check with Andy but I think it's limited by CPU, which is why my 24 core (4
performance hit.
I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.
I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!
Long story follows...
decompress the file;
pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com[http://javornikolov.wordpress.com]]
Uses libbzip2 by Julian Seward
# CPUs: 4
Maximum Memory: 1024 MB
Ignore Trailing Garbage: off
-------------------------------------------
File #: 1 of 1
Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt
BWT Block Size: 900k
Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
-------------------------------------------
Wall Clock: 5871.550948 seconds
count the lines;
wc -l latest-truthy.nt
2199382887 latest-truthy.nt
Just short of 2200M...
split the file into 10M chunks;
split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file 'latest-truthy.nt.003'
creating file 'latest-truthy.nt.004'
creating file 'latest-truthy.nt.005'
...
Restart!
sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
ps aux | grep tdb2
root 3358 0.0 0.0 222844 5756 pts/0 S+ 19:22 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3359 0.0 0.0 4500 776 pts/0 S+ 19:22 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3360 0.0 0.0 120304 3288 pts/0 S+ 19:22 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 3361 4.9 0.0 4500 92 pts/0 S<+ 19:22 0:05 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 3366 95.7 14.8 7866116 2418768 pts/0 Sl+ 19:22 1:42 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 3477 0.0 0.0 119728 972 pts/1 S+ 19:24 0:00 grep
--color=auto tdb2
Notice PID 3366 is -Xmx2G default.
19:26:49 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.28s (Avg: 42,404)
After the first pass there is no read from the 1TB source as the OS has
cached the 1.2G source.
19:33:50 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 245.70s (Avg: 42,677)
export JVM_ARGS="-Xmx4G" i.e. increase the max heap and help the GC
sudo ps aux | grep tdb2
root 4317 0.0 0.0 222848 6236 pts/0 S+ 19:35 0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4321 0.0 0.0 4500 924 pts/0 S+ 19:35 0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4322 0.0 0.0 120304 3356 pts/0 S+ 19:35 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4323 4.8 0.0 4500 88 pts/0 S<+ 19:35 0:09 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4328 94.8 18.5 8406788 3036188 pts/0 Sl+ 19:35 3:01 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4594 0.0 0.0 119728 1024 pts/1 S+ 19:38 0:00 grep
--color=auto tdb2
At 800K PID was 3GB and peaked at 3.4GB just prior to completion.
19:39:23 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.65s (Avg: 42,340)
Throw all CPU resources at it i.e. 800
sudo cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
Average was at +45K by 350K and +60K by 1.2M
19:43:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 166.91s (Avg: 62,823)
sudo ps aux | grep tdb2
root 4740 0.0 0.0 222848 6264 pts/0 S+ 19:40 0:00 sudo
cpulimit -v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4744 0.0 0.0 4500 720 pts/0 S+ 19:40 0:00 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4745 0.0 0.0 120304 3208 pts/0 S+ 19:40 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4746 4.7 0.0 4500 92 pts/0 R<+ 19:40 0:07 cpulimit
-v -l 800 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4751 131 21.1 8693508 3448252 pts/0 Sl+ 19:40 3:32 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4808 0.0 0.0 119728 1060 pts/1 S+ 19:43 0:00 grep
--color=auto tdb2
Heap peaked at 3.4GB
sudo cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
sudo ps aux | grep tdb2
root 4898 0.0 0.0 222844 5672 pts/0 S+ 19:45 0:00 sudo
cpulimit -v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4899 0.0 0.0 4500 724 pts/0 S+ 19:45 0:00 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4900 0.0 0.0 120304 3244 pts/0 T+ 19:45 0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root 4901 5.5 0.0 4500 92 pts/0 S<+ 19:45 0:25 cpulimit
-v -l 50 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root 4906 50.5 20.7 8685316 3395236 pts/0 Tl+ 19:45 3:55 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick 4983 0.0 0.0 119728 1072 pts/1 S+ 19:53 0:00 grep
--color=auto tdb2
19:53:38 INFO TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 482.27s (Avg: 21,742)
Andy Seaborne
2017-12-02 20:55:07 UTC
Permalink
Post by Dick Murray
Short story I used the following "reasonable" device
Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;
@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec
The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.
Which tdb loader?

For TDB1, the two loader behave very differently.

I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8
hours (76K triples/s) using TDB1 tdbloader2.

I'll write it up soon.
Post by Dick Murray
Check with Andy but I think it's limited by CPU, which is why my 24 core (4
performance hit.
The limit at scale is the I/O handling and disk cache. 128G RAM gives a
better disk cache and that server machine probably has better I/O. It's
big enough to fit one whole index (if all RAM is available - and that
depends on the swappiness setting which should be set to zero ideally).

CPU is a limit for a while but you'll see the load speed slows down so
it is not purely CPU as the limit. (As the indexes are 200-way trees,
they don't get very deep.)

tdbloader (loader1) does one index at a time so that the I/O is
constrained, unlike simply adding triples to all 3 indexes together
(which is what TDB2 loader does currently).

loader1 degrades at large scale due to random I/O write patterns on
secondary indexes. Hence an SSD makes a big difference.

loader2 (which has high overhead) avoids the problems and only write
indexes from sorted input so no random access to the indexes. An SSD
makes less difference.
Post by Dick Murray
I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.
I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!
The TDB2 database for a single graph will be same size as TDB1 using
tdbloader (not tdbloader2).
Post by Dick Murray
Long story follows...
<lots of interesting numbers>
Dick Murray
2017-12-02 21:34:56 UTC
Permalink
Hello.

On 2 Dec 2017 8:55 pm, "Andy Seaborne" <***@apache.org> wrote:


Short story I used the following "reasonable" device
Post by Dick Murray
Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;
@800% 60K/Sec
@100% 40K/Sec
@50% 20K/Sec
The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.
Which tdb loader?


TDB2


For TDB1, the two loader behave very differently.

I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8
hours (76K triples/s) using TDB1 tdbloader2.

I'll write it up soon.


Loaded truthy on the server in 9 hours using raid 5 with 10 10k 1TB SAS.
Loaded 4 truthy's concurrently in 9.5 hours. I think that's the biggest
concurrent source the server has handled. Fans work!



Check with Andy but I think it's limited by CPU, which is why my 24 core (4
Post by Dick Murray
performance hit.
The limit at scale is the I/O handling and disk cache. 128G RAM gives a
better disk cache and that server machine probably has better I/O. It's
big enough to fit one whole index (if all RAM is available - and that
depends on the swappiness setting which should be set to zero ideally).

CPU is a limit for a while but you'll see the load speed slows down so it
is not purely CPU as the limit. (As the indexes are 200-way trees, they
don't get very deep.)

tdbloader (loader1) does one index at a time so that the I/O is
constrained, unlike simply adding triples to all 3 indexes together (which
is what TDB2 loader does currently).

loader1 degrades at large scale due to random I/O write patterns on
secondary indexes. Hence an SSD makes a big difference.

loader2 (which has high overhead) avoids the problems and only write
indexes from sorted input so no random access to the indexes. An SSD makes
less difference.


I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
Post by Dick Murray
next few days and I will try and test against it.
I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!
The TDB2 database for a single graph will be same size as TDB1 using
tdbloader (not tdbloader2).
Post by Dick Murray
Long story follows...
<lots of interesting numbers>
Andy Seaborne
2017-12-02 22:38:23 UTC
Permalink
Post by Andy Seaborne
Which tdb loader?
TDB2
tdb2.tdbloader?

It does fine, until RAM (file system cache) gets stressed ... and for
2.2B triples, it gets stressed.

(TDB2 has a fast node table).

Andy
Laura Morales
2017-12-03 05:48:43 UTC
Permalink
Post by Dick Murray
Loaded truthy on the server in 9 hours using raid 5 with 10 10k 1TB SAS.
Loaded 4 truthy's concurrently in 9.5 hours. I think that's the biggest
concurrent source the server has handled. Fans work!
cool, this is another interesting statistics. Looks like there is quite some room for speeds up on a single machine (much simpler to deal with than distributing the work on several nodes), if TDB2 can be parallelized more...
Laura Morales
2017-12-03 05:31:56 UTC
Permalink
@Andy
Post by Andy Seaborne
Which tdb loader?
I'd guess tdb2.tdbloader since he was replying to my previous email
Post by Andy Seaborne
I loaded truthy, 2.199 billion triples, on a 16G Dell XPS with SSD in 8
hours (76K triples/s) using TDB1 tdbloader2.
Post by Andy Seaborne
I'll write it up soon.
Could you please also share the model names of the hardware components? So we can check the various frequencies, bandwidths, latencies?
Post by Andy Seaborne
The limit at scale is the I/O handling and disk cache. 128G RAM gives a
better disk cache and that server machine probably has better I/O. It's
big enough to fit one whole index (if all RAM is available - and that
depends on the swappiness setting which should be set to zero ideally).
Do you have any idea then why executing everything from ramdisk doesn't seem to bring any significant improvements over reading/writing from a SATA3 disk (at least in my tests)?
Loading...