Black Magic C
Oct 9th, 10:34am
vendu
-----
whee
morning coffee + a couple of small jointies done :)
the newly-re-engineered malloc is starting to kick ass :)
Oct 9th, 2:09pm
raster
------
Not sure what smoking weed has to do with writing malloc code...
Nov 3rd, 9:21pm
vendu
-----
smoking weed has everything to do with writing code - it makes you too lazy to wait for benchmarks to finish to come up with solutions like this (notice how i use mask to avoid a break-if-branch every item) :)
LINK UPDATED:
hash chain-table traversal without branches
raster
------
weed has made you write "clever" code, not readable or maintainable code. have you checked the ASM output of that vs a simple if/else? you might want to do that (with optimization). let me do this for you... just isolating it into a single function per method:
void *f1(void *p1, void *p2, void *p3) {
return (void *)((-((long)(p1 == p2))) & (long)p3);
}
void *f2(void *p1, void *p2, void *p3) {
if (p1 == p2) return p3;
return 0x0;
}
f1 is your fancy method of "ooh i avoided an if". well.. you did.. but the compiler still need to do a compare anyway... i think we can all agree that someone reading the code will look at f1 and go "wtf?" and have to sit for a bit and do the math in their head and eventually realize it really means the 2nd one. right? the 2'nd one is the readable "oh so that's the logic" one. right?
well let's look at the assembly output. gcc -O3 (on x86-64):
f1:
xorl %eax, %eax
cmpq %rsi, %rdi
sete %al
negq %rax
andq %rdx, %rax
ret
f2:
movq %rdx, %rax
cmpq %rsi, %rdi
movl $0, %edx
cmovne %rdx, %rax
ret
i removed some other asm junk and labels to get down to the meat of the algorithms...
the simple obvious one produces fewer instructions than your "clever" one. yes. instructions may take a different amount of time to execute. blah blah blah. we can benchmark if you like... but i might say f2 will be a bit faster than f1. they both have to do a cmp anyway and results o the cmp will need speculative execution of the result. pipelining means the and for yours has to wait on the result of the cmp to do its task, and the cmovne - a conditional instruction has to wait. the cpu pipeline will stall there one way or another. but your version does have more instructions to get through. lots of cpu archs have simple conditional instructions for just this kind of case to avoid through in a jmp for simple responses.
so my take would be... weed has dulled your brain making you THINK you can outsmart the compiler with REALLY basic things in micro-optimizations, without actually reading the assembly code to be sure you can outsmart the compiler. i would have betted compilers are pretty good at this kind of stuff. it's their bread and butter to optimize this. i was right. i did find out maybe a month ago that compilers are not so smart about where they PLACE code for if branches when if branches are non-trivial and this pollutes l1 cache lines with unused code, so either restructuring to have the most common branch "in line" helps with l1 cacheline hits vs misses. this is a less trivial optimization even if you use the likely or unlikely attributes for an if - this simple reverses the if to assume 486-style prediction (always predict false or true - i don't remember so structure the cmp + branch to have the most likely one the cpu was going to insist on predicting by default - fixed after 486 to follow statistics and track history of cmp's)... anyway. point is... i think you THINK you're being clever... but you're actually making your code slower AND harder to read and maintain. :) proof in the assembly. you should go read your asm some day. :)
vendu
-----
whoa, that was a long rant :)
but thanks :)
i'll investigate one of these days :P
Nov 4th, 2:37pm
vendu
-----
in my defense, yes i'm too lazy to read asm, but i also don't use gcc optimisations beyond -O most of the time :)
raster
------
Even -O should do this. Compilers are good at this stuff. Knowing what your computer's do for you in general is kind of important so you don't write unmaintainable code. :p
vendu
-----
:)
vendu
-----
whee
morning coffee + a couple of small jointies done :)
the newly-re-engineered malloc is starting to kick ass :)
Oct 9th, 2:09pm
raster
------
Not sure what smoking weed has to do with writing malloc code...
Nov 3rd, 9:21pm
vendu
-----
smoking weed has everything to do with writing code - it makes you too lazy to wait for benchmarks to finish to come up with solutions like this (notice how i use mask to avoid a break-if-branch every item) :)
LINK UPDATED:
hash chain-table traversal without branches
raster
------
weed has made you write "clever" code, not readable or maintainable code. have you checked the ASM output of that vs a simple if/else? you might want to do that (with optimization). let me do this for you... just isolating it into a single function per method:
void *f1(void *p1, void *p2, void *p3) {
return (void *)((-((long)(p1 == p2))) & (long)p3);
}
void *f2(void *p1, void *p2, void *p3) {
if (p1 == p2) return p3;
return 0x0;
}
f1 is your fancy method of "ooh i avoided an if". well.. you did.. but the compiler still need to do a compare anyway... i think we can all agree that someone reading the code will look at f1 and go "wtf?" and have to sit for a bit and do the math in their head and eventually realize it really means the 2nd one. right? the 2'nd one is the readable "oh so that's the logic" one. right?
well let's look at the assembly output. gcc -O3 (on x86-64):
f1:
xorl %eax, %eax
cmpq %rsi, %rdi
sete %al
negq %rax
andq %rdx, %rax
ret
f2:
movq %rdx, %rax
cmpq %rsi, %rdi
movl $0, %edx
cmovne %rdx, %rax
ret
i removed some other asm junk and labels to get down to the meat of the algorithms...
the simple obvious one produces fewer instructions than your "clever" one. yes. instructions may take a different amount of time to execute. blah blah blah. we can benchmark if you like... but i might say f2 will be a bit faster than f1. they both have to do a cmp anyway and results o the cmp will need speculative execution of the result. pipelining means the and for yours has to wait on the result of the cmp to do its task, and the cmovne - a conditional instruction has to wait. the cpu pipeline will stall there one way or another. but your version does have more instructions to get through. lots of cpu archs have simple conditional instructions for just this kind of case to avoid through in a jmp for simple responses.
so my take would be... weed has dulled your brain making you THINK you can outsmart the compiler with REALLY basic things in micro-optimizations, without actually reading the assembly code to be sure you can outsmart the compiler. i would have betted compilers are pretty good at this kind of stuff. it's their bread and butter to optimize this. i was right. i did find out maybe a month ago that compilers are not so smart about where they PLACE code for if branches when if branches are non-trivial and this pollutes l1 cache lines with unused code, so either restructuring to have the most common branch "in line" helps with l1 cacheline hits vs misses. this is a less trivial optimization even if you use the likely or unlikely attributes for an if - this simple reverses the if to assume 486-style prediction (always predict false or true - i don't remember so structure the cmp + branch to have the most likely one the cpu was going to insist on predicting by default - fixed after 486 to follow statistics and track history of cmp's)... anyway. point is... i think you THINK you're being clever... but you're actually making your code slower AND harder to read and maintain. :) proof in the assembly. you should go read your asm some day. :)
vendu
-----
whoa, that was a long rant :)
but thanks :)
i'll investigate one of these days :P
Nov 4th, 2:37pm
vendu
-----
in my defense, yes i'm too lazy to read asm, but i also don't use gcc optimisations beyond -O most of the time :)
raster
------
Even -O should do this. Compilers are good at this stuff. Knowing what your computer's do for you in general is kind of important so you don't write unmaintainable code. :p
vendu
-----
:)
Comments
Post a Comment