Assembly Read a Byte and Compare It
Byte array compare - Speed
Hey guys, my first mail service on here so I'll just say "Hullo everbody!"
Ok heres my question for y'all.
Is at that place a faster way to compare one byte assortment to another?
This is my current lawmaking
// Cheque for a match
lucifer = truthful;
for ( int i = 0; i < arraylen; i++)
{
if( data[i] != data2[i] )
{
friction match = simulated;
suspension;
}
}
Thanks in advance guys!!
Jul 22 '05 #one
13 19036
MrCoder posted:
Hey guys, my first post on here then I'll only say "Hello everbody!"Ok heres my question for you lot lot.
Is there a faster way to compare 1 byte array to another?
This is my current lawmaking
// Cheque for a match
friction match = true;
for ( int i = 0; i < arraylen; i++)
{
if( data[i] != data2[i] )
{
match = false;
break;
}
}
Thanks in advance guys!!
If the assortment is of a variable length, as in your above case, then I would
say that Yes, your lawmaking is pretty efficent, although mayhap there's some
memory functions I'm unaware of that could do the chore ameliorate?
If the array is of constant length, I'd suggest the post-obit:
Struct ChocolateArray
{
int monkey[fifty];
};
int main(void)
{
bool match;
ChocolateArray aa;
ChocolateArray bb;
//Mess with the arrays
if (aa == bb)
{
match = true;
}
else
{
match = false;
}
}
-JKop
Jul 22 '05 #ii
On Sun, 6 Jun 2004 09:23:28 +0000 (UTC), "MrCoder" <Mr*****@hotmail.com> wrote:
Hey guys, my first mail service on here so I'll simply say "Hello everbody!"Ok heres my question for yous lot.
Is at that place a faster way to compare one byte assortment to some other?
This is my electric current code
// Bank check for a lucifer
match = true;
for ( int i = 0; i < arraylen; i++)
{
if( data[i] != data2[i] )
{
friction match = false;
break;
}
}
Thank you in advance guys!!
memcmp( data, data2, arraylen );
Jul 22 '05 #3
MrCoder wrote:
Hey guys, my first post on here and then I'll just say "Howdy everbody!"Ok heres my question for you lot.
Is there a faster way to compare i byte array to some other?
This is my current code
// Check for a match
friction match = truthful;
for ( int i = 0; i < arraylen; i++)
{
if( data[i] != data2[i] )
{
friction match = false;
break;
}
}
On about systems, memcmp may be nicely optimized, information technology may (on some
platforms) be inlined past the compiler.
i.e.
#include <cstring>
bool isequal( const char * a, const char * b, int len )
{
return ::memcmp( a, b, len ) == 0;
}
using gcc: g++ -mtune=i686 -O3 generates:
memcmp_tst.o: file format elf32-i386
Disassembly of section .text:
00000000 <_Z7isequalPKcS0_i>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
three: 83 ec 0c sub $0xc,%esp
6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
ix: 8b 4d ten mov 0x10(%ebp),%ecx
c: 8b 75 08 mov 0x8(%ebp),%esi
f: 89 7d fc mov %edi,0xfffffffc(%ebp)
12: 8b 7d 0c mov 0xc(%ebp),%edi
15: fc cld
xvi: 39 c9 cmp %ecx,%ecx
18: f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
1a: 0f 94 c0 sete %al
1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
20: 8b 7d fc mov 0xfffffffc(%ebp),%edi
23: 89 ec mov %ebp,%esp
25: 0f b6 c0 movzbl %al,%eax
28: 5d popular %ebp
29: c3 ret
(Observe no office calls !)
However, on some architectures (Risc - i.e. mips, sparc, hppa etc) there
are a couple of different fast paths and it depends alot on the
processor, cache and usage patterns.
On 64 chip cpu, y'all might be able to take advantage of reading in large
64 scrap chunks at a time. Functioning optimizations at this level tin can get
very involved just unless the profiling indicates that at that place is a huge
proportion of time spent in memcmp, it'south only not worth your time and
try to optimize it.
memcmp is probably the best choice for a balance of operation and
portability.
Jul 22 '05 #4
"Gianni Mariani" <gi*******@mariani.ws> wrote in message news:c9vfo4
MrCoder wrote:friction match = truthful;for ( int i = 0; i < arraylen; i++)
{
if( data[i] != data2[i] )
{
match = imitation;
break;
}
#include <cstring>
bool isequal( const char * a, const char * b, int len )
{
return ::memcmp( a, b, len ) == 0;
}using gcc: chiliad++ -mtune=i686 -O3 generates:
memcmp_tst.o: file format elf32-i386Disassembly of section .text:
00000000 <_Z7isequalPKcS0_i>:
0: 55 push %ebp
one: 89 e5 mov %esp,%ebp
3: 83 ec 0c sub $0xc,%esp
6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
nine: 8b 4d 10 mov 0x10(%ebp),%ecx
c: 8b 75 08 mov 0x8(%ebp),%esi
f: 89 7d fc mov %edi,0xfffffffc(%ebp)
12: 8b 7d 0c mov 0xc(%ebp),%edi
fifteen: fc cld
sixteen: 39 c9 cmp %ecx,%ecx
xviii: f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
1a: 0f 94 c0 sete %al
1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
20: 8b 7d fc mov 0xfffffffc(%ebp),%edi
23: 89 ec mov %ebp,%esp
25: 0f b6 c0 movzbl %al,%eax
28: 5d popular %ebp
29: c3 ret
How is the memcmp version different from the original version? Is information technology
faster? Sorry, I don't know how to read assembly well.memcmp is probably the best choice for a balance of performance and
portability.
An implementation may specialize std::equal(begin, end, begin2) to telephone call
std::memcmp for POD types. I don't know how many actually do though. My
implementation (Borland 6) does not.
Jul 22 '05 #v
Thanks for the answers guys!
Im going to try it using memcmp presently and see what sort of speed increase
I get.
Jul 22 '05 #half dozen
"JKop" <NU**@Cipher.NULL> wrote in message news:7FDwc.1527
If the array is of constant length, I'd advise the following:Struct ChocolateArray
{
int monkey[l];
};
int main(void)
{
bool match;ChocolateArray aa;
ChocolateArray bb;
//Mess with the arrays
if (aa == bb)
There is no compiler defined operator== for structs or classes. There's
just operator=, copy constructor, and destructor.
Jul 22 '05 #7
Siemel Naran wrote:
"Gianni Mariani" <gi*******@mariani.ws> wrote in bulletin news:c9vfo4
#include <cstring>
bool isequal( const char * a, const char * b, int len )
{
return ::memcmp( a, b, len ) == 0;
}using gcc: m++ -mtune=i686 -O3 generates:
memcmp_tst.o: file format elf32-i386Disassembly of department .text:
00000000 <_Z7isequalPKcS0_i>:
0: 55 push %ebp
one: 89 e5 mov %esp,%ebp
3: 83 ec 0c sub $0xc,%esp
6: 89 75 f8 mov %esi,0xfffffff8(%ebp)
9: 8b 4d 10 mov 0x10(%ebp),%ecx
c: 8b 75 08 mov 0x8(%ebp),%esi
f: 89 7d fc mov %edi,0xfffffffc(%ebp)
12: 8b 7d 0c mov 0xc(%ebp),%edi
fifteen: fc cld
16: 39 c9 cmp %ecx,%ecx
18: f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
1a: 0f 94 c0 sete %al
1d: 8b 75 f8 mov 0xfffffff8(%ebp),%esi
xx: 8b 7d fc mov 0xfffffffc(%ebp),%edi
23: 89 ec mov %ebp,%esp
25: 0f b6 c0 movzbl %al,%eax
28: 5d pop %ebp
29: c3 retHow is the memcmp version different from the original version? Is information technology
faster? Sad, I don't know how to read assembly well.The disassembly shows that memcmp functionality is inlined - there is no
telephone call to the memcmp role. It's replaced with the ia32 "repz" prefix
to the "cmps" instruction. This is a single instuction that is
equivalent to the entire for loop in the op's example.isequal() equivalent to isequal2() beneath.
bool isequal2( const char * a, const char * b, int len )
{while ( len -- )
{
if ( * (a++) != * (b++) )
{
render false;
}
}
return true;}
memcmp_tst.o: file format elf32-i386
Disassembly of section .text:
00000000 <_Z8isequal2PKcS0_i>:
0: 55 button %ebp
ane: 89 e5 mov %esp,%ebp
iii: 56 push %esi
four: 53 button %ebx
5: 8b 75 08 mov 0x8(%ebp),%esi
8: 8b 5d 0c mov 0xc(%ebp),%ebx
b: 8b 4d 10 mov 0x10(%ebp),%ecx
e: eb 0c jmp 1c <_Z8isequal2PKcS0_i+0x1c>
10: 0f b6 13 movzbl (%ebx),%edx
13: 43 inc %ebx
14: 0f b6 06 movzbl (%esi),%eax
17: 46 inc %esi
xviii: 38 d0 cmp %dl,%al
1a: 75 0f jne 2b <_Z8isequal2PKcS0_i+0x2b>
1c: 49 dec %ecx
1d: 83 f9 ff cmp $0xffffffff,%ecx
twenty: 75 ee jne 10 <_Z8isequal2PKcS0_i+0x10>
22: 5b pop %ebx
23: b8 01 00 00 00 mov $0x1,%eax
28: 5e pop %esi
29: 5d pop %ebp
2a: c3 ret
2b: 31 c0 xor %eax,%eax
2d: 5b pop %ebx
2e: 5e pop %esi
2f: 5d popular %ebp
30: c3 retEqually y'all can run across, it has 2 provisional jumps "jne = jump if not equal".
Conditional jumps are plush in modern CPU's that do co-operative prediction
when the prediction is wrong. If the compiler was really actually smart,
it *might* be able to figure out how to practise the optimization to use "repz
cmp", merely I take yet to encounter ane !memcmp is probably the best option for a residual of performance and
portability.
An implementation may specialize std::equal(begin, finish, begin2) to call
std::memcmp for POD types. I don't know how many actually do though. My
implementation (Borland six) does not.
It appears that gcc also does not. Using std::equal
00000040 <_Z8isequal3PKcS0_i>:
xl: 55 push %ebp
41: 89 e5 mov %esp,%ebp
43: 53 push %ebx
44: 8b 55 08 mov 0x8(%ebp),%edx
47: 8b 45 10 mov 0x10(%ebp),%eax
4a: 8b 4d 0c mov 0xc(%ebp),%ecx
4d: 89 d3 mov %edx,%ebx
4f: 01 c3 add together %eax,%ebx
51: eb 09 jmp 5c <_Z8isequal3PKcS0_i+0x1c>
53: 0f b6 01 movzbl (%ecx),%eax
56: 38 02 cmp %al,(%edx)
58: 75 11 jne 6b <_Z8isequal3PKcS0_i+0x2b>
5a: 42 inc %edx
5b: 41 inc %ecx
5c: 39 da cmp %ebx,%edx
5e: 75 f3 jne 53 <_Z8isequal3PKcS0_i+0x13>
lx: 5b pop %ebx
61: b8 01 00 00 00 mov $0x1,%eax
66: 0f b6 c0 movzbl %al,%eax
69: 5d pop %ebp
6a: c3 ret
6b: 5b popular %ebx
6c: 31 c0 xor %eax,%eax
6e: 0f b6 c0 movzbl %al,%eax
71: 5d pop %ebp
72: c3 ret
Note the 2 jne'southward - basically the same as the previous loop.
Jul 22 '05 #eight
MrCoder wrote:
Thank you for the answers guys!Im going to attempt it using memcmp soon and see what sort of speed
increase I get.
When y'all know the size, you can likewise unroll (or partially unroll) the loop.
This is a semi-generalized implementation:#include <functional>
#include <iterator>// identity
template<class T> struct identity {
typedef T type;
};// map_integral
template<form T, T 5> struct map_integral {
static const T value = V;
};template<class T, T V> const T map_integral<T, V>::value;
// enable_if
template<bool, class R> struct enable_if { };
template<class R> struct enable_if<true, R> : identity<R> { };// is_class
template<class T> grade is_class {
private:
template<form U> static char check(int U::*);
template<class U> static char (& bank check(...))[ii];
public:
static const bool value = sizeof(check<T>(0)) == 1;
};template<class T> const bool is_class<T>::value;
// is_same
template<class T, form U> struct is_same : map_integral<bool, faux> { };
template<grade T> struct is_same<T, T> : map_integral<bool, truthful> { };// is_pointer_to_function
template<class> struct is_pointer_to_function
: map_integral<bool, faux> { };template<> struct is_pointer_to_function<void*>
: map_integral<bool, false> { };template<class T> struct is_pointer_to_function<T*>
: is_same<void (T), void (T*)> { };// is_reference_to_function
template<class> struct is_reference_to_function
: map_integral<bool, false> { };template<form T> struct is_reference_to_function<T&>
: is_same<void (T), void (T*)> { };// compare
namespace detail {
template<course iterator, class predicate>
bool compare(
iterator a1, iterator a2,
iterator b1, iterator b2,
predicate pred, std::input_iterator_tag
) {
while (a1 != a2) {
if (b1 == b2 || !pred(*a1++, *b1++)) {
return simulated;
}
}
return b1 == b2;
}template<course iterator, class predicate>
bool compare(
iterator a1, iterator a2,
iterator b1, iterator b2,
predicate pred, std::random_access_iterator_tag
) {
typedef typename std::iterator_traits<iterator>::difference_type
difference_type;
difference_type size(a2 - a1);
if (size != b2 - b1) {
render false;
}
// unrolling cistron == 4
register difference_type n((size + 3) / 4);
#define body() \
if (!pred(*a1++, *b1++)) { \
render false; \
} \
/**/
switch (size % 4 - !size) {
case 0:
do {
body()
case 3: body()
case 2: body()
example i: body()
} while (--n);
}
#undef body
render true;
}} // item
template<course iterator, course predicate>
inline bool compare(
iterator a1, iterator a2,
iterator b1, iterator b2,
predicate pred
) {
render detail::compare(
a1, a2, b1, b2,
pred, typename std::iterator_traits<iterator>::iterator_category( )
);
}template<class iterator>
inline bool compare(iterator a1, iterator a2, iterator b1, iterator b2) {
return compare(
a1, a2, b1, b2,
std::equal_to<typename std::iterator_traits<iterator>::value_type>()
);
}template<form T, form predicate>
inline bool compare(const T* a, const T* b, unsigned size, predicate pred) {
return compare(a, a + size, b, b + size, pred);
}template<grade T, unsigned long s1, unsigned long s2, class predicate>
inline typename enable_if<
is_class<predicate>::value
|| is_pointer_to_function<predicate>::value
|| is_reference_to_function<predicate>::value,
bool::type compare(T (& a)[s1], T (& b)[s2], predicate pred) {
render compare(a, a + s1, b, b + s2, pred);
}
template<class T> inline bool compare(const T* a, const T* b, unsigned size) {
return compare(a, b, size, std::equal_to<const T>());
}
template<course T, unsigned long s1, unsigned long s2>
inline bool compare(T (& a)[s1], T (& b)[s2]) {
render compare(a, b, std::equal_to<T>());
}
Regards,
Paul Mensonides
Jul 22 '05 #9
"Gianni Mariani" <gi*******@mariani.ws> wrote in bulletin news:c9vr41
Siemel Naran wrote:
How is the memcmp version unlike from the original version? Is it
faster? Sorry, I don't know how to read assembly well.The disassembly shows that memcmp functionality is inlined - in that location is no
phone call to the memcmp part. It'south replaced with the ia32 "repz" prefix
to the "cmps" instruction. This is a single instuction that is
equivalent to the entire for loop in the op'southward case.isequal() equivalent to isequal2() below.
bool isequal2( const char * a, const char * b, int len )
{while ( len -- )
{
if ( * (a++) != * (b++) )
{
render fake;
}
}
return true;}
memcmp_tst.o: file format elf32-i386
Disassembly of section .text:
00000000 <_Z8isequal2PKcS0_i>:
0: 55 push button %ebp
1: 89 e5 mov %esp,%ebp
3: 56 push button %esi
iv: 53 push %ebx
5: 8b 75 08 mov 0x8(%ebp),%esi
8: 8b 5d 0c mov 0xc(%ebp),%ebx
b: 8b 4d 10 mov 0x10(%ebp),%ecx
e: eb 0c jmp 1c <_Z8isequal2PKcS0_i+0x1c>
x: 0f b6 thirteen movzbl (%ebx),%edx
13: 43 inc %ebx
14: 0f b6 06 movzbl (%esi),%eax
17: 46 inc %esi
eighteen: 38 d0 cmp %dl,%al
1a: 75 0f jne 2b <_Z8isequal2PKcS0_i+0x2b>
1c: 49 dec %ecx
1d: 83 f9 ff cmp $0xffffffff,%ecx
20: 75 ee jne ten <_Z8isequal2PKcS0_i+0x10>
22: 5b pop %ebx
23: b8 01 00 00 00 mov $0x1,%eax
28: 5e popular %esi
29: 5d popular %ebp
2a: c3 ret
2b: 31 c0 xor %eax,%eax
second: 5b pop %ebx
2e: 5e pop %esi
2f: 5d pop %ebp
30: c3 retAs you lot tin encounter, information technology has ii conditional jumps "jne = bound if non equal".
Conditional jumps are costly in modern CPU'due south that do co-operative prediction
when the prediction is incorrect. If the compiler was really really smart,
it *might* be able to effigy out how to do the optimization to use "repz
cmp", but I have yet to run into one !memcmp is probably the best choice for a residual of performance and
portability.
An implementation may specialize std::equal(begin, end, begin2) to phone call
std::memcmp for POD types. I don't know how many actually do though. My implementation (Borland 6) does not.It appears that gcc as well does not. Using std::equal
00000040 <_Z8isequal3PKcS0_i>:
40: 55 button %ebp
41: 89 e5 mov %esp,%ebp
43: 53 push %ebx
44: 8b 55 08 mov 0x8(%ebp),%edx
47: 8b 45 10 mov 0x10(%ebp),%eax
4a: 8b 4d 0c mov 0xc(%ebp),%ecx
4d: 89 d3 mov %edx,%ebx
4f: 01 c3 add %eax,%ebx
51: eb 09 jmp 5c <_Z8isequal3PKcS0_i+0x1c>
53: 0f b6 01 movzbl (%ecx),%eax
56: 38 02 cmp %al,(%edx)
58: 75 11 jne 6b <_Z8isequal3PKcS0_i+0x2b>
5a: 42 inc %edx
5b: 41 inc %ecx
5c: 39 da cmp %ebx,%edx
5e: 75 f3 jne 53 <_Z8isequal3PKcS0_i+0x13>
sixty: 5b pop %ebx
61: b8 01 00 00 00 mov $0x1,%eax
66: 0f b6 c0 movzbl %al,%eax
69: 5d popular %ebp
6a: c3 ret
6b: 5b popular %ebx
6c: 31 c0 xor %eax,%eax
6e: 0f b6 c0 movzbl %al,%eax
71: 5d pop %ebp
72: c3 ret
Notation the 2 jne'south - basically the aforementioned as the previous loop.
Thanks for your answers!
Jul 22 '05 #10
MrCoder wrote:
Hey guys, my kickoff postal service on here so I'll merely say "Hello everbody!"
Ok heres my question for y'all.
Is in that location a faster way to compare ane byte array to another?
This is my current code
// Cheque for a match
match = true;
for ( int i = 0; i < arraylen; i++)
{
if( data[i] != data2[i] )
{
match = simulated;
intermission;
}
}Thanks in advance guys!!
First off, search the spider web and the newgroups for
"premature optimization". But optimize when absolutely
necessary.
Loop Unrolling (also search the web on this one).
--------------
In most CPU architectures, branches (a.k.a. jumps) are
more than expensive (timewise) than data processing instructions.
Some processors also allow for conditional execution of
instructions. Anyhow, the fundamental outcome here is to
perform more comparisons than branches. Try this:
bool equal(true);
for (unsigned int i = 0;
equal && (i < arraylen);)
{
equal = information[i] == data2[i++];
// Repeat the following line many times.
equal = equal && (i < array_length)
&& (data[i] == data2[i++]);
}
The objective here is to utilise boolean logic and the
short-circuiting of the logical AND operator to reduce
the amount of branching. The more data operations per
branch is better.
Library Routines
----------------
As others have said, utilise library routines. Some
library routines may exist optimized for your platform's
processor while a simple for loop may non be.
Customize In Assembly.
----------------------
You tin can write a custom memory compare role that
will accept advantage of any processor special functions.
The processor I'1000 working with has the capability to
fetch much data from memory into registers with 1
instruction. So for large sized arrays, I would
use a "cake" method that would haul in a lot of
data into registers, and so compare each register pair.
--
Thomas Matthews
C++ newsgroup welcome bulletin:
http://world wide web.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.united kingdom/acllc-c++/faq.html
Other sites:
http://world wide web.josuttis.com -- C++ STL Library book
Jul 22 '05 #11
"Thomas Matthews" <Thursday****************************@sbcglobal.net> wrote in
message
Loop Unrolling (also search the web on this i).
--------------
In most CPU architectures, branches (a.k.a. jumps) are
more expensive (timewise) than data processing instructions.
Some processors besides allow for provisional execution of
instructions. Anyway, the fundamental issue here is to
perform more comparisons than branches. Try this:
bool equal(true);
for (unsigned int i = 0;
equal && (i < arraylen);)
{
equal = information[i] == data2[i++];
// Repeat the following line many times.
equal = equal && (i < array_length)
&& (information[i] == data2[i++]);
}
The objective here is to utilize boolean logic and the
curt-circuiting of the logical AND operator to reduce
the amount of branching. The more data operations per
branch is better.
Accept yous measured the speed increment?
Jul 22 '05 #12
Siemel Naran wrote:
"Thomas Matthews" <Th****************************@sbcglobal.cyberspace> wrote in
messageLoop Unrolling (also search the web on this one).
--------------
In nigh CPU architectures, branches (a.k.a. jumps) are
more than expensive (timewise) than information processing instructions.
Some processors also allow for conditional execution of
instructions. Anyhow, the fundamental issue here is to
perform more comparisons than branches. Try this:
bool equal(truthful);
for (unsigned int i = 0;
equal && (i < arraylen);)
{
equal = information[i] == data2[i++];
// Repeat the following line many times.
equal = equal && (i < array_length)
&& (data[i] == data2[i++]);
}
The objective here is to apply boolean logic and the
brusque-circuiting of the logical AND operator to reduce
the amount of branching. The more data operations per
branch is meliorate.
Have you measured the speed increase?
Only on my assembly language version for an embbeded
system using the ARM processor at 30Mhz.
I've found that it is faster for larger chunks of information,
sizes of 32 bytes or more than, than using a unproblematic byte
comparing.
--
Thomas Matthews
C++ newsgroup welcome message:
http://world wide web.slack.cyberspace/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.larn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library volume
Jul 22 '05 #13
And nobody'south mentioned Duff's device notwithstanding?
"MrCoder" <Mr*****@hotmail.com> wrote in bulletin
news:c9**********@sparta.btinternet.com...
Hey guys, my beginning post on here then I'll only say "Hello everbody!"Ok heres my question for you lot lot.
Is at that place a faster way to compare 1 byte array to another?
This is my current code
// Check for a match
match = true;
for ( int i = 0; i < arraylen; i++)
{
if( data[i] != data2[i] )
{
lucifer = faux;
break;
}
}
Cheers in advance guys!!
Jul 22 '05 #fourteen
This discussion thread is closed
Replies have been disabled for this give-and-take.
Source: https://bytes.com/topic/c/answers/132410-byte-array-compare-speed
0 Response to "Assembly Read a Byte and Compare It"
Post a Comment