Enumerators Again
In Enumerator.h I describe the same generic enumerator interface, and then I offer some convenience C macros which allow a function to be called which creates the enumerator, and enables a traversal of it:
template< typename AnyT > struct Enumerator { virtual bool next(const AnyT * &) = 0; virtual ~Enumerator() {} };
The stack macros within Enumerator.h are as follows (note the use of "placement new" and the explicit call to the virtual destructor):
#define stack_for_each_start(EFUN,Tp,X) \ { /*new scope*/ \ char _enum_buff[EFUN##__storage_size]; \ typedef Tp _enum_Tp; \ Enumerator<Tp> * _enum##X = EFUN(_enum_buff); \ Enumerator<Tp> * _enum = _enum##X; /*for cleanup*/ \ const Tp * X = 0; \ while((_enum##X) && (_enum##X)->next(X)) { #define stack_for_each_end \ } if(_enum) { (_enum)->~Enumerator<_enum_Tp>(); } }
So that is reasonably ugly, and most C macros are, but the beauty on the application side is coming -- be patient! Here are the heap allocated macros:
#define heap_for_each_start(EFUN,Tp,X) \ { /*new scope*/ \ Enumerator<Tp> * _enum##X = EFUN(); \ Enumerator<Tp> * _enum = _enum##X; /*for cleanup*/ \ const Tp * X = 0; \ while((_enum##X) &&(_enum##X)->next(X)) { #define heap_for_each_end \ } delete (_enum); }
Our Library
In order to write our application library header file app.h, we want to minimize the amount of junk we expose the user application code. Here we declare a single function app::get_all_ts() which will return the enumerator and use existing storage if provided as an argument (which is our way to allocating on the stack):
#include "enumerator.h" #include <cstdlib> namespace app { struct T { int id; char name[40]; }; // T defined here Enumerator<T> * get_all_ts(void * storage = 0); extern const size_t get_all_ts__storage_size; } // namespace app
The app::get_all_ts__storage_size symbol is provided so that we know how much storage the underlying enumerator implementation needs. The implementation of the library code behind app::get_all_ts hides almost all the detail of the collection being enumerated, and how the enumerator implementation works:
#include "app.h" #include <new> namespace app { T t1 = { 1, "sam" }; T t2 = { 2, "bill" }; T * tarray[] = { &t1, &t2, 0 }; struct AllEnumeratorImpl : public Enumerator<T> { AllEnumeratorImpl() : at(&(tarray[0])) {} virtual bool next(const T * & tp) { if((tp = *at)) { ++at; return true; } return false; } private: T ** at; }; Enumerator<T> * get_all_ts(void * storage) { return storage ? new(storage) AllEnumeratorImpl() : new AllEnumeratorImpl(); } const size_t get_all_ts__storage_size = sizeof(AllEnumeratorImpl); } // namespace app
My simple C array tarray of objects is just an example. A real library would have a more interesting collection with less static contents.
User Code
Finally we can use these mechanics to do a traversal in the ex.cpp user code:
#include "app.h" #include <cstdio> int main() { stack_for_each_start(app::get_all_ts, app::T, tptr) printf("%d %s\n", tptr->id, tptr->name); stack_for_each_end return 0; }
Building and running the ex.cpp code we get:
% g++ -g app.cpp ex.cpp -o ex % ./ex 1 sam 2 bill
Checking for Leaks
Using the heap_ version of the macros available in Enumerator.h, the same result is had. The difference in Valgrind output on the two implementations is a bit interesting (first for the stack macros):
% valgrind --tool=memcheck --leak-check=full ./ex ==29306== Memcheck, a memory error detector. ==29306== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. ==29306== Using LibVEX rev 1884, a library for dynamic binary translation. ==29306== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. ==29306== Using valgrind-3.4.1-Debian, a dynamic binary instrumentation framework. ==29306== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. ==29306== For more details, rerun with: -v ==29306== 1 sam 2 bill ==29306== ==29306== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 1) ==29306== malloc/free: in use at exit: 0 bytes in 0 blocks. ==29306== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. ==29306== For counts of detected errors, rerun with: -v ==29306== All heap blocks were freed -- no leaks are possible.
Comparing this with the heap implementation which does see one alloc happen:
% valgrind --tool=memcheck --leak-check=full ./ex ==29288== Memcheck, a memory error detector. ==29288== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al. ==29288== Using LibVEX rev 1884, a library for dynamic binary translation. ==29288== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP. ==29288== Using valgrind-3.4.1-Debian, a dynamic binary instrumentation framework. ==29288== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al. ==29288== For more details, rerun with: -v ==29288== 1 sam 2 bill ==29288== ==29288== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 17 from 1) ==29288== malloc/free: in use at exit: 0 bytes in 0 blocks. ==29288== malloc/free: 1 allocs, 1 frees, 8 bytes allocated. ==29288== For counts of detected errors, rerun with: -v ==29288== All heap blocks were freed -- no leaks are possible.
At this point we also know that neither method leaks memory, or segmentation faults. So they are pretty equivalent to the user program.
Performance
To compare performance I made a new sample ex_timing.cpp with no I/O but iterated 10 million times:
#include "app.h" #include <cstdio> int main() { for(long i = 0; i < 10000000; ++i) { heap_for_each_start(app::get_all_ts, app::T, tptr) heap_for_each_end } return 0; }
I ran each version (stack and heap) 3 times and picked the middle user timing to get this table (these used compiler optimization -O3):
macro prefix | user time for 10 million iterations |
stack | 292 ms |
heap | 960 ms |
The stack version is over three times faster!
Summary
For local things like enumerators the method of encapsulating the functionality had us returning a heap allocated object which the user code had to delete (via out macro) the object when it was done. This was quite functional, but it has a performance penalty relative to what can be done with the stack. The trouble with the stack is that the memory for the enumerator object has to be be "allocated" prior to the call into the library call which returns the enumerator -- and this forced us to expose the size of the implementation enumerator somehow. I think it is an acceptable cost (some minor symbol pollution) in order to get some benefit for performance. Of course, I wish for a way to do this which does not involve using C macros.
No comments:
Post a Comment