| Topic: |
DEVELOP > c-Plus-Plus |
| User: |
"" |
| Date: |
28 Jan 2008 04:57:04 PM |
| Object: |
Container functionality and marshalling |
I've been thinking about implementing some of the ideas
discussed in this thread on clc++m.
http://preview.tinyurl.com/2l8cnh
Mainly how to go about calculating the total message
length and using it in a header before sending the payload.
In some cases like vector<int>, it is easy to multiply the size
of the vector by the size of an int and determine how many bytes are
involved. If it is a vector<string> though, I have to add up the
lengths of all of the strings.
I've wondered whether it would be helpful to have containers
that tracked the total number of bytes they are managing
rather than going through this calculation each time.
For example, if a set<string> has thousands of elements
and only a handful of changes occur to the set between
uses of the set as a marshalling parameter, the work to
count up everything from scratch seems like a waste
compared to just making a few additions/subtractions to
a count.
Any thoughts on the utility and design of containers
like that?
Brian Wood
Ebenezer Enterprises
www.webebenezer.net
.
|
|
| User: "" |
|
| Title: Re: Container functionality and marshalling |
28 Jan 2008 08:44:56 PM |
|
|
On 1=D4=C229=C8=D5, =C9=CF=CE=E76=CA=B157=B7=D6, wrote:
I've been thinking about implementing some of the ideas
discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
Mainly how to go about calculating the total message
length and using it in a header before sending the payload.
In some cases like vector<int>, it is easy to multiply the size
of the vector by the size of an int and determine how many bytes are
involved. If it is a vector<string> though, I have to add up the
lengths of all of the strings.
I've wondered whether it would be helpful to have containers
that tracked the total number of bytes they are managing
rather than going through this calculation each time.
For example, if a set<string> has thousands of elements
and only a handful of changes occur to the set between
uses of the set as a marshalling parameter, the work to
count up everything from scratch seems like a waste
compared to just making a few additions/subtractions to
a count.
Any thoughts on the utility and design of containers
like that?
Brian Wood
Ebenezer Enterpriseswww.webebenezer.net
you can implement a derived class of set, and overload
all the methods changing total bytes, count up the total size
in those methods and call the same method of base class.
.
|
|
|
| User: "" |
|
| Title: Re: Container functionality and marshalling |
29 Jan 2008 03:47:03 AM |
|
|
wrote:
On 1?29?, ??6?57?, wrote:
I've been thinking about implementing some of the ideas
discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
Mainly how to go about calculating the total message
length and using it in a header before sending the payload.
In some cases like vector<int>, it is easy to multiply the size
of the vector by the size of an int and determine how many bytes are
involved. If it is a vector<string> though, I have to add up the
lengths of all of the strings.
I've wondered whether it would be helpful to have containers
that tracked the total number of bytes they are managing
rather than going through this calculation each time.
For example, if a set<string> has thousands of elements
and only a handful of changes occur to the set between
uses of the set as a marshalling parameter, the work to
count up everything from scratch seems like a waste
compared to just making a few additions/subtractions to
a count.
Any thoughts on the utility and design of containers
like that?
Brian Wood
Ebenezer Enterpriseswww.webebenezer.net
you can implement a derived class of set, and overload
all the methods changing total bytes, count up the total size
in those methods and call the same method of base class.
That will be a little tricky: You could change the erase and intert
functions to update the count. However, many function allow client code to
change elements by returning references (e.g., dereferencing non-const
iterators). Since these functions return references and not
smart-references (which we could only have if the dot-operator was
overloadable), there is no hook to sneak in update code.
Probably it would be easier to implement a container like class with a
minimal interface (just enough for the application) that uses a set
internally. Then one can enforce client code to go through functions that
update the byte count.
Best
Kai-Uwe Bux
.
|
|
|
| User: "" |
|
| Title: Re: Container functionality and marshalling |
30 Jan 2008 11:35:18 AM |
|
|
On Jan 29, 3:47=A0am, wrote:
zhangy...@yahoo.com.cn wrote:
On 1?29?, ??6?57?, wrote:
I've been thinking about implementing some of the ideas
discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
Mainly how to go about calculating the total message
length and using it in a header before sending the payload.
In some cases like vector<int>, it is easy to multiply the size
of the vector by the size of an int and determine how many bytes are
involved. =A0If it is a vector<string> though, I have to add up the
lengths of all of the strings.
I've wondered whether it would be helpful to have containers
that tracked the total number of bytes they are managing
rather than going through this calculation each time.
For example, if a set<string> has thousands of elements
and only a handful of changes occur to the set between
uses of the set as a marshalling parameter, the work to
count up everything from scratch seems like a waste
compared to just making a few additions/subtractions to
a count.
Any thoughts on the utility and design of containers
like that?
Brian Wood
Ebenezer Enterpriseswww.webebenezer.net
you can implement a derived class of set, and overload
all the methods changing total bytes, count up the total size
in those methods and call the same method of base class.
That will be a little tricky: You could change the erase and intert
functions to update the count. However, many function allow client code to=
change elements by returning references (e.g., dereferencing non-const
iterators). Since these functions return references and not
smart-references (which we could only have if the dot-operator was
overloadable), there is no hook to sneak in update code.
So if a mutex is associated with the container and inserts and erases
are synchronized, a sum of the lengths may be incorrect before it is
finished. Even the simple way can't guarantee much.
Brian Wood
.
|
|
|
|
|
|
| User: "James Kanze" |
|
| Title: Re: Container functionality and marshalling |
30 Jan 2008 02:57:27 AM |
|
|
On Jan 28, 11:57 pm, wrote:
I've been thinking about implementing some of the ideas
discussed in this thread on clc++m.http://preview.tinyurl.com/2l8cnh
Mainly how to go about calculating the total message
length and using it in a header before sending the payload.
The usual solution it to set it to 0, and fill it in after
you've finished marshalling. Much easier, and it avoids all
sorts of problems. (For example, a machine with IBM floats
might choose to serialize them double, rather than float, since
the range of an IBM float is greater than that of an IEEE
float.)
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34
.
|
|
|
| User: "" |
|
| Title: Re: Container functionality and marshalling |
30 Jan 2008 12:11:43 PM |
|
|
On Jan 30, 2:57=A0am, James Kanze <james.ka...@gmail.com> wrote:
The usual solution it to set it to 0, and fill it in after
you've finished marshalling. =A0Much easier, and it avoids all
sorts of problems. =A0
I think it avoids the possibility of saying the total length
is one thing and it being something else. However, that
approach has some drawbacks. You have to (probably) copy
everything as you go and if there is a maximum message
size that winds up being exceeded, you have done a bunch of
copying for nothing. You also cannot start sending anything
until everything has been marshalled. And it also means you
have to have buffers as big as the max msg size. I don't
think buffer sizes should be tied to that parameter.
Given the current containers, though, perhaps what you suggest
is necessary. I hope, though that container technology will
mature and permit a more efficient approach here.
(For example, a machine with IBM floats
might choose to serialize them double, rather than float, since
the range of an IBM float is greater than that of an IEEE
float.)
OK
Brian Wood
.
|
|
|
| User: "" |
|
| Title: Re: Container functionality and marshalling |
30 Jan 2008 12:55:08 PM |
|
|
On Jan 30, 12:11=A0pm, wrote:
On Jan 30, 2:57=A0am, James Kanze <james.ka...@gmail.com> wrote:
The usual solution it to set it to 0, and fill it in after
you've finished marshalling. =A0Much easier, and it avoids all
sorts of problems. =A0
I think it avoids the possibility of saying the total length
is one thing and it being something else. =A0However, that
approach has some drawbacks. =A0You have to (probably) copy
everything as you go and if there is a maximum message
size that winds up being exceeded, you have done a bunch of
copying for nothing. =A0You also cannot start sending anything
until everything has been marshalled. =A0And it also means you
have to have buffers as big as the max msg size. =A0I don't
think buffer sizes should be tied to that parameter.
Over here,
http://www.gamedev.net/community/forums/topic.asp?topic_id=3D480778
"Antheus" says, "Also, the longer the message, the smaller the
overhead. Serializing large packets will yield higher throughput."
I think there is some truth to that. That puts a little pressure
on you to have larger messages and possibly exceed the max msg
size. And since the messages are relatively large, you're holding
more back from heading on it's way.
Brian Wood
.
|
|
|
|
| User: "James Kanze" |
|
| Title: Re: Container functionality and marshalling |
31 Jan 2008 01:50:50 PM |
|
|
On Jan 30, 7:11 pm, wrote:
On Jan 30, 2:57 am, James Kanze <james.ka...@gmail.com> wrote:
The usual solution it to set it to 0, and fill it in after
you've finished marshalling. Much easier, and it avoids all
sorts of problems.
I think it avoids the possibility of saying the total length
is one thing and it being something else. However, that
approach has some drawbacks. You have to (probably) copy
everything as you go and if there is a maximum message
size that winds up being exceeded, you have done a bunch of
copying for nothing. You also cannot start sending anything
until everything has been marshalled. And it also means you
have to have buffers as big as the max msg size. I don't
think buffer sizes should be tied to that parameter.
I said it was the usual solution, not the perfect solution. On
modern hardware, I suspect that it is also the most appropriate
solution in most cases.
Given the current containers, though, perhaps what you suggest
is necessary. I hope, though that container technology will
mature and permit a more efficient approach here.
If the containers are all in memory, it's generally a pretty
efficient approach as well. It does become problematic when
you're returning disk based data, however; if you're using the
keep alive option in an HTTP server, for example, and are
serving up a dynamically generated page which can be several
Gigabytes (hopefully not to someone using a dial-up
connection:-).
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34
.
|
|
|
| User: "" |
|
| Title: Re: Container functionality and marshalling |
01 Feb 2008 03:02:05 PM |
|
|
On Jan 31, 1:50=A0pm, James Kanze <james.ka...@gmail.com> wrote:
On Jan 30, 7:11 pm, wrote:
On Jan 30, 2:57 am, James Kanze <james.ka...@gmail.com> wrote:
The usual solution it to set it to 0, and fill it in after
you've finished marshalling. =A0Much easier, and it avoids all
sorts of problems.
I think it avoids the possibility of saying the total length
is one thing and it being something else. =A0However, that
approach has some drawbacks. =A0You have to (probably) copy
everything as you go and if there is a maximum message
size that winds up being exceeded, you have done a bunch of
copying for nothing. =A0You also cannot start sending anything
until everything has been marshalled. =A0And it also means you
have to have buffers as big as the max msg size. =A0I don't
think buffer sizes should be tied to that parameter.
I said it was the usual solution, not the perfect solution. =A0On
modern hardware, I suspect that it is also the most appropriate
solution in most cases.
I'm having second thoughts. I forgot that I make the assumption
that the arguments passed to a marshalling function are fixed
while the function is executing.* When marshalling a list<int>,
first the size() is marshalled and then the elements. If another
thread inserts elements into the list between those two steps it
will lead to undefined behaviour. I think Boost Serialization
makes this assumption also. Probably I forgot about that because
it isn't documented anywhere.
I don't think I'm going to try to factor IBM floats into the
equation at this point. To start off with I'd be happy to
support a total message length for ints, containers/string and
IEEE floats. Given the above assumption I'm not aware of any
reason why I shouldn't calculate the total length upfront.
* It may be possible to refine that to "fixed until the
argument has been marshalled."
Brian Wood
Ebenezer Enterprises
.
|
|
|
| User: "James Kanze" |
|
| Title: Re: Container functionality and marshalling |
01 Feb 2008 05:30:37 PM |
|
|
On Feb 1, 10:02 pm, wrote:
[...]
I'm having second thoughts. I forgot that I make the assumption
that the arguments passed to a marshalling function are fixed
while the function is executing.* When marshalling a list<int>,
first the size() is marshalled and then the elements. If another
thread inserts elements into the list between those two steps it
will lead to undefined behaviour.
If any thread can modifying the list, then all threads accessing
it must use some sort of lock. Synchronous access to a
container (or any other object) is only allowed if no thread is
modifying it.
I think Boost Serialization makes this assumption also.
Probably I forgot about that because it isn't documented
anywhere.
It's a basic rule of thread safety. Individual objects only
have to document it when they don't follow the rule.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34
.
|
|
|
| User: "" |
|
| Title: Re: Container functionality and marshalling |
03 Feb 2008 02:07:06 PM |
|
|
On Feb 1, 5:30=A0pm, James Kanze <james.ka...@gmail.com> wrote:
If any thread can modifying the list, then all threads accessing
it must use some sort of lock. =A0Synchronous access to a
container (or any other object) is only allowed if no thread is
modifying it.
I agree. The marshalling code doesn't know if a lock is needed
or not, but if it is, it's the caller's responsibility to get a
lock on the container before calling a marshalling function.
Below is the output I get (this isn't available online yet)
when the input is:
Msgs
(list<int>, deque<string>) @MSGID_1
}
I snipped the Receive function as it isn't ready yet.
I'm considering moving the code that calculates the
total message size into a separate function called
CalculateMarshallingSize; it might be useful by itself.
These files are included below
http://home.seventy7.com/misc/Buffer.hh
http://home.seventy7.com/misc/Counter.hh
http://home.seventy7.com/misc/ErrorWordsShepherd.hh
Also, I assume that these are defined elsewhere.
const unsigned int MSGID_1 =3D 4000;
const unsigned int MAX_MSGLENGTH =3D 100000;
The first argument to SetErrorWords sometimes reflects the
marshalling argument, but othertimes it is useless.
// computer-generated output
#include <deque>
#include <list>
#include <string>
#include <Counter.hh>
#include <Buffer.hh>
struct Msgs
{
inline
Msgs() {}
inline
~Msgs() {}
inline
int
Send(Buffer* buf, const list<int>& about1, const deque<string>&
about2)
{
unsigned int headCount =3D 0;
unsigned int slen =3D 0;
if (!buf->Receive(&MSGID_1, sizeof(MSGID_1))) {
buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
return 0;
}
// Determine total length of the message.
Counter cntr(MAX_MSGLENGTH);
if (!cntr.Add(sizeof(int))) {
buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
return 0;
}
if (!cntr.MultiplyAndAdd(about1.size(), sizeof(int))) {
buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
return 0;
}
if (!cntr.Add(sizeof(int))) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
if (!cntr.MultiplyAndAdd(about2.size(), sizeof(int))) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
deque<string >::const_iterator mediator1 =3D about2.begin();
deque<string >::const_iterator omega1 =3D about2.end();
for (; mediator1 !=3D omega1; ++mediator1) {
if (!cntr.Add((*mediator1).length())) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
}
if (!buf->Receive(&cntr.value_, sizeof(cntr.value_))) {
buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
return 0;
}
headCount =3D about1.size();
if (!buf->Receive(&headCount, sizeof(int))) {
buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
return 0;
}
list<int >::const_iterator mediator2 =3D about1.begin();
list<int >::const_iterator omega2 =3D about1.end();
for (; mediator2 !=3D omega2; ++mediator2) {
if (!buf->Receive(&(*mediator2), sizeof(int))) {
buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
return 0;
}
}
headCount =3D about2.size();
if (!buf->Receive(&headCount, sizeof(int))) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
deque<string >::const_iterator mediator3 =3D about2.begin();
deque<string >::const_iterator omega3 =3D about2.end();
for (; mediator3 !=3D omega3; ++mediator3) {
slen =3D (*mediator3).length();
if (!buf->Receive(&slen, sizeof(slen))) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
if (!buf->Receive((*mediator3).c_str(), slen)) {
buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
return 0;
}
}
if (!buf->SendStoredData()) {
buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
return 0;
}
return 1;
}
// end of computer-generated output
I've compiled it but that's about it.
Brian Wood
Ebenezer Enterprises
.
|
|
|
| User: "" |
|
| Title: Re: Container functionality and marshalling |
03 Feb 2008 03:28:58 PM |
|
|
On Feb 3, 2:07=A0pm, wrote:
I agree. The marshalling code doesn't know if a lock is needed
or not, but if it is, it's the caller's responsibility to get a
lock on the container before calling a marshalling function.
Below is the output I get (this isn't available online yet)
when the input is:
Msgs
=A0 (list<int>, deque<string>) @MSGID_1
}
I snipped the Receive function as it isn't ready yet.
I'm considering moving the code that calculates the
total message size into a separate function called
CalculateMarshallingSize; it might be useful by itself.
These files are included belowhttp://home.seventy7.com/misc/Buffer.hhhttp:=
//home.seventy7.com/misc/Counter.hhhttp://home.seventy7.com/misc/ErrorWordsS=
hepherd.hh
Also, I assume that these are defined elsewhere.
const unsigned int MSGID_1 =3D 4000;
const unsigned int MAX_MSGLENGTH =3D 100000;
The first argument to SetErrorWords sometimes reflects the
marshalling argument, but othertimes it is useless.
// computer-generated output
#include <deque>
#include <list>
#include <string>
#include <Counter.hh>
#include <Buffer.hh>
struct Msgs
{
inline
Msgs() {}
inline
~Msgs() {}
inline
int
Send(Buffer* buf, const list<int>& about1, const deque<string>&
about2)
{
=A0 unsigned int headCount =3D 0;
=A0 unsigned int slen =3D 0;
=A0 if (!buf->Receive(&MSGID_1, sizeof(MSGID_1))) {
=A0 =A0 buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 // Determine total length of the message.
=A0 Counter cntr(MAX_MSGLENGTH);
=A0 if (!cntr.Add(sizeof(int))) {
=A0 =A0 buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 if (!cntr.MultiplyAndAdd(about1.size(), sizeof(int))) {
=A0 =A0 buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 if (!cntr.Add(sizeof(int))) {
=A0 =A0 buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 if (!cntr.MultiplyAndAdd(about2.size(), sizeof(int))) {
=A0 =A0 buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 deque<string >::const_iterator mediator1 =3D about2.begin();
=A0 deque<string >::const_iterator omega1 =3D about2.end();
=A0 for (; mediator1 !=3D omega1; ++mediator1) {
=A0 =A0 if (!cntr.Add((*mediator1).length())) {
=A0 =A0 =A0 buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
=A0 =A0 =A0 return 0;
=A0 =A0 }
=A0 }
=A0 if (!buf->Receive(&cntr.value_, sizeof(cntr.value_))) {
=A0 =A0 buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 headCount =3D about1.size();
=A0 if (!buf->Receive(&headCount, sizeof(int))) {
=A0 =A0 buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 list<int >::const_iterator mediator2 =3D about1.begin();
=A0 list<int >::const_iterator omega2 =3D about1.end();
=A0 for (; mediator2 !=3D omega2; ++mediator2) {
=A0 =A0 if (!buf->Receive(&(*mediator2), sizeof(int))) {
=A0 =A0 =A0 buf->ews_.SetErrorWords(1, __FILE__, __LINE__);
=A0 =A0 =A0 return 0;
=A0 =A0 }
=A0 }
=A0 headCount =3D about2.size();
=A0 if (!buf->Receive(&headCount, sizeof(int))) {
=A0 =A0 buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 deque<string >::const_iterator mediator3 =3D about2.begin();
=A0 deque<string >::const_iterator omega3 =3D about2.end();
=A0 for (; mediator3 !=3D omega3; ++mediator3) {
=A0 =A0 slen =3D (*mediator3).length();
=A0 =A0 if (!buf->Receive(&slen, sizeof(slen))) {
=A0 =A0 =A0 buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
=A0 =A0 =A0 return 0;
=A0 =A0 }
=A0 =A0 if (!buf->Receive((*mediator3).c_str(), slen)) {
=A0 =A0 =A0 buf->ews_.SetErrorWords(2, __FILE__, __LINE__);
=A0 =A0 =A0 return 0;
=A0 =A0 }
=A0 }
=A0 if (!buf->SendStoredData()) {
=A0 =A0 buf->ews_.SetErrorWords(3, __FILE__, __LINE__);
=A0 =A0 return 0;
=A0 }
=A0 return 1;}
// end of computer-generated output
I've compiled it but that's about it.
Whoops. It looks I accidently caught the }; that correspond to
struct Msgs
{
when I snipped the Receive function.
Brian Wood
.
|
|
|
|
|
|
|
|
|
|

|
Related Articles |
|
|