| Topic: |
DEVELOP > c-Plus-Plus |
| User: |
"" |
| Date: |
04 Jan 2008 11:59:54 AM |
| Object: |
Memory issues with Map |
I have written a program that loads a file having 3 columns into a
map. Here is the declaration:
struct strE
{
char iEID[22+1];
char acEN[6+1];
int iDed;
};
map<string, struct strE> mEro;
struct strE eStr = {0,};
if ( 0 < vDataColumn.size() )
eStr.iDzReceived =
atol(vDataColumn.at(0).c_str());
//vDataColumn has columns from file which is "|" delimiter separated.
We split the line into columns and store it in vDataColumn
mEro.insert(std::pair<std::string,
struct strERO>(key, eStr)
);
I am seeing that for 6MB file the resident memory is around 12 MB. It
looks like either there is a memory leak or limitation in Map. I am
not sure what kind of memory mangement is used by Map. Could somebody
shed some light as what would be the best approach and dos and donts
when dealing with Maps
.
|
|
| User: "Phil Endecott" |
|
| Title: Re: Memory issues with Map |
04 Jan 2008 12:25:36 PM |
|
|
wrote:
I have written a program that loads a file having 3 columns into a
map.
I am seeing that for 6MB file the resident memory is around 12 MB.
Each string and each element in the map has some overhead, i.e. pointers
to the child nodes if the map uses a tree implementation internally (as
it probably does), and pointers to the start and the end of the data for
the string. How many elements are there in your 6MB file? I would
expect of the order of 20 bytes per element for your code.
Now, how to reduce the overhead.... that is the interesting question.
Phil.
.
|
|
|
|
| User: "LR" |
|
| Title: Re: Memory issues with Map |
04 Jan 2008 03:00:13 PM |
|
|
wrote:
I have written a program that loads a file having 3 columns into a
map. Here is the declaration:
struct strE
{
char iEID[22+1];
char acEN[6+1];
int iDed;
};
map<string, struct strE> mEro;
struct strE eStr = {0,};
if ( 0 < vDataColumn.size() )
eStr.iDzReceived =
atol(vDataColumn.at(0).c_str());
//vDataColumn has columns from file which is "|" delimiter separated.
We split the line into columns and store it in vDataColumn
mEro.insert(std::pair<std::string,
struct strERO>(key, eStr)
);
I am seeing that for 6MB file the resident memory is around 12 MB. It
looks like either there is a memory leak or limitation in Map. I am
not sure what kind of memory mangement is used by Map. Could somebody
shed some light as what would be the best approach and dos and donts
when dealing with Maps
Aside from the inconstancies that Victor pointed out in another post,
there are a few things that might be useful to know.
How many entries in the 6MB file.
You say that your file has three columns. What are they? I ask because
your struct strE (or strERO?) has three member variables and the key of
the map would make four values per entry. So if your data has three
values per entry and you're storing per entry (key and three others)
would that account for the change in size?
What types are the three values in the data file? And for each of the
'string' type values what is the largest, smallest, and average size?
Is it possible that you'd be better off rewriting your struct with
std::string instead of char arrays? Or perhaps writing some very
lightweight string class if memory usage is a big concern?
Also, what is sizeof(int) on your system and how many bits is a char?
I think given the information you've provided so far, it's difficult to
give a useful answer.
LR
.
|
|
|
| User: "Victor Bazarov" |
|
| Title: Re: Memory issues with Map |
04 Jan 2008 03:17:18 PM |
|
|
LR wrote:
mohitanchlia@gmail.com wrote:
I have written a program that loads a file having 3 columns into a
map. Here is the declaration:
struct strE
{
char iEID[22+1];
char acEN[6+1];
int iDed;
};
map<string, struct strE> mEro;
[..]
Is it possible that you'd be better off rewriting your struct with
std::string instead of char arrays? Or perhaps writing some very
lightweight string class if memory usage is a big concern?
Considering that the arrays are only 23 bytes and 7 bytes long, it
seems that using 'std::string' (which usually contains at least
a pointer and a size_t, making it 8 bytes, plus it allocates its
buffer somewhere on the heap requiring heap overhead) might not be
beneficial (aside from the convenience point of view). Lightweight
string classes also take memory...
Also, what is sizeof(int) on your system and how many bits is a char?
Wouldn't they be the same in the file?
I think given the information you've provided so far, it's difficult
to give a useful answer.
std::map carries some overhead. With elements that are small, the
overhead is going to be felt stronger than with elements that are
large. Paying for search speed and convenience with as much of the
memory as required to store just the data is not that awful, if you
ask me.
V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
.
|
|
|
| User: "LR" |
|
| Title: Re: Memory issues with Map |
04 Jan 2008 04:52:11 PM |
|
|
Victor Bazarov wrote:
LR wrote:
mohitanchlia@gmail.com wrote:
I have written a program that loads a file having 3 columns into a
map. Here is the declaration:
struct strE
{
char iEID[22+1];
char acEN[6+1];
int iDed;
};
map<string, struct strE> mEro;
[..]
Is it possible that you'd be better off rewriting your struct with
std::string instead of char arrays? Or perhaps writing some very
lightweight string class if memory usage is a big concern?
Considering that the arrays are only 23 bytes and 7 bytes long, it
seems that using 'std::string' (which usually contains at least
a pointer and a size_t, making it 8 bytes, plus it allocates its
buffer somewhere on the heap requiring heap overhead) might not be
beneficial (aside from the convenience point of view). Lightweight
string classes also take memory...
I guess it depends on how big the fields are in the file.
If they're all fixed length then we can save 1 byte for each char array
at the expense of having to write some class that implements a fixed
length string. Assuming that's important to us.
For the 23 char array, if the lengths vary all over the place, but tend
toward most of them being say, less than ten bytes with the largest
being 23 chars, then maybe a lightweight string class would be useful.
Maybe just a pointer to zero terminated strings.
Also, what is sizeof(int) on your system and how many bits is a char?
Wouldn't they be the same in the file?
The number of bits in a char? Yes, almost certainly. Don't know what I
was thinking there. But the OP seems to be storing the int as text in
the file. I suppose I should also have asked about these values as
well. Do these tend to have more than sizeof(int) digits? Or fewer?
I think given the information you've provided so far, it's difficult
to give a useful answer.
std::map carries some overhead. With elements that are small, the
overhead is going to be felt stronger than with elements that are
large.
Yes. But I think that we still don't really know what the keys are like
and what kind of data is being stored in that struct.
Paying for search speed and convenience with as much of the
memory as required to store just the data is not that awful, if you
ask me.
I tend to agree, it's generally not a big concern for me either, but I
don't know what limitations the OP has for memory usage or if it's
possible that he does in fact have a leak somewhere.
I think finding out how many entries he's storing in the map and the
size of the key strings would be useful.
LR
.
|
|
|
| User: "James Kanze" |
|
| Title: Re: Memory issues with Map |
05 Jan 2008 03:02:59 AM |
|
|
On Jan 4, 11:52 pm, LR <lr...@superlink.net> wrote:
Victor Bazarov wrote:
[...]
Also, what is sizeof(int) on your system and how many bits is a char?
Wouldn't they be the same in the file?
The number of bits in a char? Yes, almost certainly.
What makes you think that?
In practice, all general purpose machines today use 8 bit bytes,
and of course, 8 bits will always be 8 bits. Some more
specialized machines don't, however---there is a mainframe with
9 bit bytes, and a lot of DSP's use 16 or 32 bit bytes. Because
8 bit bytes are so widespread, however, such machines
doubtlessly have facilities for reading files with 8 bit bytes
(that were written on other machines), at least in so far as
they have facilities for reading files.
Depending on context, you may or may not have to take such
machines into consideration.
Don't know what I was thinking there. But the OP seems to be
storing the int as text in the file. I suppose I should also
have asked about these values as well. Do these tend to have
more than sizeof(int) digits? Or fewer?
When storing as text, don't forget the needed separators when
considering size.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34
.
|
|
|
| User: "LR" |
|
| Title: Re: Memory issues with Map |
05 Jan 2008 08:37:55 AM |
|
|
James Kanze wrote:
On Jan 4, 11:52 pm, LR <lr...@superlink.net> wrote:
Victor Bazarov wrote:
[...]
Also, what is sizeof(int) on your system and how many bits is a char?
Wouldn't they be the same in the file?
The number of bits in a char? Yes, almost certainly.
What makes you think that?
Think that it's almost certain? Because as you say...
In practice, all general purpose machines today use 8 bit bytes,
and of course, 8 bits will always be 8 bits.
True for all general purpose machines today, yes. But I've worked on
system with some strange arrangements. 8 bit chars in 12 bits. 8 bit
characters as 6 bit chars with escape sequences. 8 bit chars packed in
60 bit words. 8 bit chars in 9 bits. 12 bit chars stored as 8 bit chars
with escape sequences.
Some more
specialized machines don't, however---there is a mainframe with
9 bit bytes, and a lot of DSP's use 16 or 32 bit bytes. Because
8 bit bytes are so widespread, however, such machines
doubtlessly have facilities for reading files with 8 bit bytes
(that were written on other machines), at least in so far as
they have facilities for reading files.
In my limited experience with mainframes, these systems may do 8 bit IO
for terminals, but the IO of 8 bit char files on these systems have
often been limited to rather complicated conversion programs.
Depending on context, you may or may not have to take such
machines into consideration.
Ok, I stand corrected again. I was right the first time. ;)
Don't know what I was thinking there. But the OP seems to be
storing the int as text in the file. I suppose I should also
have asked about these values as well. Do these tend to have
more than sizeof(int) digits? Or fewer?
When storing as text, don't forget the needed separators when
considering size.
We haven't heard from the OP, so I don't know for sure, but the OP said
that there were three columns in the file and these were delimited by '|'.
The char arrays in the struct the OP presented were almost certainly
zero terminated. I conclude this, because of the way the OP defined the
arrays [22+1] and [6+1], which seems to be a popular idiom for
indicating the size of the data and adds one for the termination character.
If we assume (Danger!) that two of the fields in the file are char, and
the third is a character representation of an int, then the two
delimiters are accounted for already since we (likely) have two zero
terminated char arrays.
Although, I suppose it's possible that the file contains something like:
|5|hello|world|
instead of
5|hello|world
LR
.
|
|
|
|
|
| User: "" |
|
| Title: Re: Memory issues with Map |
07 Jan 2008 07:24:37 PM |
|
|
On Jan 4, 2:52=A0pm, LR <lr...@superlink.net> wrote:
Victor Bazarov wrote:
LR wrote:
mohitanch...@gmail.com wrote:
I have written a program that loads a file having 3 columns into a
map. Here is the declaration:
struct strE
{
=A0 =A0char iEID[22+1];
=A0 =A0char acEN[6+1];
=A0 =A0int =A0iDed;
};
map<string, struct strE> =A0mEro;
[..]
Is it possible that you'd be better off rewriting your struct with
std::string instead of char arrays? =A0Or perhaps writing some very
lightweight string class ifmemoryusage is a big concern?
Considering that the arrays are only 23 bytes and 7 bytes long, it
seems that using 'std::string' (which usually contains at least
a pointer and a size_t, making it 8 bytes, plus it allocates its
buffer somewhere on the heap requiring heap overhead) might not be
beneficial (aside from the convenience point of view). =A0Lightweight
string classes also takememory...
I guess it depends on how big the fields are in the file.
If they're all fixed length then we can save 1 byte for each char array
at the expense of having to write some class that implements a fixed
length string. Assuming that's important to us.
For the 23 char array, if the lengths vary all over the place, but tend
toward most of them being say, less than ten bytes with the largest
being 23 chars, then maybe a lightweight string class would be useful.
Maybe just a pointer to zero terminated strings.
Also, what is sizeof(int) on your system and how many bits is a char?
Wouldn't they be the same in the file?
The number of bits in a char? =A0Yes, almost certainly. Don't know what I
was thinking there. =A0But the OP seems to be storing the int as text in
the file. =A0I suppose I should also have asked about these values as
well. =A0Do these tend to have more than sizeof(int) digits? Or fewer?
I think given the information you've provided so far, it's difficult
to give a useful answer.
std::mapcarries some overhead. =A0With elements that are small, the
overhead is going to be felt stronger than with elements that are
large. =A0
Yes. =A0But I think that we still don't really know what the keys are like=
and what kind of data is being stored in that struct.
=A0> Paying for search speed and convenience with as much of the
=A0>memoryas required to store just the data is not that awful, if you
=A0> ask me.
I tend to agree, it's generally not a big concern for me either, but I
don't know what limitations the OP has formemoryusage or if it's
possible that he does in fact have a leak somewhere.
I think finding out how many entries he's storing in themapand the
size of the key strings would be useful.
LR- Hide quoted text -
- Show quoted text -
The file has 3 columns. First 2 columns are the key. That file
probably has 200000 lines that is being stored in map.
.
|
|
|
| User: "LR" |
|
| Title: Re: Memory issues with Map |
07 Jan 2008 08:54:31 PM |
|
|
wrote:
The file has 3 columns. First 2 columns are the key. That file
probably has 200000 lines that is being stored in map.
Didn't you say that the map was something like,
std::map<std::string, strE>
Where strE was some sort of struct?
If the first two columns of your data file are the key, why aren't you
using something like, at least, a std::pair for the key, like,
std::map<std::pair<std::string,std::string>, strE>
or whatever the two types should be for the pair?
I think it's very difficult to figure out exactly what problem you're
having if you can't tell us a little bit more explicitly what data you
have and what you're doing with it.
But ok. You have 200000 entries. As previously pointed out in this
thread, the overhead for each entry might be roughly estimated as
follows, assuming that each of an int, size_t or a pointer is 4 bytes.
Three pointers per map entry plus a flag, call it 16 bytes.
One pointer and one size_t for the string you used as a key, 8 bytes.
34 bytes for struct strE.
Now I have to add a ridiculous estimate for the size of the keys. I'll
guess that it's 20 characters per key.
So that's around 78 bytes per entry, multiply by 200000 and you get a
number that's around 15M.
So I'd say that 12M isn't bad at all, assuming my assumptions are
correct. Maybe I'm off on the size of the keys. Or over a bit on the
number of bytes for each map entry. I also neglected to add in the,
probably 4 bytes, for each memory allocation. Probably two per entry.
I very much doubt that you have a memory leak. Or at least the amount
of memory you say this code uses at run time probably doesn't indicate that.
I think that as I already pointed if you really want to you might save
some space using a lightweight string class. There may be other things
you can do too.
If you're willing to give more specific information you will probably
get more specific advice.
The struct you showed us had three fields. Are they generated from some
other data? From the keys?
LR
.
|
|
|
| User: "" |
|
| Title: Re: Memory issues with Map |
08 Jan 2008 10:05:15 AM |
|
|
On Jan 7, 6:54=A0pm, LR <lr...@superlink.net> wrote:
mohitanch...@gmail.com wrote:
The file has 3 columns. First 2 columns are the key. That file
probably has 200000 lines that is being stored inmap.
Didn't you say that themapwas something like,
std::map<std::string, strE>
Where strE was some sort of struct?
If the first two columns of your data file are the key, why aren't you
using something like, at least, a std::pair for the key, like,
std::map<std::pair<std::string,std::string>, strE>
or whatever the two types should be for the pair?
I think it's very difficult to figure out exactly what problem you're
having if you can't tell us a little bit more explicitly what data you
have and what you're doing with it.
But ok. =A0You have 200000 entries. =A0As previously pointed out in this
thread, the overhead for each entry might be roughly estimated as
follows, assuming that each of an int, size_t or a pointer is 4 bytes.
Three pointers permapentry plus a flag, call it 16 bytes.
One pointer and one size_t for the string you used as a key, 8 bytes.
34 bytes for struct strE.
Now I have to add a ridiculous estimate for the size of the keys. =A0I'll
guess that it's 20 characters per key.
So that's around 78 bytes per entry, multiply by 200000 and you get a
number that's around 15M.
So I'd say that 12M isn't bad at all, assuming my assumptions are
correct. =A0Maybe I'm off on the size of the keys. =A0Or over a bit on the=
number of bytes for eachmapentry. =A0I also neglected to add in the,
probably 4 bytes, for eachmemoryallocation. Probably two per entry.
I very much doubt that you have amemoryleak. =A0Or at least the amount
ofmemoryyou say this code uses at run time probably doesn't indicate that.=
I think that as I already pointed if you really want to you might save
some space using a lightweight string class. There may be other things
you can do too.
If you're willing to give more specific information you will probably
get more specific advice.
The struct you showed us had three fields. =A0Are they generated from some=
other data? =A0From the keys?
LR
I think it's my mistake that I didn't give you complete information:
Here is how data looks like
1234567891234567891.1000|2
So the syntax of the file is "firstkey.secondkey|remaining fields"
strE is struct for the format that I just mentioned.
.
|
|
|
| User: "LR" |
|
| Title: Re: Memory issues with Map |
08 Jan 2008 11:33:56 AM |
|
|
wrote:
Here is how data looks like
1234567891234567891.1000|2
Could the firstKey ever be as long as 22 chars? Could the second key
ever be as long as 6?
So the syntax of the file is "firstkey.secondkey|remaining fields"
Remaining fields? You've only shown one. What could the others be? Is
there only ever one?
strE is struct for the format that I just mentioned.
I'm still struggling to understand. Do you mean that you essentially do
this:
Note it's just a snippet and it's not tested and it won't work well.
struct strE {
char iEID[22+1];
char acEN[6+1];
int iDed;
strE(const char *p1, const char *p2, const int i) {
strcpy(iEID,p1);
strcpy(acEN,p2);
iDed = i;
}
};
std::map<std::string, strE> m;
and then for each data record in your file
const std::string firstKey = extractFirstKeyFromRecord();
const std::string secondKey = extractSecondKeyFromRecord();
const int i = extractIntFromRecord();
m[firstKey+secondKey] = strE(firstKey.c_str(), secondKey.c_str(), i);
Is that what you're doing?
If that's it, then you can save plenty of memory at the price of a
little time and space overhead, by removing the two char fields from
your struct and creating a wrapper for the key from the two extracted
strings from the data. I estimate the data space overhead at something
like 24 more bytes per entry but you'd save 30 from the struct for a
savings of 6 bytes per entry. Around 1.2M. Use a lightweight string
class and it gets a little bit better. Or you can make a struct for the
key that uses char arrays to store the data and do even better.
But I'm still not sure what you're doing with the data that you read in,
or what data ends up in strE.
I'm missing something.
LR
.
|
|
|
| User: "" |
|
| Title: Re: Memory issues with Map |
08 Jan 2008 11:48:18 AM |
|
|
On Jan 8, 9:33=A0am, LR <lr...@superlink.net> wrote:
mohitanch...@gmail.com wrote:
Here is how data looks like
1234567891234567891.1000|2
Could the firstKey ever be as long as 22 chars? =A0Could the second key
ever be as long as 6?
So the syntax of the file is "firstkey.secondkey|remaining fields"
Remaining fields? =A0You've only shown one. =A0What could the others be? =
=A0Is
there only ever one?
strE is struct for the format that I just mentioned.
I'm still struggling to understand. =A0Do you mean that you essentially do=
this:
Note it's just a snippet and it's not tested and it won't work well.
struct strE {
=A0 =A0char iEID[22+1];
=A0 =A0char acEN[6+1];
=A0 =A0int =A0iDed;
=A0 =A0strE(const char *p1, const char *p2, const int i) {
=A0 =A0 =A0 strcpy(iEID,p1);
=A0 =A0 =A0 strcpy(acEN,p2);
=A0 =A0 =A0 iDed =3D i;
=A0 =A0}
};
std::map<std::string, strE> m;
and then for each data record in your file
const std::string firstKey =3D extractFirstKeyFromRecord();
const std::string secondKey =3D extractSecondKeyFromRecord();
const int i =3D extractIntFromRecord();
m[firstKey+secondKey] =3D strE(firstKey.c_str(), secondKey.c_str(), i);
Is that what you're doing?
If that's it, then you can save plenty ofmemoryat the price of a
little time and space overhead, by removing the two char fields from
your struct and creating a wrapper for the key from the two extracted
strings from the data. =A0I estimate the data space overhead at something
like 24 more bytes per entry but you'd save 30 from the struct for a
savings of 6 bytes per entry. =A0Around 1.2M. Use a lightweight string
class and it gets a little bit better. =A0Or you can make a struct for the=
key that uses char arrays to store the data and do even better.
But I'm still not sure what you're doing with the data that you read in,
or what data ends up in strE.
I'm missing something.
LR
For example here us what I am doing:
1. From File
1234567891234567891.1000|2
2. After substituting data
std::map< std::string, struct strE> mE;
struct strE eStr =3D {0,};
eStr.iDed =3D 2 // this is second field in the
file
mE.insert(std::pair<std::string,
struct
strERO>(1234567891234567891.1000, 2)
);
First 2 fields in the struct are not used, but I am it's being set to
0 by using {0,}.
You mentioned lightweight string, I am already using std::string, did
you mean some other class ?
.
|
|
|
| User: "LR" |
|
| Title: Re: Memory issues with Map |
08 Jan 2008 06:49:14 PM |
|
|
wrote:
On Jan 8, 9:33 am, LR <lr...@superlink.net> wrote:
For example here us what I am doing:
1. From File
1234567891234567891.1000|2
2. After substituting data
std::map< std::string, struct strE> mE;
struct strE eStr = {0,};
eStr.iDed = 2 // this is second field in the
file
mE.insert(std::pair<std::string,
struct
strERO>(1234567891234567891.1000, 2)
);
First 2 fields in the struct are not used, but I am it's being set to
0 by using {0,}.
If those two fields aren't being used I think you can easily save around
6MB. Just take those fields out of the struct or use a different struct
or just use an int.
If they're not being used, what are they doing in there?
You mentioned lightweight string, I am already using std::string, did
you mean some other class ?
You'd have to roll your own. But you'd likely save around 4 bytes per
entry minus the code overhead.
Start with this:
class LWS {
char *p_;
public:
LWS () : p_(0) {}
.... add more as needed....
};
Add a function to compare them
bool operator<(const LWS &s1, const LWS &s2);
LWS is not a good name for this class.
LR
.
|
|
|
|
|
|
|
|
|
|
|
|
| User: "Victor Bazarov" |
|
| Title: Re: Memory issues with Map |
04 Jan 2008 12:17:11 PM |
|
|
wrote:
I have written a program that loads a file having 3 columns into a
map. Here is the declaration:
struct strE
{
char iEID[22+1];
char acEN[6+1];
int iDed;
};
map<string, struct strE> mEro;
You may omit the keyword 'struct' here. It's not needed. 'strE'
is the name of the *type* (just like 'string'). We're not in C any
more.
struct strE eStr = {0,};
Same here, you may omit 'struct '.
if ( 0 < vDataColumn.size() )
eStr.iDzReceived =
There is no member 'iDzReceived' in 'strE'.
atol(vDataColumn.at(0).c_str());
//vDataColumn has columns from file which is "|" delimiter separated.
We split the line into columns and store it in vDataColumn
mEro.insert(std::pair<std::string,
struct strERO>(key, eStr)
What's 'strERO'?
);
I am seeing that for 6MB file the resident memory is around 12 MB. It
looks like either there is a memory leak or limitation in Map.
Memory leak can be discovered by using the tools that exist for that
particular purpose. Limitation in std::map? Looks a lot more like
"overhead" than "limitation" to me...
I am
not sure what kind of memory mangement is used by Map. Could somebody
shed some light as what would be the best approach and dos and donts
when dealing with Maps
Please find a copy of 'The C++ Standard Library' by Josuttis. It
contains so much information on memory allocation for standard
containers that the margins of this posting are too narrow to hold
it.
V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
.
|
|
|
|
| User: "Phil Endecott" |
|
| Title: Re: Memory issues with Map |
07 Jan 2008 06:47:13 AM |
|
|
Well the OP hasn't replied, but I think this is an interesting subject,
having had to solve similar problems myself in the past.
Here's a simpler case: say I have a std::map<int,int>, which I
initialise from a file and subsequently only read from. The memory
required will be substantially more than 2*N*sizeof(int); I'm not sure
exactly what the overhead of a typical map implementation is, but I
guess that each tree node has pointers to two child nodes, a parent
node, and the data; if sizeof(T*)==sizeof(int), then that needs
6*N*sizeof(int) plus any allocator overhead.
So here's one way to improve that: use a std::vector< std::pair<int,int>
. Read them in and then sort the array, and then use a binary search
to do lookups. That should have the same complexity as using a map, and
needs only 2*N*sizeof(int) storage.
But it becomes more interesting if you want to be able to modify the
data after initially loading it. With the vector, insert will be O(N)
rather than O(log N). So is there a solution that has O(log N) insert
and lookup complexity and ~ 2*N*sizeof(int) storage? For example, can
you group together M of the pair<int,int>s, and store those groups in a
map? The overhead of the map is then amortised over the M elements.
Example (just the flavour, lots of details ommitted - feel free to
finish it if you like!):
template <typename KEY, typename VALUE>
class dense_map {
typedef std::vector<VALUE> vec_t;
typedef std::map<KEY,vec_t> impl_t;
impl_t impl;
public:
VALUE& operator[](KEY k) {
impl_t::iterator i = impl.upper_bound(k); --i;
vec_t& v = i->second;
vec_t::iterator j = std::find(v.begin(),v.end(),k);
return *j;
}
void insert(std::pair<KEY,VALUE> p) {
KEY& k = p.first;
impl_t::iterator i = impl.upper_bound(k); --i;
vec_t& v = i->second;
if (v.size()>max_size_per_chunk) {
// split v into two chunks, and re-start the insert
} else {
v.push_back(p);
}
}
};
Question: can something like this be implemented that has the same
guarantees (complexity, iterator validity, etc) as a std::map? In
particular, I think there's an unavoidable conflict between the split
(and join, in erase) operations and iterator / reference validity. So
it can't be used as a drop-in replacement for a std::map. Any thoughts?
Cheers, Phil.
.
|
|
|
| User: "Jerry Coffin" |
|
| Title: Re: Memory issues with Map |
11 Jan 2008 11:40:11 PM |
|
|
In article <lbpgj.23560$ov2.19952@newsfe5-win.ntli.net>,
spam_from_usenet_0606@chezphil.org says...
Question: can something like this be implemented that has the same
guarantees (complexity, iterator validity, etc) as a std::map? In
particular, I think there's an unavoidable conflict between the split
(and join, in erase) operations and iterator / reference validity. So
it can't be used as a drop-in replacement for a std::map. Any thoughts?
I've never tried to work through it in detail, but I think it should be
possible to create something with implicit pointers to child nodes (a
bit like a heap) to come close. For example, the root item would be at x
[0]. Its children would be at x[1] and x[2]. The children of x[1] would
be at x[3] and x[4], and the children of x[2] at x[5] and x[6], and so
on.
From there it's a matter of math to provide addressing that works
correctly. The tree is exponential -- i.e. the root of the tree has only
one item. The next level has two items, the next four items, and so on.
IOW, the root of the tree has 2^0 items. The next level of the tree has
2^1 items. The next level has 2^2 items, and so on.
That implies that the addressing is also exponential. Level N of the
tree will follow immediately after level N-1, and the data for all the
preceding levels will contain 2^N-1 nodes, so at level N, the nodes
start at x[2^N]. Nicely enough, given a node x[N], we can easily work
the math in the other direction to find x[N]'s parent. IOW, the tree is
also automatically threaded so it's easy to traverse the tree without
recursion.
My immediate guess is that attempting to maintain balance (like a
red/black or AVL tree) probably won't work out for this structure.
Specifically, we can't just shift an entire sub-tree by manipulating
pointers -- such a thing would require moving all the sub-tree's data,
which probably makes it impractical. Leaving it as a plain binary tree
means (among other things) that all insertions and deletions take place
at the leaf nodes -- which will generally be toward the end of the
vector, exactly where we'd like.
The last thing we need is something equivalent to a null pointer to
indicate which nodes have no children. Since we don't have any actual
pointers, however, it's probably more sensible to use a bitmap to
indicate which nodes are valid. This would use the same addressing as
the nodes themselves, except (of course) addressing bits instead of
nodes.
--
Later,
Jerry.
The universe is a figment of its own imagination.
.
|
|
|
|
|

|
Related Articles |
|
|