| Topic: |
DEVELOP > c-Plus-Plus |
| User: |
"" |
| Date: |
18 Oct 2006 07:14:25 AM |
| Object: |
UTF-8 encoding problem |
Hi All,
I am having a GUI which accepts a Unicode string and searches a given
set of xml files for that string.
Now, i have 2 XML files both of them saved in UTF-8 format, having
characters of different language.
Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.
Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)
Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
Please help.
Regards,
Shreshth
.
|
|
| User: "Ron Natalie" |
|
| Title: Re: UTF-8 encoding problem |
18 Oct 2006 07:35:43 AM |
|
|
wrote:
Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
.
|
|
|
| User: "" |
|
| Title: Re: UTF-8 encoding problem |
18 Oct 2006 07:49:45 AM |
|
|
I know this has nothing to do with C++ in particular but where better
to ask such a question.
Anyways,
your problem is your document isn't conforming with the document
rules that the search program is using.
I am not able to understand what you are trying to say by this.
Ofcourse i cannot do anything about the Search Program (Which is for
sure using Unicode)
But the question is that if both the file are in UTF-8 format why is it
(search program) working only for the one having UTF-8 in its XML
declaration as well.
Does it really make any difference in this regard.
Thanks for your reply.
Shreshth
Ron Natalie wrote:
shreshth.luthra@gmail.com wrote:
Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
.
|
|
|
| User: "Peter Jansson" |
|
| Title: Re: UTF-8 encoding problem |
18 Oct 2006 11:28:10 AM |
|
|
wrote:
I know this has nothing to do with C++ in particular but where better
to ask such a question.
The statement above is the best I have seen in a long time here.
If you know your question have "nothing to do with C++ in particular"
then why do you ask in a newsgroup dedicated to the C++ language? That
is like asking for help with you car in a bicycle shop.
You will probably get much better response if you ask in a forum
dedicated to your problem.
Sincerely,
Peter Jansson
http://www.p-jansson.com/
http://www.jansson.net/
.
|
|
|
|
| User: "Bhushan" |
|
| Title: Re: UTF-8 encoding problem |
19 Oct 2006 03:05:20 PM |
|
|
Check your 3rd party search tool documentation about how it searches
XML files.
shreshth.luthra@gmail.com wrote:
I know this has nothing to do with C++ in particular but where better
to ask such a question.
Anyways,
your problem is your document isn't conforming with the document
rules that the search program is using.
I am not able to understand what you are trying to say by this.
Ofcourse i cannot do anything about the Search Program (Which is for
sure using Unicode)
But the question is that if both the file are in UTF-8 format why is it
(search program) working only for the one having UTF-8 in its XML
declaration as well.
Does it really make any difference in this regard.
Thanks for your reply.
Shreshth
Ron Natalie wrote:
shreshth.luthra@gmail.com wrote:
Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
.
|
|
|
|
|
| User: "loufoque" |
|
| Title: Re: UTF-8 encoding problem |
18 Oct 2006 08:39:52 AM |
|
|
Ron Natalie wrote:
the Unicode (which effectively
is a 32 bit character space)
Unicode only reserves 2^20 + 2^16 mappings.
21 bits is more than enough to store that.
.
|
|
|
|
|
| User: "loufoque" |
|
| Title: Re: UTF-8 encoding problem |
18 Oct 2006 08:37:53 AM |
|
|
wrote:
Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.
BOMs are quite useless for UTF-8. They're nothing but facultative.
And according to the XML spec (AFAIK), the default encoding when no
encoding is declared is UTF-8.
Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)
OK, so you have a problem with your broken third party application.
How is that related with C++?
Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
Like most of your message, what you say just doesn't make much sense.
Please help.
Getting a basic understanding of what Unicode and its encoding formats
are would surely help.
.
|
|
|
|

|
Related Articles |
|
|