UTF-8 encoding problem



 DEVELOP > c-Plus-Plus > UTF-8 encoding problem

LINK TO THIS PAGE  


rating :  0   |  0


  Page 1 of 1

1

 
Topic: DEVELOP > c-Plus-Plus
User: ""
Date: 18 Oct 2006 07:14:25 AM
Object: UTF-8 encoding problem
Hi All,
I am having a GUI which accepts a Unicode string and searches a given
set of xml files for that string.
Now, i have 2 XML files both of them saved in UTF-8 format, having
characters of different language.
Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.
Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)
Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
Please help.
Regards,
Shreshth
.

User: "Ron Natalie"

Title: Re: UTF-8 encoding problem 18 Oct 2006 07:35:43 AM
wrote:


Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.


What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
.
User: ""

Title: Re: UTF-8 encoding problem 18 Oct 2006 07:49:45 AM
I know this has nothing to do with C++ in particular but where better
to ask such a question.
Anyways,

your problem is your document isn't conforming with the document
rules that the search program is using.

I am not able to understand what you are trying to say by this.
Ofcourse i cannot do anything about the Search Program (Which is for
sure using Unicode)
But the question is that if both the file are in UTF-8 format why is it
(search program) working only for the one having UTF-8 in its XML
declaration as well.
Does it really make any difference in this regard.
Thanks for your reply.
Shreshth
Ron Natalie wrote:

shreshth.luthra@gmail.com wrote:


Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.


What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.

.
User: "Peter Jansson"

Title: Re: UTF-8 encoding problem 18 Oct 2006 11:28:10 AM
wrote:

I know this has nothing to do with C++ in particular but where better
to ask such a question.

The statement above is the best I have seen in a long time here.
If you know your question have "nothing to do with C++ in particular"
then why do you ask in a newsgroup dedicated to the C++ language? That
is like asking for help with you car in a bicycle shop.
You will probably get much better response if you ask in a forum
dedicated to your problem.
Sincerely,
Peter Jansson
http://www.p-jansson.com/
http://www.jansson.net/
.

User: "Bhushan"

Title: Re: UTF-8 encoding problem 19 Oct 2006 03:05:20 PM
Check your 3rd party search tool documentation about how it searches
XML files.
shreshth.luthra@gmail.com wrote:

I know this has nothing to do with C++ in particular but where better
to ask such a question.

Anyways,

your problem is your document isn't conforming with the document
rules that the search program is using.


I am not able to understand what you are trying to say by this.
Ofcourse i cannot do anything about the Search Program (Which is for
sure using Unicode)

But the question is that if both the file are in UTF-8 format why is it
(search program) working only for the one having UTF-8 in its XML
declaration as well.
Does it really make any difference in this regard.

Thanks for your reply.

Shreshth


Ron Natalie wrote:

shreshth.luthra@gmail.com wrote:


Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.


What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.

.


User: "loufoque"

Title: Re: UTF-8 encoding problem 18 Oct 2006 08:39:52 AM
Ron Natalie wrote:

the Unicode (which effectively
is a 32 bit character space)

Unicode only reserves 2^20 + 2^16 mappings.
21 bits is more than enough to store that.
.


User: "loufoque"

Title: Re: UTF-8 encoding problem 18 Oct 2006 08:37:53 AM
wrote:

Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.

BOMs are quite useless for UTF-8. They're nothing but facultative.
And according to the XML spec (AFAIK), the default encoding when no
encoding is declared is UTF-8.

Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)

OK, so you have a problem with your broken third party application.
How is that related with C++?

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.

Like most of your message, what you say just doesn't make much sense.

Please help.

Getting a basic understanding of what Unicode and its encoding formats
are would surely help.
.


  Page 1 of 1

1

 


Related Articles
 

NEWER

pg.1232     pg.940     pg.716     pg.544     pg.412     pg.311     pg.234     pg.175     pg.130     pg.96     pg.70     pg.50     pg.35     pg.24     pg.16     pg.10     pg.6     pg.3     pg.1

OLDER