unicode string to C string?



 DEVELOP > c-Plus-Plus > unicode string to C string?

LINK TO THIS PAGE  


rating :  0   |  0


  Page 1 of 1

1

 
Topic: DEVELOP > c-Plus-Plus
User: "Bint"
Date: 10 Jan 2008 08:28:26 PM
Object: unicode string to C string?
Hello, is Unicode part of C/C++? I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?
Thanks
B
.

User: "Sherman Pendley"

Title: Re: unicode string to C string? 11 Jan 2008 12:24:05 PM
"Bint" <bint@csgs.com> writes:

Hello, is Unicode part of C/C++? I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

A very popular Unicode libary is IBM's ICU:
<http://www.icu-project.org/>
sherm--
--
My blog: http://shermspace.blogspot.com
Cocoa programming in Perl: http://camelbones.sourceforge.net
.

User: "cr88192"

Title: Re: unicode string to C string? 10 Jan 2008 11:11:26 PM
"Bint" <bint@csgs.com> wrote in message
news:13odl3cq69fpsc6@corp.supernews.com...

Hello, is Unicode part of C/C++? I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

I may also mention, that depending on ones' needs, UTF-8 may be worth
looking into.
reason:
UTF-8, for plain ASCII characters, is exactly the same as the ASCII version;
it can also preserve the unicode range, by encoding non-ASCII characters as
multiple bytes, and so, very often, we can work with UTF-8 data in much the
same way as with plain ASCII.
also, unlike Unicode (in particular, UTF-16), in most cases we can utilize
ASCII and UTF-8 strings interchangably.
the main cost, however, is that in most common compilers, char is signed by
default (causing there to be negative char values, or us having to cast to
'unsigned char' in cases where we actually care), but this is a pretty minor
issue most of the time.
just my opinion mostly...

Thanks
B


.

User: "Alf P. Steinbach"

Title: Re: unicode string to C string? 10 Jan 2008 08:42:53 PM
* Bint:

Hello, is Unicode part of C/C++?

No, unfortunately not, if you're talking /support/ for Unicode, except
that characters can be denoted via their Unicode character codes.

I have some unicode data, and I need to
get some kind of ASCii C string from it. Is that possible?

Yes.
You will need to decide on what to do about non-ASCII characters.
However, in practice, if the Unicode data is in the Basic Multilingual
Plane (original 16-bit Unicode), then all you need to do technically is
to check that most significant byte is zero, then retain only the least
significant byte, because ASCII is a subset of Unicode. And in
practice, I think you can do that via narrow() in the standard library.
Read documentation. :-)
Cheers, & hth.,
- Alf
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
.
User: "Alf P. Steinbach"

Title: Re: unicode string to C string? 10 Jan 2008 08:47:25 PM
* Alf P. Steinbach:

* Bint:

Hello, is Unicode part of C/C++?


No, unfortunately not, if you're talking /support/ for Unicode, except
that characters can be denoted via their Unicode character codes.


I have some unicode data, and I need to get some kind of ASCii C
string from it. Is that possible?


Yes.

You will need to decide on what to do about non-ASCII characters.

However, in practice, if the Unicode data is in the Basic Multilingual
Plane (original 16-bit Unicode), then all you need to do technically is
to check that most significant byte is zero, then retain only the least
significant byte, because ASCII is a subset of Unicode. And in
practice, I think you can do that via narrow() in the standard library.
Read documentation. :-)

Forgot to mention, if you don't use narrow() but DIY, then also need to
check that only lowest 7 bits of least significant byte are non-zero,
i.e., in practice (and I assumed 8-bit bytes above), that most
significant bit of least significant byte is zero. Which with signed
char is in practice the same as checking that it's non-negative.
Assuming you really want ASCII.
If Latin-1 is acceptable, don't need to check most significant bit.
Cheers, & hth.,
- Alf
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
.
User: "Bint"

Title: Re: unicode string to C string? 10 Jan 2008 09:06:08 PM
thanks!
.



User: "Rui Maciel"

Title: Re: unicode string to C string? 11 Jan 2008 07:34:12 AM
Bint wrote:

Hello,  is Unicode part of C/C++?  I have some unicode data, and I need to
get some kind of ASCii C string from it.  Is that possible?

I don't believe it is, at least explicitly. As far as I know, the C99
standard implemented the wchar_t type and the wchar.h library in order to
allow the implementation of some sort of hard unicode support.
Regardint the Unicode to ASCII conversion, you should have in mind that
ASCII is a subset of the UTF8 standard. That means that if you want to
convert information described with the UTF8 format to ASCII, you do run the
danger of losing information. So, why not abandon ASCII and instead
implement support for UTF-8? Some operating systems already did that.
Rui Maciel
.


  Page 1 of 1

1

 


Related Articles
 

NEWER

pg.1232     pg.940     pg.716     pg.544     pg.412     pg.311     pg.234     pg.175     pg.130     pg.96     pg.70     pg.50     pg.35     pg.24     pg.16     pg.10     pg.6     pg.3     pg.1

OLDER