首页 > 代码库 > utf-7转码

utf-7转码

List:       imap
I tried the code you referenced (the exact program and compilation script
are in attachments), but it failed. The program takes input as modified
UTF-7, uses MailboxToURL routine to change it to UTF-8 and then uses the
URLtoMailbox routine to change it to UTF-7 again:

int main(int argc, char* argv[]){
  char out[OUTSIZE];
  char in[OUTSIZE]; 

  strcpy(in,argv[1]);
  printf("in:   %s\n",in);

  MailboxToURL(out,in);
  printf("out:  %s\n",out);

  URLtoMailbox(in,out);
  printf("in:   %s\n",in);  
}

As an input I gave it the following UTF-7 code:
a&AQUBBA-e&AFC-f
which is the code produced by Microsoft Outlook and contains bunch of Polish
letters.

The output of the program is the following:
[tomcat@fatcat]$ ./utf7test ‘a&AQUBBA-e&AFC-f‘
in:   a&AQUBBA-e&AFC-f
out:  a%C4%85%C4%84e%50f
in:   a&AQUBBA-ePf

So, as you can see, the conversion is not 1:1 ;-)))) Strange enough, if I
use the resulting output (a&AQUBBA-ePf) as an input to another iteration, it
starts behaving correctly ;-)))

Can you help me?
Marek.

ps. I tried the code on linux. There are couple of strange assignments in
the code (like unsigned long variable = char variable), so I mention it just
in case this might be of some importance.

> -----Original Message-----
> From: Chris Newman [mailto:chris+imap@innosoft.com]
> Sent: Tuesday, July 24, 2001 8:43 PM
> To: Marek Kowal; imap@u.washington.edu
> Subject: Re: modified UTF7 to UTF8 conversion
> 
> 
> Try the code in:
>   <http://www.innosoft.com/rfc/rfc2192.html>
> 
> It‘s missing a security check for invalid UTF-8 chararacters 
> on input, but 
> is otherwise correct to my knowledge.  If it‘s broken, please 
> email me the 
> example which breaks it so I can fix the code.
> 
> 		- Chris
> 
> --On Monday, July 23, 2001 19:52 +0200 Marek Kowal 
> <kowalm@onet.pl> wrote:
> 
> > Hi there,
> >
> > I am having HARD time trying to convert modified UTF7 
> mailbox names to
> > UTF8 (which I then convert to ISO-8859-2 using iconv 
> library, BTW). I
> > tried the UTF7 to URL UTF8 code (which I found in imap 
> discussion list,
> > 
> http://www.washington.edu/imap/listarch/1997/msg00800.html), 
> but it does
> > not seem to work correctly - if I run it one-way and then 
> back on some
> > string, sometimes I get different results - the resulting 
> UTF7 code is
> > not the same.
> >
> > Anyway, can anybody point me to the proper conversion 
> routines which can
> > transform between modified UTF7 and UTF8? It could be 
> separate code, but
> > if anybody did it already as iconv conversion table, that 
> would be great.
> 


["compile" (application/octet-stream)]
["utf7test.c" (application/octet-stream)]

#include <stdio.h>
#include <string.h>
#include <iconv.h>
#define OUTSIZE 1024


/* hexadecimal lookup table */
static char hex[] = "0123456789ABCDEF";

/* URL unsafe printable characters */
static char urlunsafe[] = " \"#%&+:;<=>?@[\\]^`{|}";

/* UTF7 modified base64 alphabet */

static char base64chars[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,"; 
#define UNDEFINED 64

/* UTF16 definitions */
#define UTF16MASK       0x03FFUL
#define UTF16SHIFT      10
#define UTF16BASE       0x10000UL

#define UTF16HIGHSTART  0xD800UL
#define UTF16HIGHEND    0xDBFFUL
#define UTF16LOSTART    0xDC00UL
#define UTF16LOEND      0xDFFFUL

/* Convert an IMAP mailbox to a URL path
 *  dst needs to have roughly 4 times the storage space of src
 *    Hex encoding can triple the size of the input
 *    UTF-7 can be slightly denser than UTF-8
 *     (worst case: 8 octets UTF-7 becomes 9 octets UTF-8)
 */

void MailboxToURL(char *dst, char *src)
{
  unsigned char c, i, bitcount;
  unsigned long ucs4, utf16, bitbuf;
  unsigned char base64[256], utf8[6];

  /* initialize modified base64 decoding table */
  memset(base64, UNDEFINED, sizeof (base64));
 for (i = 0; i < sizeof (base64chars); ++i) {
   base64[base64chars[i]] = i;
 }

 /* loop until end of string */
 while (*src != ‘\0‘) {
   c = *src++;
   /* deal with literal characters and &- */
   if (c != ‘&‘ || *src =http://www.mamicode.com/= ‘-‘) {"in:   %s\n",in);

  MailboxToURL(out,in);
  printf("out:  %s\n",out);

  URLtoMailbox(in,out);
  printf("in:   %s\n",in);  
}

  

utf-7转码