java - Emoji not encoding -
i retrieving twitter tweets , attempting save them flat file. have following code:
string jsonstring = new gson().tojson(tweets); byte[] utf8jsonstring = jsonstring.getbytes("utf-8"); string utf8json = new string(utf8jsonstring, "utf-8"); system.out.println( utf8json);
output:
..."id":768260789744443392,"text":"#emojicity5 ?","source"...
the emoji (just after #emojicity5) appearing ?. have attempted endode using utf-8, utf-16be, utf-16le, utf-32be, , utf-32le no avail. system using jdk 1.6
, 3.0.3 of twitter4j
. missing here?
string contains unicode, no need convert same string. when or byte[]
1 needs indicate encoding of bytes.
however problem console has no unicode encoding utf-8 , might not have emoji in fonts. problem of system.out.println
. in case system.out in other encoding not represent emoji , instead printed question mark.
what can check whether emoji arrived, dump unicode code points.
in java 8:
jasonstring.tocodepoints() .filter(cp -> cp >= 256) .foreach(cp -> { system.out.printf("u+%x = %s%n", cp, character.getname(cp)); }); boolean containsemoji(string s) { return s.codepoints().anymatch(cp -> unicodeblock.of(cp).equals(unicodeblock.emoticons)); }
Comments
Post a Comment