首页 > 代码库 > ZKUI中文编码以及以docker方式运行的问题
ZKUI中文编码以及以docker方式运行的问题
ZKUI中文编码
问题
上周有同事反馈,通过ZKUI这个工具去上传带有中文的节点值时会出现中文无法显示的问题。最终发现编码是NCR编码,全称是:Numeric Character Reference。
什么是NCR?
这里引入一段维基百科的描述。
A numeric character reference (NCR) is a common markup construct used in SGML and SGML-derived markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represents a single character. Since WebSgml, XML and HTML 4, the code points of the Universal Character Set (UCS) of Unicode are used. NCRs are typically used in order to represent characters that are not directly encodable in a particular document (for example, because they are international characters that don‘t fit in the 8-bit character set being used, or because they have special syntactic meaning in the language). When the document is interpreted by a markup-aware reader, each NCR is treated as if it were the character it represents.
确认是否是ZKUI的问题
由于zookeeper本身是可以存储中文的(引用一段zookeeper网站上的介绍),所以基本确认是ZKUI工具本身的问题。
The ZooKeeper Data Model
ZooKeeper has a hierarchal name space, much like a distributed file system. The only difference is that each node in the namespace can have data associated with it as well as children. It is like having a file system that allows a file to also be a directory. Paths to nodes are always expressed as canonical, absolute, slash-separated paths; there are no relative reference. Any unicode character can be used in a path subject to the following constraints:
- The null character (\u0000) cannot be part of a path name. (This causes problems with the C binding.)
- The following characters can‘t be used because they don‘t display well, or render in confusing ways: \u0001 - \u0019 and \u007F - \u009F.
- The following characters are not allowed: \ud800 -uF8FFF, \uFFF0-uFFFF, \uXFFFE - \uXFFFF (where X is a digit 1 - E), \uF0000 - \uFFFFF.
- The "." character can be used as part of another name, but "." and ".." cannot alone be used to indicate a node along a path, because ZooKeeper doesn‘t use relative paths. The following would be invalid: "/a/b/./c" or "/a/b/../c".
- The token "zookeeper" is reserved.
ZKUI是基于什么实现的?
ZKUI个JAVA开源工具,可以下载源码。发现网页功能是基于HttpServlet实现的,没有使用其它一些高级的产品,比如Spring MVC等。知道是使用HttpServlet后,就会去对比Spring MVC对于中文的处理,然后就很容易去解决中文被NCR编码的问题。
这个项目结构是不是很像Spring MVC?
解决HttpServlet请求的中文编码
这里可以增加一个filter为请求对象以及响应对象增加UTF-8的处理。通过这步,ZKUI上提供的单节点的CRUD就可以正常处理中文节点值了。
@WebFilter(filterName = "filtercharset", urlPatterns = "/*")public class CharsetFilter implements Filter { @Override public void init(FilterConfig fc) throws ServletException { //Do Nothing } @Override public void doFilter(ServletRequest req, ServletResponse res, FilterChain fc) throws IOException, ServletException { HttpServletRequest request = (HttpServletRequest) req; HttpServletResponse response = (HttpServletResponse) res; request.setCharacterEncoding(StandardCharsets.UTF_8.toString()); response.setCharacterEncoding(StandardCharsets.UTF_8.toString()); fc.doFilter(req, res); } @Override public void destroy() { //Do nothing }}
解决上传文件的中文编码
上面只是解决了页面GET,POST解决单节点的CRUD过程中的编码问题,对于文件上传还需要特殊处理。看这段源码,细节我删除了,只留下主体结构以及编码部分的代码:
public class Import extends HttpServlet { private final static Logger logger = LoggerFactory.getLogger(Import.class); @Override protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { logger.debug("Importing Action!"); try { Iterator iter = items.iterator(); while (iter.hasNext()) { FileItem item = (FileItem) iter.next(); if (item.isFormField()) { if (item.getFieldName().equals("scmOverwrite")) { scmOverwrite = item.getString(); } if (item.getFieldName().equals("scmServer")) { scmServer = item.getString(); } if (item.getFieldName().equals("scmFilePath")) { scmFilePath = item.getString(); } if (item.getFieldName().equals("scmFileRevision")) { scmFileRevision = item.getString(); } } else { uploadFileName = item.getName(); //原来的逻辑是item.getString(),我增加了指定编码 sbFile.append(item.getString(StandardCharsets.UTF_8.toString())); } } List<String> importFile = new ArrayList<>(); //获取节点的值...... ZooKeeperUtil.INSTANCE.importData(importFile, Boolean.valueOf(scmOverwrite), ServletUtil.INSTANCE.getZookeeper(request, //处理其它一些内容 request.getSession().setAttribute("flashMsg", "Import Completed!"); response.sendRedirect("/home"); } catch (FileUploadException | IOException | InterruptedException | KeeperException ex) { logger.error(Arrays.toString(ex.getStackTrace())); ServletUtil.INSTANCE.renderError(request, response, ex.getMessage()); } }}
上传的文件格式也有一定的要求:
- 要求为无BOM的UTF-8,如果是有BOM,在处理时需要将前面的魔术数手工删除,或者是修改这个正则表达式:
else if (!inputLine.matches("/.+=.+=.*")) { throw new IOException("Invalid format at line " + lineCnt + ": " + inputLine); }
- 如果文件格式不是UTF-8的,那么会出现乱码内容。
ZKUI发布为docker
将编译好的java包,config.cfg以及项目中的Dockfile文件上传到linux服务器中。
这里简单介绍下Dockerfile
FROM java:8MAINTAINER jim <jiangmin168168@hotmail.com>WORKDIR /var/appADD zkui-*.jar /var/app/zkui.jarADD config.cfg /var/app/config.cfgENTRYPOINT [ "java", "-jar", "/var/app/zkui.jar" ]EXPOSE 9090
- FROM,这是依赖的一些环境,比如java8,ubuntu,redis等
- MAINTAINER,这是维护人员的信息
- WORKDIR,这里指定工作目录,进入docker后,会直接进入这个目录,root@zkui-host:/var/app#
- ADD,复制文件到容器
- ENTRYPOINT,容器启动后自动执行的命令,这里直接运行我们的java包
- EXPOSE,这是主机连接容器的容器端口,实际启动容器时,可以指定-p来配置主机与容器端口的映射
制作image
docker build -t jiangmin168168/zkui .
上面的jiangmin168168是在docker网站上的ID,可以将image上传到网站上去,如果只是本地用可以不用加这个ID。当然即使加了ID,只通过build也不能实现上传功能,还需要登录docker,通过push命令去实现。
docker build非常慢
由于官方的docker网站在国内非常慢,我基本没有成功过,期待良久最后无情的显示超时。最后经过同事的介绍使用了国内的DaoCloud提供的docker加速器,就是docker的一个镜像,这样大部分原本需要从docker.io下载的内容转到国内的镜像了,速度火箭上升。
查看制作的image
docker images
发现这个image还很大,估计是自动引入java8的原因,后面可以研究下自带jdk来看看能否减少空间。
启动容器
docker run -dit --name zkui --hostname zkui-host -v /data:/data -p 9090:9090 zkui:latest
简单说明下这些参数
- -dit,以后台模式运行,后面的it详细作用可以去查文档
- --name,是容器的名称
- --hostname,是登录进容器后显示的名称
- -v,指定主机与容器的目录映射,方便于容器访问主机的目录
- -p,是指定主机与容器的端口映射,容器中的端口是固定的,主机的端口是动态配置,这样就可以部署多个容器节点
- zkui:latest,前面是image名称,后面是tag,如果只输入image默认的tag就是latest
遗留问题
ZKUI中的NCR是在什么地方引入的?回头还需要再查找下
ZKUI中文编码以及以docker方式运行的问题