首页 > 代码库 > Java编程思想(十二) —— 字符串(2)
Java编程思想(十二) —— 字符串(2)
上篇讲到String的基本用法及StringBuilder和String的比较。继续。
给大家感受一下RednaxelaFX的厉害,他大学看的书。
嗯,这就是去硅谷的水平,所以,还是继续看书吧。
1)格式化输出
确实,说到C的printf,是不能用重载的+操作符的。
printf("%d %f", x , y);%d这些为格式修饰符,%d表示整数,x插入到%d的位置,%f表示浮点数,y查到%f的位置。
Java也模仿了C:
public class TestString { public static void main(String[] args) { int x = 1; float y = 1.223f; System.out.printf("%d %f",x,y); System.out.println(); System.out.format("%d %f",x,y); } }
可以用Formatter在控制台完美的控制间隔,不用你自己去数几个空格了。
public class TestString { public static void main(String[] args) { Formatter fm = new Formatter(System.out); fm.format("%-5s %5s %10s ", "Name","Age","School"); } }%数字+s这样的表达像c,位置可以移动。
System.out.println(String.format("%h", 17)); fm.format("%h", 17);16进制的格式化输出。
2)正则表达式(regex :regular expression)
字符串处理,文件批处理中经常使用到,很好用,也是容易忘。这个点结合网上的一些知识点来写。
-? 一个可能带有负号的数字不包括数字。
\d 表示一位数字,注意其他语言的\\是在正则表达式中是一个反斜杠,而在java中是正要插入正则表达式的\。
举一反三,那么\d在java中就是\\d了,真正想插入一条反斜杠就要\\\。
String的匹配 利用String的match方法 public class TestString { public static void main(String[] args) { System.out.println("-3444".matches("-?\\d+")); System.out.println("-3".matches("-?\\d")); System.out.println("-3".matches("(-|\\+)?\\d")); } } result:都是ture
(-|\\+)? 这个比较复杂,|是或的意思,\\+,由于加号有特殊含义,那么要\\转义,所以就是有加号或者负号的其中一个,或者都没有。
split方法:
经常使用的时候是根据空格切割。
String s = Arrays.toString("sdfsdf sf sdf".split(" "));其实还可以在split参数中输入正则表达式进行切割:
String s = Arrays.toString("sdfsdf sf sdf".split("\\W+")); String s2 = Arrays.toString("sdfsdf sf sdf".split("n\\W+"));\w是非单词字符,\w为单词字符,n\\W+ 字母n后跟着一个或多个非中文字符。
参考:
http://blog.csdn.net/kdnuggets/article/details/2526588
和JDK的Pattern类:
Construct | Matches |
---|---|
Characters | |
x | The character x |
\\ | The backslash character |
\0n | The character with octal value 0n (0 <= n <= 7) |
\0nn | The character with octal value 0nn (0 <= n <= 7) |
\0mnn | The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) |
\xhh | The character with hexadecimal value 0xhh |
\uhhhh | The character with hexadecimal value 0xhhhh |
\x{h...h} | The character with hexadecimal value 0xh...h (Character.MIN_CODE_POINT <= 0xh...h <= Character.MAX_CODE_POINT ) |
\t | The tab character (‘\u0009‘) |
\n | The newline (line feed) character (‘\u000A‘) |
\r | The carriage-return character (‘\u000D‘) |
\f | The form-feed character (‘\u000C‘) |
\a | The alert (bell) character (‘\u0007‘) |
\e | The escape character (‘\u001B‘) |
\cx | The control character corresponding to x |
Character classes | |
[abc] | a, b, or c (simple class) |
[^abc] | Any character except a, b, or c (negation) |
[a-zA-Z] | a through z or A through Z, inclusive (range) |
[a-d[m-p]] | a through d, or m through p:[a-dm-p] (union) |
[a-z&&[def]] | d, e, or f (intersection) |
[a-z&&[^bc]] | a through z, except for b and c: [ad-z] (subtraction) |
[a-z&&[^m-p]] | a through z, and not m through p: [a-lq-z](subtraction) |
Predefined character classes | |
. | Any character (may or may not match line terminators) |
\d | A digit: [0-9] |
\D | A non-digit: [^0-9] |
\s | A whitespace character: [ \t\n\x0B\f\r] |
\S | A non-whitespace character: [^\s] |
\w | A word character: [a-zA-Z_0-9] |
\W | A non-word character: [^\w] |
POSIX character classes (US-ASCII only) | |
\p{Lower} | A lower-case alphabetic character: [a-z] |
\p{Upper} | An upper-case alphabetic character:[A-Z] |
\p{ASCII} | All ASCII:[\x00-\x7F] |
\p{Alpha} | An alphabetic character:[\p{Lower}\p{Upper}] |
\p{Digit} | A decimal digit: [0-9] |
\p{Alnum} | An alphanumeric character:[\p{Alpha}\p{Digit}] |
\p{Punct} | Punctuation: One of !"#$%&‘()*+,-./:;<=>?@[\]^_`{|}~ |
\p{Graph} | A visible character: [\p{Alnum}\p{Punct}] |
\p{Print} | A printable character: [\p{Graph}\x20] |
\p{Blank} | A space or a tab: [ \t] |
\p{Cntrl} | A control character: [\x00-\x1F\x7F] |
\p{XDigit} | A hexadecimal digit: [0-9a-fA-F] |
\p{Space} | A whitespace character: [ \t\n\x0B\f\r] |
java.lang.Character classes (simple java character type) | |
\p{javaLowerCase} | Equivalent to java.lang.Character.isLowerCase() |
\p{javaUpperCase} | Equivalent to java.lang.Character.isUpperCase() |
\p{javaWhitespace} | Equivalent to java.lang.Character.isWhitespace() |
\p{javaMirrored} | Equivalent to java.lang.Character.isMirrored() |
Classes for Unicode scripts, blocks, categories and binary properties | |
\p{IsLatin} | A Latin script character (script) |
\p{InGreek} | A character in the Greek block (block) |
\p{Lu} | An uppercase letter (category) |
\p{IsAlphabetic} | An alphabetic character (binary property) |
\p{Sc} | A currency symbol |
\P{InGreek} | Any character except one in the Greek block (negation) |
[\p{L}&&[^\p{Lu}]] | Any letter except an uppercase letter (subtraction) |
Boundary matchers | |
^ | The beginning of a line |
$ | The end of a line |
\b | A word boundary |
\B | A non-word boundary |
\A | The beginning of the input |
\G | The end of the previous match |
\Z | The end of the input but for the final terminator, if any |
\z | The end of the input |
量词:吸收文本的方式 | |
Greedy quantifiers 贪婪型 | |
X? | X, once or not at all |
X* | X, zero or more times |
X+ | X, one or more times |
X{n} | X, exactly n times |
X{n,} | X, at least n times |
X{n,m} | X, at least n but not more than m times |
Reluctant quantifiers | |
X?? | X, once or not at all |
X*? | X, zero or more times |
X+? | X, one or more times |
X{n}? | X, exactly n times |
X{n,}? | X, at least n times |
X{n,m}? | X, at least n but not more than m times |
Possessive quantifiers | |
X?+ | X, once or not at all |
X*+ | X, zero or more times |
X++ | X, one or more times |
X{n}+ | X, exactly n times |
X{n,}+ | X, at least n times |
X{n,m}+ | X, at least n but not more than m times |
Logical operators | |
XY | X followed by Y |
X|Y | Either X or Y |
(X) | X, as a capturing group |
Back references | |
\n | Whatever the nth capturing group matched |
\k<name> | Whatever the named-capturing group "name" matched |
Quotation | |
\ | Nothing, but quotes the following character |
\Q | Nothing, but quotes all characters until \E |
\E | Nothing, but ends quoting started by \Q |
Special constructs (named-capturing and non-capturing) | |
(?<name>X) | X, as a named-capturing group |
(?:X) | X, as a non-capturing group |
(?idmsuxU-idmsuxU) | Nothing, but turns match flags i d m s u x U on - off |
(?idmsux-idmsux:X) | X, as a non-capturing group with the given flagsidms u x on - off |
(?=X) | X, via zero-width positive lookahead |
(?!X) | X, via zero-width negative lookahead |
(?<=X) | X, via zero-width positive lookbehind |
(?<!X) | X, via zero-width negative lookbehind |
(?>X) | X, as an independent, non-capturing group |
3)Pattern和Matcher
public class TestString { public static void main(String[] args) { Pattern p = Pattern.compile("\\W+"); Matcher m = p.matcher("qw"); System.out.println(m.matches()); } }Pattern.compile,静态方法,Compiles the given regular expression into a pattern。将一个正则表达式编译进Pattern中。
p.mathcer,Creates a matcher that will match the given input against this pattern。创建一个matcher将输入和Pattern匹配。
m.matches,Attempts to match the entire region against the pattern。
boolean,返回匹配结果。
这样就可以传入正则表达式,然后对字符串进行匹配。
1、find和group
2、end和start
3、split
其实书上讲属性的东西是最简单的,因为文档有,这种文档有的就是自己动手查动手敲代码。Pattern还有两个
4)替换操作
replaceFirst替换的是第一个匹配的内容。replaceAll是全部替换。
接下来还有比这两者好用的处理方法,加入你要找出abcd字母并且替换成大写字母,如果用上面两种写法的话就要处理多次。
reset方法:
5)扫描输入
c的输入很简单,有时java经常写Syso(Eclipse的System.out.println的快捷输入,很早之前一位前辈告诉我的,一直受用)。一直输出,却忘了输入怎么写。
可读流对象:
用Scanner也可以。
Scanner定界符:
昨天想了好久的就是定界符这个东西,为什么我用\\d+,\\d+不行,今天再来看想通了,其实定界是作为分隔符来看,\\s是空格,而*是零次或多次,这样说就是以逗号前后无空白或者一个或多个空白,将Scanner里面的内容分隔开。
\\d+,\\d+,以逗号前后有数字作为分隔符,肯定不匹配。为了验证,把s改为W一试,也是可以的。
以前没有Scanner和正则表达式的时候,Java使用的是StringTokenizer,现在基本废弃不用了,当然,IDE还没有提示Deprecated.
String内容就到这里了,输入输出,格式化输出,正则表达式,用好的话,在批处理方面甚是强大,有空补充一下String不变性和内存分配的内容。
boolean,返回匹配结果。
这样就可以传入正则表达式,然后对字符串进行匹配。
1、find和group
public class TestString { public static void main(String[] args) { String s = "You're kidding me!"; Pattern p = Pattern.compile("\\w+"); Matcher m = p.matcher(s); while(m.find()){ System.out.printf(m.group()+" "); } int i = 0; while(m.find(i)){ System.out.printf(m.group()+" "); i++; } } } result: You re kidding me You ou u re re e kidding kidding idding dding ding ing ng g me me efind可以遍历字符串,寻找正则表达式的匹配,group是 Returns the input subsequence matched by the previous match。这样返回的便是第一个匹配多个单词字符 ,所以便是You。find传入参数后,可以调整开始搜索的位置,刚开始为0,那么匹配的是You,i+1之后,匹配到的是ou。
2、end和start
while(m.find()){ System.out.printf(m.group()+" Start:"+m.start()+" End:"+m.end()); } You Start:0 End:3re Start:4 End:6kidding Start:7 End:14me Start:15 End:17匹配起始位置的索引,匹配结束位置的索引。
3、split
其实书上讲属性的东西是最简单的,因为文档有,这种文档有的就是自己动手查动手敲代码。Pattern还有两个
String[] | split(CharSequence input) Splits the given input sequence around matches of this pattern. |
String[] | split(CharSequence input, int limit) Splits the given input sequence around matches of this pattern. |
String string = "kjj~~lkjl~~lkjlJ~~lkj~~"; System.out.println(Arrays.toString(Pattern.compile("~~").split(string))); System.out.println(Arrays.toString(Pattern.compile("~~").split(string,2))); result: [kjj, lkjl, lkjlJ, lkj] [kjj, lkjl~~lkjlJ~~lkj~~](哈哈,作者竟然在书中直接讽刺Sun里面的java设计者,把Pattern的标记设计得难懂。)
4)替换操作
String | replaceAll(String regex,String replacement) Replaces each substring of this string that matches the given regular expression with the given replacement. |
String | replaceFirst(String regex,String replacement) Replaces the first substring of this string that matches the givenregular expression with the given replacement. |
接下来还有比这两者好用的处理方法,加入你要找出abcd字母并且替换成大写字母,如果用上面两种写法的话就要处理多次。
String string = "asdfb sdfoiwer sdfcdf wer sd d sdf cxvxzcv s ef bob b b "; StringBuffer s2 = new StringBuffer(); Pattern pa = Pattern.compile("[abcd]"); Matcher mc = pa.matcher(string); System.out.println(); while(mc.find()){ mc.appendReplacement(s2,mc.group().toUpperCase()); } mc.appendTail(s2); System.out.println(s2); result:AsDfB sDfoiwer sDfCDf wer sD D sDf CxvxzCv s ef BoB B B mc.find(); mc.appendReplacement(s2,mc.group().toUpperCase()); mc.appendTail(s2); System.out.println(s2); result :Asdfb sdfoiwer sdfcdf wer sd d sdf cxvxzcv s ef bob b b替换时也能操作字符串,while的时候能够全部替换,如果不用while,只进行一次find操作,那么s2打印出来的只有A,要达到replaceFirst的效果,要用appendTail方法,加尾巴,就是把剩余没替换的补上。这样才会打印完整。
reset方法:
Matcher mc = pa.matcher(string);每次mc只能match一个字符串,可以用reset方法重新match其他字符串:
mc.reset(String newString);
5)扫描输入
c的输入很简单,有时java经常写Syso(Eclipse的System.out.println的快捷输入,很早之前一位前辈告诉我的,一直受用)。一直输出,却忘了输入怎么写。
可读流对象:
public class TestScanner { public static void main(String[] args) { BufferedReader br = new BufferedReader(new StringReader("sdfsdf\nsdfsdf\nsdfsdf")); try { System.out.println(br.readLine()); System.out.println(br.readLine()); System.out.println(br.readLine()); } catch (IOException e) { e.printStackTrace(); } } }
用Scanner也可以。
Scanner s = new Scanner(System.in); System.out.println(s.nextLine());可以在控制台输入后输出。
Scanner定界符:
public class TestScanner { public static void main(String[] args) { Scanner s = new Scanner("12, 323, 34, 34, 5"); s.useDelimiter("\\s*,\\s*"); while (s.hasNextInt()) { System.out.println(s.nextInt()); } } }昨天看到这里的时候卡住了,本来Scanner根据空白字符对输入进行分词:
Scanner s = new Scanner("12 323 34 34 5"); while (s.hasNextInt()) { System.out.println(s.nextInt()); }这样可以打印每一个数字。
昨天想了好久的就是定界符这个东西,为什么我用\\d+,\\d+不行,今天再来看想通了,其实定界是作为分隔符来看,\\s是空格,而*是零次或多次,这样说就是以逗号前后无空白或者一个或多个空白,将Scanner里面的内容分隔开。
\\d+,\\d+,以逗号前后有数字作为分隔符,肯定不匹配。为了验证,把s改为W一试,也是可以的。
以前没有Scanner和正则表达式的时候,Java使用的是StringTokenizer,现在基本废弃不用了,当然,IDE还没有提示Deprecated.
String内容就到这里了,输入输出,格式化输出,正则表达式,用好的话,在批处理方面甚是强大,有空补充一下String不变性和内存分配的内容。
Java编程思想(十二) —— 字符串(2)
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。