首页 > 代码库 > String源码解析(一)
String源码解析(一)
本篇文章内的方法介绍,在方法的上面的注释讲解的很清楚,这里只阐述一些要点。
Java中的String类的定义如下:
1 public final class String2 implements java.io.Serializable, Comparable<String>, CharSequence { ...}
可以看到,String是final的,而且继承了Serializable、Comparable和CharSequence接口。
正是因为这个特性,字符串对象可以被共享,例如下面两个字符串是等价的:
1 String str = "abc";2 3 is equivalent to:4 5 char data[] = {‘a‘, ‘b‘, ‘c‘};6 String str = new String(data);
String类中定义了一个final的字符数组value[],用来存储字符:
/** The value is used for character storage. */ private final char value[];
注意value是final的,所以赋值之后不能被修改,这就是字符串不能被修改的原因。
我们要区分这个final与前面提到的String类前面的final的作用,String前的final表示类String不能被继承、不能被修改。
下面我们就看下String类的具体实现。
1.构造函数
1.1 无参构造函数
/** * Initializes a newly created {@code String} object so that it represents * an empty character sequence. Note that use of this constructor is * unnecessary since Strings are immutable. */ public String() { this.valuehttp://www.mamicode.com/= "".value; }
调用这个构造函数可以构造一个空的字符串对象,可是字符串是不可变的,调用这个构造函数没有什么意义。
1.2
1 /** 2 * Initializes a newly created {@code String} object so that it represents 3 * the same sequence of characters as the argument; in other words, the 4 * newly created string is a copy of the argument string. Unless an 5 * explicit copy of {@code original} is needed, use of this constructor is 6 * unnecessary since Strings are immutable. 7 * 8 * @param original 9 * A {@code String}10 */11 public String(String original) {12 this.value =http://www.mamicode.com/ original.value;13 this.hash = original.hash;14 }
这个构造函数接收一个字符串对象orginal作为参数,并用它初始化一个新创建的字符串对象,使其表示一个与参数相同的字符序列;换句话说,新创建的字符串是该参数字符串的副本。
除非需要original的显式副本,否则不要使用此构造函数。
1.3
1 /** 2 * Allocates a new {@code String} so that it represents the sequence of 3 * characters currently contained in the character array argument. The 4 * contents of the character array are copied; subsequent modification of 5 * the character array does not affect the newly created string. 6 * 7 * @param value 8 * The initial value of the string 9 */10 public String(char value[]) {11 this.value =http://www.mamicode.com/ Arrays.copyOf(value, value.length);12 }
该构造函数接受一个字符数组value作为初始值来构造一个新的字符串,以表示字符数组参数中当前包含的字符序列。字符数组value的内容已被复制到字符串对象中,因此后续对字符数组的修改不会影响新创建的字符串。
1.4
/** * Allocates a new {@code String} that contains characters from a subarray * of the character array argument. The {@code offset} argument is the * index of the first character of the subarray and the {@code count} * argument specifies the length of the subarray. The contents of the * subarray are copied; subsequent modification of the character array does * not affect the newly created string. * * @param value Array that is the source of characters * @param offset The initial offset * @param count The length * @throws IndexOutOfBoundsException * If the {@code offset} and {@code count} arguments index * characters outside the bounds of the {@code value} array */ public String(char value[], int offset, int count) { if (offset < 0) { throw new StringIndexOutOfBoundsException(offset); } if (count <= 0) { if (count < 0) { throw new StringIndexOutOfBoundsException(count); } if (offset <= value.length) { this.valuehttp://www.mamicode.com/= "".value; return; } } // Note: offset or count might be near -1>>>1. if (offset > value.length - count) { throw new StringIndexOutOfBoundsException(offset + count); } this.value = http://www.mamicode.com/Arrays.copyOfRange(value, offset, offset+count); }
该构造函数会分配一个新的字符串,初始值取自字符数组value,offset参数是子数组第一个字符的索引,count参数指定子数组的长度。
当count=0且offset<=value.length时,会返回一个空的字符串。
1.5
1 /** 2 * Allocates a new {@code String} that contains characters from a subarray 3 * of the <a href="http://www.mamicode.com/Character.html#unicode">Unicode code point</a> array 4 * argument. The {@code offset} argument is the index of the first code 5 * point of the subarray and the {@code count} argument specifies the 6 * length of the subarray. The contents of the subarray are converted to 7 * {@code char}s; subsequent modification of the {@code int} array does not 8 * affect the newly created string. 9 *10 * @param codePoints Array that is the source of Unicode code points11 * @param offset The initial offset12 * @param count The length13 * @throws IllegalArgumentException If any invalid Unicode code point is found in codePoints14 * @throws IndexOutOfBoundsException If the offset and count arguments index characters outside the bounds of the codePoints array15 * @since 1.516 */17 public String(int[] codePoints, int offset, int count) {18 if (offset < 0) {19 throw new StringIndexOutOfBoundsException(offset);20 }21 if (count <= 0) {22 if (count < 0) {23 throw new StringIndexOutOfBoundsException(count);24 }25 if (offset <= codePoints.length) {26 this.valuehttp://www.mamicode.com/= "".value;27 return;28 }29 }30 // Note: offset or count might be near -1>>>1.31 if (offset > codePoints.length - count) {32 throw new StringIndexOutOfBoundsException(offset + count);33 }34 35 final int end = offset + count;36 37 // Pass 1: Compute precise size of char[]38 int n = count;39 for (int i = offset; i < end; i++) {40 int c = codePoints[i];41 if (Character.isBmpCodePoint(c))42 continue;43 else if (Character.isValidCodePoint(c))44 n++;45 else throw new IllegalArgumentException(Integer.toString(c));46 }47 48 // Pass 2: Allocate and fill in char[]49 final char[] v = new char[n];50 51 for (int i = offset, j = 0; i < end; i++, j++) {52 int c = codePoints[i];53 if (Character.isBmpCodePoint(c))54 v[j] = (char)c;55 else56 Character.toSurrogates(c, v, j++);57 }58 59 this.value =http://www.mamicode.com/ v;60 }
该构造函数从代码点数组构造字符串:
先对offset、count等做判断,看是否超出界限,然后计算字符数组大的精确大小,最后将代码点数组的内容拷贝到数组v中并返回(这里涉及到字符编码的知识,会在Character源码解析中详细叙述)。
1.6
1 /* Common private utility method used to bounds check the byte array 2 * and requested offset & length values used by the String(byte[],..) 3 * constructors. 4 */ 5 private static void checkBounds(byte[] bytes, int offset, int length) { 6 if (length < 0) 7 throw new StringIndexOutOfBoundsException(length); 8 if (offset < 0) 9 throw new StringIndexOutOfBoundsException(offset);10 if (offset > bytes.length - length)11 throw new StringIndexOutOfBoundsException(offset + length);12 }
这个方法只是单纯的进行边界检查,length、offset不能小于零,而且offset+lenght不能超出字节数组的长度。
1 /** 2 * Constructs a new {@code String} by decoding the specified subarray of 3 * bytes using the specified charset. The length of the new {@code String} 4 * is a function of the charset, and hence may not be equal to the length 5 * of the subarray. 6 * 7 * <p> The behavior of this constructor when the given bytes are not valid 8 * in the given charset is unspecified. The {@link 9 * java.nio.charset.CharsetDecoder} class should be used when more control10 * over the decoding process is required.11 *12 * @param bytes13 * The bytes to be decoded into characters14 *15 * @param offset16 * The index of the first byte to decode17 *18 * @param length19 * The number of bytes to decode20 21 * @param charsetName22 * The name of a supported {@linkplain java.nio.charset.Charset23 * charset}24 *25 * @throws UnsupportedEncodingException26 * If the named charset is not supported27 *28 * @throws IndexOutOfBoundsException29 * If the {@code offset} and {@code length} arguments index30 * characters outside the bounds of the {@code bytes} array31 *32 * @since JDK1.133 */34 public String(byte bytes[], int offset, int length, String charsetName)35 throws UnsupportedEncodingException {36 if (charsetName == null)37 throw new NullPointerException("charsetName");38 checkBounds(bytes, offset, length);39 this.value =http://www.mamicode.com/ StringCoding.decode(charsetName, bytes, offset, length);40 }41 42 /**43 * Constructs a new {@code String} by decoding the specified subarray of44 * bytes using the specified {@linkplain java.nio.charset.Charset charset}.45 * The length of the new {@code String} is a function of the charset, and46 * hence may not be equal to the length of the subarray.47 *48 * <p> This method always replaces malformed-input and unmappable-character49 * sequences with this charset‘s default replacement string. The {@link50 * java.nio.charset.CharsetDecoder} class should be used when more control51 * over the decoding process is required.52 *53 * @param bytes54 * The bytes to be decoded into characters55 *56 * @param offset57 * The index of the first byte to decode58 *59 * @param length60 * The number of bytes to decode61 *62 * @param charset63 * The {@linkplain java.nio.charset.Charset charset} to be used to64 * decode the {@code bytes}65 *66 * @throws IndexOutOfBoundsException67 * If the {@code offset} and {@code length} arguments index68 * characters outside the bounds of the {@code bytes} array69 *70 * @since 1.671 */72 public String(byte bytes[], int offset, int length, Charset charset) {73 if (charset == null)74 throw new NullPointerException("charset");75 checkBounds(bytes, offset, length);76 this.value =http://www.mamicode.com/ StringCoding.decode(charset, bytes, offset, length);77 }
这两个构造函数使用指定的字符集解码字节数组,构造一个新的字符串。解码的字符集可以使用字符集名指定或者直接将字符集传入。
decode方法在StringCoding源码解析中说明。
注意,如果给定的字符集无效,构造函数的行为没有指定。
1.7
/** * Constructs a new {@code String} by decoding the specified array of bytes * using the specified {@linkplain java.nio.charset.Charset charset}. The * length of the new {@code String} is a function of the charset, and hence * may not be equal to the length of the byte array. * * <p> The behavior of this constructor when the given bytes are not valid * in the given charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param charsetName * The name of a supported {@linkplain java.nio.charset.Charset * charset} * * @throws UnsupportedEncodingException * If the named charset is not supported * * @since JDK1.1 */ public String(byte bytes[], String charsetName) throws UnsupportedEncodingException { this(bytes, 0, bytes.length, charsetName); } /** * Constructs a new {@code String} by decoding the specified array of * bytes using the specified {@linkplain java.nio.charset.Charset charset}. * The length of the new {@code String} is a function of the charset, and * hence may not be equal to the length of the byte array. * * <p> This method always replaces malformed-input and unmappable-character * sequences with this charset‘s default replacement string. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param charset * The {@linkplain java.nio.charset.Charset charset} to be used to * decode the {@code bytes} * * @since 1.6 */ public String(byte bytes[], Charset charset) { this(bytes, 0, bytes.length, charset); } /** * Constructs a new {@code String} by decoding the specified subarray of * bytes using the platform‘s default charset. The length of the new * {@code String} is a function of the charset, and hence may not be equal * to the length of the subarray. * * <p> The behavior of this constructor when the given bytes are not valid * in the default charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @param offset * The index of the first byte to decode * * @param length * The number of bytes to decode * * @throws IndexOutOfBoundsException * If the {@code offset} and the {@code length} arguments index * characters outside the bounds of the {@code bytes} array * * @since JDK1.1 */ public String(byte bytes[], int offset, int length) { checkBounds(bytes, offset, length); this.value =http://www.mamicode.com/ StringCoding.decode(bytes, offset, length); } /** * Constructs a new {@code String} by decoding the specified array of bytes * using the platform‘s default charset. The length of the new {@code * String} is a function of the charset, and hence may not be equal to the * length of the byte array. * * <p> The behavior of this constructor when the given bytes are not valid * in the default charset is unspecified. The {@link * java.nio.charset.CharsetDecoder} class should be used when more control * over the decoding process is required. * * @param bytes * The bytes to be decoded into characters * * @since JDK1.1 */ public String(byte bytes[]) { this(bytes, 0, bytes.length); }
上面这几个构造函数很简单,就不再多说了。
1.8
1 /** 2 * Allocates a new string that contains the sequence of characters 3 * currently contained in the string buffer argument. The contents of the 4 * string buffer are copied; subsequent modification of the string buffer 5 * does not affect the newly created string. 6 * 7 * @param buffer 8 * A {@code StringBuffer} 9 */10 public String(StringBuffer buffer) {11 synchronized(buffer) {12 this.value =http://www.mamicode.com/ Arrays.copyOf(buffer.getValue(), buffer.length());13 }14 }15 16 /**17 * Allocates a new string that contains the sequence of characters18 * currently contained in the string builder argument. The contents of the19 * string builder are copied; subsequent modification of the string builder20 * does not affect the newly created string.21 *22 * <p> This constructor is provided to ease migration to {@code23 * StringBuilder}. Obtaining a string from a string builder via the {@code24 * toString} method is likely to run faster and is generally preferred.25 *26 * @param builder27 * A {@code StringBuilder}28 *29 * @since 1.530 */31 public String(StringBuilder builder) {32 this.value =http://www.mamicode.com/ Arrays.copyOf(builder.getValue(), builder.length());33 }
除了前面所示的,可以从字符串、字符数组、代码点数组、字节数组构造字符串外,也可以使用StringBuffer和StringBuilder构造字符串。
2. length()
1 /** 2 * Returns the length of this string. 3 * The length is equal to the number of <a href="http://www.mamicode.com/Character.html#unicode">Unicode 4 * code units</a> in the string. 5 * 6 * @return the length of the sequence of characters represented by this 7 * object. 8 */ 9 public int length() {10 return value.length;11 }
length()方法返回字符串的长度,即字符串中Unicode代码单元的数量。
3.isEmpty()
1 /** 2 * Returns {@code true} if, and only if, {@link #length()} is {@code 0}. 3 * 4 * @return {@code true} if {@link #length()} is {@code 0}, otherwise 5 * {@code false} 6 * 7 * @since 1.6 8 */ 9 public boolean isEmpty() {10 return value.length == 0;11 }
判断字符串是否为空。
4.charAt(int index)
1 /** 2 * Returns the {@code char} value at the 3 * specified index. An index ranges from {@code 0} to 4 * {@code length() - 1}. The first {@code char} value of the sequence 5 * is at index {@code 0}, the next at index {@code 1}, 6 * and so on, as for array indexing. 7 * 8 * <p>If the {@code char} value specified by the index is a 9 * <a href="http://www.mamicode.com/Character.html#unicode">surrogate</a>, the surrogate10 * value is returned.11 *12 * @param index the index of the {@code char} value.13 * @return the {@code char} value at the specified index of this string.14 * The first {@code char} value is at index {@code 0}.15 * @exception IndexOutOfBoundsException if the {@code index}16 * argument is negative or not less than the length of this17 * string.18 */19 public char charAt(int index) {20 if ((index < 0) || (index >= value.length)) {21 throw new StringIndexOutOfBoundsException(index);22 }23 return value[index];24 }
返回指定索引处的字符,索引范围为从0到lenght()-1。
如果索引指定的char值是代理项,则返回代理项值。
5.codePointAt(int index)
1 /** 2 * Returns the character (Unicode code point) at the specified 3 * index. The index refers to {@code char} values 4 * (Unicode code units) and ranges from {@code 0} to 5 * {@link #length()}{@code - 1}. 6 * 7 * <p> If the {@code char} value specified at the given index 8 * is in the high-surrogate range, the following index is less 9 * than the length of this {@code String}, and the10 * {@code char} value at the following index is in the11 * low-surrogate range, then the supplementary code point12 * corresponding to this surrogate pair is returned. Otherwise,13 * the {@code char} value at the given index is returned.14 *15 * @param index the index to the {@code char} values16 * @return the code point value of the character at the17 * {@code index}18 * @exception IndexOutOfBoundsException if the {@code index}19 * argument is negative or not less than the length of this20 * string.21 * @since 1.522 */23 public int codePointAt(int index) {24 if ((index < 0) || (index >= value.length)) {25 throw new StringIndexOutOfBoundsException(index);26 }27 return Character.codePointAtImpl(value, index, value.length);28 }
现在只需记住返回的是索引index处的代码点即可。
6.codePointBefore(int index)
1 /** 2 * Returns the character (Unicode code point) before the specified 3 * index. The index refers to {@code char} values 4 * (Unicode code units) and ranges from {@code 1} to {@link 5 * CharSequence#length() length}. 6 * 7 * <p> If the {@code char} value at {@code (index - 1)} 8 * is in the low-surrogate range, {@code (index - 2)} is not 9 * negative, and the {@code char} value at {@code (index -10 * 2)} is in the high-surrogate range, then the11 * supplementary code point value of the surrogate pair is12 * returned. If the {@code char} value at {@code index -13 * 1} is an unpaired low-surrogate or a high-surrogate, the14 * surrogate value is returned.15 *16 * @param index the index following the code point that should be returned17 * @return the Unicode code point value before the given index.18 * @exception IndexOutOfBoundsException if the {@code index}19 * argument is less than 1 or greater than the length20 * of this string.21 * @since 1.522 */23 public int codePointBefore(int index) {24 int i = index - 1;25 if ((i < 0) || (i >= value.length)) {26 throw new StringIndexOutOfBoundsException(index);27 }28 return Character.codePointBeforeImpl(value, index, 0);29 }
现在只需记住返回的是索引index之前的代码点即可,在类Character源码解析时详细介绍代码点相关内容。
7.codePointCount(int beginIndex, int endIndex)
1 /** 2 * Returns the number of Unicode code points in the specified text 3 * range of this {@code String}. The text range begins at the 4 * specified {@code beginIndex} and extends to the 5 * {@code char} at index {@code endIndex - 1}. Thus the 6 * length (in {@code char}s) of the text range is 7 * {@code endIndex-beginIndex}. Unpaired surrogates within 8 * the text range count as one code point each. 9 *10 * @param beginIndex the index to the first {@code char} of11 * the text range.12 * @param endIndex the index after the last {@code char} of13 * the text range.14 * @return the number of Unicode code points in the specified text15 * range16 * @exception IndexOutOfBoundsException if the17 * {@code beginIndex} is negative, or {@code endIndex}18 * is larger than the length of this {@code String}, or19 * {@code beginIndex} is larger than {@code endIndex}.20 * @since 1.521 */22 public int codePointCount(int beginIndex, int endIndex) {23 if (beginIndex < 0 || endIndex > value.length || beginIndex > endIndex) {24 throw new IndexOutOfBoundsException();25 }26 return Character.codePointCountImpl(value, beginIndex, endIndex - beginIndex);27 }
返回此字符串的指定文本范围中的 Unicode 代码点数。文本范围从beginIndex开始,到endIndex结束,长度(用char表示)为endIndex-beginIndex。该文本范围内每个未配对的代理项计为一个代码点。
8.offsetByCodePoints(int index, int codePointOffset)
1 /** 2 * Returns the index within this {@code String} that is 3 * offset from the given {@code index} by 4 * {@code codePointOffset} code points. Unpaired surrogates 5 * within the text range given by {@code index} and 6 * {@code codePointOffset} count as one code point each. 7 * 8 * @param index the index to be offset 9 * @param codePointOffset the offset in code points10 * @return the index within this {@code String}11 * @exception IndexOutOfBoundsException if {@code index}12 * is negative or larger then the length of this13 * {@code String}, or if {@code codePointOffset} is positive14 * and the substring starting with {@code index} has fewer15 * than {@code codePointOffset} code points,16 * or if {@code codePointOffset} is negative and the substring17 * before {@code index} has fewer than the absolute value18 * of {@code codePointOffset} code points.19 * @since 1.520 */21 public int offsetByCodePoints(int index, int codePointOffset) {22 if (index < 0 || index > value.length) {23 throw new IndexOutOfBoundsException();24 }25 return Character.offsetByCodePointsImpl(value, 0, value.length,26 index, codePointOffset);27 }
返回字符串中从给定的index处偏移codePointOffset个代码点的索引。
String源码解析(一)