首页 > 代码库 > String源码解析(一)

String源码解析(一)

本篇文章内的方法介绍,在方法的上面的注释讲解的很清楚,这里只阐述一些要点。

Java中的String类的定义如下:

1 public final class String2     implements java.io.Serializable, Comparable<String>, CharSequence { ...}

可以看到,String是final的,而且继承了Serializable、Comparable和CharSequence接口。

正是因为这个特性,字符串对象可以被共享,例如下面两个字符串是等价的:

1      String str = "abc";2  3 is equivalent to:4 5      char data[] = {‘a‘, ‘b‘, ‘c‘};6      String str = new String(data);

String类中定义了一个final的字符数组value[],用来存储字符:

    /** The value is used for character storage. */    private final char value[];

注意value是final的,所以赋值之后不能被修改,这就是字符串不能被修改的原因。

我们要区分这个final与前面提到的String类前面的final的作用,String前的final表示类String不能被继承、不能被修改

下面我们就看下String类的具体实现。

1.构造函数

1.1 无参构造函数

    /**     * Initializes a newly created {@code String} object so that it represents     * an empty character sequence.  Note that use of this constructor is     * unnecessary since Strings are immutable.     */    public String() {        this.valuehttp://www.mamicode.com/= "".value;    }

调用这个构造函数可以构造一个空的字符串对象,可是字符串是不可变的,调用这个构造函数没有什么意义。

 

1.2

 1     /** 2      * Initializes a newly created {@code String} object so that it represents 3      * the same sequence of characters as the argument; in other words, the 4      * newly created string is a copy of the argument string. Unless an 5      * explicit copy of {@code original} is needed, use of this constructor is 6      * unnecessary since Strings are immutable. 7      * 8      * @param  original 9      *         A {@code String}10      */11     public String(String original) {12         this.value =http://www.mamicode.com/ original.value;13         this.hash = original.hash;14     }

这个构造函数接收一个字符串对象orginal作为参数,并用它初始化一个新创建的字符串对象,使其表示一个与参数相同的字符序列;换句话说,新创建的字符串是该参数字符串的副本。

除非需要original的显式副本,否则不要使用此构造函数。

 

1.3

 1     /** 2      * Allocates a new {@code String} so that it represents the sequence of 3      * characters currently contained in the character array argument. The 4      * contents of the character array are copied; subsequent modification of 5      * the character array does not affect the newly created string. 6      * 7      * @param  value 8      *         The initial value of the string 9      */10     public String(char value[]) {11         this.value =http://www.mamicode.com/ Arrays.copyOf(value, value.length);12     }

该构造函数接受一个字符数组value作为初始值来构造一个新的字符串,以表示字符数组参数中当前包含的字符序列。字符数组value的内容已被复制到字符串对象中,因此后续对字符数组的修改不会影响新创建的字符串。

 

1.4

    /**     * Allocates a new {@code String} that contains characters from a subarray     * of the character array argument. The {@code offset} argument is the     * index of the first character of the subarray and the {@code count}     * argument specifies the length of the subarray. The contents of the     * subarray are copied; subsequent modification of the character array does     * not affect the newly created string.     *     * @param  value    Array that is the source of characters     * @param  offset   The initial offset     * @param  count    The length     * @throws  IndexOutOfBoundsException     *          If the {@code offset} and {@code count} arguments index     *          characters outside the bounds of the {@code value} array     */    public String(char value[], int offset, int count) {        if (offset < 0) {            throw new StringIndexOutOfBoundsException(offset);        }        if (count <= 0) {            if (count < 0) {                throw new StringIndexOutOfBoundsException(count);            }            if (offset <= value.length) {                this.valuehttp://www.mamicode.com/= "".value;                return;            }        }        // Note: offset or count might be near -1>>>1.        if (offset > value.length - count) {            throw new StringIndexOutOfBoundsException(offset + count);        }        this.value = http://www.mamicode.com/Arrays.copyOfRange(value, offset, offset+count);    }

该构造函数会分配一个新的字符串,初始值取自字符数组value,offset参数是子数组第一个字符的索引,count参数指定子数组的长度。

当count=0且offset<=value.length时,会返回一个空的字符串。

 

1.5

 1    /** 2      * Allocates a new {@code String} that contains characters from a subarray 3      * of the <a href="http://www.mamicode.com/Character.html#unicode">Unicode code point</a> array 4      * argument.  The {@code offset} argument is the index of the first code 5      * point of the subarray and the {@code count} argument specifies the 6      * length of the subarray.  The contents of the subarray are converted to 7      * {@code char}s; subsequent modification of the {@code int} array does not 8      * affect the newly created string. 9      *10      * @param  codePoints     Array that is the source of Unicode code points11      * @param  offset     The initial offset12      * @param  count     The length13      * @throws  IllegalArgumentException      If any invalid Unicode code point is found in codePoints14      * @throws  IndexOutOfBoundsException    If the offset and count arguments index characters outside the bounds of the codePoints array15      * @since  1.516      */17     public String(int[] codePoints, int offset, int count) {18         if (offset < 0) {19             throw new StringIndexOutOfBoundsException(offset);20         }21         if (count <= 0) {22             if (count < 0) {23                 throw new StringIndexOutOfBoundsException(count);24             }25             if (offset <= codePoints.length) {26                 this.valuehttp://www.mamicode.com/= "".value;27                 return;28             }29         }30         // Note: offset or count might be near -1>>>1.31         if (offset > codePoints.length - count) {32             throw new StringIndexOutOfBoundsException(offset + count);33         }34 35         final int end = offset + count;36 37         // Pass 1: Compute precise size of char[]38         int n = count;39         for (int i = offset; i < end; i++) {40             int c = codePoints[i];41             if (Character.isBmpCodePoint(c))42                 continue;43             else if (Character.isValidCodePoint(c))44                 n++;45             else throw new IllegalArgumentException(Integer.toString(c));46         }47 48         // Pass 2: Allocate and fill in char[]49         final char[] v = new char[n];50 51         for (int i = offset, j = 0; i < end; i++, j++) {52             int c = codePoints[i];53             if (Character.isBmpCodePoint(c))54                 v[j] = (char)c;55             else56                 Character.toSurrogates(c, v, j++);57         }58 59         this.value =http://www.mamicode.com/ v;60     }

该构造函数从代码点数组构造字符串:

先对offset、count等做判断,看是否超出界限,然后计算字符数组大的精确大小,最后将代码点数组的内容拷贝到数组v中并返回(这里涉及到字符编码的知识,会在Character源码解析中详细叙述)。

 

1.6

 1     /* Common private utility method used to bounds check the byte array 2      * and requested offset & length values used by the String(byte[],..) 3      * constructors. 4      */ 5     private static void checkBounds(byte[] bytes, int offset, int length) { 6         if (length < 0) 7             throw new StringIndexOutOfBoundsException(length); 8         if (offset < 0) 9             throw new StringIndexOutOfBoundsException(offset);10         if (offset > bytes.length - length)11             throw new StringIndexOutOfBoundsException(offset + length);12     }

这个方法只是单纯的进行边界检查,length、offset不能小于零,而且offset+lenght不能超出字节数组的长度。

 

 1     /** 2      * Constructs a new {@code String} by decoding the specified subarray of 3      * bytes using the specified charset.  The length of the new {@code String} 4      * is a function of the charset, and hence may not be equal to the length 5      * of the subarray. 6      * 7      * <p> The behavior of this constructor when the given bytes are not valid 8      * in the given charset is unspecified.  The {@link 9      * java.nio.charset.CharsetDecoder} class should be used when more control10      * over the decoding process is required.11      *12      * @param  bytes13      *         The bytes to be decoded into characters14      *15      * @param  offset16      *         The index of the first byte to decode17      *18      * @param  length19      *         The number of bytes to decode20 21      * @param  charsetName22      *         The name of a supported {@linkplain java.nio.charset.Charset23      *         charset}24      *25      * @throws  UnsupportedEncodingException26      *          If the named charset is not supported27      *28      * @throws  IndexOutOfBoundsException29      *          If the {@code offset} and {@code length} arguments index30      *          characters outside the bounds of the {@code bytes} array31      *32      * @since  JDK1.133      */34     public String(byte bytes[], int offset, int length, String charsetName)35             throws UnsupportedEncodingException {36         if (charsetName == null)37             throw new NullPointerException("charsetName");38         checkBounds(bytes, offset, length);39         this.value =http://www.mamicode.com/ StringCoding.decode(charsetName, bytes, offset, length);40     }41 42     /**43      * Constructs a new {@code String} by decoding the specified subarray of44      * bytes using the specified {@linkplain java.nio.charset.Charset charset}.45      * The length of the new {@code String} is a function of the charset, and46      * hence may not be equal to the length of the subarray.47      *48      * <p> This method always replaces malformed-input and unmappable-character49      * sequences with this charset‘s default replacement string.  The {@link50      * java.nio.charset.CharsetDecoder} class should be used when more control51      * over the decoding process is required.52      *53      * @param  bytes54      *         The bytes to be decoded into characters55      *56      * @param  offset57      *         The index of the first byte to decode58      *59      * @param  length60      *         The number of bytes to decode61      *62      * @param  charset63      *         The {@linkplain java.nio.charset.Charset charset} to be used to64      *         decode the {@code bytes}65      *66      * @throws  IndexOutOfBoundsException67      *          If the {@code offset} and {@code length} arguments index68      *          characters outside the bounds of the {@code bytes} array69      *70      * @since  1.671      */72     public String(byte bytes[], int offset, int length, Charset charset) {73         if (charset == null)74             throw new NullPointerException("charset");75         checkBounds(bytes, offset, length);76         this.value =http://www.mamicode.com/  StringCoding.decode(charset, bytes, offset, length);77     }

这两个构造函数使用指定的字符集解码字节数组,构造一个新的字符串。解码的字符集可以使用字符集名指定或者直接将字符集传入。

decode方法在StringCoding源码解析中说明。

注意,如果给定的字符集无效,构造函数的行为没有指定。

 

1.7

    /**     * Constructs a new {@code String} by decoding the specified array of bytes     * using the specified {@linkplain java.nio.charset.Charset charset}.  The     * length of the new {@code String} is a function of the charset, and hence     * may not be equal to the length of the byte array.     *     * <p> The behavior of this constructor when the given bytes are not valid     * in the given charset is unspecified.  The {@link     * java.nio.charset.CharsetDecoder} class should be used when more control     * over the decoding process is required.     *     * @param  bytes     *         The bytes to be decoded into characters     *     * @param  charsetName     *         The name of a supported {@linkplain java.nio.charset.Charset     *         charset}     *     * @throws  UnsupportedEncodingException     *          If the named charset is not supported     *     * @since  JDK1.1     */    public String(byte bytes[], String charsetName)            throws UnsupportedEncodingException {        this(bytes, 0, bytes.length, charsetName);    }    /**     * Constructs a new {@code String} by decoding the specified array of     * bytes using the specified {@linkplain java.nio.charset.Charset charset}.     * The length of the new {@code String} is a function of the charset, and     * hence may not be equal to the length of the byte array.     *     * <p> This method always replaces malformed-input and unmappable-character     * sequences with this charset‘s default replacement string.  The {@link     * java.nio.charset.CharsetDecoder} class should be used when more control     * over the decoding process is required.     *     * @param  bytes     *         The bytes to be decoded into characters     *     * @param  charset     *         The {@linkplain java.nio.charset.Charset charset} to be used to     *         decode the {@code bytes}     *     * @since  1.6     */    public String(byte bytes[], Charset charset) {        this(bytes, 0, bytes.length, charset);    }    /**     * Constructs a new {@code String} by decoding the specified subarray of     * bytes using the platform‘s default charset.  The length of the new     * {@code String} is a function of the charset, and hence may not be equal     * to the length of the subarray.     *     * <p> The behavior of this constructor when the given bytes are not valid     * in the default charset is unspecified.  The {@link     * java.nio.charset.CharsetDecoder} class should be used when more control     * over the decoding process is required.     *     * @param  bytes     *         The bytes to be decoded into characters     *     * @param  offset     *         The index of the first byte to decode     *     * @param  length     *         The number of bytes to decode     *     * @throws  IndexOutOfBoundsException     *          If the {@code offset} and the {@code length} arguments index     *          characters outside the bounds of the {@code bytes} array     *     * @since  JDK1.1     */    public String(byte bytes[], int offset, int length) {        checkBounds(bytes, offset, length);        this.value =http://www.mamicode.com/ StringCoding.decode(bytes, offset, length);    }    /**     * Constructs a new {@code String} by decoding the specified array of bytes     * using the platform‘s default charset.  The length of the new {@code     * String} is a function of the charset, and hence may not be equal to the     * length of the byte array.     *     * <p> The behavior of this constructor when the given bytes are not valid     * in the default charset is unspecified.  The {@link     * java.nio.charset.CharsetDecoder} class should be used when more control     * over the decoding process is required.     *     * @param  bytes     *         The bytes to be decoded into characters     *     * @since  JDK1.1     */    public String(byte bytes[]) {        this(bytes, 0, bytes.length);    }

上面这几个构造函数很简单,就不再多说了。

 

1.8

 1     /** 2      * Allocates a new string that contains the sequence of characters 3      * currently contained in the string buffer argument. The contents of the 4      * string buffer are copied; subsequent modification of the string buffer 5      * does not affect the newly created string. 6      * 7      * @param  buffer 8      *         A {@code StringBuffer} 9      */10     public String(StringBuffer buffer) {11         synchronized(buffer) {12             this.value =http://www.mamicode.com/ Arrays.copyOf(buffer.getValue(), buffer.length());13         }14     }15 16     /**17      * Allocates a new string that contains the sequence of characters18      * currently contained in the string builder argument. The contents of the19      * string builder are copied; subsequent modification of the string builder20      * does not affect the newly created string.21      *22      * <p> This constructor is provided to ease migration to {@code23      * StringBuilder}. Obtaining a string from a string builder via the {@code24      * toString} method is likely to run faster and is generally preferred.25      *26      * @param   builder27      *          A {@code StringBuilder}28      *29      * @since  1.530      */31     public String(StringBuilder builder) {32         this.value =http://www.mamicode.com/ Arrays.copyOf(builder.getValue(), builder.length());33     }

除了前面所示的,可以从字符串、字符数组、代码点数组、字节数组构造字符串外,也可以使用StringBuffer和StringBuilder构造字符串。

2. length()

 1     /** 2      * Returns the length of this string. 3      * The length is equal to the number of <a href="http://www.mamicode.com/Character.html#unicode">Unicode 4      * code units</a> in the string. 5      * 6      * @return  the length of the sequence of characters represented by this 7      *          object. 8      */ 9     public int length() {10         return value.length;11     }

length()方法返回字符串的长度,即字符串中Unicode代码单元的数量。

3.isEmpty()

 1     /** 2      * Returns {@code true} if, and only if, {@link #length()} is {@code 0}. 3      * 4      * @return {@code true} if {@link #length()} is {@code 0}, otherwise 5      * {@code false} 6      * 7      * @since 1.6 8      */ 9     public boolean isEmpty() {10         return value.length == 0;11     }

判断字符串是否为空。

4.charAt(int index)

 1     /** 2      * Returns the {@code char} value at the 3      * specified index. An index ranges from {@code 0} to 4      * {@code length() - 1}. The first {@code char} value of the sequence 5      * is at index {@code 0}, the next at index {@code 1}, 6      * and so on, as for array indexing. 7      * 8      * <p>If the {@code char} value specified by the index is a 9      * <a href="http://www.mamicode.com/Character.html#unicode">surrogate</a>, the surrogate10      * value is returned.11      *12      * @param      index   the index of the {@code char} value.13      * @return     the {@code char} value at the specified index of this string.14      *             The first {@code char} value is at index {@code 0}.15      * @exception  IndexOutOfBoundsException  if the {@code index}16      *             argument is negative or not less than the length of this17      *             string.18      */19     public char charAt(int index) {20         if ((index < 0) || (index >= value.length)) {21             throw new StringIndexOutOfBoundsException(index);22         }23         return value[index];24     }

返回指定索引处的字符,索引范围为从0到lenght()-1。

如果索引指定的char值是代理项,则返回代理项值。   

5.codePointAt(int index)

 1     /** 2      * Returns the character (Unicode code point) at the specified 3      * index. The index refers to {@code char} values 4      * (Unicode code units) and ranges from {@code 0} to 5      * {@link #length()}{@code  - 1}. 6      * 7      * <p> If the {@code char} value specified at the given index 8      * is in the high-surrogate range, the following index is less 9      * than the length of this {@code String}, and the10      * {@code char} value at the following index is in the11      * low-surrogate range, then the supplementary code point12      * corresponding to this surrogate pair is returned. Otherwise,13      * the {@code char} value at the given index is returned.14      *15      * @param      index the index to the {@code char} values16      * @return     the code point value of the character at the17      *             {@code index}18      * @exception  IndexOutOfBoundsException  if the {@code index}19      *             argument is negative or not less than the length of this20      *             string.21      * @since      1.522      */23     public int codePointAt(int index) {24         if ((index < 0) || (index >= value.length)) {25             throw new StringIndexOutOfBoundsException(index);26         }27         return Character.codePointAtImpl(value, index, value.length);28     }

现在只需记住返回的是索引index处的代码点即可。

6.codePointBefore(int index)

 1     /** 2      * Returns the character (Unicode code point) before the specified 3      * index. The index refers to {@code char} values 4      * (Unicode code units) and ranges from {@code 1} to {@link 5      * CharSequence#length() length}. 6      * 7      * <p> If the {@code char} value at {@code (index - 1)} 8      * is in the low-surrogate range, {@code (index - 2)} is not 9      * negative, and the {@code char} value at {@code (index -10      * 2)} is in the high-surrogate range, then the11      * supplementary code point value of the surrogate pair is12      * returned. If the {@code char} value at {@code index -13      * 1} is an unpaired low-surrogate or a high-surrogate, the14      * surrogate value is returned.15      *16      * @param     index the index following the code point that should be returned17      * @return    the Unicode code point value before the given index.18      * @exception IndexOutOfBoundsException if the {@code index}19      *            argument is less than 1 or greater than the length20      *            of this string.21      * @since     1.522      */23     public int codePointBefore(int index) {24         int i = index - 1;25         if ((i < 0) || (i >= value.length)) {26             throw new StringIndexOutOfBoundsException(index);27         }28         return Character.codePointBeforeImpl(value, index, 0);29     }

现在只需记住返回的是索引index之前的代码点即可,在类Character源码解析时详细介绍代码点相关内容。

7.codePointCount(int beginIndex, int endIndex)

 1     /** 2      * Returns the number of Unicode code points in the specified text 3      * range of this {@code String}. The text range begins at the 4      * specified {@code beginIndex} and extends to the 5      * {@code char} at index {@code endIndex - 1}. Thus the 6      * length (in {@code char}s) of the text range is 7      * {@code endIndex-beginIndex}. Unpaired surrogates within 8      * the text range count as one code point each. 9      *10      * @param beginIndex the index to the first {@code char} of11      * the text range.12      * @param endIndex the index after the last {@code char} of13      * the text range.14      * @return the number of Unicode code points in the specified text15      * range16      * @exception IndexOutOfBoundsException if the17      * {@code beginIndex} is negative, or {@code endIndex}18      * is larger than the length of this {@code String}, or19      * {@code beginIndex} is larger than {@code endIndex}.20      * @since  1.521      */22     public int codePointCount(int beginIndex, int endIndex) {23         if (beginIndex < 0 || endIndex > value.length || beginIndex > endIndex) {24             throw new IndexOutOfBoundsException();25         }26         return Character.codePointCountImpl(value, beginIndex, endIndex - beginIndex);27     }

返回此字符串的指定文本范围中的 Unicode 代码点数。文本范围从beginIndex开始,到endIndex结束,长度(用char表示)为endIndex-beginIndex。该文本范围内每个未配对的代理项计为一个代码点。 

8.offsetByCodePoints(int index, int codePointOffset)

 1     /** 2      * Returns the index within this {@code String} that is 3      * offset from the given {@code index} by 4      * {@code codePointOffset} code points. Unpaired surrogates 5      * within the text range given by {@code index} and 6      * {@code codePointOffset} count as one code point each. 7      * 8      * @param index the index to be offset 9      * @param codePointOffset the offset in code points10      * @return the index within this {@code String}11      * @exception IndexOutOfBoundsException if {@code index}12      *   is negative or larger then the length of this13      *   {@code String}, or if {@code codePointOffset} is positive14      *   and the substring starting with {@code index} has fewer15      *   than {@code codePointOffset} code points,16      *   or if {@code codePointOffset} is negative and the substring17      *   before {@code index} has fewer than the absolute value18      *   of {@code codePointOffset} code points.19      * @since 1.520      */21     public int offsetByCodePoints(int index, int codePointOffset) {22         if (index < 0 || index > value.length) {23             throw new IndexOutOfBoundsException();24         }25         return Character.offsetByCodePointsImpl(value, 0, value.length,26                 index, codePointOffset);27     }

返回字符串中从给定的index处偏移codePointOffset个代码点的索引。

 

String源码解析(一)