c# string字符串连接原理

c#字符串连接原理

  • 一、现象:string 没有重载operate +,但是能加
  • 二、string.Format
  • 三、StringBuilder
  • 四、String.Join
  • 五、内插
  • 总结
  • 参考


一、现象:string 没有重载operate +,但是能加

c# string字符串连接原理_第1张图片看上面的代码执行结果,发现string跟int相加得到了一个正常的结果,但是我们在string.cs源码中并没有看到string对于+operate的重载,那么到底发生了什么呢?马上使用ildasm打开,找到关键的IL代码:
c# string字符串连接原理_第2张图片

马上可以发现,首先进行了一次int32的Tosting方法,然后,调用了string的concat方法。说明在编译期,编译器自动帮我们进行了代码转换。马上找到concat方法:

//code from string.cs
public static String Concat(Object arg0, Object arg1) {
    Contract.Ensures(Contract.Result<String>() != null);
    Contract.EndContractBlock();

    if (arg0 == null)
    {
        arg0 = String.Empty;
    }

    if (arg1==null) {
        arg1 = String.Empty;
    }
    return Concat(arg0.ToString(), arg1.ToString());
}
[System.Security.SecuritySafeCritical]  // auto-generated
public static String Concat(String str0, String str1) {
    Contract.Ensures(Contract.Result<String>() != null);
    Contract.Ensures(Contract.Result<String>().Length ==
        (str0 == null ? 0 : str0.Length) +
        (str1 == null ? 0 : str1.Length));
    Contract.EndContractBlock();

    if (IsNullOrEmpty(str0)) {
        if (IsNullOrEmpty(str1)) {
            return String.Empty;
        }
        return str1;
    }

    if (IsNullOrEmpty(str1)) {
        return str0;
    }

    int str0Length = str0.Length;
    
    String result = FastAllocateString(str0Length + str1.Length);
    
    FillStringChecked(result, 0,        str0);
    FillStringChecked(result, str0Length, str1);
    
    return result;
}

可以看到,在调用Concat的时候,传入2个Object对象,然后都tosting,之后,执行下面的方法对2个string Concat,这个过程中,调用了FastAllocateString,创建了一个新的String,然后再把之前string的值填入。那如果是3个string连续相加呢?马上试一下:
c# string字符串连接原理_第3张图片
c# string字符串连接原理_第4张图片
可以看到,都是3个字符串相加,但是后面确调用了3次2个参数的string.Concat,也就创建了3次string。连续相加只会调用1个3个参数的string.Concat,仅仅一次GC,回头想一想,字符串连接还有其他方法,那么研究一下吧。

二、string.Format

先上源码:

public static String Format(String format, Object arg0) {
     if (format == null)
         throw new ArgumentNullException("format");
     Contract.Ensures(Contract.Result<String>() != null);
     Contract.EndContractBlock();
     return Format(null, format, new Object[] {arg0});
 }

 public static String Format(String format, Object arg0, Object arg1) {
     if (format == null)
         throw new ArgumentNullException("format");
     Contract.Ensures(Contract.Result<String>() != null);
     Contract.EndContractBlock();
     return Format(null, format, new Object[] {arg0, arg1});
 }

 public static String Format(String format, Object arg0, Object arg1, Object arg2) {
     if (format == null)
         throw new ArgumentNullException("format");
     Contract.Ensures(Contract.Result<String>() != null);
     Contract.EndContractBlock();

     return Format(null, format, new Object[] {arg0, arg1, arg2});
 }


 public static String Format(String format, params Object[] args) {
     if (format == null || args == null)
         throw new ArgumentNullException((format == null) ? "format" : "args");
     Contract.Ensures(Contract.Result<String>() != null);
     Contract.EndContractBlock();

     return Format(null, format, args);
 }

 public static String Format( IFormatProvider provider, String format, params Object[] args) {
     if (format == null || args == null)
         throw new ArgumentNullException((format == null) ? "format" : "args");
     Contract.Ensures(Contract.Result<String>() != null);
     Contract.EndContractBlock();

     StringBuilder sb = StringBuilderCache.Acquire(format.Length + args.Length * 8);
     sb.AppendFormat(provider,format,args);
     return StringBuilderCache.GetStringAndRelease(sb);
 }

从上面代码可以看出来,String.Format不同参数的方法,最终会调用:
String Format( IFormatProvider provider, String format, params Object[] args)
这个方法里面其实是先调用的StringBuilderCache.Acquire,然后依次append进去,最后调用StringBuilderCache.GetStringAndRelease(sb)

那么下面就看下StringBuilderCache的这两个方法:

//stringbuildercache.cs
namespace System.Text
{
    internal static class StringBuilderCache
    {
        // The value 360 was chosen in discussion with performance experts as a compromise between using
        // as litle memory (per thread) as possible and still covering a large part of short-lived
        // StringBuilder creations on the startup path of VS designers.
        private const int MAX_BUILDER_SIZE = 360;

        [ThreadStatic]
        private static StringBuilder CachedInstance;

        public static StringBuilder Acquire(int capacity = StringBuilder.DefaultCapacity)
        {
            if(capacity <= MAX_BUILDER_SIZE)
            {
                StringBuilder sb = StringBuilderCache.CachedInstance;
                if (sb != null)
                {
                    // Avoid stringbuilder block fragmentation by getting a new StringBuilder
                    // when the requested size is larger than the current capacity
                    if(capacity <= sb.Capacity)
                    {
                        StringBuilderCache.CachedInstance = null;
                        sb.Clear();
                        return sb;
                    }
                }
            }
            return new StringBuilder(capacity);
        }

        public static void Release(StringBuilder sb)
        {
            if (sb.Capacity <= MAX_BUILDER_SIZE)
            {
                StringBuilderCache.CachedInstance = sb;
            }
        }

        public static string GetStringAndRelease(StringBuilder sb)
        {
            string result = sb.ToString();
            Release(sb);
            return result;
        }
    }
}

看上面代码可以知道,StringBuilderCache类其实帮助capacity在360以下的StringBuilder缓存了一个StringBuilder实例,想不到吧,其实我们在单次使用360容量以下的StringBuilder对象,可以直接使用,不需要本地再次缓存了。具体StringBuilder怎么做的,下一章介绍。

三、StringBuilder

根据前面的描述,我们调用了StringBuilder的AppendFormat,源码有点长,这里直接简单描述一下:

internal char[] m_ChunkChars;                // The characters in this block
internal int m_ChunkLength;                  // The index in m_ChunkChars that represent the end of the block
//这里就只写定义
public StringBuilder AppendFormat(IFormatProvider provider, String format, params Object[] args)

// Appends a character at the end of this string builder. The capacity is adjusted as needed.
public StringBuilder Append(char value, int repeatCount) {
    if (repeatCount<0) {
        throw new ArgumentOutOfRangeException("repeatCount", Environment.GetResourceString("ArgumentOutOfRange_NegativeCount"));
    }
    Contract.Ensures(Contract.Result<StringBuilder>() != null);
    Contract.EndContractBlock();

    if (repeatCount==0) {
        return this;
    }
    int idx = m_ChunkLength;
    while (repeatCount > 0)
    {
        if (idx < m_ChunkChars.Length)
        {
            m_ChunkChars[idx++] = value;
            --repeatCount;
        }
        else
        {
            m_ChunkLength = idx;
            ExpandByABlock(repeatCount);
            Contract.Assert(m_ChunkLength == 0, "Expand should create a new block");
            idx = 0;
        }
    }
    m_ChunkLength = idx;
    VerifyClassInvariant();
    return this;
}

上面两个变量保存了StringBuilder里面保存的字符串,通过AppendFormat自动识别参数format中的{n}然后使用后面的args填充到m_ChunkChars中,当m_ChunkLength不足的时候,调用ExpandByABlock:

/// 
/// Assumes that 'this' is the last chunk in the list and that it is full.  Upon return the 'this'
/// block is updated so that it is a new block that has at least 'minBlockCharCount' characters.
/// that can be used to copy characters into it.   
/// 
private void ExpandByABlock(int minBlockCharCount)
{
    Contract.Requires(Capacity == Length, "Expand expect to be called only when there is no space left");        // We are currently full
    Contract.Requires(minBlockCharCount > 0, "Expansion request must be positive");

    VerifyClassInvariant();

    if ((minBlockCharCount + Length) > m_MaxCapacity)
        throw new ArgumentOutOfRangeException("requiredLength", Environment.GetResourceString("ArgumentOutOfRange_SmallCapacity"));

    // Compute the length of the new block we need 
    // We make the new chunk at least big enough for the current need (minBlockCharCount)
    // But also as big as the current length (thus doubling capacity), up to a maximum
    // (so we stay in the small object heap, and never allocate really big chunks even if
    // the string gets really big. 
    int newBlockLength = Math.Max(minBlockCharCount, Math.Min(Length, MaxChunkSize));

    // Copy the current block to the new block, and initialize this to point at the new buffer. 
    m_ChunkPrevious = new StringBuilder(this);
    m_ChunkOffset += m_ChunkLength;
    m_ChunkLength = 0;

    // Check for integer overflow (logical buffer size > int.MaxInt)
    if (m_ChunkOffset + newBlockLength < newBlockLength)
    {
        m_ChunkChars = null;
        throw new OutOfMemoryException();
    }
    m_ChunkChars = new char[newBlockLength];

    VerifyClassInvariant();
}

通过计算newBlockLength ,重新创建了一个 m_ChunkChars = new char[newBlockLength];
最后,一般都会调用ToString()方法:

[System.Security.SecuritySafeCritical]  // auto-generated
  public override String ToString() {
      Contract.Ensures(Contract.Result<String>() != null);

      VerifyClassInvariant();
      
      if (Length == 0)
          return String.Empty;

      string ret = string.FastAllocateString(Length);
      StringBuilder chunk = this;
      unsafe {
          fixed (char* destinationPtr = ret)
          {
              do
              {
                  if (chunk.m_ChunkLength > 0)
                  {
                      // Copy these into local variables so that they are stable even in the presence of ----s (hackers might do this)
                      char[] sourceArray = chunk.m_ChunkChars;
                      int chunkOffset = chunk.m_ChunkOffset;
                      int chunkLength = chunk.m_ChunkLength;

                      // Check that we will not overrun our boundaries. 
                      if ((uint)(chunkLength + chunkOffset) <= ret.Length && (uint)chunkLength <= (uint)sourceArray.Length)
                      {
                          fixed (char* sourcePtr = sourceArray)
                              string.wstrcpy(destinationPtr + chunkOffset, sourcePtr, chunkLength);
                      }
                      else
                      {
                          throw new ArgumentOutOfRangeException("chunkLength", Environment.GetResourceString("ArgumentOutOfRange_Index"));
                      }
                  }
                  chunk = chunk.m_ChunkPrevious;
              } while (chunk != null);
          }
      }
      return ret;
  }

这里也是创建了一个stirng对象,然后调用unsafe代码中的string.wstrcpy,将m_ChunkChars的数据拷贝到新的string对象中。

可见,对于StringBuilder的使用,最好开始就计算好大小,不然经常性的扩容,也会导致GC增大,其次每次tostring的时候也有一次GC。

四、String.Join

先看一下使用场景:
c# string字符串连接原理_第5张图片
可以看到,String.Join可以方便的组合一个集合通过分隔符组成一个字符串。
那么再看一下源码的实现:

// Joins an array of strings together as one string with a separator between each original string.
public static String Join(String separator, params String[] value) {
    if (value==null)
        throw new ArgumentNullException("value");
    Contract.EndContractBlock();
    return Join(separator, value, 0, value.Length);
}

[System.Security.SecuritySafeCritical]  // auto-generated
public unsafe static String Join(String separator, String[] value, int startIndex, int count) {
	//部分代码,省略了jointLength计算
    string jointString = FastAllocateString( jointLength );
    fixed (char * pointerToJointString = &jointString.m_firstChar) {
        UnSafeCharBuffer charBuffer = new UnSafeCharBuffer( pointerToJointString, jointLength);                
        
        // Append the first string first and then append each following string prefixed by the separator.
        charBuffer.AppendString( value[startIndex] );
        for (int stringToJoinIndex = startIndex + 1; stringToJoinIndex <= endIndex; stringToJoinIndex++) {
            charBuffer.AppendString( separator );
            charBuffer.AppendString( value[stringToJoinIndex] );
        }
        Contract.Assert(*(pointerToJointString + charBuffer.Length) == '\0', "String must be null-terminated!");
    }
    return jointString;
}

可以看到,对于String数组的join,采用的是unsafe 代码,操作UnSafeCharBuffer,通过指针运算,将每一个stringappend进创建的jointString中。

[ComVisible(false)]
public static String Join(String separator, params Object[] values) {
    if (values==null)
        throw new ArgumentNullException("values");
    Contract.EndContractBlock();

    if (values.Length == 0 || values[0] == null)
        return String.Empty;

    if (separator == null)
        separator = String.Empty;

    StringBuilder result = StringBuilderCache.Acquire();

    String value = values[0].ToString();           
    if (value != null)
        result.Append(value);

    for (int i = 1; i < values.Length; i++) {
        result.Append(separator);
        if (values[i] != null) {
            // handle the case where their ToString() override is broken
            value = values[i].ToString();
            if (value != null)
                result.Append(value);
        }
    }
    return StringBuilderCache.GetStringAndRelease(result);
}

[ComVisible(false)]
public static String Join<T>(String separator, IEnumerable<T> values) {
    if (values == null)
        throw new ArgumentNullException("values");
    Contract.Ensures(Contract.Result<String>() != null);
    Contract.EndContractBlock();

    if (separator == null)
        separator = String.Empty;

    using(IEnumerator<T> en = values.GetEnumerator()) {
        if (!en.MoveNext())
            return String.Empty;

        StringBuilder result = StringBuilderCache.Acquire();
        if (en.Current != null) {
            // handle the case that the enumeration has null entries
            // and the case where their ToString() override is broken
            string value = en.Current.ToString();
            if (value != null)
                result.Append(value);
        }

        while (en.MoveNext()) {
            result.Append(separator);
            if (en.Current != null) {
                // handle the case that the enumeration has null entries
                // and the case where their ToString() override is broken
                string value = en.Current.ToString();
                if (value != null)
                    result.Append(value);
            }
        }            
        return StringBuilderCache.GetStringAndRelease(result);
    }
}

[ComVisible(false)]
public static String Join(String separator, IEnumerable<String> values) {
    if (values == null)
        throw new ArgumentNullException("values");
    Contract.Ensures(Contract.Result<String>() != null);
    Contract.EndContractBlock();

    if (separator == null)
        separator = String.Empty;


    using(IEnumerator<String> en = values.GetEnumerator()) {
        if (!en.MoveNext())
            return String.Empty;

        StringBuilder result = StringBuilderCache.Acquire();
        if (en.Current != null) {
            result.Append(en.Current);
        }

        while (en.MoveNext()) {
            result.Append(separator);
            if (en.Current != null) {
                result.Append(en.Current);
            }
        }            
        return StringBuilderCache.GetStringAndRelease(result);
    }           
}

这里是的三个方法:
public static String Join(String separator, params Object[] values)
public static String Join(String separator, IEnumerable values)
public static String Join(String separator, IEnumerable values)

都是通过StringBuilder,将字符串合并的。

五、内插

string userName = "";
string date = DateTime.Today.ToShortDateString();

// Use string interpolation to concatenate strings.
string str = $"Hello {userName}. Today is {date}.";
System.Console.WriteLine(str);

str = $"{str} How are you today?";
System.Console.WriteLine(str);

从 C# 10 开始,当用于占位符的所有表达式也是常量字符串时,可以使用字符串内插来初始化常量字符串。在某些表达式中,使用字符串内插进行字符串串联更简单,那么内插的IL到底是调用的什么呢?
c# string字符串连接原理_第6张图片
可以看到,字符串内插,其实调用的就是string.Concat。

总结

提示:这里对文章进行总结:
本文总结了5中对字符串拼接的方式,以及原理,因此我们在不同的场景要根据选择去编写字符串拼接代码。建议如下:

  1. Concat 跟 + 操作符,以及内插方法,其实都是调用了string.contacct,调用一次创建一个新字符串并且拷贝,因此这些方法不适合进行循环以及大量的拼接操作。
  2. stringBuilder,string.Format本质上都是调用了stringBuilder,但是要注意扩容,已经360capacity的话是有cache对象的
  3. 对于需要分隔符,以及数组,list等集合,可以使用string.join
  4. 其实console.WriteLine(),最终也是调用了String.Format。

参考

源码下载:Download .NET Framework 4.5.1

你可能感兴趣的:(unity,c#性能,c#,开发语言,游戏开发)