Learn Roslyn Now[非原创]

摘录于: Shotgun Debugging Learn Roslyn Now

Learn Roslyn Now

Learn Roslyn Now is a blog series that explores Microsoft’s Roslyn compiler API. My aim with this series is to introduce people to the power of Roslyn through small self-contained examples. I’ve taken inspiration from LearnVSXNow, a series by Istvan Novak that walks people through Visual Studio Extensibility.

  • Part 1: Installing Roslyn
  • Part 2: Analyzing Syntax Trees With LINQ
  • Part 3: Syntax Nodes and Syntax Tokens
  • Part 4: CSharpSyntaxWalker
  • Part 5: CSharpSyntaxRewriter
  • Part 6: Working with Workspaces
  • Part 7: Introducing the Semantic Model
  • Part 8: Data Flow Analysis
  • Part 9: Control Flow Analysis
  • Part 10: Introduction to Analyzers
  • Part 11: Introduction to Code Fixes
  • Part 12: The DocumentEditor
  • Part 13: Syntax Annotations
  • Part 14: Introduction to the Scripting API
  • Part 15: The SymbolVisitor
  • Part 16: The Emit API

Learn Roslyn Now TV

  • Episode 01: Introduction to Roslyn Tooling
  • Episode 02: The Syntax Tree API
  • Episode 03: The SyntaxWalker API
  • Episode 04: The CSharpSyntaxRewriter API
  • Episode 05: The Semantic Model and Symbol API
  • Episode 06: The MSBuildWorkspace
  • Episode 07: The VisualStudioWorkspace
  • Episode 08: The AdhocWorkspace
  • Episode 09: Introduction To Analyzers

Learn Roslyn Now: Quick Tips

  • Working with Regions
  • Fields and Symbols
  • Working with nameof
  • Don’t trust SyntaxNode.ToFullString()
  • PCL References and MSBuildWorkspace
  • Bridging Visual Studio and Roslyn
  • Enabling C# 7 Features in Roslyn

如果原文网络打不开可参考我搬过来的正文:

Learn Roslyn Now: Part 1 Getting Roslyn

Roslyn is deployed as a NuGet package.
Navigate to: Tools > NuGet Package Manager > Package Manager Console
**Paste the following:** Install-Package Microsoft.CodeAnalysis

Learn Roslyn Now: Part 2 Analyzing Syntax Trees with LINQ

Note: I’ve also created a ten-minute video to explore the Syntax Tree API

I won’t spend much time explaining Syntax Trees. There are a number of posts that deal with that including the Roslyn Whitepaper. The main idea is that given a string containing C# code, the compiler creates a tree representation (called a Syntax Tree) of the string. Roslyn’s power is that it allows us to query this Syntax Tree with LINQ.

Here is a sample in which we use Roslyn create a Syntax Tree from a string. We must add references to Microsoft.CodeAnalysis and Microsoft.CodeAnalysis.CSharp. You can do so using Method 1 from Part 1 Installing Roslyn.

using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;

var tree = CSharpSyntaxTree.ParseText(@"
    public class MyClass
    {
        public void MyMethod()
        {
        }
    }");

var syntaxRoot = tree.GetRoot();
var MyClass = syntaxRoot.DescendantNodes().OfType().First();
var MyMethod = syntaxRoot.DescendantNodes().OfType().First();

Console.WriteLine(MyClass.Identifier.ToString());
Console.WriteLine(MyMethod.Identifier.ToString());

We first start by parsing a string containing C# code and getting the root of this syntax tree. From this point it’s extremely easy to retrieve elements we’d like using LINQ. Given the root of the tree, we look at all the descendant objects and filter them by their type. While we’ve only used ClassDeclarationSyntax and MethodDeclarationSyntax there are corresponding pieces of syntax for any C# feature.

Visual Studio’s Intellisense is extremely valuable for exploring the various types of C# syntax we can use.

We can composed more advanced LINQ expressions as one might expect:

var tree = CSharpSyntaxTree.ParseText(@"
    public class MyClass
    {
        public void MyMethod()
        {
        }
        public void MyMethod(int n)
        {
        }
    }");

var syntaxRoot = tree.GetRoot();
var MyMethod = syntaxRoot.DescendantNodes().OfType()
    .Where(n => n.ParameterList.Parameters.Any()).First();

//Find the type that contains this method
var containingType = MyMethod.Ancestors().OfType().First();

Console.WriteLine(containingType.Identifier.ToString());
Console.WriteLine(MyMethod.ToString());

Above, we start by finding all methods, and then filtering by those that accept parameters. We then take this method and work our way upwards through the tree with the Ancestors() method, searching for the first type that contains this method.

Hopefully this acts as a base for you to play around and explore the Syntax Tree API. There are some limitations to the kind of information you can discover at a purely syntactical level and to overcome these we must make use of Roslyn’s Semantic Model, which will be the subject of future posts.

Learn Roslyn Now: Part 3 Syntax Nodes and Syntax Tokens

Syntax trees are made up of three things: Syntax Nodes, Syntax Tokens and Trivia.

The Roslyn documentation describes Syntax Nodes and Syntax Tokens as follows:

Syntax nodes are one of the primary elements of syntax trees. These nodes represent syntactic constructs such as declarations, statements, clauses, and expressions. Each category of syntax nodes is represented by a separate class derived from SyntaxNode.

Syntax tokens are the terminals of the language grammar, representing the smallest syntactic fragments of the code. They are never parents of other nodes or tokens. Syntax tokens consist of keywords, identifiers, literals, and punctuation.

While both definitions are accurate, they don’t give newcomers much insight on the difference between the two.

Let’s take a look at the following class and its Syntax Tree.

class SimpleClass
{
    public void SimpleMethod()
    {

    }
}

Using Roslyn’s Syntax Visualizer, we can take a peek at the syntax tree:

syntax tree

The Syntax Visualizer shows Syntax Nodes in blue and Syntax Tokens in green.

Syntax Nodes:
ClassDeclaration
MethodDeclaration
ParamteterList
Block

Syntax Tokens:
class
SimpleClass
Punctuation
void
SimpleMethod

Syntax Tokens cannot be broken into simpler pieces. They are the atomic units that make up a C# program. They are the leaves of a syntax tree. They always have a parent Syntax Node (as their parent cannot be a Syntax Token).

Syntax Nodes, on the other hand, are combinations of other Syntax Nodes and Syntax Tokens. They can always be broken into smaller pieces. In my experience, you’re most interested in Syntax Nodes when trying to reason about a syntax tree.

Learn Roslyn Now: Part 4 CSharpSyntaxWalker

In Part 2: Analyzing Syntax Trees With LINQ, we explored different approaches to picking apart pieces of the syntax tree. This approach works well when you’re only interested in specific pieces of syntax (methods, classes, throw statement etc.) It’s great for singling out certain parts of the syntax tree for further investigation.

However, sometimes you’d like to operate on all nodes and tokens within a tree. Alternatively, the order in which you visit these nodes might be important. Perhaps you’re trying to convert C# into VB.Net. Or maybe you’d like to analyze a C# file and output a static HTML file with correct colorization. Both of these programs would require us to visit all nodes and tokens within a syntax tree in the correct order.

The abstract class CSharpSyntaxWalker allows us to construct our own syntax walker that can visit all nodes, tokens and trivia. We can simply inherit from CSharpSyntaxWalker and override the Visit() method to visit all nodes within the tree.

public class CustomWalker : CSharpSyntaxWalker
{
    static int Tabs = 0;
    public override void Visit(SyntaxNode node)
    {
        Tabs++;
        var indents = new String('\t', Tabs);
        Console.WriteLine(indents + node.Kind());
        base.Visit(node);
        Tabs—;
    }
}

static void Main(string[] args)
{
    var tree = CSharpSyntaxTree.ParseText(@"
        public class MyClass
        {
            public void MyMethod()
            {
            }
            public void MyMethod(int n)
            {
            }
       ");
    
    var walker = new CustomWalker();
    walker.Visit(tree.GetRoot());
}

This short sample contains an implementation of CSharpSyntaxWalker called CustomWalker. CustomWalker overrides the Visit() method and prints the type of the node being currently visited. It’s important to note that CustomWalker.Visit() also calls the base.Visit(SyntaxNode) method. This allows the CSharpSyntaxWalker to visit all the child nodes of the current node.

The output for this program:


syntax tree

We can clearly see the various nodes of the syntax tree and their relationship with one another. There are two sibling MethodDeclarations who share the same parent ClassDeclaration.

This above example only visits the nodes of a syntax tree, but we can modify CustomWalker to visit tokens and trivia as well. The abstract class CSharpSyntaxWalker has a constructor that allows us to specify the depth with which we want to visit.

We can modify the above sample to print out the nodes and their corresponding tokens at each depth of the syntax tree.

public class DeeperWalker : CSharpSyntaxWalker
{
    static int Tabs = 0;
    //NOTE: Make sure you invoke the base constructor with 
    //the correct SyntaxWalkerDepth. Otherwise VisitToken()
    //will never get run.
    public DeeperWalker() : base(SyntaxWalkerDepth.Token)
    {
    }
    public override void Visit(SyntaxNode node)
    {
        Tabs++;
        var indents = new String('\t', Tabs);
        Console.WriteLine(indents + node.Kind());
        base.Visit(node);
        Tabs—;
    }

    public override void VisitToken(SyntaxToken token)
    {
        var indents = new String('\t', Tabs);
        Console.WriteLine(indents + token);
        base.VisitToken(token);
    }
}

Note: It’s important to pass the appropriate SyntaxWalkerDepth argument to CSharpSyntaxWalker. Otherwise, the overridden VisitToken() method is never called. Personally, I don’t think CSharpSyntaxWalker’s arguments should be optional. It was unclear to me that the most conservative depth would be walked when I was learning how to use this class.

The output when we use this CSharpSyntaxWalker:


CSharpSyntaxWalker

The previous sample and this one share the same syntax tree. The output contains the same syntax nodes, but we’ve added the corresponding syntax tokens for each node.

In the above examples, we’ve visited all nodes and all tokens within a syntax tree. However, sometimes we’d only like to visit certain nodes, but in the predefined order that the CSharpSyntaxWalker provides. Thankfully the API allows us to filter the nodes we’d like to visit based on their syntax.

Instead of visiting all nodes as we did in previous samples, the following only visits ClassDeclarationSyntax and MethodDeclarationSyntax nodes. It’s extremely simple, just printing out the concatenation of the class’ name with the method’s name.

public class ClassMethodWalker : CSharpSyntaxWalker
{
    string className = String.Empty;
    public override void VisitClassDeclaration(ClassDeclarationSyntax node)
    {
        className = node.Identifier.ToString();
        base.VisitClassDeclaration(node);
    }

    public override void VisitMethodDeclaration(MethodDeclarationSyntax node)
    {
        string methodName = node.Identifier.ToString();
        Console.WriteLine(className + '.' + methodName);
        base.VisitMethodDeclaration(node);
    }
}

static void Main(string[] args)
{
    var tree = CSharpSyntaxTree.ParseText(@"
    public class MyClass
    {
        public void MyMethod()
        {
        }
    }
    public class MyOtherClass
    {
        public void MyMethod(int n)
        {
        }
    }
   ");

    var walker = new ClassMethodWalker();
    walker.Visit(tree.GetRoot());
}

This sample simply outputs:
MyClass.MyMethod
MyOtherClass.MyMethod

The CSharpSyntaxWalker acts as a really great API for analyzing syntax trees. It allows one to accomplish a lot without resorting to using the semantic model and forcing a (possibly) expensive compilation. Whenever inspecting syntax trees and order is important, the CSharpSyntaxWalker is usually what you’re looking for.

Learn Roslyn Now: Part 5 CSharpSyntaxRewriter

In Part 4, we discussed the abstract CSharpSyntaxWalker and how we could navigate the syntax tree with the visitor pattern. Today, we go one step further with the CSharpSyntaxRewriter, and “modify” the syntax tree as we traverse it. It’s important to note that we’re not actually mutating the original syntax tree, as Roslyn’s syntax trees are immutable. Instead, the CSharpSyntaxRewriter creates a new syntax tree resulting from our changes.

The CSharpSyntaxRewriter can visit all nodes, tokens or trivia within a syntax tree. Like the CSharpSyntaxVisitor, we can selectively choose what pieces of syntax we’d like to visit. We do this by overriding various methods and returning one of the following:

  • The original, unchanged node, token or trivia.
  • Null, signalling the node, token or trivia is to be removed.
  • A new syntax node, token or trivia.

As with most APIs, the CSharpSyntaxRewriter is best understood through examples. A recent question on Stack Overflow asked How can I remove redundant semicolons in code with SyntaxRewriter?

Roslyn treats all redundant semicolons as part of an EmptyStatementSyntax node. Below, we demonstrate how to solve the base case: an unnecessary semicolon on a line of its own.

public class EmtpyStatementRemoval : CSharpSyntaxRewriter
{
    public override SyntaxNode VisitEmptyStatement(EmptyStatementSyntax node)
    {
        //Simply remove all Empty Statements
        return null;
    }
}

public static void Main(string[] args)
{
    //A syntax tree with an unnecessary semicolon on its own line
    var tree = CSharpSyntaxTree.ParseText(@"
    public class Sample
    {
       public void Foo()
       {
          Console.WriteLine();
          ;
        }
    }");

    var rewriter = new EmtpyStatementRemoval();
    var result = rewriter.Visit(tree.GetRoot());
    Console.WriteLine(result.ToFullString());
}

The output of this program produces a simple program without any redundant semicolons.

public class Sample
{
   public void Foo()
   {
      Console.WriteLine();
    }
}

However, odulkanberoglu points out some problems with this approach. When either leading or trailing trivia is present, this trivia is removed. This means, comments above and below the semicolon will be stripped out.

svick has a pretty clever workaround. By constructing an EmptyStatementSyntax with a missing token instead of a semicolon, we can manage to remove the semicolon from the original tree. His approach is demonstrated below:

public class EmtpyStatementRemoval : CSharpSyntaxRewriter
{
    public override SyntaxNode VisitEmptyStatement(EmptyStatementSyntax node)
    {
        //Construct an EmptyStatementSyntax with a missing semicolon
        return node.WithSemicolonToken(
            SyntaxFactory.MissingToken(SyntaxKind.SemicolonToken)
                .WithLeadingTrivia(node.SemicolonToken.LeadingTrivia)
                .WithTrailingTrivia(node.SemicolonToken.TrailingTrivia));
    }
}

public static void Main(string[] args)
{
    var tree = CSharpSyntaxTree.ParseText(@"
    public class Sample
    {
       public void Foo()
       {
          Console.WriteLine();
          #region SomeRegion
          //Some other code
          #endregion
          ;
        }
    }");

    var rewriter = new EmtpyStatementRemoval();
    var result = rewriter.Visit(tree.GetRoot());
    Console.WriteLine(result.ToFullString());
}

The output of this approach is:

public class Sample
{
   public void Foo()
   {
      Console.WriteLine();
      #region SomeRegion
      //Some other code
      #endregion

    }
}

This approach has the side effect of leaving a blank line wherever there was a redundant semicolon. That being said, I think it’s probably worth the trade-off as there doesn’t seem to be a way to retain trivia otherwise. Ultimately, the trivia can only be retained by attaching it to a node, and then returning that node.

An aside: I suspect this will be the de facto approach to removing any syntax nodes in the future. It’s highly likely that any syntax node one might wish to remove might have associated comment trivia. The only way to remove the node while retaining the trivia is to construct a replacement node. The best candidate for replacement will likely be an EmptyStatementSyntax with a missing semicolon.

This might also indicate a limitation with the CSharpSyntaxRewriter. It seems like it should be easier to remove nodes, while retaining their trivia.

Learn Roslyn Now: Part 6 Working with Workspaces

Special thanks to @JasonMalinowski for his help clarifying some of the subtleties of the workspace API. Until this point, we’ve simply been constructing syntax trees from strings. This approach works well when creating short samples, but often we’d like to work with entire solutions. Enter: Workspaces. Workspaces are the root node of a C# hierarchy that consists of a solution, child projects and child documents. A fundamental tenet within Roslyn is that most objects are immutable. This means we can’t hold on to a reference to a solution and expect it to be up-to-date forever. The moment a change is made, this solution will be out of date and a new, updated solution will have been created. Workspaces are our root node. Unlike solutions, projects and documents, they won’t become invalid and always contain a reference to the current, most up-to-date solution. There are four Workspace variants to consider:

Workspace

The abstract base class for all other workspaces. It’s a little disingenuous to claim that it’s a workspace variant, as you’ll never actually have an instance of it. Instead, this class serves as a sort of API around which actual workspace implementations can be created. It can be tempting to think of workspaces solely within the context of Visual Studio. After all, for most C# developers this is the only way we’ve dealt with solutions and projects. However, Workspace is meant to be agnostic as to the physical source of the files it represents. Individual implementations might store the files on the local filesystem, within a database, or even on a remote machine. One simply inherits from this class and overrides Workspace’s empty implementations as they see fit.

MSBuildWorkspace

A workspace that has been built to handle MSBuild solution (.sln) and project (.csproj, .vbproj) files. Unfortunately it cannot currently write to .sln files, which means we can’t use it to add projects or create new solutions.

The following example shows how we can iterate over all the documents in a solution:

string solutionPath = @"C:\Users\…\PathToSolution\MySolution.sln";
var msWorkspace = MSBuildWorkspace.Create();

var solution = msWorkspace.OpenSolutionAsync(solutionPath).Result;
foreach (var project in solution.Projects)
{
    foreach (var document in project.Documents)
    {
        Console.WriteLine(project.Name + "\t\t\t" + document.Name);
    }
}

For more information see Learn Roslyn Now – E06 – MSBuildWorkspace.

AdhocWorkspace

A workspace that allows one to add solution and project files manually. One should note that the API for adding and removing solution items is different within AdhocWorkspace when compared to the other workspaces. Instead of calling TryApplyChanges(), methods for adding projects and documents are provided at the workspace level. This workspace is meant to be consumed by those who just need a quick and easy way to create a workspace and add projects and documents to it.

var workspace = new AdhocWorkspace();

string projName = "NewProject";
var projectId = ProjectId.CreateNewId();
var versionStamp = VersionStamp.Create();
var projectInfo = ProjectInfo.Create(projectId, versionStamp, projName, projName, LanguageNames.CSharp);
var newProject = workspace.AddProject(projectInfo);
var sourceText = SourceText.From("class A {}");
var newDocument = workspace.AddDocument(newProject.Id, "NewFile.cs", sourceText);

foreach (var project in workspace.CurrentSolution.Projects)
{
    foreach (var document in project.Documents)
    {
        Console.WriteLine(project.Name + "\t\t\t" + document.Name);
    }
}

For more information see Learn Roslyn Now – E08 – AdhocWorkspace

VisualStudioWorkspace

The active workspace consumed within Visual Studio packages. As this workspace is tightly integrated with Visual Studio, it’s difficult to provide a small example on how to use this workspace. Steps:

  1. Create a new VSPackage.
  2. Add a reference to the Microsoft.VisualStudio.LanguageServices.dll. It’s now available on NuGet.
  3. Navigate to the Package.cs file (where is the name you chose for your solution.
  4. Find the Initalize() method.
  5. Place the following code within Initialize()
protected override void Initialize()
{ 
    //Other stuff…
    …
    
    var componentModel = (IComponentModel)this.GetService(typeof(SComponentModel));
    var workspace = componentModel.GetService();
}
 
//Alternatively you can MEF import the workspace. MEF can be tricky if you're not familiar with it
//but here's how you'd import VisuaStudioWorkspace as a property.
 
[Import(typeof(Microsoft.VisualStudio.LanguageServices.VisualStudioWorkspace))]
public VisualStudioWorkspace myWorkspace { get; set; }

When writing VSPackages, one of the most useful pieces of functionality exposed by the workspace is the WorkspaceChanged event. This event allows our VSPackage to respond to any changes made by the user or any other VSPackage. Naturally, the best way to familiarize oneself with workspaces is to use them. Roslyn’s immutability can impose a slight learning curve so we’ll be exploring how to modify documents and projects in future posts.

For more information see Learn Roslyn Now – E07 – Visual StudioWorkspace

Learn Roslyn Now: Part 7 Introducing the Semantic Model

Up until this point we’ve been working with C# code on a purely syntactical level. We can find property declarations, but we can’t track down references to this property within our source code. We can identify invocations, but we can’t tell what’s being invoked. And God help us if we want to try to solve the really hard problems like overload resolution.

In this developer’s opinion, the semantic layer is where the power of Roslyn really shines. Roslyn’s semantic model can answer all the hard compile-time questions we might have. However, this power comes at a cost. Querying the semantic model is typically more expensive than querying syntax trees. This is because requesting a semantic model often triggers a compilation.

There are 3 different ways to request the semantic model:

  1. Document.GetSemanticModel()
  2. Compilation.GetSemanticModel(SyntaxTree) 3. Various Diagnostic AnalysisContexts including CodeBlockStartAnalysisContext.SemanticModel and SemanticModelAnalysisContext.SemanticModel

To avoid the boiler plate involved in setting up our own Workspace, we’ll simply create compilations for individual syntax trees as follows:

var tree = CSharpSyntaxTree.ParseText(@"
    public class MyClass 
    {
        int MyMethod() { return 0; }
    }");

var Mscorlib = MetadataReference.CreateFromFile(typeof(object).Assembly.Location);
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree }, references: new[] { Mscorlib });
//Note that we must specify the tree for which we want the model.
//Each tree has its own semantic model
var model = compilation.GetSemanticModel(tree);

Symbols

Before continuing, it’s worth taking a moment to discuss Symbols.

C# programs are comprised of unique elements, such as types, methods, properties and so on. Symbols represent most everything the compiler knows about each of these unique elements.

At a high level, every symbol contains information about:

  • Where this elements is declared in source or metadata (It may have come from an external assembly)
  • What namespace and type this symbol exists within
  • Various truths about the symbol being abstract, static, sealed etc.
  • More information may be found in ISymbol.

Other, more context-dependent information may also be uncovered. When dealing with methods, IMethodSymbol allows us to determine:

  • Whether the method hides a base method.
  • The symbol representing the return type of the method.
  • The extension method from which this symbol was reduced.

Requesting Symbols

The semantic model is our bridge between the world of syntax and the world of symbols.

SemanticModel.GetDeclaredSymbol() accepts declaration syntax and provides the corresponding symbol.

SemanticModel.GetSymbolInfo() accepts expression syntax (eg. InvocationExpressionSyntax) and returns a symbol. If the model could not successfully resolve a symbol, it provides candidate symbols which can serve as best guesses.

Below, we retrieve the symbol for a method via it’s declaration syntax. We then retrieve the same symbol, but via an invocation (InvocationExpressionSyntax) instead.

var tree = CSharpSyntaxTree.ParseText(@"
    public class MyClass {
             int Method1() { return 0; }
             void Method2()
             {
                int x = Method1();
             }
        }
    }");

var Mscorlib = PortableExecutableReference.CreateFromAssembly(typeof(object).Assembly);
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree }, references: new[] { Mscorlib });
var model = compilation.GetSemanticModel(tree);

//Looking at the first method symbol
var methodSyntax = tree.GetRoot().DescendantNodes().OfType().First();
var methodSymbol = model.GetDeclaredSymbol(methodSyntax);

Console.WriteLine(methodSymbol.ToString());         //MyClass.Method1()
Console.WriteLine(methodSymbol.ContainingSymbol);   //MyClass
Console.WriteLine(methodSymbol.IsAbstract);         //false

//Looking at the first invocation
var invocationSyntax = tree.GetRoot().DescendantNodes().OfType().First();
var invokedSymbol = model.GetSymbolInfo(invocationSyntax).Symbol; //Same as MyClass.Method1

Console.WriteLine(invokedSymbol.ToString());         //MyClass.Method1()
Console.WriteLine(invokedSymbol.ContainingSymbol);   //MyClass
Console.WriteLine(invokedSymbol.IsAbstract);         //false

Console.WriteLine(invokedSymbol.Equals(methodSymbol)); //true

Note on performance:
The documentation for SemanticNode notes the following:

An instance of SemanticModel caches local symbols and semantic information. Thus, it is much more efficient to use a single instance of SemanticModel when asking multiple questions about a syntax tree, because information from the first question may be reused. This also means that holding onto an instance of SemanticModel for a long time may keep a significant amount of memory from being garbage collected.

Essentially, Roslyn is allowing you to make the tradeoff between memory and computation. When querying the semantic model repetitively, it may be in your best interest to keep an instance of it around, instead of requesting a new model from a compilation or document.

Next Time
We’ve only scratched the surface of the Semantic Model. Next time we’ll take a look at the control and data flow analysis APIs.

Learn Roslyn Now: Part 8 Data Flow Analysis

Writing this blog post has been really painful. It’s been three months since I last published my introduction to the semantic model and I’ve been putting off this post for as long as I could. I started a new series called Learn Roslyn Now Quick Tips, I helped build Source Browser, and I even submitted a small pull request to clean up the analysis APIs. Basically, I’ve done everything but learn and write about these APIs.

The two reasons I’ve struggled to write about AnalyzeControlFlow and AnalyzeDataFlow are:

  1. I’ve struggled to imagine how one would use them in an analyzer or extension.
  2. They’re weird, unintuitive and they frighten me.

I put out a tweet asking how others were using them, and it appears they’re only really used within Microsoft to implement the “Extract Method” functionality. A handful of questions on Stack Overflow have mentioned these APIs, so I’m sure someone out there is putting them to good use.

Data Flow Analysis

This API can be used to inspect how variables are read and written within a given block of code. Perhaps you’d like to make a Visual Studio extension that captures and logs all assignments to a certain variable. You could use the data flow analysis API to find the statements, and a rewriter to log them.

To demonstrate the capabilities of this API, we’ll be looking at a modified piece of code posted on Stack Overflow. I’ve cleaned it up slightly, but it shows a number of interesting behaviors consumers of this API should be aware of.

We can analyze the for-loop in the following code:

var tree = CSharpSyntaxTree.ParseText(@"
public class Sample
{
   public void Foo()
   {
        int[] outerArray = new int[10] { 0, 1, 2, 3, 4, 0, 1, 2, 3, 4};
        for (int index = 0; index < 10; index++)
        {
             int[] innerArray = new int[10] { 0, 1, 2, 3, 4, 0, 1, 2, 3, 4 };
             index = index + 2;
             outerArray[index – 1] = 5;
        }
   }
}");
 
var Mscorlib = MetadataReference.CreateFromFile(typeof(object).Assembly.Location);
 
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree }, references: new[] { Mscorlib });
var model = compilation.GetSemanticModel(tree);
 
var forStatement = tree.GetRoot().DescendantNodes().OfType().Single();
DataFlowAnalysis result = model.AnalyzeDataFlow(forStatement);

At this point we’ve got access to a DataFlowAnalysis object.

Perhaps the most important property on this object is Succeeded. This tells you if the data flow analysis completed successfully. In my experience the API has been pretty good at dealing with semantically invalid code. Neither invocations to missing methods nor use of undeclared variables seemed to trip it up. The documentation notes that if the analyzed region does not span a single expression or statement then analysis is likely to fail.

The DataFlowAnalysis object exposes a pretty rich API for uses to consume. It exposes information about unsafe addresses, local variables captured by anonymous methods and much more.

In our case, we’re interested in the following properties:

  • DataFlowAnalysis.AlwaysAssigned – The set of local variables for which a value is always assigned inside a region.
  • DataFlowAnalysis.ReadInside – The set of local variables that are read inside a region.
  • DataFlowAnalysis.WrittenOutside – The set of local variables that are written outside a region.
  • DataFlowAnalysis.WrittenInside – The set of local variables that are written inside a region.
  • DataFlowAnalysis.VariablesDeclared – The set of local variables that are declared within a region. Note the region must be bounded by a method’s body or a field’s initializer, so parameter symbols are never included in the result.

To refresh, the code on which we’ve analyzed is displayed below. The region we’ve declared interest in is the for-loop.

public class Sample
{
   public void Foo()
   {
        int[] outerArray = new int[10] { 0, 1, 2, 3, 4, 0, 1, 2, 3, 4};
        for (int index = 0; index < 10; index++)
        {
             int[] innerArray = new int[10] { 0, 1, 2, 3, 4, 0, 1, 2, 3, 4 };
             index = index + 2;
             outerArray[index – 1] = 5;
        }
   }
}

The results from analysis are as follows:

AlwaysAssigned: index index is always assigned to as it is contained within the initializer of the for-loop, which runs unconditionally.

WrittenInside: index, innerArray Both index and innerArray are clearly written within the loop.

One important point is that outerArray is not. While we’re mutating the array, we’re not mutating the reference contained within the outerArray variable. Therefore it does not show up in this list.

WrittenOutside: outerArray, this outerArray is clearly written to outside of the for-loop.

However, it surprised me that this showed up as a parameter symbol within the WrittenOutside list. It appears as though this is passed as a parameter to the class and its member, which means that it shows up here as well. This appears to be by design, although I suspect most consumers of this API will be surprised, and likely ignore this value.

ReadInside: index, outerArray It is clear that the value of index is read within the loop.

It was surprising to me that outerArray is considered to be “read” inside the loop as we’re not reading its value directly. I suppose that technically we must first read the value of outerArray in order to calculate the offset and retrieve the correct address for the given element of the array. So we’re performing a sort of “implicit read” inside the loop here.

VariablesDeclared: index, innerArray This is fairly straightforward. index is declared within the loop initializer and innerArray within the body of the for-loop.

Final Thoughts

The general weirdness of the data flow analysis API has long kept me from writing about it. The issues with this and what’s considered a read vs. a write is pretty offputting to me. I suspect these kinds of issues will prevent a lot of people from taking advantage of this API, but I could be wrong. It’s difficult to say this early in the game and I have not seen very much discussion about this API and the above problems.

Learn Roslyn Now: Part 9 Control Flow Analysis

Control flow analysis is used to understand the various entry and exit points within a block of code and to answer questions about reachability. If we’re analyzing a method, we might be interested in all the points at which we can return out of the method. If we’re analyzing a for-loop, we might be interested in all the places we break or continue.

We trigger control flow analysis via an extension method on the SemanticModel. This returns an instance of [ControlFlowAnalysis](http://source.roslyn.io/#Microsoft.CodeAnalysis/Compilation/ControlFlowAnalysis.cs,76b153de98a08228)to us that exposes the following properties:

  • EntryPoints – The set of statements inside the region that are the destination of branches outside the region.
  • ExitPoints – The set of statements inside a region that jump to locations outside the region.
  • EndPointIsReachable – Indicates whether a region completes normally. Returns true if and only if the end of the last statement is reachable or the entire region contains no statements.
  • StartPointIsReachable – Indicates whether a region can begin normally.
  • ReturnStatements – The set of returns statements within a region.
  • Succeeded – Returns true if and only if analysis was successful. Analysis can fail if the region does not properly span a single expression, a single statement, or a contiguous series of statements within the enclosing block.

Basic usage of the API:

var tree = CSharpSyntaxTree.ParseText(@"
    class C
    {
        void M()
        {
            for (int i = 0; i < 10; i++)
            {
                if (i == 3)
                    continue;
                if (i == 8)
                    break;
            }
        }
    }
");

var Mscorlib = PortableExecutableReference.CreateFromAssembly(typeof(object).Assembly);
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree }, references: new[] { Mscorlib });
var model = compilation.GetSemanticModel(tree);

var firstFor = tree.GetRoot().DescendantNodes().OfType().Single();
ControlFlowAnalysis result = model.AnalyzeControlFlow(firstFor.Statement);

Console.WriteLine(result.Succeeded);            //True
Console.WriteLine(result.ExitPoints.Count());    //2 – continue, and break

Alternatively, we can specify two statements and analyze the statements between the two. The following example demonstrates this and the usage of EntryPoints:

var tree = CSharpSyntaxTree.ParseText(@"
class C
{
    void M(int x)
    {
        L1: ; // 1
        if (x == 0) goto L1;    //firstIf
        if (x == 1) goto L2;
        if (x == 3) goto L3;
        L3: ;                   //label3
        L2: ; // 2
        if(x == 4) goto L3;
    }
}
");

var Mscorlib = PortableExecutableReference.CreateFromAssembly(typeof(object).Assembly);
var compilation = CSharpCompilation.Create("MyCompilation",
syntaxTrees: new[] { tree }, references: new[] { Mscorlib });
var model = compilation.GetSemanticModel(tree);

//Choose first and last statements
var firstIf = tree.GetRoot().DescendantNodes().OfType().First();
var label3 = tree.GetRoot().DescendantNodes().OfType().Skip(1).Take(1).Single();

ControlFlowAnalysis result = model.AnalyzeControlFlow(firstIf, label3);
Console.WriteLine(result.EntryPoints);      //1 – Label 3 is a candidate entry point within these statements
Console.WriteLine(result.ExitPoints);       //2 – goto L1 and goto L2 and candidate exit points

In the above example, we see an example of a possible entry point label L3. To the best of my knowledge, labels are the only possible entry points.

Finally, we’ll take a look at answering questions about reachability. In the following, neither the start point or the end point is reachable:

var tree = CSharpSyntaxTree.ParseText(@"
    class C
    {
        void M(int x)
        {
            return;
            if(x == 0)                                  //-+     Start is unreachable
                System.Console.WriteLine(""Hello"");    // |
            L1:                                            //-+    End is unreachable
        }
    }
");

var Mscorlib = PortableExecutableReference.CreateFromAssembly(typeof(object).Assembly);
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree }, references: new[] { Mscorlib });
var model = compilation.GetSemanticModel(tree);

//Choose first and last statements
var firstIf = tree.GetRoot().DescendantNodes().OfType().Single();
var label1 = tree.GetRoot().DescendantNodes().OfType().Single();

ControlFlowAnalysis result = model.AnalyzeControlFlow(firstIf, label1);
Console.WriteLine(result.StartPointIsReachable);    //False
Console.WriteLine(result.EndPointIsReachable);      //False

Overall, the Control Flow API seems a lot more intuitive than the Data Flow Analysis API. It requires less knowledge of the C# specification and is straightforward to work with. At Code Connect, we’ve been using it when rewriting and logging methods. Although it looks like no one has experimented much with this API, I’m really interested to see what uses others will come up with.

Learn Roslyn Now: Part 10 Introduction to Analyzers

Roslyn analyzers allow companies and individuals to enforce certain rules within a code base. My understanding is that there are two primary uses for analyzers:

  • Broadly enforce coding styles and best practices
  • Specifically guide individuals consuming a library

The first use is largely a replacement for tools like StyleCop and FxCop. We can use analyzers to enforce stylistic choices like “All private variables must start with a lowercase letter” and “Use spaces not tabs”. In fact, you can start using StyleCop.Analyzers today. From a NuGet command line simply use:

Install-Package StyleCop.Analyzers -Pre

The second use is to release library specific analyzers meant to guide consumers of your library. For example, we might want to ensure that no one does the following:

var dateTime = System.DateTime.UtcNow;
dateTime.AddDays(1);

System.DateTime is immutable, so the above code is misleading. Instead the user should have written the following:

var dateTime = System.DateTime.UtcNow;
dateTime = dateTime.AddDays(1);

Analyzers allow library authors to help guide their users. In that sense, I hope that it becomes standard to release a set of analyzers alongside new libraries. It’s difficult to say if this will actually happen, as it requires extra work from library authors.
Download the Roslyn SDK Templates
The templates do not ship with Visual Studio 2015. To install them go to:

Tools > Extensions and Updates > Online.

Search for “Roslyn SDK” and find the templates that correspond to your version. I’m using Visual Studio 2015 RC. I’ve chosen the package selected below:


Roslyn SDK

After installing the templates, you must restart Visual Studio.
Creating your first analyzer
Navigate to:

File > New Project > Extensibility > Analyzer with Code Fix


New Project

Give your analyzer a name and click “OK”. I’ve taken the creative liberty of naming mine "Analyzer1". From here we’re presented a README that explains that building our project creates both a .vsix for Visual Studio and a .nupkg for submission to NuGet. There are also instructions on how to properly distribute your analyzer as a NuGet package.

Let’s take a look at what we’re given right out of the box:


Project

We’re given three projects:

Analyzer1 – The brain of our analyzer. This is where all code analysis is done and code fixes are figured out.
Anylzer1.Test – A default test project with some helper classes to make testing easier.
Analyzer.Vsix – The startup project that will be deployed to Visual Studio. The .vsixmanifest tells Visual Studio that you’d like to export an analyzer and a code fix.
To run the project, simply press F5. A new instance of Visual Studio will launch. This Visual Studio is called the Experimental Hive and has its own set of settings within the Windows Registry. Note: It’s a good practice to choose a different theme for your Experimental Hive so you don’t get them mixed up.

Once you open a solution, you’ll notice Visual Studio complaining about a lot of new warnings. The analyzer we’re running simply creates a warning when it sees any type with lowercase letters in its name. It’s obviously not very useful, but allows us to also demonstrate the code fix included in this sample:


Fix Code

Now that we’ve got a rough idea of what each project is for, we’ll explore Analyzer1 and what we’re given for free.

DiagnosticAnalyzer.cs

The first thing to notice is that our Analyzer inherits from the abstract class DiagnosticAnalyzer. This class expects us to do two things:

  • Expose a set of diagnostics our analyzer is responsible for via SupportedDiagnostics.
  • Initialize our analyzer via Initialize(AnalysisContext).

Let’s take a look at the properties and fields in the first half of the file:

[DiagnosticAnalyzer(LanguageNames.CSharp)]
public class Analyzer1Analyzer : DiagnosticAnalyzer
{
    public const string DiagnosticId = "Analyzer1";

    // You can change these strings in the Resources.resx file. If you do not want your analyzer to be localize-able, you can use regular strings for Title and MessageFormat.
    internal static readonly LocalizableString Title = new LocalizableResourceString(nameof(Resources.AnalyzerTitle), Resources.ResourceManager, typeof(Resources));
    internal static readonly LocalizableString MessageFormat = new LocalizableResourceString(nameof(Resources.AnalyzerMessageFormat), Resources.ResourceManager, typeof(Resources));
    internal static readonly LocalizableString Description = new LocalizableResourceString(nameof(Resources.AnalyzerDescription), Resources.ResourceManager, typeof(Resources));
    internal const string Category = "Naming";

    internal static DiagnosticDescriptor Rule = new DiagnosticDescriptor(DiagnosticId, Title, MessageFormat, Category, DiagnosticSeverity.Warning, isEnabledByDefault: true, description: Description);

    public override ImmutableArray SupportedDiagnostics { get { return ImmutableArray.Create(Rule); } }
    
    …
}

It may seem overwhelming at first, but bear with me. First notice the DiagnosticAnalyzer attribute applied to the class. This specifies what language or languages our analyzer will be run on. Today, you can only specify C# and VB .Net.

Looking within the class, the first five properties are simply strings to describe our analyzer and provide messages to users list. By default, the analyzer is set up to encourage localization and allows you define your title, message format and description as localizable strings. However if localization scares you like it does me, you make them simple strings.

Take a moment to look at DiagnosticDescriptor Rule. It defines a DiagnosticSeverity of “Warning”. I suspect you’ll likely want to stick with Warning, but if you feel like imposing on consumers of your analyzer, you could upgrade the severity to Error and prevent compilation completely. Note: I don’t recommend this. If your analyzer misbehaves and reports errors where there are none, the user will remove it.

Finally, lets take a look at the two generated methods:

[DiagnosticAnalyzer(LanguageNames.CSharp)]
public class Analyzer1Analyzer : DiagnosticAnalyzer
{
    …

    public override void Initialize(AnalysisContext context)
    {
        // TODO: Consider registering other actions that act on syntax instead of or in addition to symbols
        context.RegisterSymbolAction(AnalyzeSymbol, SymbolKind.NamedType);
    }

    private static void AnalyzeSymbol(SymbolAnalysisContext context)
    {
        // TODO: Replace the following code with your own analysis, generating Diagnostic objects for any issues you find
        var namedTypeSymbol = (INamedTypeSymbol)context.Symbol;

        // Find just those named type symbols with names containing lowercase letters.
        if (namedTypeSymbol.Name.ToCharArray().Any(char.IsLower))
        {
            // For all such symbols, produce a diagnostic.
            var diagnostic = Diagnostic.Create(Rule, namedTypeSymbol.Locations[0], namedTypeSymbol.Name);

            context.ReportDiagnostic(diagnostic);
        }
    }
}

The Initialize() method sets up the analyzer by registering the AnalyzeSymbol method to fire when semantic analysis has been run on a NamedType symbol. This is only one example out of a handful of ways to trigger an analyzer. We can register our analyzer to run on various triggers including compilation, analysis of codeblocks and analysis of syntax trees. We’ll flush out AnalysisContext in further posts.

The AnalyzeSymbol() method is where we actually do the analysis we’ve been talking about. This is where we would use the Syntax Tree and Symbol APIs to diagnose and report issues. In the case of this analyzer it simply takes the INamedTypSymbol provided and checks whether any of the characters in its name are lowercase. If they are, we report this diagnostic using the Rule we defined earlier.

This may seem like an awful lot of boilerplate for such a simple analyzer. However, once you start building complicated analyzers, you’ll find that the analysis code quickly starts to dominate and that the boilerplate isn’t so bad.

Next time, we’ll explore the CodeFixProvider and how we can offer solutions to problems we find in a user’s code.

Learn Roslyn Now: Part 11 Introduction to Code Fixes

Last time (three months ago, jeez) we talked about building our first analyzer and what we get out of the box with the default analyzer template. Today we’ll talk about the second half of the analyzer project: The Code Fix Provider.

CodeFixProvider.cs

The first thing to notice is that our class inherits from CodeFixProvider. If you take a quick look at CodeFixProvider, you’ll see that it expects you to provide at least two things:

FixableDiagnosticsIds – A list of diagnostic IDs that we would like our code fix to deal with. We would have defined these IDs in our original analyzer.

RegisterCodeFixesAsync – Registers our code fix within Visual Studio to handle our diagnostic(s).

GetFixAllProvider – An optional FixAllProvider that can apply your code fix to all the occurrences of a diagnostic.

Let’s take a look at the first half of this file:

[ExportCodeFixProvider(LanguageNames.CSharp, Name = nameof(Analyzer1CodeFixProvider)), Shared]
public class Analyzer1CodeFixProvider : CodeFixProvider
{
  private const string title = "Make uppercase";

  public sealed override ImmutableArray FixableDiagnosticIds
  {
      get { return ImmutableArray.Create(Analyzer1Analyzer.DiagnosticId); }
  }

  public sealed override FixAllProvider GetFixAllProvider()
  {
      return WellKnownFixAllProviders.BatchFixer;
  }
  
  …
}

First, we can see that we’re exporting a code fix provider for C# with the name “Analyzer1CodeFixProvider”. We can also specify additional languages such as VB if you’re writing a multi-language code fix. Note that we have to specify the name explicitly here. (Name is a property in ExportCodeFixProvider. I’d actually never come across this attribute-specific syntax before.)

To start, we’ve got the title of the analyzer which is self explanatory. We’ll expose this title to Visual Studio when we register our code fix action.

Next, we’ve got to expose a list of diagnostics for which we’d like to provide our code fix. In this case, we expose the analyzer we created in the introduction to analyzers.

Finally, the default codefix template overrides the optional GetFixAllProvider. In this case they provide a [BatchFixer](http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/CodeFixes/FixAllOccurrences/WellKnownFixAllProviders.cs,f42c8a42757f6a56).The BatchFixer computes all the required changes in parallel and then applies them to the solution at one time.

Now we’ll take a look at the last two methods given to us in CodeFixProvider.cs

[ExportCodeFixProvider(LanguageNames.CSharp, Name = nameof(Analyzer1CodeFixProvider)), Shared]
public class Analyzer1CodeFixProvider : CodeFixProvider
{
  …
  
  public sealed override async Task RegisterCodeFixesAsync(CodeFixContext context)
  {
      var root = await context.Document.GetSyntaxRootAsync(context.CancellationToken).ConfigureAwait(false);

      // TODO: Replace the following code with your own analysis, generating a CodeAction for each fix to suggest
      var diagnostic = context.Diagnostics.First();
      var diagnosticSpan = diagnostic.Location.SourceSpan;

      // Find the type declaration identified by the diagnostic.
      var declaration = root.FindToken(diagnosticSpan.Start).Parent.AncestorsAndSelf().OfType().First();

      // Register a code action that will invoke the fix.
      context.RegisterCodeFix(
          CodeAction.Create(
              title: title,
              createChangedSolution: c => MakeUppercaseAsync(context.Document, declaration, c),
              equivalenceKey: title),
          diagnostic);
  }

  private async Task MakeUppercaseAsync(Document document, TypeDeclarationSyntax typeDecl, CancellationToken cancellationToken)
  {
      // Compute new uppercase name.
      var identifierToken = typeDecl.Identifier;
      var newName = identifierToken.Text.ToUpperInvariant();

      // Get the symbol representing the type to be renamed.
      var semanticModel = await document.GetSemanticModelAsync(cancellationToken);
      var typeSymbol = semanticModel.GetDeclaredSymbol(typeDecl, cancellationToken);

      // Produce a new solution that has all references to that type renamed, including the declaration.
      var originalSolution = document.Project.Solution;
      var optionSet = originalSolution.Workspace.Options;
      var newSolution = await Renamer.RenameSymbolAsync(document.Project.Solution, typeSymbol, newName, optionSet, cancellationToken).ConfigureAwait(false);

      // Return the new solution with the now-uppercase type name.
      return newSolution;
  }
}

The first is RegisterCodeFixesAsync and it accepts a CodeFixContext. The CodeFixContext has information about where we can apply our code fix, and what diagnostics are available for us to register our code fix against. CodFixContext provides a list of diagnostics for us to choose from based on what we exposed in FixableDiagnosticIds.

Based on my experiments, RegisterCodeFixesAsync is run every time the Visual Studio light bulb appears due to a diagnostic we’ve declared interest in. At this point we can register a action to run that we’d like to apply if the user selects our code fix. We do this with context.RegisterCodeFix(). We pass in a title, a function that returns a solution with our change and an optional equivalence key. The title is simply what will be displayed to the user when they see our fix as an option. In the default template it’s “Make uppercase” which you can see below:

fix code

Clicking on the code fix runs MakeUppercaseAsync. There’s admittedly a lot of overhead here for what seems like a trivial change. The real work occurs in [Renamer.RenameSymbolAsync()](http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Rename/Renamer.cs,122757488c14f307) an API that quickly and easily renames symbols for us across an entire solution. Remember that Roslyn objects are immutable, so we are given an entirely new solution (newSolution) which we return from our method. Now Visual Studio will replace the previous solution with our updated copy.

One final note to make is regarding equivalenceKey. The equivalence key is used to match our code fix against other code fixes and see whether or not they’re the same. To my knowledge, there’s no commonly agreed upon format for these keys. However it looks like projects such as StyleCopAnalyzers are using a similar approach to Microsoft and name theirs with a two letter code followed by a number (eg. SA1510CodeFixProvider).
And there you have it. That’s the base case analyzer that ships with Visual Studio. Obviously we can build much more powerful analyzers and code fixes, but this project should serve as a nice starting point for most people. For more advanced analyzers check out StyleCopAnalyzers, Code Cracker or the Roslyn Analyzers.

Learn Roslyn Now: Part 12 Document Editing with the DocumentEditor

One drawback of Roslyn’s immutability is that it can sometimes make it tricky to apply multiple changes to a Document or SyntaxTree. Immutability means that every time we apply changes to a syntax tree, we’re given an entirely new syntax tree. By default we can’t compare nodes across trees, so what do we do when we want to make multiple changes to a syntax tree?

Roslyn gives us four options:

  • Use the [CSharpSyntaxRewriter](http://source.roslyn.io/#Microsoft.CodeAnalysis.CSharp/Syntax/CSharpSyntaxRewriter.cs,72741c962906b744) and rewrite from the bottom up (See LRN: Part 5)
  • Use Annotations (See LRN: Part 13)
  • Use [TrackNodes()](http://source.roslyn.io/#Microsoft.CodeAnalysis/Syntax/SyntaxNodeExtensions_Tracking.cs,83e3274ab1824195)
  • Use the [DocumentEditor](http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Editing/DocumentEditor.cs,324ac2311809b8f7)

The DocumentEditor allows us to make multiple changes to a document and get the resulting document after the changes have been applied. Under the covers, the DocumentEditor is a thin layer over the [SyntaxEditor](http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Editing/SyntaxEditor.cs,6b0ef9b1d0beaf05).

We’ll use the DocumentEditor to change:

char key = Console.ReadKey();
if(key == 'A')
{
    Console.WriteLine("You pressed A");
}
else
{
    Console.WriteLine("You didn't press A");
}

to:

char key = Console.ReadKey();
if(key == 'A')
{
    LogConditionWasTrue();
    Console.WriteLine("You pressed A");
}
else
{
    Console.WriteLine("You didn't press A");
    LogConditionWasFalse();
}

We’ll use the DocumentEditor to simultaneously insert an invocation before the first Console.WriteLine() and to insert another after the second.

Unfortunately there’s a ton of boiler plate when creating a Document from scratch. Typically you’ll get a Document from a Workspace so it shouldn’t be this bad:

var mscorlib = MetadataReference.CreateFromAssembly(typeof(object).Assembly);
var workspace = new AdhocWorkspace();
var projectId = ProjectId.CreateNewId();
var versionStamp = VersionStamp.Create();
var projectInfo = ProjectInfo.Create(projectId, versionStamp, "NewProject", "projName", LanguageNames.CSharp);
var newProject = workspace.AddProject(projectInfo);
var sourceText = SourceText.From(@"
class C
{
    void M()
    {
        char key = Console.ReadKey();
        if (key == 'A')
        {
            Console.WriteLine(""You pressed A"");
        }
        else
        {
            Console.WriteLine(""You didn't press A"");
        }
    }
}");
var document = workspace.AddDocument(newProject.Id, "NewFile.cs", sourceText);
var syntaxRoot = await document.GetSyntaxRootAsync();
var ifStatement = syntaxRoot.DescendantNodes().OfType().Single();

var conditionWasTrueInvocation =
SyntaxFactory.ExpressionStatement(
    SyntaxFactory.InvocationExpression(SyntaxFactory.IdentifierName("LogConditionWasTrue"))
    .WithArgumentList(
                    SyntaxFactory.ArgumentList()
                    .WithOpenParenToken(
                        SyntaxFactory.Token(
                            SyntaxKind.OpenParenToken))
                    .WithCloseParenToken(
                        SyntaxFactory.Token(
                            SyntaxKind.CloseParenToken))))
            .WithSemicolonToken(
                SyntaxFactory.Token(
                    SyntaxKind.SemicolonToken));

var conditionWasFalseInvocation =
SyntaxFactory.ExpressionStatement(
    SyntaxFactory.InvocationExpression(SyntaxFactory.IdentifierName("LogConditionWasFalse"))
    .WithArgumentList(
                    SyntaxFactory.ArgumentList()
                    .WithOpenParenToken(
                        SyntaxFactory.Token(
                            SyntaxKind.OpenParenToken))
                    .WithCloseParenToken(
                        SyntaxFactory.Token(
                            SyntaxKind.CloseParenToken))))
            .WithSemicolonToken(
                SyntaxFactory.Token(
                    SyntaxKind.SemicolonToken));

//Finally… create the document editor
var documentEditor = await DocumentEditor.CreateAsync(document);
//Insert LogConditionWasTrue() before the Console.WriteLine()
documentEditor.InsertBefore(ifStatement.Statement.ChildNodes().Single(), conditionWasTrueInvocation);
//Insert LogConditionWasFalse() after the Console.WriteLine()
documentEditor.InsertAfter(ifStatement.Else.Statement.ChildNodes().Single(), conditionWasFalseInvocation);

var newDocument = documentEditor.GetChangedDocument();

All the familiar SyntaxNode methods are here. We can [Insert](http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Editing/SyntaxEditor.cs,cf8f1630bbc805d7,references), [Replace](http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Editing/SyntaxEditor.cs,4de5d817a570515f,references) and [Remove](http://source.roslyn.io/#Microsoft.CodeAnalysis.Workspaces/Editing/SyntaxEditor.cs,56f11260dd8f06b8,references) nodes as we see fit, all based off of nodes in our original syntax tree. Many people find this approach more intuitive than building an entire CSharpSyntaxRewriter.

It can be somewhat difficult to debug things when they go wrong. When writing this post I was mistakenly trying to insert nodes after ifStatement.Else instead of ifStatement.Else.Statement. I was receiving an InvalidOperationException but the message wasn’t very useful and it took me quite some time to figure out what I was doing wrong. The documentation on InsertNodeAfter says:

This node must be of a compatible type to be placed in the same list containing the existing node.

**How can we know which types of nodes are compatible with one another? **I don’t think there’s a good answer here. We essentially have to learn which nodes are compatible ourselves. As usual the Syntax Visualizer and Roslyn Quoter are the best tools for figuring out what kinds of nodes you should be creating.

It’s worth noting that the DocumentEditor exposes the SemanticModel of your original document. You may need this when editing the original document and making decisions about what you’d like to change.

It’s also worth noting that the underlying SyntaxEditor exposes a SyntaxGenerator that you can use to build syntax nodes without relying on the more verbose SyntaxFactory.

Learn Roslyn Now: Part 13 Keeping track of syntax nodes with Syntax Annotations

It can be tricky to keep track nodes when applying changes to syntax trees. Every time we “change” a tree, we’re really creating a copy of it with our changes applied to that new tree. The moment we do that, any pieces of syntax we had references to earlier become invalid in the context of the new tree.

What’s this mean in practice? It’s tough to keep track of syntax nodes when we change syntax trees.

A recent Stack Overflow question touched on this. How can we get the symbol for a class that we’ve just added to a document? We can create a new class declaration, but the moment we add it to the document, we lose track of the node. So how can we keep track of the class so we can get the symbol for it once we’ve added it to the document?

The answer: Use a [SyntaxAnnotation](http://source.roslyn.io/#Microsoft.CodeAnalysis/Syntax/SyntaxAnnotation.cs,5df4388ff3239a2c)

A SyntaxAnnotation is a basically piece of metadata we can attach to a piece of syntax. As we manipulate the tree, the annotation sticks with that piece of syntax making it easy to find.

AdhocWorkspace workspace = new AdhocWorkspace();
Project project = workspace.AddProject("SampleProject", LanguageNames.CSharp);

//Attach a syntax annotation to the class declaration
var syntaxAnnotation = new SyntaxAnnotation();
var classDeclaration = SyntaxFactory.ClassDeclaration("MyClass")
    .WithAdditionalAnnotations(syntaxAnnotation);

var compilationUnit = SyntaxFactory.CompilationUnit().AddMembers(classDeclaration);

Document document = project.AddDocument("SampleDocument.cs", compilationUnit);
SemanticModel semanticModel = document.GetSemanticModelAsync().Result;

//Use the annotation on our original node to find the new class declaration
var changedClass = document.GetSyntaxRootAsync().Result.DescendantNodes().OfType()
    .Where(n => n.HasAnnotation(syntaxAnnotation)).Single();
var symbol = semanticModel.GetDeclaredSymbol(changedClass);

There are a couple of overloads available when creating a SyntaxAnnotation. We can specify Kind and Data to be attached to pieces of syntax. Data is used to attach extra information to a piece of syntax that we’d like to retrieve later. Kind is a field we can use to search for Syntax Annotations.

So instead of looking for the exact instance of our annotation on each node, we could search for annotations based on their kind:

AdhocWorkspace workspace = new AdhocWorkspace();
Project project = workspace.AddProject("Test", LanguageNames.CSharp);

string annotationKind = "SampleKind";
var syntaxAnnotation = new SyntaxAnnotation(annotationKind);
var classDeclaration = SyntaxFactory.ClassDeclaration("MyClass")
    .WithAdditionalAnnotations(syntaxAnnotation);

var compilationUnit = SyntaxFactory.CompilationUnit().AddMembers(classDeclaration);

Document document = project.AddDocument("Test.cs", compilationUnit);
SemanticModel semanticModel = await document.GetSemanticModelAsync();
var newAnnotation = new SyntaxAnnotation("test");

//Just search for the Kind instead
var root = await document.GetSyntaxRootAsync();
var changedClass = root.GetAnnotatedNodes(annotationKind).Single();

var symbol = semanticModel.GetDeclaredSymbol(changedClass);

This is just one of a few different ways for dealing with Roslyn’s immutable trees. It’s probably not the easiest to use if you’re making multiple changes and need to track multiple syntax nodes. (If that’s the case, I’d recommend the DocumentEditor). That said, it’s good to be aware of it so you can use it when it makes sense.

Learn Roslyn Now: Part 14 Intro to the Scripting API

The Scripting API is finally here! After being removed from Roslyn’s 1.0 release it’s now available (for C#) in pre-release format on NuGet. To install to your project just run:

Install-Package Microsoft.CodeAnalysis.Scripting -Pre

Note: You need to target .NET 4.6 or you’ll get the following exception when running your scripts:

Could not load file or assembly 'System.Runtime, Version=4.0.20.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.

Note: Today (October 15, 2015) the Scripting APIs depend on the 1.1.0-beta1 release, so you’ll have to update your Microsoft.CodeAnalysis references to match if you want to use all of Roslyn with the scripting stuff.

There are a few different ways to use the Scripting API.

EvaluateAsync

[CSharpScript.EvaluateAsync](http://source.roslyn.io/#Microsoft.CodeAnalysis.Scripting.CSharp/CSharpScript.cs,08240e2bf8a17c6b) is probably the simplest way to get started evaluating expressions. Simple pass any expression that would return a single result to this method it will be evaluated for you.

var result = await CSharpScript.EvaluateAsync("5 + 5");
Console.WriteLine(result); // 10

result = await CSharpScript.EvaluateAsync(@"""sample""");
Console.WriteLine(result); // sample

result = await CSharpScript.EvaluateAsync(@"""sample"" + "" string""");
Console.WriteLine(result); // sample string

result = await CSharpScript.EvaluateAsync("int x = 5; int y = 5; x"); //Note the last x is not contained in a proper statement
Console.WriteLine(result); // 5

RunAsync

Not every script returns a single value. For more complex scripts we may want to keep track of state or inspect different variables. [CSharpScript.RunAsync](http://source.roslyn.io/#Microsoft.CodeAnalysis.Scripting.CSharp/CSharpScript.cs,b45a0a5178f00d6b) creates and returns a [ScriptState](http://source.roslyn.io/#Microsoft.CodeAnalysis.Scripting/ScriptState.cs,8f3bcd39804916b4) object that allows us to do exactly this. Take a look:

var state = CSharpScript.RunAsync(@"int x = 5; int y = 3; int z = x + y;""");
ScriptVariable x = state.Variables["x"];
ScriptVariable y = state.Variables["y"];

Console.Write($"{x.Name} : {x.Value} : {x.Type} "); // x : 5
Console.Write($"{y.Name} : {y.Value} : {y.Type} "); // y : 3

We can also maintain the state of our script and continue applying changes to it with [ScriptState.ContinueWith()](http://source.roslyn.io/#Microsoft.CodeAnalysis.Scripting/Script.cs,28760f869988b06b):

var state = CSharpScript.RunAsync(@"int x = 5; int y = 3; int z = x + y;""").Result;
state = state.ContinueWithAsync("x++; y = 1;").Result;
state = state.ContinueWithAsync("x = x + y;").Result;

ScriptVariable x = state.Variables["x"];
ScriptVariable y = state.Variables["y"];

Console.Write($"{x.Name} : {x.Value} : {x.Type} "); // x : 7
Console.Write($"{y.Name} : {y.Value} : {y.Type} "); // y : 1

ScriptOptions
We can start to get into more interesting code by adding references to DLLs that we’d like to use. We use ScriptOptions to provide out script with the proper MetadataReferences.

ScriptOptions scriptOptions = ScriptOptions.Default;

//Add reference to mscorlib
var mscorlib = typeof(System.Object).Assembly;
var systemCore = typeof(System.Linq.Enumerable).Assembly;
scriptOptions = scriptOptions.AddReferences(mscorlib, systemCore);
//Add namespaces
scriptOptions = scriptOptions.AddNamespaces("System");
scriptOptions = scriptOptions.AddNamespaces("System.Linq");
scriptOptions = scriptOptions.AddNamespaces("System.Collections.Generic");

var state = await CSharpScript.RunAsync(@"var x = new List(){1,2,3,4,5};", scriptOptions);
state = await state.ContinueWithAsync("var y = x.Take(3).ToList();");

var y = state.Variables["y"];
var yList = (List)y.Value;
foreach(var val in yList)
{
  Console.Write(val + " "); // Prints 1 2 3
}

This stuff is surprisingly broad. The Microsoft.CodeAnalysis.Scripting namespace is full of public types that I’m not at all familiar with and there’s a lot left to learn. I’m excited to see what people will build with this and how they might be able to incorporate scripting into their applications.

Kasey Uhlenhuth from the Roslyn team has compiled a list of code snippets to help get you off the ground with the Scripting API. Check them out on GitHub!

If you’ve got some cool plans for the scripting API, let me know if the comments below!

Learn Roslyn Now: Part 15 The SymbolVisitor

I had a question the other day that I ended up taking directly to the Roslyn issues: How do I get a list of all of the types available to a compilation? Schabse Laks (@Schabse) and David Glick (@daveaglick) introduced me to a cool class I hadn’t encountered before: The [SymbolVisitor](http://source.roslyn.io/#Microsoft.CodeAnalysis/Symbols/SymbolVisitor.cs,650e8dd480b0fd0f).

In previous posts we touched on the [CSharpSyntaxWalker](https://joshvarty.wordpress.com/2014/07/26/learn-roslyn-now-part-4-csharpsyntaxwalker/) and the [CSharpSyntaxRewriter](https://joshvarty.wordpress.com/2014/08/15/learn-roslyn-now-part-5-csharpsyntaxrewriter/). The SymbolVisitor is the analogue of SyntaxVisitor, but applies at the symbol level. Unfortunately unlike the SyntaxWalker and CSharpSyntaxRewriter, when using the SymbolVisitor we must construct the scaffolding code to visit all the nodes.

To simply list all the types available to a compilation we can use the following.

public class NamedTypeVisitor : SymbolVisitor
{
    public override void VisitNamespace(INamespaceSymbol symbol)
    {
        Console.WriteLine(symbol);
        
        foreach(var childSymbol in symbol.GetMembers())
        {
            //We must implement the visitor pattern ourselves and 
            //accept the child symbols in order to visit their children
            childSymbol.Accept(this);
        }
    }

    public override void VisitNamedType(INamedTypeSymbol symbol)
    {
        Console.WriteLine(symbol);
        
        foreach (var childSymbol in symbol.GetTypeMembers())
        {
            //Once againt we must accept the children to visit 
            //all of their children
            childSymbol.Accept(this);
        }
    }
}

//Now we need to use our visitor
var tree = CSharpSyntaxTree.ParseText(@"
class MyClass
{
    class Nested
    {
    }
    void M()
    {
    }
}");

var mscorlib = MetadataReference.CreateFromFile(typeof(object).Assembly.Location);
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree }, references: new[] { mscorlib });

var visitor = new NamedTypeVisitor();
visitor.Visit(compilation.GlobalNamespace);

In order to visit all the methods available to a given compilation we can use the following:

public class MethodSymbolVisitor : SymbolVisitor
{
    //NOTE: We have to visit the namespace's children even though
    //we don't care about them. 
    public override void VisitNamespace(INamespaceSymbol symbol)
    {
        foreach(var child in symbol.GetMembers())
        {
            child.Accept(this);
        }
    }
    
    //NOTE: We have to visit the named type's children even though
    //we don't care about them. 
    public override void VisitNamedType(INamedTypeSymbol symbol)
    {
        foreach(var child in symbol.GetMembers())
        {
            child.Accept(this);
        }
    }

    public override void VisitMethod(IMethodSymbol symbol)
    {
        Console.WriteLine(symbol);
    }
}

It’s important to be aware of how you must structure your code in order to visit all the symbols you’re interested in. By now you may have noticed that using this API directly makes me a little sad. If I’m interested in visiting method symbols, I don’t want to have to write code that visits namespaces and types.

Hopefully at some point we’ll get a SymbolWalker class that we can use to separate out our implemenation from the traversal code. I’ve opened an issue on Roslyn requesting this feature. (It seems like it’s going to be challenging to implement and would require working with both syntax and symbols).

Finding All Named Type Symbols
Finally, you might be wondering how I answered my original question: How do we get a list of all of the types available to a compilation? My implementation is below:

public class CustomSymbolFinder
{
    public List GetAllSymbols(Compilation compilation)
    {
        var visitor = new FindAllSymbolsVisitor();
        visitor.Visit(compilation.GlobalNamespace);
        return visitor.AllTypeSymbols;
    }

    private class FindAllSymbolsVisitor : SymbolVisitor
    {
        public List AllTypeSymbols { get; } = new List();

        public override void VisitNamespace(INamespaceSymbol symbol)
        {
            Parallel.ForEach(symbol.GetMembers(), s => s.Accept(this));
        }

        public override void VisitNamedType(INamedTypeSymbol symbol)
        {
            AllTypeSymbols.Add(symbol);
            foreach (var childSymbol in symbol.GetTypeMembers())
            {
                base.Visit(childSymbol);
            }
        }
    }
}

I should note that after implementing this solution, I came to the conclusion that it was too slow for our purposes. We got a major performance boost by only visiting symbols within namespaces defined within source, but it was still about an order of magnitude slower than the simply searching for types via the [SymbolFinder](http://source.roslyn.io/Microsoft.CodeAnalysis.Workspaces/P/fd299f73032d2f26.html#fd299f73032d2f26) class.

Still, the SymbolVisitor class is probably appropriate for one-off uses during compilation or for visiting a subset of available symbols. At the very least, it’s worth being aware of.

Learn Roslyn Now: Part 16 The Emit API

Up until now, we’ve mostly looked at how we can use Roslyn to analyze and manipulate source code. Now we’ll take a look at finishing the compilation process by emitting it disk or to memory. To start, we’ll just try emitting a simple compilation to disk and checking whether or not it succeeded.

var tree = CSharpSyntaxTree.ParseText(@"
using System;
public class C
{
    public static void Main()
    {
        Console.WriteLine(""Hello World!"");
        Console.ReadLine();
    }   
}");

var mscorlib = MetadataReference.CreateFromFile(typeof(object).Assembly.Location);
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree }, references: new[] { mscorlib });

//Emitting to file is available through an extension method in the Microsoft.CodeAnalysis namespace
var emitResult = compilation.Emit("output.exe", "output.pdb");

//If our compilation failed, we can discover exactly why.
if(!emitResult.Success)
{
    foreach(var diagnostic in emitResult.Diagnostics)
    {
        Console.WriteLine(diagnostic.ToString());
    }
}

After running this code we can see that our executable and .pdb have been emitted to Debug/bin/. We can double click output.exe and see that our program runs as expected. Keep in mind that the .pdb file is optional. I’ve only chosen to emit it here to show off the API. Writing the .pdb file to disk can take a fairly long time and it often pays to omit this argument unless you really need it.

Sometimes we might not want to emit to disk. We might just want to compile the code, emit it to memory and then execute it from memory. Keep in mind that for most cases where we’d want to do this, the scripting API probably makes more sense to use. Still, it pays to know our options.

var tree = CSharpSyntaxTree.ParseText(@"
using System;
public class MyClass
{
    public static void Main()
    {
        Console.WriteLine(""Hello World!"");
        Console.ReadLine();
    }   
}");

var mscorlib = MetadataReference.CreateFromFile(typeof(object).Assembly.Location);
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree }, references: new[] { mscorlib });

//Emit to stream
var ms = new MemoryStream();
var emitResult = compilation.Emit(ms);

//Load into currently running assembly. Normally we'd probably
//want to do this in an AppDomain
var ourAssembly = Assembly.Load(ms.ToArray());
var type = ourAssembly.GetType("MyClass");

//Invokes our main method and writes "Hello World" 
type.InvokeMember("Main", BindingFlags.Default | BindingFlags.InvokeMethod, null, null, null);

Finally, what if we want to influence how our code is compiled? We might want to allow unsafe code, mark warnings as errors or delay sign the assembly. All of these options can be customized by passing a [CSharpCompilationOptions](http://source.roslyn.io/#Microsoft.CodeAnalysis.CSharp/CSharpCompilationOptions.cs,ffc9d5ff7f13d4a1) object to [CSharpCompilation.Create()](http://source.roslyn.io/#Microsoft.CodeAnalysis.CSharp/Compilation/CSharpCompilation.cs,cb0be8b9d3027ce8). We’ll take a look at how we can interact with a few of these properties below.

var tree = CSharpSyntaxTree.ParseText(@"
using System;
public class MyClass
{
    public static void Main()
    {
        Console.WriteLine(""Hello World!"");
        Console.ReadLine();
    }   
}");

//We first have to choose what kind of output we're creating: DLL, .exe etc.
var options = new CSharpCompilationOptions(OutputKind.ConsoleApplication);
options = options.WithAllowUnsafe(true);                                //Allow unsafe code;
options = options.WithOptimizationLevel(OptimizationLevel.Release);     //Set optimization level
options = options.WithPlatform(Platform.X64);                           //Set platform

var mscorlib = MetadataReference.CreateFromFile(typeof(object).Assembly.Location);
var compilation = CSharpCompilation.Create("MyCompilation",
    syntaxTrees: new[] { tree },
    references: new[] { mscorlib },
    options: options);                                            //Pass options to compilation

In total there are about twenty-five different options available for customization. Basically any option you have within the Visual Studio’s project property page should be available here.

Advanced options

There are a few optional parameters available in Compilation.Emit() that are worth discussing. Some of them I’m familiar with, but others I’ve never used.

  • xmlDocPath – Auto generates XML documentation based on the documentation comments present on your classes, methods, properties etc.
  • manifestResources – Allows you to manually embed resources such as strings and images within the emitted assembly. Batteries are not included with this API and it requires some heavy lifting if you want to embed .resx resources within your assembly. We’ll explore this overload in a future blog post.
  • win32ResourcesPath – Path of the file from which the compilation’s Win32 resources will be read (in RES format). Unfortunately I haven’t used this API yet and I’m not at all familiar with Win32 Resources.
  • There is also the option to EmitDifference between two compilations. I’m not familiar with this API, and I’m not familiar with how you can apply these deltas to existing assemblies on disk or in memory. I hope to learn more about this API in the coming months.

That just about wraps up the Emit API. If you have any questions, feel free to ask them in the comments below.

你可能感兴趣的:(Learn Roslyn Now[非原创])