Anders Hejlsberg - Chief Architect at Microsoft Anders Hejlsberg is the architect of the C# language and a Microsoft Distinguished Engineer. He joined Microsoft in 1996, following a thirteen-year career at Borland, where he was the chief architect of Delphi and Turbo Pascal. |
I am a distinguished engineer at Microsoft and I am also the chief designer of the C# programming language, and then I sort of have a dotted line relationship with the base class library team in the .NET framework where I do some architect work as well. I came to Microsoft about eight years ago and before that I was at Borland for 13 years and wrote the core of Turbo Pascal originally and then later worked as the chief architect of the Delphi product line.
Thirteen years at one company in this industry is pretty much a lifetime and I was ready for some new challenges and to try something different. It has been a wonderful experience. It's a rare opportunity to be able to come to a big company like Microsoft and actually get to design a whole new programming experience, programing language. Microsoft put their weight behind it, of course it has been wonderful.
I think you have to put in context in what timeframe did this occur. At the time when we started .NET, the way people programmed the Windows platform at that time was COM and Active-X and DNA was the marketing thrust of all of all of that. It was not the most seamless programming experience at all times. There was a lot of debate in-house about how to make that experience better. Should we evolve COM or should we go to a new platform. There is always sort of the evolutionist and the revolutionist.
So I was definitively in the revolutionist camp, and after that sort of shook itself out and everybody agreed that no, we have to think bigger than just trying to jam classes into COM or whatever the idea "de jour" might have been. We also realized that to execute on this vision of increased productivity and web services and whatever, we needed a language that could appeal to the C, C++, Java developer crowd.
I think language is a lifestyle choice. I have actually been in this business long enough that I have become very agonistic when it comes to that. I don't try to persuade people that this language is better than the other. It is kind of like having a debate about whether French is better than English, it all depends on your point of view. So I certainly felt that that something that could appeal to the C, C++, and Java crowd would be a very valuable addition to our toolset. So some of the core principles then, some of them we inherit basically from .NET. The notion of a type safe execution environment with garbage collection, and exceptions, and guaranteed index checks on arrays and what have you. All of those things that the managed code brings you come from .NET, so to speak, but of course we wanted to be a language that fully envelopes that and just sort of takes that as granted, if you will, I suppose to try to pepper it on.
I think above that I always felt that it was important to have some conceptual simplicities in the language, and one of them that I think is fairly subtle, but I am really happy that we got in, is the unified type system. The notion that in C# you can just say everything is an object. You can start with that premise and then you can later delve into the sort of subtle differences between reference types, value type, but you can put anything in an object to begin with and the mechanisms, like boxing for value types, and so forth are all sort of technically interesting. But the broad simplification or the deep simplification you get by starting there, as opposed to say a language like Java, where you have to immediately up front say well there are classes and everything is class, except these eight guys over here that are specially blessed and only the language will do those.
TSS: Which in all fairness has caused Java programmers no end of angst.
Sure, and the sort of dichotomy between the built-in and the integer wrapping class, and so forth, and of course it looks like Java now is getting auto-boxing, but even so, after the fact it is hard to make it feel built in from the beginning, so that is one thing. I think the other thing is, again sort of a subtle concept, but the thing that we call components or component-oriented programming, the notion of the programming language giving first class treatment to all of the concepts that you use when you write components.
So anyone who has done any kind of development on a modern development platform like .NET or JAVA will be familiar with components that have properties, and methods, and events and attributes that describe how the components host themselves in the various environments and documentation that is built into the component and so forth. The interesting thing to note for example, if you compare us with Java or C++ is that, even though we all fully embrace the concepts of properties, methods and events, they don't actually all get first class treatment in the language. So Java has methods, but it does not have properties, you have a getter and a setter method and then you squint and it looks like a property.
But it is funny because in the language it is get blah and set blah, but when you are looking at property inspectors it is just blah. So there's that, and then events are emulated with interfaces and again you sort of got to squint, and metadata is done through beans and patterns, and if you don't know the patterns, you don't actually see that this is metadata. So we just sort of felt that, hey it is time to give first class treatments to these concepts, build them into the programming language. Lets not pretend that they are not worthy, because we use them everyday, there is value to be had. Yes, people can argue they are syntactic sugar, but everything in a programming language ultimately is syntactic sugar so just as long as it is good sugar.
TSS: As long as it tastes good going down..
So those are some of the core things, I would say maybe if I had to add one more it would be the pretty heavy emphasis that we put on interoperability with the existing platform, giving programmers direct access to COM and DLL entry points and not just telling them that this is a complete rewrite strategy, you have got to start from scratch or whatever. We realize that Moore's law has already long since passed us by. We will never fill all this capacity. The only way we can do it is by leveraging existing systems.
When I started in this industry, 64K was all you had right, you got to leave at least 32 of them over to the users, so really you only have 32 k. Back then it was like, hey, how long does it take to write a program that takes up 32 k? Well it goes pretty quick, right, so even at 640 k it coincides with the time that you would want to release the next product. So you could just start from scratch every time. You do it, do it as little bit better, forget that. The demands on software, and what people demand from us, has just far passed us by and we have to leverage existing systems, and that means, we have to take interoperability seriously.
Well, there is no doubt that there is heritage from Delphi. Delphi has properties, Delphi has events and so forth, and if I have to go do it again, I still think they are useful concepts. So I would put into the next one too, if there was a next one. I think you learn from what you've done. You learn from your past mistakes, you know what worked and what didn't, and you keep the stuff that worked, and then you get rid of the stuff that didn't. Those are definitely things that worked and I think they are some of the things that made Delphi a great product.
Well, I think generics are, in a nutshell, all about providing even stronger type checking. To the extent that we can type check stuff at compile time, we can get better software because the errors are found sooner and faster. We can get better performance, because things that we can guarantee are true about the code, do not need to be dynamically checked at runtime. So those are some of the good things. The other thing I would say is that you get more expressiveness in the type system.
In a sense, it is not a completely correct analogy, but imagine that you had methods that could not take parameters. They certainly would not be as useful as methods that you write; or methods that could only take parameters of type object if you will, as opposed to methods that can take true type parameters and so forth. It is really what parameters do for methods, is what type arguments and type parameters do for types themselves. The ability to not just create a list, but a list of something, and you can say what that something is later, for example list of order, list of customer, list of T and so fort.
So that is sort of the core value proposition of generics. It has been an interesting project by the way. I chalk it up as one of our really nice success stories in working with Microsoft Research. All of the hard work of generics was done by Don Simon, Andrew Verde from MSR Cambridge. And even some of the work in the actual shipping product, they not only did the research but actually did a lot of work too to move it into the runtime. So it has been phenomenal stuff and it is great to see that there is that sort of cross bleed from research into the product groups.
TSS: Research guys are actually doing something concrete.
Yes, definitely. We are seeing some returns on that.
TSS: Excellent. That's good to hear. Awesome.
I think there is, with anything,first tier effects and those are the strongly typed collections that you talk about there. And definitely those are the ones that people in .NET will immediately see as being most beneficial;,list of T, dictionary K of column V and stack MQ and so forth. I think going further out you will see some second order effects that are somewhat more subtle, but one of the things that is really interesting about generics is that it allows us to be much more faithful in our representation of complicated object graphs.
Let's take sort of a business example. Let's say you have customers and orders and details right. If you model that with objects today, it is easy to have a customer, and is easy for the customer to have strongly typed properties and so forth, but then the customer also wants to have a collection of orders. The question is how do you model that? Well you could model it as an order array and now you preserve type. Now the compiler can look at it and follow the link and know that here are one to many and so forth right. But an array can't grow itself and does not have all the convenience methods, like Find(), and Sort(), and blah blah blah or Add(), that you would want on a collection like that.
So, the alternative you then have is you can put an array list in there, but now you lose the type. Or you can roll your own manually where you follow a pattern manually, but the compiler can't see through this pattern so to speak. Do you know what I mean? It is not like it is actually a part of the type system. So with generics we actually get the ability to faithfully model these complicated relationships, and once you do that then you can start talking about, gosh now that the compiler can understand this whole web of objects, well could we perhaps have set operations or queries or whatever that are strongly typed over all of the stuff, because now the compiler actually knows what is going on. Before it was kind of like you could not see the forest but the trees, but now, because the type system is richer, it opens up a whole sort of new avenue of interesting things that we can do. Some of it relates in some ways actually even to partial programming and whatever, and there are all sorts of more type inferrencing that we can do and these are the things that we are starting to think about and I think will be pretty fruitful going forward.
It is interesting, because I see people talk about how types are overblown, and boy, you could do everything without types and just have untyped languages and unit tests and that is good enough. I don't really buy that argument. Let's take one example. One of the great things that types buy you is IntelliSense statement completion. Because we have a strongly typed language, because every variable and every expression has a type, we know what members of a particular type are, and we can show them in a list and filter them for you and we can give you help and guidance because of type. If we had no type, I mean we could sometimes guess, but sometimes we did get it wrong
Sometimes you can. If you just assign to the variable on the line before, well then you can sort of guess what. If you pass that as a parameter, you have no idea what this incoming var x is. So sorry, buddy, on your parameters you don't get any help. If I have to say one of the great innovations certainly in programmer productivity tools over the last decade has been statement completion. It is just unthinkable that you would want a program without it today and honestly that is the reality you face in a typeless language, but that is just one thing.
There is also the ability to find errors up front that otherwise you'd have to wait to catch at runtime. There is the ability to provide better documentation. There is the ability to do all sorts of reflection tools and FXCop this new thing that we have, this tool that checks whether everything is coded according to conventions, without types that would be hard to do. So I think strong typing is a good thing. Now that said, strong typing is like security, it is a dial. If you dial it up too hard, the systems becomes unusable, so you got to find just the right setting somewhere in the middle, where there is still flexibility and when you want to do dynamics stuff like type fast and treat things as objects you have to have that ability too. So it is a balancing act. But I think just taking an extreme it is a trap because extremes are always interesting but I do not think it is particularly truthful.
Absolutely. I mean, I think people do it everyday. They may not know that they do it, but they relay on components that were written by someone else in some other language and the notion that our platform supports multiple languages, I will staunchly defend this being a good and important thing for a variety of reasons. Like I said, first of all language is a lifestyle choice and some people just think better in one languages than another, and this is not just about VB or C#. It could be, I like these loosely typed languages and they give me capabilities,that otherwise I wouldn't have had. But I think also a multitude of languages is how progress gets made, that is how innovation happens. There is healthy competition between the various languages and something happens over there and you can pull it in over here. If there is only one language then well is that it, are we done or can we all go home? I do not think so.
Many. I come from, of course, a very strong Pascal background. So that essentially is something that is burned into my backbone at this point. So anything in that family tree, Algol, Pascal, Modula, Oberon so forth, Then there is the C family of languages, so C, C++, Java. Then there is the object orientation that we've gotten from SmallTalk and now lots of interesting things happening in functional languages like LISP and Haskell and ML, and there is stuff in JScript that is interesting. There are so many languages that it almost impossible to just say that there were a few. It's the multitude of all of them.
Yes the project that Don Syme has been looking at.
F# at this point is a research project that Don is looking at. It's really just sort of trying to take things from functional programming and imperative programming and put them together in a new way is really what it boils down to. Ultimately, it's interesting but at this point I would not start writing products in F# or whatever, that is not what it's about. It's more about like experimenting with new things that you want to do in the type system and in programming languages. But certainly, I would urge anyone who is interested to go look.
I don't think anyone was dictating to anybody. We worked, of course, very closely together. I was involved in a lot of the design of the type system, and certainly, all of the unification of the type system. I was a strong champion for that on both sides of the table, be that in the runtime or in C# itself. Yeah, I mean C# is in the enviable position of not having, or at least was, not having any backwards compatibility to worry about and that makes a lot of things a lot easier. I will say in 2.0 we have bent over backwards to remain 100% backwards compatible. There are no more keywords, you can compile all your existing code without modifying anything. I continue to think that that is very important. I don't want to have to run an upgrade wizard or whatever in order to use the new features, it should just be seamless.
Because if you have an identifier by the same name as that keyword, that identifier is now an error. So let's say that we had made the yield keyword. Well it is not actually a keyword. If you look at the way we design so yield is used by iterators in C# and there are two new statements, yield return and yield break. But the yield word in yield return is actually not a keyword, it is the identifier yield coming immediately before the keyword return that constitutes yield return, because that was never a valid thing to say before. But you can still have, in your financial application, a parameter called double yield, that is the yield of your investments or whatever and you will not break that code. It is very subtle, but you know, it is the real world right. If we tell people, hey your C# 2.0 upgrade today, and bam, the first thing that happens is all their code breaks and it is just not a happy experience. You really have to make sure that the barrier to entry is nonexistent.
Yes, certainly generics is the big thing. But there are lots of other new features. For example, anonymous methods, iterators, partial types, nullable types, which we talked about here at the TechEd for the first time, and then a bunch of smaller features than that such as static classes external aliases, names, alias qualifier, pragma warning etc., etc. Actually just in the compiler implementation itself better, error warnings, and performance improvements and so forth. So anonymous methods I think is an interesting extension that we have to the language that allows you to do delegates written directly in place. So, instead of having to new a delegate to a method you just get to write block of code right there, and pass it almost like we you pass code as parameters this way. In fact you can almost create your statements this way because you can create closures. These are true closures just like in LISP and partial languages.
It's similar and different. Actually, I would say it is mostly different. So anonymous classes differ in that you are not creating a closure on a single method, but you are creating a class instance. Now, the methods that you implement in an anonymous class can refer to local variables of the enclosing scope, but those local variables must be declared final in Java. So you can have closures, but they only capture the R value of the local variables of the enclosing variables. We capture the L value. Meaning that you don't have to make these variables final and you can actually modify them. You are indeed capturing that location and I am referring to the same memory cell, if you will, as the local variable and within the delegate, and so you can use these as a communication device, you know, between the outside code and the inside code or between two separate delegates, which opens up some very interesting avenues in programming.
We've felt that it was important to capture L value instead of R value because we are after all an imperative language and imperative languages are sort of primarily characterized by the fact that you can modify values, as opposed to functional languages, that tend to be purely functional, you can assign once to a variable, and that's it. Whenever we have rules like that in imperative languages they tend to sort of conflict with people's intuitive expectation of what a variable is so we felt that it was important to do it the other way. So there are some big differences between anonymous classes and delegates.
I think one of the problems that is starting to interest me a lot is the impedance mismatch that we have today between databases and general purpose programming languages. Whenever I talk to a group of programmers, be they C# or even VB programmers, it doesn't matter, I've had my Cabana session yesterday, I asked, how many people here write code that accesses a database? Sea of hands go up, a guy in the back yells, who doesn't? Everybody talks to a database and that really means that in order to successfully write an application you have to learn two languages. Your C# or VB or whatever your mid tier or your mainstream programming or general purpose programming languages, and then you have to learn your database programming language and that is typically SQL, some variant be that TSQL or PLSQL or whatever.
The impedance mismatches between these two worlds are just phenomenal, astounding. I mean it is relational versus objects. Its queries and set operations versus for loops and arrays and so forth. Its nullable types versus not, and the stuff that we go through today to successfully write these apps sometimes I feel makes us feel like plumbers right. We're just moving stuff from A to B and getting rid of the nullability and converting it to that, and stuffing it in here so I can take it out of there and you're again, you are doing all the housekeeping.
One of the things that made .NET incredibly successful was the fact that we took all of that housekeeping and put it in the platform, garbage collection, type safety, exception handling, whatever, all these things that programmers just get wrong, if they have to do it manually. Put them in the platform, just allow you to think about the algorithms. I am like trying to shift my focus to that space and try to do some of those same things. Truly integrate the database with the programing language. Of course, that is a very broad vision and many people lie on the rocks of unsuccessful database integration, and I am hoping not to be one of them, but I think we have some interesting thoughts about it.
I think the generic stuff is very cool. If I have to talk about sort of my favorite feature in C# 1.0. I think it is actually the beauty and simplification we get from the unified type system. I always was really, really, happy with the way that came out, and that just simple beauty that you get from being able to represent everything as an object, and if you want to have a method that takes any number of arguments of any type, params object array and there it is. You can write your type safe printf() there and whatever it there is just something pretty about the way that worked out. I think I experience bears it out too because you always know when a feature is not successful you always get questions about it, and people not understanding how this works and whatever. This is one of those things that the silence is astounding, which to me, is a great of sign of success, because it means people are using it everyday and are not having an issue with it, so I like that feature.
I've got at least another five.
TSS: Do expect to be working on C# for the next decade?
It is so hard to say. All I can say is right now I am having lots of fun, I just talked about how there is this big challenge that we are like trying to look at with databases in general purpose programming languages. That should keep me busy for a good long while. I am having a good time, so I have no desire to do anything else whatsoever.
Sure, I think that when I talk about integrating databases, and whatever, you sort of have to see broadly there. XML in a sense is also data right, so I am talking about integrating data, if you will, richer data support in the language. So XML would clearly of course be a part of that. Now, the thing that is interesting is there is a lot of research being done about XML integration and programming languages and I think it is very important to understand that they are sort of different kinds of XML and there are different times when this pays off, and when it does not.
For example, if you look at the E4X stuff for java script, the ability to write XML literals, I don't know if you looked at that specification. So there is some interesting stuff there that you can write XML literals inside your program and sort of escape yourself out of XML and back in write a little expression and it's sort of XQuery a little bit, but not quite. It's very interesting, and it really only works in a typeless language, and we are a strongly type language, and once you are type language, knowing things about the type of XML, first of all is hard, because XSD is a very, very, complicated type system that unfortunately exists only in XSD.
My big issue with XSD has always been that people write their applications today with objects and classes and whatever be that in C# or Java or VB, or whatever, and then they want to transport them on the wire, and ideally, we would want to have some sort of type system that at least is fairly close to a what an object is so we can meaningfully map it to XML, and move it over, and then map it back. Unfortunately, XSD has a zillion other features that no one ever asked for. It is incredibly complicated to implement and just the notion of jamming XSD as a type system into C# would be the death of C#. So now you are faced with this unfortunate situation that you have a strongly typed language that really the core premises that you got to have type in order to delivery value, yet you have a type system that is just quite unlike anything ever seen in a programing language. That makes that particular integration very hard. So I think you know that is not to say that we are not looking at this problem, and I definitely think we are making progress, but it is a complicated feature area.
It is not something that we are actively looking at implementing at this point, so we are sort of if you will in wait and see mode. I have to yet see sort of large scale projects succeed with that development methodology. I see how aspects are very useful as a logging, debugging, monitoring, whatever, the prototypical example, that are always mentioned with aspects. I see how it's useful for that, but as a general purpose programming discipline, I still have some grave concerns about sort of aspects taking away your ability to reason about your own code because people can inject stuff anywhere practically anywhere in your code and it can even have side effects that might make it very, very, hard for you to reason about what your program does. That does concern me and I am sort of waiting for that to play out little bit.
TSS: So if tomorrow you see a successful aspect system going to production?
I would be definitely be interested in taking a closer look at what it is that works and doesn't and so forth.
TSS: Can I give Gregor Kiczales your phone number?
Sure, I think he has my e-mail already.
TSS: Thank you very much Anders for your time and good luck for the future.
Absolutely, thanks.