Knowledge Conclusion - PART III: Entity framework & SQL Server

/*****by Jiangong SUN******/


Update: 16/05/2013, 17/05/2013, 28/06/2013, 15/07/2013


PART III: Entity framework


1) What is the difference between edmx and dbml ?
Answer:

edmx (entity data model xml) : is linq to entities 
dbml (database markup language) : is linq to sql

2) What is the composition of edmx file?
Answer:
A edmx file is an XML file composed by conceptual model, storage model and mapping information between the two models.

CSDL: conceptual schema definition language
SSDL: storage schema definition language
MSL: mapping specification language

A edmx file:
1) EF runtime content
1.1 SSDL content
1.2 CSDL content
1.3 C-S mapping content
2) EF designer content (DO NOT EDIT MANUALLY)
2.1 Connection
2.2 Options

2.3 Diagrams (shape and connector positions)

Structure of edmx file:


3) What is the difference between .SDF and .MDF file ?

Answer:

.sdf file is for SQL Server Compact Version. It's popular in Windows phone for storing some data (small count of data), also it could be used in web sites, but it has poor performance, even SQLite is better.

.mdf file is for SQL Server Complete Version. It's used in big sites with a lot of data. Its more powerful, but cant be used in phones and must be installed on servers.

SQL Server Compact 4.0:
Individual Database file size is limited to 4 GB.
Run in-process (Embedded) with an application. 
No FileStream support
Limited to 256 concurrent connection.

SQL Express 2008 R2:
Individual Database File size is limited to 10 GB
Excluding user instance scenario it can be used as a standalone database 
Filestream and CLR Support.
No limit on number of simultaneous connection.


4) What is stored procedure?

Answer:

Store procedure stores the T-SQL code in sql server, who pre-compile the T-SQL code. 

Performance: reduce network traffic and enhance execution plan re-use

stored procedures can be used to reduce network traffic. You only have to send the EXECUTE stored_proc_name statement over the wire instead of a whole T-SQL routine, which can be pretty extensive for complex operations. 

stored procedures allows you to enhance execution plan re-use, and thereby improve performance, by using remote procedure calls (RPCs) to process the stored procedure on the server. When you use a SqlCommand.CommandType of StoredProcedure, the stored procedure is executed via RPC. The way RPC marshals parameters and calls the procedure on the server side makes it easier for the engine to find the matching execution plan and simply plug in the updated parameter values.

Maintenability :Provide a single point of maintenance

In a perfect world, your database schema would never change and your business rules would never get modified, but in the real world these things happen. That being the case, it may be easier for you if you canmodify a stored procedure to include data from the new X, Y, and Z tables that have been added to support that new sales initiative, instead of changing that information somewhere in your application code. 

Security: 

In terms of regulating user access to information, they can provide access to specific data by allowing users permissions on the stored procedure, but not the underlying tables. You can think of stored procedures as similar to SQL Server views (if you are familiar with those), except the stored procedure accepts input from the user to dynamically change the data displayed.

The cached execution plan used to give stored procedures a performance advantage over queries. However, for the last couple of versions of SQL Server, execution plans are cached for all T-SQL batches, regardless of whether or not they are in a stored procedure. Therefore, performance based on this feature is no longer a selling point for stored procedures.

reference:

http://msdn.microsoft.com/en-us/library/ms973918.aspx


5) How to create a trigger ?

Answer:

Trigger can work with AFTER/FOR/INSTEAD OF with INSERT, UPDATE, DELETE operations.

MSSQL does not support BEFORE triggers. 

Syntax:

CREATE TRIGGER trigger_name 
ON { table | view 
[ WITH ENCRYPTION ] 

    { { FOR | AFTER | INSTEAD OF } { [ INSERT ] [ ] [ UPDATE ] [ ] [ DELETE ] } 
        [ WITH APPEND ] 
        [ NOT FOR REPLICATION ] 
        AS 
        [ { IF UPDATE column 
            [ { AND | OR } UPDATE column 
                [ ...
        | IF ( COLUMNS_UPDATED ( ) bitwise_operator updated_bitmask ) 
                { comparison_operator column_bitmask [ ...
        } ] 
        sql_statement [ ...
    } 
}

Disadvantages(Problems) of Triggers:
- It is easy to view table relationships , constraints, indexes, stored procedure in database buttriggers are difficult to view.
- Triggers execute invisible to client-application application. They are not visible or can be traced in debugging code.
- It is hard to follow their logic as it they can be fired before or after the database insert/update happens.
- It is easy to forget about triggers and if there is no documentation it will be difficult to figure out for new developers for their existence.
- Triggers run every time when the database fields are updated and it isoverheadon system. It makes system run slower.

Reference:

http://blog.sqlauthority.com/2007/05/24/sql-server-disadvantages-problems-of-triggers/


Here is an example:

ALTER TRIGGER dbo.TriggerName ON dbo.TableName
AFTER INSERT
AS BEGIN 
   SET NOCOUNT ON;
   DECLARE @ID int;
   SELECT @ID = ColumnID FROM INSERTED
   UPDATE dbo.TableName.CreateDate = GETDATE();
END;

reference: 

http://stackoverflow.com/questions/11131540/sql-server-create-triggers-on-insert-and-update


6) What is a View ?

Answer:

View is a "virtual table"

First, simple views are expanded in place and so do not directly contribute to performance improvements - that much is true. 

However, indexed views can dramatically improve performance.


A view contains rows and columns, just like a real table. The fields in a view are fieldsfrom one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table.

Example: 

CREATE VIEW viewName AS
SELECT columns
FROM tables
WHERE conditions

reference:

http://stackoverflow.com/questions/439056/is-a-view-faster-than-a-simple-query



7) What is an Index ?

Answer:

Multicolumn unique indexes guarantee that each combination of values in the index key is unique. For example, if a unique index is created on a combination of LastName, FirstName, and MiddleName columns, no two rows in the table could have the same combination of values for these columns.

Example:

CREATE UNIQUE INDEX indexName
ON table(columns)
WITH (IGNORE_DUP_KEY = OFF);


8) What is an indexed view ?

Answer:

CREATE TABLE wide_tbl(a int PRIMARY KEY, b int, ..., z int)
CREATE VIEW v_abc WITH SCHEMABINDING AS
SELECT a, b, c
FROM dbo.wide_tbl
WHERE a BETWEEN 0 AND 1000
CREATE UNIQUE CLUSTERED INDEX i_abc ON v_abc(a)

9) What is the difference between clustered index and non-clustered index ?

Answer:

A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL server sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book. Since it alters the physical storage of the table, only one clustered index can be created per table. In the above example the entire rows are sorted by computer_id since a clustered index on computer_id column has been created.

CREATE CLUSTERED INDEX [IX_CLUSTERED_COMPUTER_ID] 
ON [dbo].[nics] ([computer_id] ASC)

A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference. A non-clustered index on the computer_id in the previous example would look like the table below:

A table without a clustered-index is called a “heap table”. A heap table has no sorted data thus SQL server has to scan the entire table in order to locate the data in a process called a “scan”.

CREATE NONCLUSTERED INDEX [IX_NONCLUSTERED_COMPUTER_ID] 
ON [dbo].[nics] ([computer_id] ASC)

Clustered index vs non-clustered index:

Clustered Index
Only one per table
Faster to read than non clustered as data is physically stored in index order

Non Clustered Index
Can be used many times per table
Quicker for insert and update operations than a clustered index

reference: 

http://www.itbully.com/articles/sql-indexing-and-performance-part-2-clustered-and-non-clustered

http://stackoverflow.com/questions/91688/what-are-the-differences-between-a-clustered-and-a-non-clustered-index


10) What is query optimizer, query processor ?

Answer:

At the core of the SQL Server Database Engine are two major components: the Storage Engine and the Query Processor, also called the Relational Engine. The Storage Engine is responsible for reading data between the disk and memory in a manner that optimizes concurrency while maintaining data integrity. The Query Processor, as the name suggests, accepts all queries submitted to SQL Server, devises a plan for their optimal execution, and then executes the plan and delivers the required results.

Query processor's job:

for each query SQL Server receives, the first job of the query processor is to devise a plan, as quickly as possible, which describes the best possible way to execute said query (or, at the very least, an efficient way). Its second job is to execute the query according to that plan.

Each of these tasks is delegated to a separate component within the query processor; theQuery Optimizer devises the plan and thenpasses it along to the Execution Engine, which will actuallyexecute the plan and get the results from the database.

Query processor's working steps:

SQL Statement ----> Parsing  ----> Binding ----> Query Optimization  ----> Query Execution----> Query Results


Parsing makes sure that the T-SQL query has a valid syntax, and translates the SQL query into an initial tree representation: specifically, a tree of logical operators representing the high-level steps required to execute the query in question.

Binding is mostly concerned with name resolution. During the binding operation, SQL Server makes sure that all the object names do exist, and associates every table and column name on the parse tree with their corresponding object in the system catalog. The output of this second process is called analgebrized tree, which is then sent to the Query Optimizer.

The next step is the optimization process, which is basicallythe generation of candidate execution plans and theselection of the best of these plans according to their cost. As has already been mentioned, SQL Server uses a cost-based optimizer, and uses a cost estimation model to estimate the cost of each of the candidate plans.


reference: 

https://www.simple-talk.com/sql/sql-training/the-sql-server-query-optimizer/

http://www.itbully.com/articles/sql-indexing-and-performance-part-3-queries-indexes-and-query-optimizer


11) How to optimize SQL server's performance?

Answer:

1> Apply proper indexing in the table columns in the database

2> Move TSQL code from the application into the database server

2.1 move T-SQL to Stored procedures, Views, Functions and Triggers

2.2 use T-SQL best practices

Don't use "SELECT*" in a SQL query
Avoid unnecessary columns in the SELECT list and unnecessary tables in join conditions
Do not use the COUNT() aggregate in a subquery to do an existence check
Try to avoid joining between two types of columns
Try to avoid deadlocks
Write TSQL using "Set based approach" rather than "Procedural approach" (avoid using Cursor for large range search results)
Try not to use COUNT(*) to obtain the record count in a table
Try to avoid dynamic SQL
Try to avoid the use of temporary tables
Instead of LIKE search, use full text search for searching textual data
Try to use UNION to implement an "OR" operation
Implement a lazy loading strategy for large objects
Implement the following good practices in Stored Procedures(don't use SP_XXX, it can make conflicts with other applications. Try :[App]_[Object]_[Action][Process])
Implement the following good practices in Triggers (avoid of using triggers, it's costly)
Use views for re-using complex TSQL blocks, and to enable it for indexed views

3>Diagnose performance problems, and use SQL Profiler and the Performance Monitoring Tool effectively



reference: 

http://www.codeproject.com/Articles/34372/Top-10-steps-to-optimize-data-access-in-SQL-Server


12) Transaction?

Answer:

USE AdventureWorks;
GO
BEGIN TRANSACTION;

BEGIN TRY
    -- Generate a constraint violation error.
    DELETE FROM Production.Product
    WHERE ProductID = 980;
END TRY
BEGIN CATCH
    SELECT 
        ERROR_NUMBER() AS ErrorNumber
        ,ERROR_SEVERITY() AS ErrorSeverity
        ,ERROR_STATE() AS ErrorState
        ,ERROR_PROCEDURE() AS ErrorProcedure
        ,ERROR_LINE() AS ErrorLine
        ,ERROR_MESSAGE() AS ErrorMessage;
    IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
END CATCH;


IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;
GO

13) Full text search 

Answer:

Full-Text Search in SQL Server lets users and applications run full-text queries against character-based data in SQL Server tables. Before you can run full-text queries on a table, the databaseadministrator must create a full-text index on the table. The full-text index includes one or more character-based columns in the table. These columns can have any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, or varbinary(max) and FILESTREAM. Each full-text index indexes one or more columns from the table, and each column can use a specific language.

Creation Full text index Steps:

- Create a Full-Text Catalog
- Create a Full-Text Index
- Populate the Index


Full text Search:

FREETEXT( ) Is predicate used to search columns containing character-based data types. It will not match the exact word, but the meaning of the words in the search condition. When FREETEXT is used, the full-text query engine internally performs the following actions on the freetext_string, assigns each term a weight, and then finds the matches.

Separates the string into individual words based on word boundaries (word-breaking).
Generates inflectional forms of the words (stemming).
Identifies a list of expansions or replacements for the terms based on matches in the thesaurus.

CONTAINS( ) is similar to the Freetext but with the difference that it takes one keyword to match with the records, and if we want to combine other words as well in the search then we need to provide the“and” or “or” in search else it will throw an error.

USE AdventureWorks2008
GO

SELECT BusinessEntityID, JobTitle
FROM HumanResources.Employee
WHERE FREETEXT(*, 'Marketing Assistant');

SELECT BusinessEntityID,JobTitle
FROM HumanResources.Employee
WHERE CONTAINS(JobTitle, 'Marketing OR Assistant');

SELECT BusinessEntityID,JobTitle
FROM HumanResources.Employee
WHERE CONTAINS(JobTitle, 'Marketing AND Assistant');
GO

Reference:

http://blog.sqlauthority.com/2008/09/05/sql-server-creating-full-text-catalog-and-index/


14) SQL practices

1>There are 3 tables, Employee, WorkAt, Company. I need to get the employees who worked at at least 2 companies. 

SELECT e.ID, e.FirstName, e.LastName, count(id) as occurance
FROM Employee e, WorkAt wa
INNER JOIN e.ID = wa.employeeID
WHERE wa.StartDate BETWEEN '2000-01-01' AND '2013-05-17'
GROUP BY e.ID, e.FirstName, e.LastName
HAVING occurance >= 2;

2> Get all differences between two tables

SELECT A.*, B.*
FROM A
FULL JOIN B ON (A.C = B.C)
WHERE A.C IS NULL OR B.C IS NULL

3> Get rows exists in A, not B

SELECT A.*, B.*
FROM A
LEFT JOIN B ON (A.C = B.C)
WHERE B.C IS NULL;

15) Database ACID

Atomicity, Consistency, Isolation, Durability


Atomicity states that database modifications must follow an “all or nothing” rule. Each transaction is said to be “atomic.” If one part of the transaction fails, the entire transaction fails. It is critical that the database management system maintain the atomic nature of transactions in spite of any DBMS, operating system or hardware failure.

Consistency states that only valid data will be written to the database. If, for some reason, a transaction is executed that violates the database’s consistency rules, the entire transaction will be rolled back and the database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it will take the database from one state that is consistent with the rules to another state that is also consistent with the rules.

Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. For example, if Joe issues a transaction against a database at the same time that Mary issues a different transaction, both transactions should operate on the database in an isolated manner. The database should either perform Joe’s entire transaction before executing Mary’s or vice-versa. This prevents Joe’s transaction from reading intermediate data produced as a side effect of part of Mary’s transaction that will not eventually be committed to the database. Note that the isolation property does not ensure which transaction will execute first, merely that they will not interfere with each other.

Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.


16) MongoDB

MongoDB is a NoSQL database, who stores key/value as its data.


17) Truncate vs. Delete from

TRUNCATE TABLE is a statement that quickly deletes all records in a table by deallocating the data pages used by the table. This reduces the resource overhead of logging the deletions, as well as the number of locks acquired; however, it bypasses the transaction log, and the only record of the truncation in the transaction logs is the page deallocation. Records removed by the TRUNCATE TABLE statement cannot be restored. You cannot specify a WHERE clause in a TRUNCATE TABLE statement-it is all or nothing. The advantage to using TRUNCATE TABLE is that in addition to removing all rows from the table it resets the IDENTITY back to the SEED, and the deallocated pages are returned to the system for use in other areas.

In addition, TRUNCATE TABLE statements cannot be used for tables involved in replication or log shipping, since both depend on the transaction log to keep remote databases consistent. 
TRUNCATE TABLE cannot used be used when a foreign key references the table to be truncated, since TRUNCATE statements do not fire triggers. This could result in inconsistent data because ON DELETE/UPDATE triggers would not fire. If all table rows need to be deleted and there is a foreign key referencing the table, you must drop the index and recreate it. If a TRUNCATE TABLE statement is issued against a table that has foreign key references, the following error is returned:


DELETE TABLE statements delete rows one at a time, logging each row in the transaction log, as well as maintaining log sequence number (LSN) information. Although this consumes more database resources and locks, these transactions can be rolled back if necessary. You can also specify a WHERE clause to narrow down the rows to be deleted. When you delete a large number of rows using a DELETE FROM statement, the table may hang on to the empty pages requiring manual release using DBCC SHRINKDATABASE (db_name). 

reference:

http://www.mssqltips.com/sqlservertip/1080/deleting-data-in-sql-server-with-truncate-vs-delete-commands

http://www.sqlservergeeks.com/blogs/RakeshMishra/sql-server-bi/76/sql-server-delete-vs-truncate


18) Linq to Entities vs. Linq to SQL

Linq-To-Sql - use this framework if you plan on editing a one-to-one relationship of your data in your presentation layer. Meaning you don't plan on combining data from more than one table in any one view or page.

Entity Framework - use this framework if you plan on combining data from more than one table in your view or page. To make this clearer, the above terms are specific to data that will be manipulated in your view or page, not just displayed. This is important to understand.


Performance comparison:

http://toomanylayers.blogspot.fr/2009/01/entity-framework-and-linq-to-sql.html

Read time:  DataReader < DataSet < Linq to SQL < Linq to Entites



你可能感兴趣的:(Knowledge Conclusion - PART III: Entity framework & SQL Server)