参考:
https://mariadb.com/kb/en/mariadb/documentation/storage-engines/connect/connect-table-types/connect-table-types-odbc-table-type-accessing-tables-from-other-dbms/
Note: On Linux, unixODBC must be installed.
These tables are given the type ODBC. For example, if a "Customers" table is contained in an Access™ database you can define it with a command such as:
create table Customer ( CustomerID varchar(5), CompanyName varchar(40), ContactName varchar(30), ContactTitle varchar(30), Address varchar(60), City varchar(15), Region varchar(15), PostalCode varchar(10), Country varchar(15), Phone varchar(24), Fax varchar(24)) engine=connect table_type=ODBC block_size=10 tabname='Customers' Connection='DSN=MS Access Database;DBQ=C:/Program Files/Microsoft Office/Office/1033/FPNWIND.MDB;';
Tabname option defaults to the table name. It is required if the source table name is different from the name of the CONNECT table.
Often, because CONNECT can retrieve the table description using ODBC catalog functions, the column definitions can be unspecified. For instance this table can be simply created as:
create table Customer engine=connect table_type=ODBC block_size=10 tabname='Customers' Connection='DSN=MS Access Database;DBQ=C:/Program Files/Microsoft Office/Office/1033/FPNWIND.MDB;';
The BLOCK_SIZE
specification will be used later to set the RowsetSize when retrieving rows from the ODBC table. A reasonably large RowsetSize can greatly accelerate the fetching process.
If you specify the column description, the column names of your table must exist in the data source table. However, you are not obliged to define all the data source columns and you can change the order of the columns. Some type conversion can also be done if appropriate. For instance, to access the FireBird sample table EMPLOYEE, you could define your table as:
create table empodbc ( EMP_NO smallint(5) not null, FULL_NAME varchar(37) not null), PHONE_EXT varchar(4) not null, HIRE_DATE date, DEPT_NO smallint(3) not null, JOB_COUNTRY varchar(15), SALARY double(12,2) not null) engine=CONNECT table_type=ODBC tabname='EMPLOYEE' connection='DSN=firebird';
This definition ignores the FIRST_NAME, LAST_NAME, JOB_CODE, and JOB_GRADE columns. It places the FULL_NAME last column of the original table in second position. The type of the HIRE_DATE column was changed from timestamp to date and the type of the DEPT_NO column was changed from char to integer.
Currently, some restrictions apply to ODBC tables:
Extracting data from an external database can of course be used to construct another table of any file format from a data source. For instance to construct a fixed formatted DOS table containing the CUSTOMER table data, create the table as
create table Custfix engine=connect File_name='customer.txt' table_type=fix block_size=20 as select * from customer;
Now you can use custfix for fast database operations on the copied customer table data.
ODBC can also be used to create tables based on tabular data belonging to an Excel spread sheet:
create table XLCONT engine=CONNECT table_type=ODBC tabname='CONTACT' Connection='DSN=Excel Files;DBQ=D:/Ber/Doc/Contact_BP.xls;';
This supposes that a tabular zone of the sheet including column headers is defined as a table named CONTACT. Refer to the Excel documentation for how to specify tables inside sheets. Once done, you can ask:
select * from xlcont;
This will extract the data from Excel and display:
Nom | Fonction | Societe |
---|---|---|
Boisseau Frederic | 9 Telecom | |
Martelliere Nicolas | Vidal SA (Groupe UBM) | |
Remy Agathe | Price Minister | |
Du Halgouet Tanguy | Danone | |
Vandamme Anna | GDF | |
Thomas Willy | Europ Assistance France | |
Thomas Dominique | Acoss (DG des URSSAF) | |
Thomas Berengere | Responsable SI Decisionnel | DEXIA Credit Local |
Husy Frederic | Responsable Decisionnel | Neuf Cegetel |
Lemonnier Nathalie | Directeur Marketing Client | Louis Vuitton |
Louis Loic | Reporting International Decisionnel | Accor |
Menseau Eric | Orange France |
Here again, the columns description was left to CONNECT when creating the table.
The concept of multiple tables can be extended to ODBC tables when they are physically represented by files, for instance to Excel or Access tables. The condition is that the connect string for the table must contain a field DBQ=filename, in which wildcard characters can be included as for multiple=1 tables in their filename. For instance, a table contained in several Excel files CA200401.xls, CA200402.xls, ...CA200412.xls can be created by a command such as:
create table ca04mul (Date char(19), Operation varchar(64), Debit double(15,2), Credit double(15,2)) engine=CONNECT table_type=ODBC multiple=1 qchar= '"' tabname='bank account' connection='DSN=Excel Files;DBQ=D:/Ber/CA/CA2004*.xls;';
Providing that in each file the applying information is internally set for Excel as a table named "bank account". This extension to ODBC does not support multiple=2. The qchar option was specified to make the identifiers quoted in the select statement sent to ODBC, in particular the when the table or column names contain blanks, to avoid SQL syntax errors.
Caution: Do not try to get tables belonging to the currently running MariaDB server via the MySQL ODBC connector. This does not work and causes the server to be restarted.
To avoid extracting entire tables from an ODBC source, which can be a lengthy process, CONNECT extracts the "compatible" part of query WHERE clauses and adds it to the ODBC query. Compatible means that it must be understood by the data source. In particular, clauses involving scalar functions are not kept because the data source may have different functions than MariaDB or use a different syntax. Of course, clauses involving sub-select are also skipped. This will transfer eventual indexing to the data source.
Take care with clauses involving string items because you may not know whether they are treated by the data source as case sensitive or case insensitive. If in doubt, make your queries as if the data source was processing strings as case sensitive to avoid incomplete results.
Instead of specifying a source table name via the TABNAME option, it is possible to retrieve data from a “view” whose definition is given in a new option SRCDEF. For instance:
CREATE TABLE custnum ( country varchar(15) NOT NULL, customers int(6) NOT NULL) ENGINE=CONNECT TABLE_TYPE=ODBC BLOCK_SIZE=10 CONNECTION='DSN=MS Access Database;DBQ=C:/Program Files/Microsoft Office/Office/1033/FPNWIND.MDB;' SRCDEF='select country, count(*) as customers from customers group by country';
Or simply, because CONNECT can retrieve the returned column definition:
CREATE TABLE custnum ENGINE=CONNECT TABLE_TYPE=ODBC BLOCK_SIZE=10 CONNECTION='DSN=MS Access Database;DBQ=C:/Program Files/Microsoft Office/Office/1033/FPNWIND.MDB;' SRCDEF='select country, count(*) as customers from customers group by country';
Then, when executing for instance:
select * from custnum where customers > 3;
The processing of the group by is done by the data source, which returns only the generated result set on which only the where clause is performed locally. The result:
country | customers |
---|---|
Brazil | 9 |
France | 11 |
Germany | 11 |
Mexico | 5 |
Spain | 5 |
UK | 7 |
USA | 13 |
Venezuela | 4 |
This makes possible to let the data source do complicated operations, such as joining several tables or executing procedures returning a result set. This minimizes the data transfer through ODBC.
The only data modifying operations are the INSERT , UPDATE and DELETE commands. They can be executed successfully only if the data source database or tables are not read/only.
When inserting values to an ODBC table, local values are used and sent to the ODBC table. This does not make any difference when the values are constant but in a query such as:
insert into t1 select * from t2;
Where t1 is an ODBC table, t2 is a locally defined table that must exist on the local server. Besides, it is a good way to create a distant ODBC table from local data.
CONNECT does not directly support INSERT commands such as:
insert into t1 values(2,'Deux') on duplicate key update msg = 'Two';
Sure enough, the “on duplicate key update” part of it is ignored, and will result in error if the key value is duplicated.
Unlike the INSERT command, UPDATE and DELETE are supported in a simplified way. They are just rephrased to correspond to the data source syntax and sent to the data source for execution. Let us suppose we created the table:
create table tolite ( id int(9) not null, nom varchar(12) not null, nais date default null, rem varchar(32) default null) ENGINE=CONNECT TABLE_TYPE=ODBC tabname='lite' CONNECTION='DSN=SQLite3 Datasource;Database=test.sqlite3' CHARSET=utf8 DATA_CHARSET=utf8;
We can populate it by:
insert into tlite values(1,'Toto',now(),'First'), (2,'Foo','2012-07-14','Second'),(4,'Machin','1968-05-30','Third');
The function now()
will be executed by MariaDB and it returned value sent to the ODBC table.
Let us see what happens when updating the table. If we use the query:
update `tolite` set nom = 'Gillespie' where id = 10;
CONNECT will rephrase the command as:
update "lite" set nom = 'Gillespie' where id = 10;
What it did is just to replace the local table name by the remote table name and change all the back ticks to the data source identifier quoting characters. Then this command will be sent to the data source to be executed by it.
This is simpler and can be faster than doing a positional update using a cursor and commands such as “select ... for update of ...” that are not supported by all data sources. However, there are some restrictions that must be understood due to the way it is handled by MariaDB.
This is possible because both MariaDB and the data source are using the SQL language. But you must use only the basic features that are part of the core SQL language. For instance, keywords like IGNORE or LOW_PRIORITY will cause syntax error with many data source.
Scalar function names also can be different, which severely restrict the use of them. For instance:
update tolite set nais = now() where id = 2;
This will not work with SQLite3, the data source returning an “unknown scalar function” error message. Note that in this particular case, you can rephrase it to:
update tolite set nais = date('now') where id = 2;
This understood by both parsers, and even if this function would return NULL executed by MariaDB, it does return the current date when executed by SQLite3. But this begins to become too trickery so to overcome all these restrictions, and permit to have all types of commands executed by the data source, CONNECT provides a specific ODBC table subtype described now.
This can be done using a special subtype of ODBC tables. Let us see this on an example:
create table crlite ( command varchar(128) not null, number int(5) not null flag=1, message varchar(255) flag=2) engine=connect table_type=odbc connection='Driver=SQLite3 ODBC Driver;Database=test.sqlite3;NoWCHAR=yes' option_list='Execsrc=1';
The key points in this create statement are the EXECSRC option and the column definition.
The EXECSRC option tells that this table will be used to send a command to the data source. Most of the sent commands do not return result set. Therefore, the table columns are used to specify the command to be executed and to get the result of the execution. The name of these columns can be chosen arbitrarily, their function coming from the FLAG value:
Flag=0: | The command to execute. |
Flag=1: | The affected rows, or -1 in case of error, or the result number of column if the command returns a result set. |
Flag=2: | The returned (eventually error) message. |
How to use this table and specify the command to send? By executing a command such as:
select * from crlite where command = 'a command';
This will send the command specified in the WHERE clause to the data source and return the result of its execution. The syntax of the WHERE clause must be exactly as shown above. For instance:
select * from crlite where command = 'CREATE TABLE lite ( ID integer primary key autoincrement, name char(12) not null, birth date, rem varchar(32))';
This command returns:
command | number | message |
---|---|---|
CREATE TABLE lite (ID integer primary key autoincrement, name... |
0 | Affected rows |
Now we can create a standard ODBC table on the newly created table:
CREATE TABLE tlite ENGINE=CONNECT TABLE_TYPE=ODBC tabname='lite' CONNECTION='Driver=SQLite3 ODBC Driver;Database=test.sqlite3;NoWCHAR=yes' CHARSET=utf8 DATA_CHARSET=utf8;
We can populate it directly using the supported INSERT statement:
insert into tlite(name,birth) values('Toto','2005-06-12'); insert into tlite(name,birth,rem) values('Foo',NULL,'No ID'); insert into tlite(name,birth) values('Truc','1998-10-27'); insert into tlite(name,birth,rem) values('John','1968-05-30','Last');
And see the result:
select * from tlite;
ID | name | birth | rem |
---|---|---|---|
1 | Toto | 2005-06-12 | NULL |
2 | Foo | NULL | No ID |
3 | Truc | 1998-10-27 | NULL |
4 | John | 1968-05-30 | Last |
Any command, for instance UPDATE, can be executed from the crlite table:
select * from crlite where command = 'update lite set birth = ''2012-07-14'' where ID = 2';
This command returns:
command | number | message |
---|---|---|
update lite set birth = '2012-07-15' where ID = 2 |
1 | Affected rows |
Let us verify it:
select * from tlite where ID = 2;
ID | name | birth | rem |
---|---|---|---|
2 | Foo | 2012-07-15 | No ID |
The syntax to send a command is rather strange and may seem unnatural. It is possible to use an easier syntax by defining a stored procedure such as:
create procedure send_cmd(cmd varchar(255)) MODIFIES SQL DATA select * from crlite where command = cmd;
Now you can send commands like this:
call send_cmd('drop tlite');
This is possible only when sending one single command.
Grouping commands uses is easier syntax and is faster because only one connection is made for the all of them. To send several commands in one call, use the following syntax:
select * from crlite where command in ( 'update lite set birth = ''2012-07-14'' where ID = 2', 'update lite set birth = ''2009-08-10'' where ID = 3');
When several commands are sent, the execution stops at the end of them or after a command that is in error. To continue after n errors, set the option maxerr=n (0 by default) in the option list.
Note 1: It is possible to specify the SRCDEF option when creating an EXECSRC table. It will be the command sent by default when a WHERE clause is not specified.
Note 2: Most data sources do not allow sending several commands separated by semi-colons.
Note 3: Quotes inside commands must be escaped. This can be avoided by using a different quoting character than the one used in the command
Note 4: The sent command must obey the data source syntax.
Note 5: Sent commands apply in the specified database. However, they can address any table within this database, or .belonging to another database using the name syntax schema.tabname.
This is sometimes the most difficult task when creating ODBC tables because, depending on the operating system and the data source, this string can widely differ.
The format of the ODBC Connection String is:
connection-string::= empty-string[;] | attribute[;] | attribute; connection-string empty-string ::= attribute ::= attribute-keyword=attribute-value | DRIVER=[{]attribute-value[}] attribute-keyword ::= DSN | UID | PWD | driver-defined-attribute-keyword attribute-value ::= character-string driver-defined-attribute-keyword = identifier
Where character-string has zero or more characters; identifier has one or more characters; attribute- keyword is not case-sensitive; attribute-value may be case-sensitive; and the value of the DSN keyword does not consist solely of blanks. Due to the connection string grammar, keywords and attribute values that contain the characters []{}(),;?*=!@
should be avoided. The value of the DSN keyword cannot consist only of blanks, and should not contain leading blanks. Because of the grammar of the system information, keywords and data source names cannot contain the backslash (\) character. Applications do not have to add braces around the attribute value after the DRIVER keyword unless the attribute contains a semicolon (;), in which case the braces are required. If the attribute value that the driver receives includes the braces, the driver should not remove them, but they should be part of the returned connection string.
The ODBC defined attributes are:
Other attributes are DSN dependent attributes. The connection string can give the name of the driver in the DRIVER field or the data source in the DSN field (attention! meet the spelling and case) and has other fields that depend on the data source. When specifying a file, the DBQ field must give the full path and name of the file containing the table. Refer to the specific ODBC connector documentation for the exact syntax of the connection string.
If you get an error on Linux/Unix when using TABLE_TYPE=ODBC:
Error Code: 1105 [unixODBC][Driver Manager]Can't open lib '/usr/cachesys/bin/libcacheodbc.so' : file not found
You must make sure that the user running mysqld (usually "mysql") has enough permission to load the ODBC driver library. It can happen that the driver file does not have enough read privileges (use chmod to fix this), or loading is prevented by SELinux configuration.
Try this command in shell to check if the driver had enough permission:
sudo -u mysql ldd /usr/cachesys/bin/libcacheodbc.so
And/or check the SELinux logs for records like this:
type=AVC msg=audit(1384890085.406:76): avc: denied { execute } for pid=1433 comm="mysqld" path="/usr/cachesys/bin/libcacheodbc.so" dev=dm-0 ino=3279212 scontext=unconfined_u:system_r:mysqld_t:s0 tcontext=unconfined_u:object_r:usr_t:s0 tclass=file
Depending on the version of the used ODBC driver, some additional information on the tables are existing, such as table QUALIFIER or OWNER for old versions, now named CATALOG or SCHEMA since version 3.
CATALOG is apparently rarely used by most data sources, but SCHEMA (formerly OWNER) is and corresponds to the DATABASE information of MySQL.
The issue is that if no schema name is specified, some data sources return information for all schemas while some others only return the information of the “default” schema. In addition, the used “schema” or “database” is sometimes implied by the connection string and sometimes is not. Sometimes, it also can be included in a data source definition.
CONNECT offers two ways to specify this information:
When both are used, the qualified table name has precedence over DBNAME . For instance:
Tabname | DBname | Description |
---|---|---|
test.t1 | The t1 table of the test schema. | |
test.t1 | mydb | The t1 table of the test schema (test has precedence) |
t1 | mydb | The t1 table of the mydb schema |
%.%.% | All tables in all catalogs and all schemas | |
t1 | The t1 table in the default or all schema depending on the DSN | |
%.t1 | The t1 table in all schemas for all DSN | |
test.% | All tables in the test schema |
When creating a standard ODBC table, you should make sure only one source table is specified. Specifying more than one source table must be done only for CONNECT catalog tables (with CATFUNC=tables or columns)
Another issue when dealing with ODBC tables is the way table and column names are handled regarding of the case.
For instance, Oracle follows to the SQL standard here. It converts non-quoted identifiers to upper case. This is correct and expected. PostgreSQL is not standard. It converts identifiers to lower case. MySQL/MariaDB is not standard. They preserve identifiers on Linux, and convert to lower case on Windows.
Think about that if you fail to see a table or a column on an ODBC data source.