Portable SQL

From Devpit
Jump to: navigation, search

There's no such thing as Portable SQL. However, since few of us are lucky enough to pick the right database before doing significant development, and often we need to support multiple for uncontrollable reasons, *sigh* here we are. That said, we can come fairly close by sticking to a subset of portable things. These get us 90% of what we need. Unfortunately, the rest just has to be implemented with kludges (although this can be easier if the kludges are abstracted). Most languages have libraries to help you out with this. Check the See Also section below for these.

Editor's note: Go ahead and add syntax for other databases. Note also, much of this is unverified. It comes from someone with a MySQL background that has messed with other databases a bit. If someone can verify this, please do.

Quoting

  • PostgreSQL:
    • Literals: Single-quotes
    • Column names: Unadorned or Double-quotes
  • MySQL:
    • Literals: Single-quotes or Double-quotes
    • Column names: Unadorned or Back-quotes or Double-quotes in ANSI mode

Caveats:

  • Single-quotes work with almost all databases (e.g. Firebird, MSSQL, MySQL, Oracle, PostgreSQL, SQLite, Sybase). Be careful that you only quote strings, not numbers, decimals, etc. as at least MSSQL and Sybase forbit single-quotes for them. Newer versions of MySQL and probably other database server also have problems with quotes around numbers which results in degraded performance. If your database server supports ?-parameters, it'll handle all the quoting for you, and you can simply pass literals the same way you pass variables.
  • Either quote column/table names never or always if you ensure that the connection is set to ANSI mode; and always use the same case for these. This is primarily because different databases quote them in mutually exclusive ways (for example, MySQL requires back-ticks by default and PostgreSQL requires double-quotes). Also, some databases are case-sensitive and some are not. In fact, some databases are case-sensitive for quoted names but not for unquoted, and some actually change the case of the names. For example, it may be case sensitive and capitalize unquoted names, normalizing foo to Foo, but leaving "foo" as foo, distinctly. If you decide to never quote names, use names that are very unlikely to become a reserved SQL word, e.g. bm_link.

SQL Commands

UPDATE

Easy ones first. Update isn't too bad. Be careful of the where-clause (see below).

UPDATE table SET column1 = value1, column2 = value2 WHERE where-clause

DELETE

Delete isn't bad either. Again, be careful of the where-clause.

DELETE FROM table WHERE where-clause

INSERT

Always use this syntax:

INSERT INTO table (column1, column2) VALUES (value1, value2)

Caveats

  • There's no portable auto-increment column, but with careful CREATE statements, this might not be an issue.
    • MySQL uses autoincrement in the column spec: id int NOT NULL auto_increment
    • PostgreSQL uses serial, which silently creates a sequence. As the sequence is not logically part part of the table and inherits no privileges from the table, you have to specifically grant UPDATE privileges on the sequence (along with INSERT privileges on the table) before a user can insert into tables that use the sequence.
    • DB2 uses generated always as identity in the column spec.
    • Sybase uses identity in the column spec.
  • Things like "INSERT INTO table SET column1 = value1" may be easier to read in MySQL but won't work everywhere.
  • It is not portable to combine multiple rows into one insert.
  • Did we mention, "Be careful of the where-clause"?

SELECT

Select is terrible. While the others generally have only a few kludges, select is mostly a kludge. The basic form is portable, but beyond that, all bets are off.

SELECT [DISTINCT] table1.column1, table2.column2 FROM table1, table2 where-clause

XXX: more work ...

Caveats:

  • Avoid using built-in functions in the column list at all costs. For example, MySQL's UNIX_TIMESTAMP() function will not work in most other databases. Usually there's no benefit to doing conversions inside the query anyway (and if you do it wrong it'll mess up possible indexing), so just do the conversion outside the SQL statement.
  • DISTINCT works on MySQL, PostgreSQL, and MSSQL (and probably more). Avoid the synonym DISTINCTROW.
  • SELECT * and SELECT table.* work on MySQL, PostgreSQL, and MSSQL (and probably more).

GROUP BY

  • If you use aggregate functions in your query, then a rule of SQL says you must include all non-aggregate selected columns in the GROUP BY clause. This means if you write
  SELECT foo, bar, max(baz) FROM some_table 

the database should complain. Both MySQL and Sybase let you get away with simply having a GROUP BY clause - they do not verify that the clause is complete. However, results can be seriously strange. On MySQL results will be more or less what you expect as long as the columns you omit from the GROUP BY depend fully on the ones contained in the GROUP BY. Thus if you write

  SELECT foo, bar, max(baz) FROM some_table GROUP BY foo

and foo is, say, a primary key of some_table, then in general the results will be the same as if you had grouped by foo and bar. However, this is only true in general - I have seen examples on Sybase where the results where definitely not the same.

  • You can give titles to what you select using AS:
  SELECT foo, bar, max(baz) AS 'Maximum' FROM some_table GROUP BY foo, bar
  • However, you cannot use these titles in the GROUP BY clause:
  SELECT foo AS 'xyzzy', bar, max(baz) AS 'Maximum' FROM some_table GROUP BY xyzzy, bar DOES NOT WORK.
  • On MySQL and most other databases, you can use integer column indexes instead of column names, so if you have a complicated column that you do not know how to refer to, this works:
  SELECT foo, (SELECT x FROM y WHERE a = foo), max(bar) FROM some_table GROUP BY 1, 2

(whether or not it makes sense is, of course, another matter ...)

  • Sybase, at least in ASE version 11.x, will not accept this syntax.

WHERE

  • Avoid using built-in functions in where clauses at all costs. For example, MySQL's FROM_UNIXTIME() function will not work in most other databases. Usually there's no benefit to doing conversions inside the query anyway (and if you do it wrong it'll mess up possible indexing), so just do the conversion outside the SQL statement.
  • Limiting queries: (from http://forums.mysql.com/read.php?60,4515,4678#msg-4678 )
    • DB2: SELECT * FROM table FETCH FIRST 10 ROWS ONLY
    • Informix: SELECT FIRST 10 * FROM table
    • Microsoft SQL Server and Access: SELECT TOP 10 * FROM table
    • MySQL and PostgreSQL: SELECT * FROM table LIMIT 10
    • Oracle 8i: SELECT * FROM (SELECT * FROM table) WHERE rownum <= 10
    • Sybase: set rowcount 10 before the query. Remember to reset this by executing set rowcount 0 afterwards.

CREATE

You could write a book on this one. It is usually impossible to write a portable table schema. This means it's also very unmaintainable to modify table schemas on the fly. XXX: more work...

Some tips on porting your schema to another database:

  • There's no portable date column. Different databases use different date formats. Some may try to apply time zone or daylight-saving time conversions. Use char(24) unless you are sure your code will handle the format and you're sure the database won't change the times. Under MySQL, this usually means using the --timezone=UTC0 option. Always store dates in the format "2005-12-09 00:05:35 UTC", including the zero-padding, so that sorting works properly. Never store local-times in a database. Always convert to UTC before storage, and back to local time after retrieval. Given this invariant, the trailing "UTC" is obviously not necessary, but it is a nice fail-safe in case someone forgets to specify UTC instead of local time, for example, when converting the string into epoch-seconds. If you really want it:
    • Storing dates with times: MySQL calls it datetime. PostgreSQL calls it timestamp.
    • Storing just dates: MySQL calls it date
    • If a database doesn't support an equivalent or uses an inconvenient format, you may be stuck with char(10) for dates or char(24) for times.
  • There's no portable enumeration type.
    • MySQL: enum('one', 'two', 'three') NOT NULL default 'one'
    • PostgreSQL: varchar(xxx) NOT NULL default 'one' CHECK (column = 'one' OR column = 'two' OR column = 'three')
    • If a database doesn't support an equivalent, you could create a table to list the possible values and use foreign keys to this table instead of enums. If this is too unwieldy, you may be stuck using an unchecked char field.
  • Foreign keys:
    • MySQL requires an index on both the referencing and referenced columns.
    • PostgreSQL requires on index on the referenced column.
  • Don't specify integer widths. For example, when converting from a MySQL schema, change "int(11)" to just "int".
  • Specify unique keys in the table as follows: UNIQUE KEY (column-name)
  • Create non-unique indexes with a separate statement as follows: CREATE INDEX tablename_columnname_key ON tablename (columnlist)
  • There's no portable auto-increment column.
    • MySQL uses autoincrement in the column spec: id int NOT NULL auto_increment
    • PostgreSQL uses serial, which silently creates a sequence. As the sequence is not logically part part of the table and inherits no privileges from the table, you have to specifically grant UPDATE privileges on the sequence (along with INSERT privileges on the table) before a user can insert into tables that use the sequence.

See Also