Gadfly SQL constructs
=====================

:Version: $Revision: 1.4 $

This  document  describes  SQL  constructs supported by Gadfly. The
presentation  does not define the complete syntax -- see sqlgram.py
for  the precise syntax as BNF -- nor the complete semantics -- see
a  good  book on SQL for more detailed coverage of semantic (or use
the  source,  Luke  ;c)  ). Also, please have a look at my evolving
database  course  notes  for  more coverage of SQL. Examples of all
supported  constructs  are also shown in the test suite source file
gftest.py.  This  document is only a very brief guide, primarily of
use  to  those  who already understand something about SQL -- it is
neither  a  tutorial  nor  a  detailed  discussion  of  syntax  and
semantics.

.. contents::


The Standard, with omissions
----------------------------

Gadfly supports a large subset of ODBC 2.0 SQL. The reason ODBC 2.0 is
chosen  is  because it provides a fairly strong set of constructs, but
does  not  include  some  of  the  more  obscure features of other SQL
standards  which would be extremely difficult and complex to implement
correctly (and perhaps, not used very frequently (?)).

Supported  features  include  views,  groupings,  aggregates, subquery
expressions,  quantified  subquery  comparisons,  EXISTS,  IN,  UNION,
EXCEPT,  INTERSECT,  searched mutations and Indices, among others (see
below).

Some important omissions from ODBC 2.0 at this point are

- Nulls.
- Outer joins.
- CHECK conditions.
- Enforced data type constraints.
- Alter table (can't implement until NULLs arrive).
- Date, Time, and Interval data types

It is hoped these will be implemented at some future time.

Less important omissions include

Cursor based updates and deletes
  (justification: if you really need them the db design
  is flawed, and it's possible to use python instead).
LIKE string predicate
  (justification: use Python regexes in python code).
Users and permissions
  (justification: Gadfly is not intended for full multiuser use at this time).

These may or may not be implemented at some future time.


Statements
----------

All  interaction  with SQL databases is mediated by SQL statements, or
statement  sequences.  Statement sequences are statements separated by
semicolons. SQL keywords and user defined names are not case sensitive
(but string values are, of course).

SQL statements include the following.

Select Statement
~~~~~~~~~~~~~~~~

The select statement derives a table from tables in the database. It's
general form is::

    sub_query
      optorder_by

Where sub_query is given by::

    SELECT alldistinct select_list
    FROM table_reference_list
    optwhere
    optgroup
    opthaving
    optunion

Read the statement::

    SELECT [DISTINCT|ALL] expressions or *
    FROM tables
    [WHERE condition]
    [GROUP BY group-expressions]
    [HAVING aggregate-condition]
    [union-clause]
    [ORDER BY columns]

as follows:

1) Make all combinations of rows from the tables (FROM line)
2) Eliminate those combinations not satisfying condition (WHERE line)
3) (if GROUP present) form aggregate groups that match on group-expressions
4) (if HAVING present) eliminate aggregate groups that don't satisfy
   the aggregate-condition.
5) compute the columns to keep (SELECT line).
6) (if union-clause present) combine (union, except, intersect)
   the result with the result of another select statement.
7) if DISTINCT, throw out redundant entries.
8) (if ORDER present) order the result by the columns (ascending
   or descending as specified, with precedence as listed). Note that columns
   in the ORDER BY clause must also be present in the SELECT clause.

This  reading has little to do with the actual implementation, but the
answer produced should match this intuitive reading.

Table Aliases
~~~~~~~~~~~~~

You can assign an alias to each table named in the FROM clause of each
SELECT  statement.  This provides a shortcut to the  tables  when  you 
refer to them in the other clauses in the statement. 

Table aliases are  especially useful  when your table names  are quite 
long,  and  consequently  meaningful.  They  help  the   conscientious 
developer avoid the temptation to use  short and potentially confusing
table names like 'a' or 'cust'.

For example, instead of a query like::

    SELECT customer_orders.order_number
          ,customer_orders.order_total
          ,customer_addresses.postal_code
    FROM   customer_orders
          ,customer_addresses
    WHERE  customer_orders.customer_id=customer_addresses.customer_id;

You can use aliases::

    SELECT co.order_number
          ,co.order_total
          ,ca.postal_code
    FROM   customer_orders co
          ,customer_addresses ca
    WHERE  co.customer_id=ca.customer_id;

Create and drop table
~~~~~~~~~~~~~~~~~~~~~

The  create  and  drop table constructs initialize and destroy a table
structure, respectively::

    CREATE TABLE user_defined_name ( colelts )

    DROP TABLE user_defined_name

The  colelts  declare the names of the columns for the table and their
data  types.  The data types are not checked or enforced in any way at
this time.

Table mutations (INSERT, UPDATE, DELETE)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Insert,  Update, and Delete statements insert rows into tables, modify
rows in tables in place, or remove rows from tables respectively::

   INSERT INTO table_name optcolids insert_spec

   DELETE FROM user_defined_name optwhere

   UPDATE user_defined_name
   SET assns
   optwhere

The  insert statement has two variants (in this implementation) INSERT
sub-select and INSERT VALUES::

   insert into r (a,b,c) select a,b,c from s

   insert into r (a,b,c) values (1,2,3)

The  first  inserts  the  result of a SELECT statement into the target
table  and  the  other  inserts  explicit values (which may be dynamic
parameters, see below).

Cursor based updates are not supported at the SQL level, eg::

   update r set a=1 where current of curs

is not supported.

Indices
~~~~~~~

The  create  and  drop  index  statements initialize and destroy index
structures respectively::

    CREATE INDEX user_defined_name
    ON user_defined_name
    ( namelist )

    DROP INDEX user_defined_name

Indices  allow fast access to a table, based on values for the indexed
columns in the namelist.

Indices can be UNIQUE, meaning that the attributes of the index cannot
take on the same values in the table twice::

    CREATE UNIQUE INDEX user_defined_name
    ON user_defined_name ( namelist )

Unique  indices  can  be  used  to  enforce  primary and secondary key
constraints.  After  a UNIQUE index on a table is created inserts that
attempt  to  insert  repeat  values  for  the  indexed columns will be
rejected.

Views
~~~~~

Create  view  and  drop  view  statements  initialize  and drop views,
respectively::

    CREATE VIEW user_defined_name optnamelist
    AS select_statement

    DROP VIEW user_defined_name

Views  are  "derived  tables"  which  are  defined  as  stored  SELECT
statements.  They  can  be  used as tables, except that they cannot be
directly mutated.

It  is  possible  to  "implement your own views in Python". Please see
remotetest.py, gfintrospect and the FAQ for discussion.

Conditions
----------

Conditions  are  truth  valued  boolean  expressions formed from basic
conditions possibly combined using NOT, AND, OR (where NOT has highest
precedence and OR has lowest precedence) and parentheses.

Basic conditions include simple comparisons::

    expression = expression
    expression < expression
    expression > expression
    expression <= expression
    expression >= expression
    expression <> expression

Variants  of  the  simple  comparisons  are  the  quantified  subquery
comparisons::

    expression = ANY ( subquery )
    expression = ALL ( subquery )

(and  similarly  for the other comparison operators). The IN predicate
tests membership (like =ANY)::

    expression IN ( subquery )
    expression NOT IN ( subquery )

For all the quantified comparisons and IN the subquery must generate a
single column table.

Also included are the the BETWEEN and NOT BETWEEN predicates::

    expression BETWEEN expression AND expression
    expression NOT BETWEEN expression AND expression

The  most  general  subquery  predicate is EXISTS and NOT EXISTS which
places no restriction on the subquery::

    EXISTS (subquery)
    NOT EXISTS (subquery)

Expressions
-----------

Expressions  occur  in  conditions  (WHERE,  HAVING,  etc.), in UPDATE
searched assignments, and in the select list of select statements.

Expressions  are  formed  from  primary expressions, possibly combined
using  the  standard  arithmetic  operators  and  parentheses with the
normal precedence.

Primary  expressions  include  numeric  and  string  literals. Numeric
literals  supported  are the Python numeric literals. String constants
are  set  off  by  apostrophies,  where  two  apostrophe's in sequence
represent an apostrophy in the string:
'SQL string literals ain''t pretty'

Column name expressions may be unqualified if they are unambiguous, or
may be qualified with a table name or table alias::

    bar
    frequents.bar
    f.bar

The  rules  for  scoping  of column names are not covered here. Column
names  in  subqueries  may refer to bindings in the query (or queries)
that contain the sub-query.

Subquery expressions of form::

    ( select_statement )

must produce a single column and single row table.

Aggregate  operations  are only permitted in the select list or in the
HAVING condition of SELECT statements (including subselects)::

    COUNT(*)
    COUNT(expression)
    AVG(expression)
    MAX(expression)
    SUM(expression)
    MIN(expression)

and also including the non-standard extension MEDIAN::

    MEDIAN(expression)

Aggregate operations can be applied to distinct values as in::

    COUNT(DISTINCT expression)

The  Dynamic  expression  "?"  is  a  placeholder for a value bound at
evaluation  time  (from  Python values). See the gadfly overview_ doc
"Dynamic Values" section for more details on the use of dynamic parameters.

.. _overview: overview.html

Object Names
------------

Table, column, index and view names must only be  comprised  of  valid
characters. These are currently any letter or the underscore character 
"_".  Note  that  names  are  case  insensitive  so  the  table  names 
'Customers', 'CUSTOMERS' and 'customers' are identical.

Reserved Words
~~~~~~~~~~~~~~

A number of  words and phrases in Gadfly are reserved.  They cannot be
used as the names of tables,  columns,  indexes or views.  These words 
are only illegal if they are used as the sole identifier. To make them
legal  simply  add  another  letter  or  other legal character and the 
parser will accept them.

The following words are currently reserved::

    INDEX ON ANY IN VIEW AS
    EXCEPT INTERSECT
    EXISTS AVG COUNT MAX MIN SUM MEDIAN
    UPDATE DROP DELETE FROM SET
    INSERT INTO VALUES CREATE TABLE INTEGER FLOAT VARCHAR
    AND OR NOT
    SELECT FROM WHERE HAVING GROUP BY UNION ALL DISTINCT AS ORDER
    ASC DESC BETWEEN UNIQUE

If you do use a reserved word as an identifier a SyntaxError exception
will be raised, with an error message of 'unexpected token sequence'.

Tutorials
---------

If you aren't familiar with SQL, or need to brush up on your knowledge
try a tutorial.  A  list  is available courtesy of Google and the Open
Directory Project here_

.. _here: http://directory.google.com/Top/Computers/Programming/Languages/SQL/FAQs,_Help,_and_Tutorials/
