Login

As with anything, the BEST way is to write lots of complex SQL statements yourself. Eventually the general way things are structured becomes apparent. Of course, if you're looking for something quick that probably isn't the way.

White space is very important. A query that looks incredibly complex can look almost simplistic when the proper white space is present.

As to the joins... Well, I'm sorry but I can't be very helpful here, because my answer is that the best way to understand a particular join is to understand how joins work in general. Each type of join serves a very specific purpose and if you know how they work, there shouldn't really be much of a difference from joining x to y, x to y to z, or x and y to a and b.

What may help more immediately, however, is knowing that you need to look at the innermost pieces first. As opposed to code where you're probably used to looking at things on the grand scale then digging into the granularity, with a query it's more helpful and easy to understand if you start with the granularity and work your way outward.

Start with any subqueries, figure out what they're doing in individual pieces treating it as a single query (if possible) then gradually move out step by step until you're at the top. Once again, on the joins... Really, just go find a web page that explains joins and do some tests until you fully understand them. There's not really a way to make that easier, as once you understand them you can pretty much figure out anything with joins that you want.

Here's a procedure to follow to unravel a query.

1. First I format the SQL.
2. Then I comment out all parts of the SQL other than the basic parts of the most primary or most important table to answer the question.
3. Then I will start uncommenting the joins, select columns, groupings, order fields, & filters to issolate different parts of the query to see what is happening. Or highlighted-execution works in some tools.
4. Subqueries can usually be executed independently.

Executing each of these usually allows me to get a better grip on what is happening in the query.

I break it down into smaller queries (that's why I like sub-queries more than JOINs)

Sometimes I even save the results of the sub-query as a table and use that in the main query. It's somewhat like simplifying a code expression by saving bits into local variables and then operating on the local variables in the next part of the expression.

I am fanatic about always using table aliases (e.g. CLIENT_HISTORY_ITEM T1) and parentheses around criteria expressions. I often change the table alias number by ten or so for each part of the query so I can see what is coming from where:

SELECT T1.ID
FROM TABLE1 T1
WHERE T1.DATE =
(SELECT MAX(T11.DATE)
FROM TABLE1 T11
WHERE (T1.AREA = T11.AREA))

Cheers

Query optimizers can handle a lot, including implementing your sub-query as a join. These days, they can even handle non-correlated sub-queries.

Clarity is more important than performance in most cases, and sub-queries are easier to debug.

BTW: why do you use confusing table aliases?

When I look at a complex bit of SQL Code, this is what I do.

First, if it is an update or delete, I add code (if it isn't there and commented out) to make it a select. Never try an update or delete for the first time without seeing the results in a select first. If it is an update, I make sure the select shows the current value and what I will be setting it to in order to make sure that I'm getting the desired result.

Understanding the joins is critical to understanding complex SQL. For every join I ask myself why is this here? There are four basic reasons. You need a column for the select, you need a field for the where clause, you need the join as a bridge to a third table, or you need to join to the table to filter records (such as retrieving details on customer who have orders but not needing the order details, this can often be done better with an IF EXISTS where clause). If it is a left or right join (I tend to rewrite so everything is a left join which makes life simpler.), I consider whether an inner join would work. Why do I need a left join? If I don't know the answer, I will run it both ways and see what the difference is within the data. If there are derived tables, I will look at those first (running just that part of the select to see what the result is) to understand why it is there. If there are sub-queries, I will try to understand them and if they are slow will try to convert to a derived table instead as those are often much faster.

Next, I look at the `where` clauses. This is one place where a solid foundation in your particular database will come in handy. For instance, I know in my databases what occasions I might need to see only the mailing address and what occasions I might need to see other addresses. This helps me to know if something is missing from the where clause. Otherwise I consider each item in the `where` clause and figure out why it would need to be there, then I consider whether there is anything missing that should be there. After looking it over, I consider if I can make adjustments to make the query sargable.

I also consider any complex bits of the select list next. What does that case statement do? Why is there a subquery? What do those functions do? (I always look up the function code for any function I'm not already familiar with.) Why is there a distinct? Can it be gotten rid of by using a derived table or aggregate function and group by statements?

**Finally and MOST important**, I run the select and determine if the results look correct based on my knowledge of the business. *If you don't understand your business, you won't know if the query is correct*. Syntactically correct doesn't mean the right results. Often there is a part of your existing user interface that you can use as a guide to whether your results are correct. If I have a screen that shows the orders for a customer and I'm doing a report that includes the customer orders, I might spot check a few individual customers to make sure it is showing the right result.

If the current query is filtering incorrectly, I will remove bits of it to find out what is getting rid of the records I don't want or adding ones I don't want. Often you will find that the join is one to many and you need one to one (use a derived table in this case!) or you will find that some piece of information that you think you need in the `where` clause isn't true for all the data you need or that some piece of the `where` clause is missing. It helps to have all the fields in the `where` clause (if they weren't in the select already) in the select at the time you do this. It may even help to show all the fields from all the joined tables and really look at the data. When I do this, I often add a small bit to the where clause to grab just some of the records that I have that shouldn't be there rather than all the records.

One sneaky thing that will break a lot of queries is the `where` clause referencing a field in a table on the right side of a left join. That turns it into an inner join. If you really need a left join, you should add those kinds of conditions to the join itself.

I find going back to the logical query processing phases and unpicking the query bit by bit with sample data is often helpful.

(The following is borrowed from Inside Microsoft SQL Server 2005: T-SQL Querying, by Itzik Ben-Gan.)

(8) SELECT (9) DISTINCT (11) <TOP_specification> <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) WITH {CUBE | ROLLUP}
(7) HAVING <having_condition>
(10) ORDER BY <order_by_list>

1. **FROM**: A Cartesian product (cross join) is performed between the first two tables in the
FROM clause, and as a result, virtual table VT1 is generated.
2. **ON**: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE
are inserted to VT2.
3. **OUTER (join)**: If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an
INNER JOIN), rows from the preserved table or tables for which a match was not found
are added to the rows from VT2 as outer rows, generating VT3. If more than two tables
appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result
of the last join and the next table in the FROM clause until all tables are processed.
4. **WHERE**: The WHERE filter is applied to VT3. Only rows for which the <where_condition>
is TRUE are inserted to VT4.
5. **GROUP BY**: The rows from VT4 are arranged in groups based on the column list specified
in the GROUP BY clause. VT5 is generated.
6. **CUBE | ROLLUP**: Supergroups (groups of groups) are added to the rows from VT5,
generating VT6.
7. **HAVING**: The HAVING filter is applied to VT6. Only groups for which the
<having_condition> is TRUE are inserted to VT7.
8. **SELECT**: The SELECT list is processed, generating VT8.
9. **DISTINCT**: Duplicate rows are removed from VT8. VT9 is generated.
10. **ORDER BY**: The rows from VT9 are sorted according to the column list specified in the
ORDER BY clause. A cursor is generated (VC10).
11. **TOP**: The specified number or percentage of rows is selected from the beginning of
VC10. Table VT11 is generated and returned to the caller.

Indentation and comments help a lot.
The most valuable thing I have run into is the WITH statement. It is in Oracle, and deals with subquery refactoring. It allows you to break a large query, into a set of seemingly smaller ones. Each just a bit more manageable.

Here is an example

WITH
ssnInfo AS
(
SELECT SSN,
UPPER(LAST_NAME) LAST_NAME,
UPPER(FIRST_NAME) FIRST_NAME,
TAXABLE_INCOME,
CHARITABLE_DONATIONS
FROM IRS_MASTER_FILE
WHERE STATE = 'MN' AND -- limit to in-state
TAXABLE_INCOME > 250000 AND -- is rich
CHARITABLE_DONATIONS > 5000 -- might donate too

),
doltishApplicants AS
(
SELECT SSN,
SAT_SCORE,
SUBMISSION_DATE
FROM COLLEGE_ADMISSIONS
WHERE SAT_SCORE < 100 -- About as smart as a Moose.
),
todaysAdmissions AS
(
SELECT doltishApplicants.SSN,
TRUNC(SUBMISSION_DATE) SUBMIT_DATE,
FIRST_NAME
FROM ssnInfo,
doltishApplicants
WHERE ssnInfo.SSN = doltishApplicants.SSN

)
SELECT 'Dear ' || FIRST_NAME ||
' your admission to WhatsaMattaU has been accepted.'
FROM todaysAdmissions
WHERE SUBMIT_DATE = TRUNC(SYSDATE) -- For stuff received today only
;

The same thing can be done with inline views, but the with also has the ability to create temporary tables when needed. In some of the cases, you can copy out the subquery and execute it, outside the context of the large query.

This form also allows you to put the filter clauses with the individual subquery, and save the joining clauses for the final select.

At work, our development group generally finds them easier to maintain, and frequently faster.

Mrcomponent4

wernerites769673

bannerman610

rhodiehn

sick223

nickpoarcla

subsidizes34874