There are two main facets to SELECT efficiency: the size of the result set and the effort required by the DBMS engine. Remember that the result of a SELECT is sent as one huge string to the workstation. If your SELECT results in 30MB of data, that is what is sent over the network. Bury this kind of thing in your application and suddenly all your workstations are shifting this kind of data continually. There is a maxim: any SELECT which takes over 1 sec is a bad SELECT. Think on this…
1. What is wrong with SELECT * FROM ?
Essentially everything. Although it is harmless for tables with few fields, mostly this is not the case. Where you have memos or large strings, it is very expensive and thus a bad habit to get into. Get used to specifying only those fields necessary for the next operation on the recordset. It is especially important for browser controls because they tend to access every row. The code is simple:
|
TSQL Statement |
GCS SQLServer Class |
|
SELECT Entrydate, Surname, Balance
FROM Mytable |
oServer := MyTable{}
oServer:SetFields(“Entrydate, Surname, Balance”) |
2. Always limit the Recordset.
This is always important for SELECT efficiency. Use a WHERE clause every single time to reduce the number of rows to only those required. Most often the problem with ex DBF’ers is simply that they forget to limit the recordset or throw entire tables at browsers. It is a different mindset that you have to evolve and you need to bring your users along with you. Here are some examples:
|
TSQL Statement |
GCS SQLServer Class |
|
// get just the lowest date
SELECT TOP 1 entrydate FROM Mytable
ORDER BY entrydate
// get just the highest date
SELECT TOP 1 entrydate FROM Mytable
ORDER BY entrydate DESC
// get all invoices for January in date order
SELECT entrydate FROM Mytable
WHERE entrydate >= ‘20070101’ AND entrydate <= ‘20070131’
ORDER BY entrydate DESC
|
oServer:SetOrder(“entrydate”)
oServer:SelectTopRow()
oServer:SetOrder(“entrydate”)
oServer:SelectBottomRow()
oServer:SetOrder(“entrydate”)
oServer:SetScopes(‘20070101’,’20070131’) *
* In the GCS SQL Server classes it is just as efficient to use a filter as it is to set order scopes. This is because both mechanisms become WHERE clauses. Thus our code could have been oServer:SetFilter(xxxx). |
3. When do we use the ORDER BY clause?
Answer: only when you have to. Again, the ORDER BY clause is very expensive in terms of DBMS processing time and the more columns you add to the ORDER BY, the slower will be the result set. So, if row order is not important to the next operation, do not order the SELECT. With our GCS SQL Server classes we tend to do the DBF thing and always specify an opening order. Where you can, call SetOrder(0) to remove the ORDER BY out of ensuing selects. Remember also that indexes are implicit in SQL, meaning that you don’t need to know you have them. The DBMS will analyse every query and automatically choose an index that suits the operation. Your ORDER BY will influence this but the WHERE clause has a much strong impact. You have already seen an example above.
2. Two Part and Three Part names.
This applies from the moment you use a join or want to access an object in another schema or database. The fully qualified column name for a table looks like this:
Mydatabase.myschema.mytable.myfield
You are able to drop items from the left and use current defaults. dbo is the default schema if none is supplied. Most people set the current database and the current schema (if applicable) and just specify the table and column. Here is a simple example which also uses table aliases for readability:
Select IL.entrydate, IL.invcode, IH.invdesc, IL.balance from invoicelines IL
Left outer join invoiceheader IH on IH.invcode=IL.invcode
(* note… the AS keyword is redundant for creating aliases).
If all the fields except invcode were unique to each table, we could have written the filed list as follows: Select entrydate, IL.invcode, invdesc, balance. Intrinsically there is nothing wrong with this except that you are forcing the DBMS engine to resolve the correct table and this can be a performance hit. The reason is that the DBMS engine creates an execution plan from the query and will attempt to reuse this plan the next time it gets the same sort of query. When you do not fully qualify the column names, a new plan is built every time regardless.
When using the GCS SQL Server classes you do not have to worry about this because internally, the class code automatically uses fully qualified column and table names.
4. Using Functions in the SELECT statement.
This is another one of the convenience vs performance things. Functions in the statement slow down performance substantially so unless the function is implicit to the retrieval of the data, try not to use one. Most importantly, keep them out of WHERE clauses because the function will need to be performed on all rows in the table. When the function is in the field list, only retrieved rows are operated on. For example, here are some trivialised examples to demonstrate good and bad.
Expression functions such as IN(), CONTAINS and Between are most expensive. Two-way or scalar outcomes are not so bad.
|
Bad |
Good |
|
Select price*qty as value from mytable where… |
Select price, qty from mytable where…
(Do the multiply in the code) |
|
Select price from mytable
Where date between ‘20070101’ and ‘20070131’ |
Select price from mytable
Where date >= ‘20070101’ and date <= ‘20070131’ |
5. Using Wild Cards for comparisons.
This is similar to function use. Remember that the comparison is required for every selected row so use the least expensive option. For example:
|
Bad |
Good |
|
Where surname like ‘JOHN%’… |
Where surname > ‘JOHZ’ AND surname < ‘K’ |
6. Using DISTINCT.
This is one of serious gotcha’s. This keyword is very useful to find that row set of unique values but it is extremely expensive in terms of performance. Try not to use it with large tables. Further, the more columns you use with the keyword, the slower everything becomes. Remember too that like DBF, the actual row returned is indeterminate and can be different for every call to the same select.
Applies to: all SQL 92 compatible DBMS (not MySQL)