There are quite a lot of powerful SELECT statement mechanisms for describing data. As discussed in a previous item, you need to be aware of the performance impact of these mechanisms but they are also very convenient to use and can derive very complex expressions. Some you will know…
1. CONTAINS
This is a fuzzy word lookup to use with WHERE clauses. Like FREETEXT, it has some complex qualifiers but at its simplest level, it can be used as a word lookup. Warning: it is very expensive!
For example, you want all addresses that use the word ‘Street’ in the address column. Then let us extend it to get ‘Road’ or ‘Street’ (note the double quotes). Note here that you can also specify a phrase, in which case the start and closing double quotes. Single quotes are only used for a single word. Punctuation is ignored and when using multiple word options, you can use AND, OR, AND NOT and you can specify multiple columns.
SELECT Surname FROM Clients WHERE CONTAINS (address, ‘Street’)
SELECT Surname FROM Clients WHERE CONTAINS (address, ‘”Street” OR “Road”’)
The following will restrict the result set to addresses that contain words prefixed with a set of characters. For example, all addresses that contain North something:
SELECT Surname FROM Clients WHERE CONTAINS (address, “‘North*”’)
From here it gets much more complex. You can express proximity of two words:
SELECT Surname FROM Clients WHERE CONTAINS (address, ‘North NEAR Geelong’)
The FORMSOF keyword allows for variations of a word. For example, you want ride, riding, ridden then you can specify CONTAINS( Description, ‘FORMSOF(INFLECTIONAL, ‘ride)’). Replacing the keyword INFLECTIONAL with THESAURUS will give a full-text word search for ones of similar meaning.
The ISABOUT keyword allows for weightings on variations to a word. There are many sub variations and properties of this so for examples of these uses, look up Books on Line. One final point: you can use parameterised variables in the expression to allow for data driven TSQL. Here is one example:
DECLARE @Search varchar(30)
SET #Search = ‘Street’
SELECT Surname FROM Clients WHERE CONTAINS (address, @Search)
2. BETWEEN
This is most useful for dates but possibly for numeric results. It is quite difficult to do strings but might work. Remember that the two range items are included in the outcome:
WHERE entrydate BETWEEN ‘20070101’ AND ‘20070131’
WHERE invtotal BETWEEN 0 AND 1000
WHERE surname BETWEEN ‘A’ AND ‘Azzzzzz’ // to get all A’s
3. ISNULL() and IS NULL and its opposites
This is a very important function and is used to ensure that you do not return a NIL value type from your SELECT. When using DBFs, an empty column always returns the empty data type for that column. In SQL you will get NIL if the value in the table is a NULL. Dates are where this is most useful, or the result sets from joins. For example, say you are reading a list of graduates to a course. You can restrict the result set to only records where a date of graduation is included. ie:
SELECT GraduateName from dbo.Candidate WHERE GradDate IS NOT NULL
… or alternately, if you want to read the name of all students enrolled into the application but not crash on assignment when the record set returns nulls, you can force an empty value to return an empty string like this:
SELECT ISNULL(GraduateName, ‘’) from dbo.Candidate
4. IN()
IN() allows you to specify a specific list of values that must be matched. For example, if you only want invoices from VIC, NSW and QLD then the code looks like this:
SELECT invtotal, invdate, invtitle, invstate FROM invoices
WHERE inv state IN (‘VIC’, ‘NSW’, ‘QLD’)
5. Wild Cards.
Wild cards can be used to describe partial string matches.
|
% |
A string of zero or more characters |
|
_ |
Any single character |
|
[] |
A single character in this range eg. [aeiou] one of either aeiou |
|
[^aeiou] |
A single character not in this range eg [^aeiou] one not in aeiou |
…and don’t forget the ESCAPE keyword. This allows you to include the above wild card template characters. For example ESCAPE ‘%’ to get an escape character in the output string.
6. EXISTS(), IN() and = ANY()
Exists() produces a positive logical result from a SELECT statement that produces at least one row. This can be used to include or exclude rows from a table based on some lookup criteria. For example, say we want to find only invoices that have line items:
SELECT * FROM invoices a WHERE EXISTS(SELECT * FROM lines b WHERE a.invcode=b.invcode)
Of course for full efficiency, do not use * but specify a single column in the sub select. You can use NOT EXISTS() to return the opposite result. Note the difference to using joins. In this case the sub select table is not available to the broader query and you can use the same alias again inside the sub select. If you need values from the sub select table you are better off using a join.
IN() and ANY() have a related use but now it relates to values found for a specific field. This is quite useful for selecting row sets from one table that have a value based match in another table. For example, select all the expenses from a table where the account code is for rent but the rent property can only be established from the acctcodes table:
SELECT * FROM expenses e WHERE a_code
IN(SELECT a_code FROM acctcodes a WHERE a.groupname=’RENT’)
A similar construct can be used for =ANY():
SELECT * FROM expenses e WHERE e.a_code =
ANY(SELECT a_code FROM acctcodes a WHERE a.groupname=’RENT’)
7. FREETEXT()
This function is like CONTAINS on steroids. It can only be used on columns that have been registered for full-text searching and applies a thesaurus-like search on the meaning of the words requested. Note that the word ‘and’ is considered noise and will be ignored. There are two basic forms:
…WHERE FREETEXT(address, ‘house road little’) // search for any of these words
…WHERE FREETEXT(address, ”little annie road”) // search for this exact phrase
Like CONTAINS, you can search on multiple columns. Take care using this mechanism because it is very expensive in performance terms.
8. TOP
The TOP keyword is vastly underrated. Mostly it is used to limit row sets where you won’t be able to use all returned rows or don’t need all rows. Most importantly it can be used to get the first and last row of an ordered row set. The last row is obtained by retrieving TOP 1 with a descending order. But for surveys, lists and result sets, it is particularly useful for returning the TOP 10 or with the PERCENT keyword, TOP 10 PERCENT. Again, in code we can parameterise the value like this:
DELCARE @Percentage FLOAT
SET @Percentage = 10.0
SELECT TOP (@Percentage) FROM…
9. GROUP BY with HAVING
Ok, so you know how to use GROUP BY but what about extending query to indicate a further condition. For example, we want sales invoices summed but only for clients with more than 100 orders. In other words, we are looking for some kind of ranking outcome:
SELECT SUM(invtotal) FROM SALES
WHERE …some selection condition
GROUP BY clientname
HAVING COUNT(*) > 100
The HAVING clause might be things like invoices greater than a certain value, only in a certain city and so on. It is a very convenient way to restrict the grouping.
10. Using SELECT INTO to Create Tables
This is a quick and dirty way to create temporary tables that persist in the database. Hence they can only be used once, at least until you delete the table again.
SELECT column_list INTO new_table FROM table_list
Another good use of this is to create an empty table with the exact same structure by using WHERE 1=0 in the SELECT.
Applies to: All SQL 92 compatible DBMS