Blog

4Jan
2007
Performing SQL Server CONTAINS Searches

SQL Server 2005 supports two forms of full-text search, FREETEXT and CONTAINS. CONTAINS is used to search for rows that contain words, phrases, partial phrases, words with the same stem, proximity searches, synonyms (using a thesaurus lookup), and more.

Here is a simple example:

view plain print about
1SELECT note_id, note_text
2FROM productnotes
3WHERE CONTAINS(note_text, 'handsaw');

WHERE CONTAINS(note_text, 'handsaw') means find the word handsaw in column note_text.

CONTAINS also supports the use of wildcards:

view plain print about
1SELECT note_id, note_text
2FROM productnotes
3WHERE CONTAINS(note_text, '"anvil*"');

'"anvil*"' means match any word that starts with anvil. Note that unlike LIKE, full-text searching uses * as the wildcard character (instead of %). Wildcards may be used at the beginning or end of a string. Also, when passing simple text to CONTAINS then that text is enclosed within single quotes. When passing wildcards each search phrase must be enclosed within double quotes inside those outer single quotes. Failing to do this will likely cause your searches to return no matches.

CONTAINS also supports Boolean operators AND, OR, and NOT. Here are a couple of examples:

view plain print about
1SELECT note_id, note_text
2FROM productnotes
3WHERE CONTAINS(note_text, 'safe AND handsaw');
4
5SELECT note_id, note_text
6FROM productnotes
7WHERE CONTAINS(note_text, 'rabbit AND NOT food');

When searching through extremely long text there is a greater likelihood of matches being found if search terms are near each other in the saved data. A simple AND search matches terms anywhere in the text, but NEAR can be used to instruct the full-text search engine to only match terms when they are close together. Here is an example:

view plain print about
1SELECT note_id, note_text
2FROM productnotes
3WHERE CONTAINS(note_text, 'detonate NEAR quickly');

'detonate NEAR quickly' means match only rows that contain the words detonate and quickly near each other.

Sometimes you may want to match a word that is part of the same family (based on the same stem). For example, if you were searching for "life" you'd also want to match "lives". Obviously, a wildcard of life* could not help here, and using li* would likely match too many false positives. This is where inflectional matching helps. Here is an example:

view plain print about
1SELECT note_id, note_text
2FROM productnotes
3WHERE CONTAINS(note_text, 'FORMSOF(INFLECTIONAL, life)');

'FORMSOF(INFLECTIONAL, life)' instructs the full-text engine to look for any words that share the same stem as the specified word, in this case "life".

FORMSOF() also supports THESAURUS searches, where words can match synonyms. To use this functionality you must first populate an XML thesaurus file with words and their synonyms.

Related Blog Entries

Comments (25)



  • Mikkel Johansen

    Why should I use SQL Freetext instead of the Verity search function in ColdFusion?

    Verity can index database querys aswell. And what about documents (word, pdf, html etc.) in SQL Freetext ?

    -- Mikkel, from Denmark

  • Ben Forta

    Mikkel, maybe you shouldn't. ;-)

    The benefits are no limitation on document count, no external collections to have to manage, searching can be used by any database client (not just CF), it does indeed index Excel and Word and other formats, being able to combine WHERE clauses that perform full-text search and other filters is very useful.

    The downsides are that Verity is more capable, SQL Server full-text can't do the 'did you mean' type search, it has a much less powerful query language, as an example.

    --- Ben

    #2Posted by Ben Forta | Jan 4, 2007, 06:35 PM
  • Mikkel Johansen

    I'm very interessted in freetext search (SQL and Verity) so I have more questions.

    What about performance - Verity vs. SQL?

    I guess that Verity is also better at showing summary, highlighting, categories.

  • Ben Forta

    Mikkel, I have no performance data. Verity is far more feature rich, without question.

    --- Ben

    #4Posted by Ben Forta | Jan 4, 2007, 08:07 PM
  • Joe Mastroianni

    I've always been hesitant to enable full text search because I've always been under the impression the performance hit to the DB server is substantial. Any ideas on this? We tried it once at the office and when the DB was being indexed the server became slow and unresponsive to queries. Would that only happen on the initial full-text indexing or would it do so every time the index was refreshed?

    #5Posted by Joe Mastroianni | Jan 5, 2007, 08:21 AM
  • Ben Forta

    Joe, I've not used it with tables big enough to really get a feel for it. But, reportedly you have quite a bit of control over how indexing service resources arrre alllocated and used.

    --- Ben

    #6Posted by Ben Forta | Jan 5, 2007, 10:06 AM
  • Stephen Cassady

    "it does indeed index Excel and Word and other formats"
    This feels new! If it can easily index Excel and Word, AND pdf then I'm really excited about this. Verity has lovely feature sets, but (and this easily can be a product of my addled brain) I've always had issues with each use of the Verity engine over something: indexing, corruption, ease of use. Again, I accept that I've got the low-gear brain on Verity, but using SQL Server alone would make me more happy. Do you have any nice handy links on indexing external documents with 2005 SQL?

  • Tony Petruzzi

    I'll chime in here since I've used both Verity and MSSQL FTS. FTS is by FAR slower then using verity. If you don't believe me, try doing a FTS against a table with 10,000 rows vs. Verity with 10,000 documents. Verity will beat it hands down. FTS has always had this problem. The only way to improve FTS is to throw alot of high performance harddrives, RAM and servers at it. Also hiring a professional DBA to come and fine tune would be a requirement if you're planning to use it in an enterprise environment. In any case plan to shell out some big bucks to get it up and running properly.

  • John Kane

    "it does indeed index Excel and Word and other formats"
    Yes, this has always been the case, ever since SQL FTS shipped with SQL Server 7.0. This feature and many others related to SQL Server Full Text Search (FTS) have been greatly imporved in SQL Server 2005 relative to past version of SQL Server, including vastly improved FT Indexing performance. I've worked with SQL FTS ever since it was first incorporated in SQL 7.0 in 1998 and have spoke at several conferences on this and related topics. I cannot speak about SQL FTS performance relative to Verity, but 10,000 rows is a trival table size and can be FT Indexed with SQL 2005 in minutes and no DBA is necessary. I also maintain a blog on this topic at http://jtkane.spaces.live.com/ and porting all of the technical content over to http://www.SQLFTS.com this month with a more formal lauch in Feb. 2007.
    Thanks,
    John

    #9Posted by John Kane | Jan 6, 2007, 06:18 PM
  • NejcPass

    Thanks for this! But I still have a problem. I'm using dynamic variables and id somebody searches under 2 words. is it like this???

    WHERE CONTAINS(note_text, #form.search# AND #form.search');

    #10Posted by NejcPass | Jul 3, 2007, 05:13 AM
  • Mitesh Patel

    Thanks for this! But I still have a problem
    If i use "select TextSearch from CLGHistory where contains(TextSearch,'"1ALL*"') " then it will disply me the records.
    but if i use "select TextSearch from CLGHistory where contains(TextSearch,'"*1ALL*"') " it will not display me any results
    Pls help me

    #11Posted by Mitesh Patel | Aug 22, 2007, 09:29 AM
  • Ben

    Anyone used contains phrase searching successfully with Sql Server 2008 and Cold Fusion 8 yet?

    Take the following query:

    select g.*
    from games g
    where contains(g.Title,'"World of Warcraft"')

    This returns results when I run it in SQL Management Studio but no results in my Cold Fusion pages. One word searches work fine for example:

    select g.*
    from games g
    where contains(g.Title,'"WoW"')

    I have literally been running these queries against SQL Server 2000 for years through version 5,6,7 and 8 of Cold Fusion. It's just the recent upgrade to SQL Server 2008 that killed this for us. I've even tried making my search queries into stored procedures and calling them this way through my CF pages with the same results.

    Any help here would be appreciated, I'm stumped.

    #12Posted by Ben | Feb 11, 2009, 11:10 PM
  • Seamus

    I'm seeing the same behavior as Ben in the above post. Can't believe we are the only ones running into this.

    This behavior is totally odd. The interesting thing is (from what I've been able to gather) if you remove the "of" and any other stopword from the CONTAINS, then the query will actually return results.

    #13Posted by Seamus | Apr 26, 2009, 08:52 PM
  • Seamus

    To recap the problem: When using FULLTEXT SEARCH, specifically the CONTAINS function, with CF8 and MSSQL 2008 if there are any noise words in the criteria being searched for zero results are returned. Remove the noise words and the query will return the proper set of results. If you take the exact same SQL and run it in Studio (or query analyzer) it runs without a hitch. So my take; CF is translating something terribly wrong.

    Here is some more information about what I've discovered:

    1) I downloaded the latest JDBC driver from MS
    2) Copied the sqljdbc4.jar to the cfusion/lib directory
    3) Restarted CF instance
    4) Created a new DSN using "other" and used the following connection string: jdbc:sqlserver://xxx.xxx.xxx.xxx;databaseName=someDatabase;SelectMethod=direct;sendStringParametersAsUnicode=false;MaxPooledStatements=1000;
    5) Used the following driver class: com.microsoft.sqlserver.jdbc.SQLServerDriver

    This caused the problem that has been described thus far to disappear, however, multi-line SQL seemed to quit working as you'd expect. For instance:

    INSERT INTO someTable (someField) VALUES (someValue)
    SELECT SCOPE_IDENTITY() AS someID

    Does not work. Let me rephrase, it WILL work, but the SELECT statement isn't the result set. Instead you get a query result with Generated_Keys as the only column. In fact ANY select statement executed after the insert seems to be ignored BUT if you repeat the select statement it executes fine.

    However, if you change the connection string to (note at the beginning the addition of "macromedia:":
    jdbc:macromedia:sqlserver://xxx.xxx.xxx.xxx;databaseName=someDatabase;SelectMethod=direct;sendStringParametersAsUnicode=false;MaxPooledStatements=1000;

    Then the INSERT/SCOPE_IDENTITY works as expected, but you get the SAME behavior as before with the FT query.

    Any ideas? It would appear that this is a bug in how CF is gathering information.

    #14Posted by Seamus | Apr 27, 2009, 07:59 PM
  • Seamus

    So here is what is REALLY going on with the FULL TEXT SEARCH specifically in relation to stop words.

    Assume you have a table called quotes with three columns all part of a full text catalog: ID, name, description and several rows within the table. Two of the rows have the following data:

    id = 5, name = 'John Doe', description = 'some days are better than others'
    id = 10, name = 'Jane Doe', description = 'the days are getting longer and time is getting shorter'

    Take the following query:
    SELECT *
    FROM quotes
    WHERE CONTAINS(quotes.*, ' "days are" ')

    NOTE: To test this in SQL Server Management Studio or Query Analyzer, insure that you've set your results to be returned as text instead of the grid (which is the default).

    In SQL 2000 you get the following results:
    id name description
    -------- -------------- -------------------------------------------------------------
    5 John Doe some days are better than others
    10 Jane Doe the days are getting longer and time is getting shorter

    Nice, clean, and what you would expect to see.

    in SQL 2005 you get the following results:
    id name description
    -------- -------------- -------------------------------------------------------------
    5 John Doe some days are better than others
    10 Jane Doe the days are getting longer and time is getting shorter
    Informational: The full-text search condition contained noise word(s).

    Notice the informational message. In this case all is fine because the macromedia_drivers which do a bit of magic with the result is seeing the message and dismissing it (presumably as garbage)

    in SQL 2008 you get the following results:
    id name description
    -------- -------------- -------------------------------------------------------------
    Informational: The full-text search condition contained noise word(s).
    5 John Doe some days are better than others
    10 Jane Doe the days are getting longer and time is getting shorter

    Notice in this case that the informational message is actually intermingled within the result set. This essentially results with CF converting the result into TWO result sets. The first one is blank (always) the second contains your results.

    You can prove this by putting your SQL into a stored procedure, then execute the stored procedure. When noise words are NOT present you get only one result set. When noise words (the, and, are, is, to for, etc....) ARE present you can (using cfstoredproc/cfprocresult) specify <cfprocresult name="queryName" resultSet="2" /> which will return your results. resultSet="1" returns the empty result set.

    I believe this is occurring because the macromedia_drivers are interpreting the "informational" message as the end of the result set.

    There are many ways to deal with this issue, though none are very elegant. We are currently working with MS to establish that this is indeed a bug. We feel it is, the MS jury is still out.

    I will update when Microsoft responds.

    #15Posted by Seamus | Apr 30, 2009, 12:29 PM
  • Steven

    Hello Seamus, excellent detective work! I've just wasted the past day trying to figure out why some
    of my SQL 2008 FTS were returning zero rows when clearly in SQL Server Management Studio correct results were being returned! Arrggh!!

    You mention there are many non-elegant ways to solve this. Would you care to mention a few?

    Or is simply putting the query into a stored proc and then testing for the existence of a second resultset it?

    PS. Hopefully this can be resolved soon!

    Regards
    Steven

    #16Posted by Steven | Jun 23, 2009, 09:37 AM
  • Seamus

    I apologize, I absolutely hate it when people have "interesting" topics and then never report back their findings. I guess that's me this time.

    So here's what I've discovered:

    First, MS does not consider this a bug because it is consistent with their TDS (http://msdn.microsoft.com/en-us/library/dd304523(PROT.13).aspx). In a nutshell, providers should be able to deal with information tokens regardless of where they appear (though even .NET had a problem with their ADO implementation which HAS been identified to be a bug related to this very issue).

    For whatever reason the developers at Microsoft decided that it would be a good idea to change the precedence of the information token in this case from the end of the results (prior to and including 2005) to a "position" that is intermingled with the results.

    Next, this issue was pointed out to Adobe. The reply there has been that SQL 2008 is NOT officially supported by ColdFusion 8 and that this issue also exists in ColdFusion 9. The reason that it exists is because ColdFusion uses Data Direct drivers and the drivers from Data Direct do not effectively deal with the information tokens correctly. This is because the TDS from MS is a fairly recent publication and prior to the release of the TDS Data Direct essentially had to rely on reverse engineering to create their implementation.

    I've been informed that this has been escalated so that the version of drivers to be released with ColdFusion 9 will not have this problem, but it appears that there is no real direction regarding support of SQL 2008 in ColdFusion 8.

    Be that as it may, it's likely the "new" Data Direct drivers will work with ColdFusion 8, it may just not be "supported".

    I can also tell you that Microsoft has offered their assistance to help Adobe comply with the requirements because it is in their best interest to do so. However, to date I've not seen any effort from Adobe to take them up on that offer.

    So with all that in mind, here are some potential solutions:

    ALTER FULLTEXT INDEX ON <table> SET STOPLIST OFF

    That statement when run will essentially disable all stoplists including system stoplists for the given FTI on the table specified. You will of course need to rebuild your FTI when complete and it will require more space, though in our experience it wasn't overwhelming.

    This is not an ideal solution, but it does help to work around the issue.

    The other fix, which could be a better solution not only in the short but also longer term, is leveraging the MS JDBC driver (http://msdn.microsoft.com/en-us/data/aa937724.aspx...). I say that it might be a better long term solution because MS should know their DB fairly well. MS has also indicated that they will support the implementation even if Adobe doesn't (which I THINK Adobe will).

    Initial testing suggests that this driver works wonderfully to overcome the issue described previously, however, the Data Direct drivers do some nifty little convenience things like taking care of the summative information that SQL typically kicks back. In order to install the driver you will need to copy the .jar to the cfusion/lib directory (or add it to the class path in some fashion) and then choose "other" from the DSN drop down in the administrator. You will also need to build essentially the whole connection string.

    So the point here regarding usage of the MS JDBC driver is; if you just plug in the MS JDBC driver and start using it and you leverage something like:

    <cfquery ...>
       INSERT INTO myTable
          (fld1, fld2, fld3)
       VALUES
          (val1, val2, val3)

       SELECT SCOPE_IDENTITY AS myNewId
    </cfquery>

    Then you will likely get no result set returned due to the fact that totals are coming back with the results and everything past the first result set is "tossed". That is to say that you will get some information indicating x rows affected intermingled with the results.

    To overcome this issue, you either have to SET NOCOUNT ON prior to any of your batched statements or alternatively on your server execute:

    EXEC sp_configure 'user options', '512'
    RECONFIGURE

    That statement will essentially turn NOCOUNT ON for the entire SQL server for all NEW connections. This means that when you run queries in studio you won't get the helpful little messages about how many rows were affected unless you precede statements with SET NOCOUNT OFF, but it's a small price to pay. Plus, it should marginally increase performance.

    All that said, we've not exhaustively tested the new drivers for both functionality differences neither have we tested performance differences. Should anyone else do this, I'd love to see the results of your efforts.

    So the point here is that while I believe that the MS drivers are probably the way to go, you need to do some testing of your own to insure that if you decide to go that route that you identify any issues that may exist that directly affect you.

    Regardless, one of those two "solutions" should at least allow us to get around this issue until the Data Direct drivers and CF 9 comes out (and it's adopted). Who knows, maybe Adobe will choose to support SQL 2008 (a mainstream DB) with CF 8 as well since I doubt anyone is going to immediately jump to CF 9 when it first comes out.

    If anyone else has any other solutions, I'm all ears....

    Hope this helps!

    #17Posted by Seamus | Jun 23, 2009, 01:26 PM
  • ayme

    i needed information about this as i had some probs with sql, this article helped me to point me in the right direction
    thank you

    #18Posted by ayme | Jul 5, 2009, 09:45 AM
  • Mark

    I'm not sure that this is the cleanest solution either but while encountering the same issue I decided to create a separate DSN on the site that offers read-only permissions. Than I wrote a function to make use of the built-in routine [sp_executesql] to run my full text search code against because you can define which result you'd like to use. For this scenario, you can always depend on the last position of the array to contain your actual result-set. This particular solution also works well when making use of "COMPUTE" aggregate routines. The only hitch is you cannot pass dynamic parameters ( without modifying what I wrote )
    ----
    <cffunction access="public" name="multipleResultSet" output="false" returntype="any">
       <cfargument name="SQL" required="yes" type="string">
       <cfscript>
          // Create local variable scope
          var Local = StructNew();
          // Create place holder variables
          Local.sctResult = StructNew();
          Local.arrResult = ArrayNew(1);
       </cfscript>
       <!---Prepare our result for the caller--->
       <cfstoredproc datasource="#this.ReadOnly_DSN#" procedure="sp_executesql">
          <cfprocparam cfsqltype="cf_sql_varchar" dbvarname="@Statement" maxlength="4000" value="#Arguments.SQL#">
          <!---Include our upper bound of theoretical "max" for record-sets to be considered from this statement--->
          <cfloop from="1" to="100" index="Local.I">
             <cfprocresult name="Local.sctResult.Result_#Local.I#" resultset="#Local.I#">
          </cfloop>
       </cfstoredproc>
       <!---Restructure our result set into an array; rather than struct--->
       <cfloop index="Local.Key" list="#StructKeyList(Local.sctResult)#">
          <!---Add result to our return array--->
          <cfset ArrayAppend(Local.arrResult,Local.sctResult[Local.Key])>
          <!---Remove key from source-structure--->
          <cfset StructDelete(Local.sctResult,Local.Key)>
       </cfloop>
       <!---Return result to caller--->
       <cfreturn Local.arrResult>
    </cffunction>
    -----

    #19Posted by Mark | Sep 14, 2009, 02:03 PM
  • Phil

    Hey Seamus and Steve, I had the same issue and found in my where clause I was doing a date compare on a text field. Once I changed it to a datetime
    field it worked.

    (i.e. where dateReceived >= {d '2009-11-23'}) (dateReceived being a nvarchar field not datetime)
    when I changed it to where dateTimeReceived >= {d '2009-11-23'} it worked (where dateTimeReceived was a datetime field.

    #20Posted by Phil | Nov 30, 2009, 12:11 PM
  • Maroof Saeed

    Hi Sir,
    I have a problem regarding full text search as i have db in sql server 2005 and using contains function on a column having datatype text. now i am using
    contains function like this. select * from table where contains(column,'R&D'). Now the problem is that the contains function doesnt gives me any record as
    full text search assumes R&D as and logical operator. i have tried a lot like contains(column,'" R&D "') or like contains(column,'"Rchar(38)D"'), but all in vain
    Please help me what can i do. i dont think so that there is no solution for this but its wrong that we cant search such special characters on contains
    function.

    Regards,
    Maroof Saeed

    #21Posted by Maroof Saeed | Sep 3, 2010, 12:19 PM
  • Maroof Saeed

    Hi Sir,
    I have a problem regarding full text search as i have db in sql server 2005 and using contains function on a column having datatype text. now i am using
    contains function like this. select * from table where contains(column,'R&D'). Now the problem is that the contains function doesnt gives me any record as
    full text search assumes R&D as and logical operator. i have tried a lot like contains(column,'" R&D "') or like contains(column,'"Rchar(38)D"'), but all in vain
    Please help me what can i do. i dont think so that there is no solution for this but its wrong that we cant search such special characters on contains
    function.

    Regards,
    Maroof Saeed

    #22Posted by Maroof Saeed | Sep 3, 2010, 12:20 PM
  • Dee

    Hi Seamus ,
    Your information was VERY helpful. Does anyone know if Adobe fixed this issue in CF8? We recently upgraded to SQL 2008 and got in to this issue.
    We have few hundred servers and all these options are risky.

    #23Posted by Dee | Apr 13, 2011, 04:03 AM
  • Adam Cameron

    I would say Adobe have NOT addressed this in CF8, no. Because as far as I can tell, it's still bung in CF9 (CF9.0.1 in fact) too.

    :-(

    --
    Adam

    #24Posted by Adam Cameron | Aug 2, 2011, 04:55 AM
  • Piyush Bajaj

    Its nice one..
    Well i have also posted an article,
    What is CONTAINS() and CONTAINSTABLE() function in INTEGRATED FULL-TEXT SEARCH ? Check it now:
    http://www.sqlservergeeks.com/articles/sql-server-...

    #25Posted by Piyush Bajaj | Aug 27, 2011, 02:26 PM