IntraSITE Search™ Changelog
November 06 2008:
- Changed get_file_as_string so that it converts fetched file into PHP's default character encoding
(ISO-8859-1) and removes the question marks "?" created by any unsupported characters in the process
- Changed extract_descriptions so that it always terminates the description on the last occurring space
this prevents edge cases where it was cutting a HTML character entity in half and therefore producing an
invalid character.
November 03 2008:
- Changed all functions that deal with the $pages array to use the standard variable name $pages
- Changed all fucntions that deal with the $pages array to use an associative array layout
October 17 2008:
- Re-formatted some comments to make the code more legible
- Changed debug mode to force the site to be re-crawled
- Added crawl-chart generation routines to crawl_site
- Added code for outputting debug information to generate_html
- Removed file_exists check from get_file_as_string as it was causing the script to refuse to fetch the contents of any absolute URL
- Added routine to output a list of all pages crawled as part of the debug info
- Added routine to output a list of all the pages unable to be retrieved as part of the debug info
- Added some variable existence checking to crawl_site to stop it from throwing warnings during automated testing
July 22 2008:
- Added support for negative keywords to compute_relevance ( use by preceeding a keyword with the minus "-" sign )
- Fixed bug where compute relevance was not seeing keywords in the title tag
- Fixed bug in explode_searchterm that was causing it to return arrays with blank elements in fringe cases
July 21 2008:
- Added function highlight_title_keywords to add emphasis to keywords in the page title via HTML
- Added explode_searchterm function and modified rest of script to use it
July 18 2008:
- Changed method of highlighting keywords in descriptions so that the keywords retain their original formatting
July 15 2008:
- Created function extract_descriptions from routines originally contained in compute_relevance
- Created function create_text_pages from routines originally contained in compute_relevance
- Changed method of computing relevance so that if multiple words are entered all of the words have to
be present in a page in order for it to be deemed relevant at all
- Added functionality to compute_relevance that allows only exact matches when the user enters
a keyword surrounded by double quotes
- Added functionality to extract_descriptions to highlight all the search keywords in the description
July 9 2008:
- Changed default script timeout to 30 seconds
July 7 2008:
- Fixed bug where generate_absolute_urls was being assigned to $page instead of $pages
- Changed generate_absolute_urls to correctly deal with a URL that already has a leading forward slash
June 27 2008:
- Changed data-caching system to make it more re-usable in other projects
- Changed generate_xml & generate_html functions to allow them to be automatically tested
- Fixed bug in strip_code_block where it wasn't properly removing the whole end tag
- Fixed bug in strip_code_block where an empty start_tag or end_tag triggered a PHP warning
June 26 2008:
- Changed code to be vastly more atomic to facilitate automated unit tests and easier maintenence
- Removed output buffering as it actually slowed down the script
- Removed all debug outputs from the script, similar functionality will be implemented in the automated testing suite
June 25 2008:
- Added strip_code_block function which allows unwanted blocks of server-side or client-side code
to be removed from the page data
- Did some code clean-up and added some new comments
June 18 2008:
- Fixed bug in CRAWL_SITE where array_unique was destroying a lot of the collected data
- Added output buffering when the user isn't requesting debug info
- Removed the backup search system as it was unlikely ever to be used and was cluttering up the code
- Added data caching system to the search engine, defualt cache time is 24 hours
June 17 2008:
- Fixed bug where backstep operators were not being resolved correctly
May 26 2008:
- Fixed method of extracting description text, sometimes it was returning a blank string
- Fixed bug where php was reporting errors if $css_path or $exclusions variables were empty
Apr 02 2008:
- Added capability to block a given directory or multiple directories from being indexed
- Made preference variables easier to understand
- Added extra debug info outputs
- Moved change-log to end of file
Mar 20 2008:
- Fixed issue where XML was being output in wrong character encoding ( should have been UTF-8 )
- Improved handling of HTML enitities that were already in the source documents
- Improved stripping of javascript code from descriptions, also the script now strips all HTML comments out of the descriptions
Mar 19 2008:
- Fixed issue where some characters should have been converted to HTML entities but weren't and were breaking the XML output
Jan 25 2008:
- Added backup search capability, working very nicely!
- Consolidated the generation of absolute URLs to one loop and added detection of when a URL is already absolute
Jan 17 2008:
- Altered output section of script so it returns absolute URLs
- Documented compute_relevance function
- Documented crawl_site function
- Re-encodes HTML entities before returning to the client
- First full working alpha version complete
- Performance optimizations - average execution time down to 0.15 secs on shared server ( .08 sec average improvement )
Jan 11 2008:
- Crawl_site function working
- Compute_relevance function working nicely
Jan 10 2008:
- Get_file_as_string & extract_urls now working.
- Added support for backstep operator (..) to extract_urls
Jan 03 2008:
- Initial work on extract_urls function & get_file_as_string