James H. Zisch - Computer Services

Solutions : Site Search : Text Search

 

Text Search

PRICE:  $149.99 (subject to change without notice)

Description
Requirements
Usage
FAQs

Description

Text Search provides a generic text search of a website. It includes a robot spider mechanism to generate a search dataset for the website in advance for rapid results. The spider can be run resident on the server or on local non-networked PC development platform.

Invocation

Invocation is supported via POST passing text to search for. Text Search Controls are supported by both static HTML pages and dynamic page templates used by JHZ-CS Page Generators.

Results

Results are sorted by numbers of matches per page and formatted using a customizable template.

Template processing supports automatic generation of:

Search Dataset Creation

The Search Dataset is a standard ASCII flat file. The Text Search robot spidering mechanism can be run on either the PC development platform where a current representation of the website resides or as a web based back-end administrative CGI. The Search Dataset is to be manually generated periodically following website updates.

Web Server Requirements:

  • Operating Systems Supported: Unix, Linux, Macintosh OS X® and Windows®
  • HTTP 1.x with CGI (Common Gateway Interface) PERL 5.x (check with your ISP or Server Administrator; this requirement is typically supported on most systems)
  • Optional: SENDMAIL (supported on most all Unix/Linux platforms), or PERL NET:SMTP (available from CPAN) (commonly used on Windows® server platforms) with an SMTP system pre-installed and configured. This optional requirement supports automatic email notification to the webmaster when execution errors are detected.

Text Search
Installation and Usage

FILENAME: ts.html

PURPOSE:

Describes installation and use of Text Search.

DEPENDENCIES:

se.pl - Search configuration and common logic modules (see: se.html)
jhzcs.pl - JHZ-CS configuration and common logic modules (see: jhzcs.html)
IID (Item Information Dataset)

REQUIREMENTS:

ts.cgi (web interface component)
ts.htm (template)

OPTIONAL:

Support for Banner Rotator System requires its components to be installed and configured according to specifications. Inclusion of Special Template markup supporting Banner Rotator System must also be included within the template supported by this product. See sample template provided with this product.

PACKING (PARTS) LIST

PACKING LIST
PRODUCT ID: TS
COMPONENT TYPE SIZE AUTH*
cgi-bin/geog.pl TEXT 11750 750
cgi-bin/jhzcs.pl TEXT 47225 750
cgi-bin/license.incl TEXT 7565 700
cgi-bin/se/se.incl TEXT 4399 700
cgi-bin/se/se.pl TEXT 3465 750
cgi-bin/se/ts.cgi TEXT 7157 750
cgi-bin/se/ts.incl TEXT 18314 700
cgi-bin/se/ts_data.cgi TEXT 9337 750
css/se/se.css TEXT 3443 750
data/ts/search.dat TEXT 318718 700
images/JHZCS.gif BINARY 745 750
logs/errorlog.txt TEXT 91 750
tmpls/se/ts.htm TEXT 2516 750
*AUTH - Authorization/permissions octal equilvalents where:
7=RWX, 6=RW, 5=RX, 4=R (R=Read W=Write X=Executable)
1 char indicates Octal number
2 char Owner
3 char Group
4 char All Others

INSTALLATION:

It is strongly recommended that you read the entire installation process steps prior to performing any installations. Each solution has specific instruction that must be followed precisely.

The Installation Process

The installation process follows this order:

  1. Download solution to PC development platform and expand installation package (use any standard archive application such as WinZip®, StuffIt Expander® or TAR command)
  2. Modify configuration settings
  3. Upload to server
  4. Set access authorization permissions
  5. Test
  6. Customize Templates
  7. Test

Authorize non-Logic Components

Authorize non-logic components as follows:

  • authorized all images for read and execute access; see NOTE FOR NON-LOGIC COMPONENTS below
  • authorized all static HTML ".html" documents for read access; see NOTE FOR NON-LOGIC COMPONENTS below
  • authorized all HTML Templates ".htm" documents for read access at the logic level; see NOTE FOR LOGIC COMPONENTS below
  • authorized all images for read and execute access; see NOTE FOR NON-LOGIC COMPONENTS below.

NOTE FOR NON-LOGIC COMPONENTS: Use the minimum required permissions to achieve most secure configuration; permission requirements for non-logic modules are dependent on the server configuration whether authorization is required for group only or for both group and other.

Modify Logic Components

Modify all logic components "*.cgi" and "*.pl" logic modules as follows:

  • first line of code (#!) correctly points to the PERL executable on the server (use command "which perl" or consult your server administrator)
  • all "requires" statements contained in all "*.cgi" and "*.pl" logic modules must use absolute directory paths (beginning with a forward slash "/")

Authorize Logic Components

Authorize logic components and dependent components (HTML and Email templates, and others) as follows:

  • authorized as all "*.pl" logic modules for read access; see NOTE FOR LOGIC COMPONENTS below.
  • authorized as all "*.cgi" logic modules for read and execute access; see NOTE FOR LOGIC COMPONENTS below.
  • authorized all HTML and Email Templates documents for read access at the logic level; see NOTE FOR LOGIC COMPONENTS below

NOTE FOR LOGIC COMPONENTS: Use the minimum required permissions to achieve most secure configuration; permission requirements for logic modules are dependent on the server configuration whether authorization is required for owner only (i.e., Apache with SUExec active), or for both owner and group.

Locate the following in "ts.cgi" and modify as described below:

############################################################
#
#	CONFIGURATION SECTION
#

require "/home/cust/yourdomainname/www/cgi-bin/se/se.pl";

$template_file			= $site_root."tmpls/se/ts.htm";
$search_data			= $site_root."data/ts/search.dat";

#
#	END CONFIGURATION SECTION
#
############################################################
  1. Modify the absolute file path contained between quotes for the "require" statement to correctly point to "se.pl"
  2. If the default install locations are not used, modify the file path relative to "$site_root" (defined in "jhzcs.pl") contained between quotes for the "$template_file" configuration variable assignment to correctly point to "ts.htm"
  3. If the default install locations are not used, modify the file path relative to "$site_root" (defined in "jhzcs.pl") contained between quotes for the "$search_data" configuration variable assignment to correctly point to "search.dat"

Locate the following in "ts_data.cgi" and modify as described below:

############################################################
#
#	CONFIGURATION SECTION
#

#	Directory path of website root
$root = "absolute-path-to-root-directory";

#	Platform path delimiter
$delim = "/";

#	Directory path and filename of output file
$srch_data_fs = $root."data/ts/search.dat";

#	URL to prepend to page URLs
$base_URL = "http://www.jhz-cs.com";

#-----------------------------------------------------------
#	START and END and EXCLUDE content directives 
#	example could be for start "<BODY" and "</BODY>"
$srch_start	= "<!-- content start -->";
$srch_end	= "<!-- content end -->";

$exclude_content_start	= "<!-- EXCLUDE FROM SEARCH START -->";
$exclude_content_end	= "<!-- EXCLUDE FROM SEARCH END -->";
#-----------------------------------------------------------

#-----------------------------------------------------------
#	OUTPUT TO FILE SWITCH
#	Comment line to output to screen only
$output_to_file = 1;
#-----------------------------------------------------------

#-----------------------------------------------------------
#	Subdirectories not to search
#	nested subdirectories will not be searched either
@exclude_dirs = (
	"cgi-bin",
);
#
#-----------------------------------------------------------

#-----------------------------------------------------------
#	Files not to search
@exclude_files = (
	"test.html",
);
#
#-----------------------------------------------------------

#
#	END CONFIGURATION SECTION
#
############################################################
  1. Modify "$root" to correctly point to the absolute directory path to the website's home directory. "$root" is search along with all nested subdirectories. "$root" is also used to point to the relative path of the output file defined in "$srch_data_fs"; see below.

    NOTE: If invoking locally on the PC development platform, be sure to correctly point to the absolute path, not the relative path. If invoking remotely on the web server, the path will typically be different than the local path.

  2. Modify "$delim" to the platform specific path delimiter; ie., UNIX/Linux/MacOSX use "/", Mac OS 9 uses ":" and Windows uses "\"
  3. Modify "$srch_data_fs" portion between quotes to correctly point to "search.dat"

    NOTE: This determines the location where the "search.dat" file will be written to. It is important to specify the location where "ts.cgi" expects this file to reside. If not using the default install locations, all configuration variables in "ts.cgi", "se.pl" and "jhzcs.pl" must be consistent.

  4. Modify "$base_URL" to correctly specify the HTTP domain name address of the website. This is used within the "search.dat" file for dynamically generated links to the results output when "ts.cgi" is invoked at the website.
  5. Modify "$srch_start" and "$srch_end" to specify the markup indicating what portion of each HTML file is to be included in the search. To use the default specified HTML comments, insert them into each HTML document at the start and end position of the content to be searched. Or, change the value to the desired markers such as the suggested "<BODY" and "</BODY>" to search the entire content contained between the HTML BODY tags. NOTE: the starting "<BODY" marker does not include a terminating ">" as additional parameters within the BODY tag would prevent that marker from being recognized within the HTML file.
  6. Modify "$exclude_content_start" and "$exclude_content_end" to specify the "exclude" markup indicating what portion of each HTML file is to be excluded in the search. To use the default specified HTML comments, insert them into each HTML document at the start and end position of the content to be excluded from the search. NOTE: HTML tags and content contained within SCRIPT tags are automatically excluded.
  7. Modify "$output_to_file" to 1 or "" (null) to specify whether or not the file is to be output to disk. This is used for testing in the PC development environment with "ts_data.cgi" being invoked from the command line. Turning this option off (optionally commenting this line with "#") causes the result to be sent to terminal.
  8. Modify "@exclude_dirs" adding subdirectories to exclude from the search. Each entry must be wrapped in quotes and terminated with a comma.
  9. Modify "@exclude_dirs" adding filenames to exclude from the search. Each entry must be wrapped in quotes and terminated with a comma.

DATA GENERATION

For generation of the search data file "search.dat" refer to the "USAGE" section below.

TEMPLATES:

Customize templates for appearance as desired:

ts.htm

Special mark-up within the Text Search templates includes the Search Banner indicating the string search for and the number of pages returned, and the repeating page link entries "Entry Start" and "Entry End" directives.

<!-- Entry Start -->
<!-- Entry End -->
	

Uploading to Web Server

All dependent and required components must be installed and successfully configured. All components, with the excpetion of images, must be FTP uploaded as TEXT. Images must be FTP uploaded as BINARY.

TEST:

Test installation and configuration by invoking "ts.cgi" using a test page containing the Text Search controls.

USAGE:

To perform generation of the "search.dat" file invoke it either locally in the PC development environment or remotely on the web server; depending upon how ts_data.cgi is configured (paths are specific to the particular environment where it is intended to be invoked). When invoking locally from the PC development environment, upload the resulting "search.dat" file to the server location specified in "$search_data" contained "ts.cgi".

Following successful installation and satisfaction of dependencies and requirements, add the Text Search Controls to static pages and JHZ-CS page templates as follows:

For static pages:

 <form action="/cgi-bin/se/ts.cgi" method="post">
 <input type="text" name="search_text" value="" size="20">
 <input type="submit" value="Search">
 </form>

For page template the slight addition of value="<<search_text>>" results in that input field being primed with the text the user entered for subsequent use:

 <form action="/cgi-bin/se/ts.cgi" method="post">
 <input type="text" name="search_text" value="<<search_text>>" size="20">
 <input type="submit" value="Search">
 </form>

You may optionally modify the form's submit button to use your own image as opposed to the web browser default button with the follow:

 <form action="/cgi-bin/se/ts.cgi" method="post">
 <input type="text" name="search_text" value="<<search_text>>" size="20">
 <input type="image" name="search" value="search" src="/images/buttons/search.gif">
 </form>

Change "/images/buttons/search.gif" to correctly point to your button image.

Can I modify the Text Search template?

Yes. The template is used to format the search results. You can modify it as desired respecting the special markup directives and variables.

Do I upload the template as Text or Binary?

Upload templates and data as Text.

JHZ-CS Solutions are distributed exclusively under terms and conditions of the JHZ-CS Software License Agreement.