If you're processing data from the user, typically entered into an HTML form, you'll be using a rich variety of built-in string-handling procedures. Suppose that a user is registering at your site with the form variablesset whole_page "some stuff for the top of the page\n\n"
append whole_page "some stuff for the middle of the page\n\n"
append whole_page "some stuff for the bottom of the page\n\n"
# done composing the page, let's write it back to the user
ns_return 200 text/html $whole_page
first_names, last_name, email, password. Here's how we might build up a list of exceptions (using the Tcl lappend command, described in the chapter on lists): If there aren't any exceptions, we have to get these data ready for insertion into the database:# compare the first_names value to the empty string
if { [string compare $first_names ""] == 0 } {
lappend exception_list "You forgot to type your first name"
}
# see if their email address has the form
# something at-sign something
if { ![regexp {.+@.+} $email] } {
lappend exception_list "Your email address doesn't look valid."
}
if { [string length $password] > 20 } {
lappend exception_list "The password you selected is too long."
}
# remove whitespace from ends of input (if any)
set last_name_trimmed [string trim $last_name]
# escape any single quotes with an extra one (since the SQL
# string literal quoting system uses single quotes)
regsub -all ' $last_name_trimmed '' last_name_final
set sql_insert "insert into users (..., last_name, ...)
values
(..., '$last_name_final', ...)"
The simplest way to look for a substring within a string is with the string first command. Some users of photo.net complained that they didn't like seeing classified ads that were simply pointers to the eBay auction site. Here's a simplified snippet from http://software.arsdigita.com/www/gc/place-ad-3.tcl:
an alternative formulation would beif { [string first "ebay" [string tolower $full_ad]] != -1 } {
# return an exception
...
}
Both implementations will catch any capitalization variant of "eBAY". Both implementations will miss "e-bay" but it doesn't matter because if the poster of the ad includes a link with a URL, the hyperlink will contain "ebay". What about false positives? If you visit www.m-w.com and search for "*ebay*" you'll find that both implementations might bite someone selling rhododendrons or a water-powered mill. That's why the toolkit code checks a "DisalloweBay" parameter, set by the publisher, before declaring this an exception.if { [regexp -nocase {ebay} $full_ad] } {
# return an exception
...
}
If you're just trying to find a substring, you can use either string first or regexp. If you're trying to do something more subtle, you'll need regexp (described more fully in the chapter "Pattern Matching"):
if { ![regexp {[a-z]} $full_ad] } {
# no lowercase letters in the ad!
append exception_text "
Your ad appears to be all uppercase.
ON THE INTERNET THIS IS CONSIDERED SHOUTING. IT IS ALSO MUCH
HARDER TO READ THAN MIXED CASE TEXT. So we don't allow it,
out of decorum and consideration for people who may
be visually impaired."
incr exception_count
}
In the ArsDigita Community System, we have a page that shows a user's complete history with a Web service, e.g., http://photo.net/shared/community-member.tcl?user_id=23069 shows all of the postings by Philip Greenspun. If a comment on a static page is short, we want to show the entire message. If not, we want to show just the first 1000 characters.
In http://software.arsdigita.com/www/shared/community-member.tcl, we find the following use of the string range command:
if { [string length $message] > 1000 } {
set complete_message "[string range $message 0 1000]... "
} else {
set complete_message $message
}
The Tcl commands format and scan resemble C's printf and scanf commands. That's pretty much all that any Tcl manual will tell you about these commands, which means that you're kind of S.O.L. if you don't know C. The basic idea of these commands comes from Fortran, a computer language developed by John Backus at IBM in 1954. The FORMAT command in Fortran would let you control the printed display of a number, including such aspects as spaces of padding to the left and digits of precision after the decimal point.
With Tcl format, the first argument is a pattern for how you'd like the final output to look. Inside the pattern are placeholders for values. The second through Nth arguments to format are the values themselves:
We can never figure out how to use format without either copying an earlier fragment of pattern or referring to the man page (http://www.tcl.tk/man/tcl8.4/TclCmd/format.htm ). However, here are some examples for you to copy:format pattern value1 value2 value3 .. valueN
The Tcl command% # format prices with two digits after the point
% format "Price: %0.2f" 17
Price: 17.00
% # pad some stuff out to fill 20 spaces
% format "%20s" "a long thing"
a long thing
% format "%20s" "23"
23
% # notice that the 20 spaces is a MINIMUM; use string range
% # if you might need to truncate
% format "%20s" "something way longer than 20 spaces"
something way longer than 20 spaces
% # turn a number into an ASCII character
% format "%c" 65
A
scan performs the reverse operation, i.e., parses an input string according to a pattern and stuffs values as it finds them into variables: Notice that the number returned by% # turn an ASCII character into a number
% scan "A" "%c" the_ascii_value
1
% set the_ascii_value
65
%
scan is a count of how many conversions it was able to perform successfully. If you really want to use scan, you'll need to visit the man page: http://www.tcl.tk/man/tcl8.4/TclCmd/scan.htm . For an idea of how useful this is for Web development, consider that the entire 250,000-line ArsDigita Community System does not contain a single use of the scan command.
string
append variable_name value1 value2 value3 ... valueN regexp ?switches? expression string ?matchVar? ?subMatchVar subMatchVar ...? expression matches string; 0 otherwise. If successful, regexp sets the match variables to the parts of string that matches the corresponding parts of expression. (more: the pattern matching chapter and http://www.tcl.tk/man/tcl8.4/TclCmd/regexp.htm )% set fraction "5/6"
5/6
% regexp {(.*)/(.*)} $fraction match num denom
1
% set match
5/6
% set num
5
% set denom
6
regsub ?switches? expression string substitution_spec result_variable_name result_variable_name.
Here's an example where we ask a user to type in keywords, separated by commands. We then expect to feed this list to a full-text search indexer that will throw an error if handed two commas in a row. We use regsub to clean up what the user typed:
(more: the pattern matching chapter and http://www.tcl.tk/man/tcl8.4/TclCmd/regsub.htm )# here we destructively modify the variable $query_string'
# replacing every occurrence of one or more commas with a single
# command
% set query_string "samoyed,, sledding, harness"
samoyed,, sledding, harness
% regsub -all {,+} $query_string "," query_string
2
% set query_string
samoyed, sledding, harness
were dramatically improved with the Tcl 8.1 release. For a Web developer the most important feature is the inclusion of non-greedy regular expressions. This makes it easy to match the contents of HTML tags. See http://www.scriptics.com/services/support/howto/regexp81.html for a full discussion of the differences.
string(all of which are documented at http://www.tcl.tk/man/tcl8.4/TclCmd/string.htm )
string compare apple applesauce ==> -1
string compare apple Apple ==> 1
string first tcl catclaw ==> 2
string last abra abracadabra ==> 7
string range catclaw 2 4 ==> tcl
string compare weBmaster Webmaster => 1
string compare [string tolower weBmaster] \
[string tolower Webmaster] => 0
set password "ferrari"
string compare "FERRARI" [string toupper $password] ==> 0
set password [string trim $form_password] ; # see above example
set password [string trimleft $form_password]
set password [string trimright $form_password]
string wordend "tcl is the greatest" 0 ==>3
string wordstart "tcl is the greatest" 5 ==> 4
Exercises ( see section V.3 List Operations )
---
based on Tcl for Web Nerds