News Focus
News Focus
Followers 210
Posts 7903
Boards Moderated 15
Alias Born 05/24/2001

Re: Koikaze post# 13432

Saturday, 05/04/2002 11:25:20 AM

Saturday, May 04, 2002 11:25:20 AM

Post# of 222626
Do you know Basic? I've always kept a copy of QBASIC.EXE hanging around on my hard drive for anytime I need to write a quick 'n dirty throw-away.

Or, if I'm not mistaken, W2K and later have VBS built in.

Personally, though I'm an old dog none too keen on having to learn yet another set of tricks (when 80% of what I know is obsolete knowledge), I found PERL to be worth the downloading and learning. I use it anytime I need to interact with a website (a few lines of code, and you've got the site's page sitting in memory, ready to run through) or if all I need to do is extract particular lines from a file, which it sounds like you're doing.

For example:

 
open (INFILE, "filetoprocess.txt");
open (OUTFILE, "resultsfile.txt");
while (<INFILE>)
{
if (/I like PERL/)
{
print OUTFILE "$_\n";
}
}
close INFILE;
close OUTFILE;


I think that's about right, but I haven't used it for a bit, so there might be a syntax error or two.

But that quickly reads from one file, and takes every line containing "I like PERL" and writes it (plus a linefeed -- the "\n" thing) to another file.

It's one of the quickest ways to do something like that.

It gets even sweeter, though, if you want to just grab a part of the line in which the text was found. It's got a function called SPLIT that's terribly handy.

But where I like it best is interacting with websites.

For example, SI removes "dead" (threads with no activity for 90 days) threads from its thread search, so a common problem there is trying to find an old thread that hasn't been posted to for a long time. The threads themselves aren't removed. They're just not searchable.

The following bit of code produced the thread list I keep on my sibob site:

 
require LWP::UserAgent;
$ua = new LWP::UserAgent;

$baseurl="http://www.siliconinvestor.com/stocktalk/subject.gsp\?subjectid\=";
$beginthreadnum=1;
$endthreadnum=37035;
$threadnum=$beginthreadnum;

open (OUTFILE, ">thread.si");

while ($threadnum < $endthreadnum)
{
$cur_url=$baseurl.$threadnum;
print "$threadnum ";
$request=new HTTP::Request GET => $cur_url;
$response=$ua->request($request, "header.si");
open (MESSAGE, "header.si");
while (<MESSAGE>)
{
chomp;
if (/\<title\>SI\: /)
{
($junk, $save)=split(/\<title\>SI\: /, $_);
($threadtitle, $junk)=split(/\<\/title\>/, $save);
print OUTFILE "$cur_url / $threadtitle\n";
print "$threadtitle\n";
last;
}
}
close (MESSAGE);
$threadnum++;
close (OUTFILE);
open (OUTFILE, ">>thread.si");
}


It starts with thread 1 and goes up to 37035 (this is really old -- the ending thread number would be considerably higher now), and if a valid page is returned, it looks for whatever is between "<title>SI:" and "</title>", saves that text as the thread name, and also saves the URL, then moves on to the next thread.

I later used QBasic (because I was more comfortable with it for just crunching on strings and reshaping them) to make the HTML, complete with a href's, but I know it can be done in PERL, too. In fact, looking at the PERL code right now, I can see that just one more line is all I'd need to make a ready-to-use HTML.

PERL's a really sweet language for doing a lot of things quickly, especially searching for and dealing with text in files or interacting with a website.



Discover What Traders Are Watching

Explore small cap ideas before they hit the headlines.

Join Today