Every so often, I decide I want to watch all the matches a wrestler
has had that meet some criteria (e.g. in a particular promotion or
against a particular opponent), and usually, I have a bit more fun if I
don't go into every match knowing who's going to win and how many
minutes it's going to take them to do it. Cagematch.net is great for listing
out all the matches I'm interested in, but it's pretty rough for the
latter: the "Matches" view on a wrestler's page is comprehensive but
includes the results, and the "Matchguide" view is spoiler-free but
doesn't include every match.
Fortunately, we can make the computer give us the info we want and
nothing more. We're going to use fq for this, which is a very
general tool for parsing, transforming, and making queries over
structured data formats (if you're familiar with jq, it's
basically "jq but for everything instead of just for
JSON").
I recently subscribed to RevPro's streaming service because I wanted
to watch more Safire Reed matches, so to start this process off I did a
Cagematch query for a
list of all of Safire's matches in RevPro and downloaded the page
(as matches.html). A typical entry in the table looks
something like this (I've added some indentation and removed the
href attributes from the a tags to make it
easier to see the structure):
<span class="MatchCard">
<a>Safire Reed</a> defeats <a>Anita Vaughan</a> (10:33)
</span>
<div class="MatchEventLine">
<a>RevPro Live In London 91</a> - Online Stream @ 229 The Venue in London, England, UK
</div>
The easiest useful thing we can do is to just grab the name of the
event. If we run fq with just . as the
command, it'll show us how it parsed the entire input:
fq -d html '.' matches.html (maybe pipe it to
less, since this is quite a bit of output). If we look for
"MatchEventLine" in there, we'll find that the entries we're interested
in look something like this:
{
"#text": "- Online Stream @ 229 The Venue in London, England, UK",
"@class": "MatchEventLine",
"a": {
"#text": "RevPro Live In London 91",
"@href": "https://www.cagematch.net/?id=1&nr=381436"
}
}
fq uses the grep_by function to recursively
find all objects that match a given condition. We'll use that to get all
the "MatchEventLine" objects:
fq -d html 'grep_by(."@class"=="MatchEventLine")' matches.html
From here, we want to narrow our focus down to the a
element, and get just its inner "#text" (at this point I'm
adding -r to tell fq to produce raw output,
since I don't need the name of each event to be wrapped in double
quotes):
fq -d html -r 'grep_by(."@class"=="MatchEventLine").a."#text"' matches.html
If you're wondering why "@class" and
"#text" are in double quotes but a isn't, it's
because @ and # are special characters. You
could write "a" instead of a in the
middle of the chain of selectors, but you don't have to.
This will give us our list of all the event names and nothing else
(Cagematch's default sorting is newest-to-oldest, so if we want to watch
in order, we should start at the bottom):
RevPro Live In London 106
RevPro Live In Coventry
RevPro Live In London 105
RevPro Live In London 101
RevPro Raw Deal 2025
...
Tada! Now we can go spend the whole day watching indie wrestling
instead of whatever else we were supposed to be doing. If you're looking
for a rec, Safire
vs Kanji from Live In London 97 is available for free (I haven't
actually watched it yet, but their match at Live In London 78 has one of
my favorite final three-minute stretches of any match ever, so this one
is probably awesome too).
Or, we can keep tinkering....
( Read more... )