When I started to think about a second edition of Data Munging With Perl, I thought it would be almost trivial. The plan was to go through the text and
Update the Perl syntax to a more modern version of Perl
Update the CPAN modules used
And that was about it.
But when I started looking through the first edition, I realised there was a big chunk of work missing from this plan
Ensure the book reflects current ideas and best practices in the industry
It’s that extra step that is taking the time. I was writing the first edition 25 years ago. That’s a long time. It’s a long time in any industry - it’s several generations in our industry. Think about the way you were working in 2000. Think about the day-to-day tasks you were taking on. Think about the way you organised your day (the first books on Extreme Programming were published in 2000; the Agile Manifesto was written in 2001; Scrum first appeared at about the same time).
The first edition contains the sentence “Databases are becoming almost as ubiquitous as data files”. Imagine saying that with a straight face today. There is one paragraph on Unicode. There’s nothing about YAML or JSON (because those formats both appeared in the years following publication).
When I was writing the slides for my talk, Still Munging Data With Perl, I planned to add a slide about “things we hadn’t heard of in 2000”. It ended up being four slides - and that was just scratching the surface. Not everything in those lists needs to be mentioned in the book - but a lot of it does.
When working on the book recently, I was reminded of how much one particular section of the industry has changed.
The problem of screen-scraping
The first edition has a chapter on parsing HTML. Of course it does - that was cutting edge at the time. At the end of the chapter, there’s an extended example on screen scraping. It grabs the Yahoo! weather forecast for London and extracts a few pieces of data.
I was thinking ahead when I wrote it. There’s a footnote that says:
You should, of course, bear in mind that web pages change very frequently. By the time you read this, Yahoo! may well have changed the design of this page which will render this program useless.
But I had no idea how true that would be. When I revisited it, the changes were far larger than the me of 2000 could have dreamed of.
The page had moved. So the program would have failed at the first hurdle.
The HTML had changed. So even when I updated the URL, the program still failed.
And. most annoyingly, the new HTML didn’t include the data that I wanted to extract. To be clear - the data I wanted was displayed on the page - it just wasn’t included in the HTML.
Oh, I know what you’re thinking. The page is using Javascript to request the data and insert it into the page. That’s what I assumed too. And that’s almost certainly the case. But after an afternoon with the Chrome Development Tools open, I could not find the request that pulled the required data into the page. It’s obviously there somewhere, but I was defeated.
I’ll rewrite that example to use an API to get weather data.
But it was interesting to see how much harder screen scraping has become. I don’t know whether this was an intentional move by Yahoo! or if it’s just a side effect of their move to different technologies for serving this page. Whichever it is, it’s certainly something worth pointing out in that chapter.
Other writing
I seem to have written quite a lot of things that aren’t at all related to the book since my last newsletter. I wonder if that’s some kind of displacement therapy :-)
I’m sure that part of it is down to how much more productive I am now I have AI to help with my projects.
Cleaner web feed aggregation with App::FeedDeduplicator explains a problem I have because I’m syndicating a lot of my blog posts to multiple sites (and talks about my solution).
Reformatting images with App::BlurFill introduces a new CPAN module I wrote to make my life as a publisher easier (but it has plenty of other applications too).
Turning AI into a Developer Superpower: The PERL5LIB Auto-Setter - another project that I’d been putting off because it just seemed a bit too complicated. But ChatGPT soon came up with a solution.
Deploying Dancer Apps – The Next Generation - last year, I wrote a couple of blog posts about how I deployed Dancer apps on my server. This takes it a step further and integrates them with systemd.
Generating Content with ChatGPT - I asked ChatGPT to generate a lot of content for one of my websites. The skeleton of the code I used might be useful for other people.
A Slice of Perl - explaining the idea of slices in Perl. Some people don’t seem to realise they exist, but they’re a really powerful piece of Perl syntax.
Stop using your system Perl - this was controversial. I should probably revisit this and talk about some of the counterarguments I’ve seen.
perlweekly2pod - someone wondered if we could turn the Perl Weekly newsletter into a podcast. This was my proof of concept.
Modern CSS Daily - using ChatGPT to fill in gaps in my CSS knowledge.
Other people’s writing
Pete thinks he can help people sort out their AI-generated start-up code. I think he might be onto something!
The Substack conversation
Like most people, when I started this Substack I had an idea that I might be able to monetise it at some point. I’m not talking about forcing people to pay for it, but maybe add a paid tier on top of this sporadic free one. Obviously, I’d need to get more organised and promise more regular updates - and that’s something for the future.
But over the last few months, I’ve been pleasantly surprised to receive email from Substack saying that two readers have “pre-pledged” for my newsletter. That is, they’ve told Substack that they would be happy to pay for my content. That’s a nice feeling.
To be clear, I’m not talking about adding a paid tier just yet. But I might do that in the future. So, just to gauge interest, I’m going to drop a “pledge your support” button in this email. Don’t feel you have to press it.
That’s all for today. I’m going to get back to working on the book. I’ll write again soon.
Dave…
Dave,
Where can I send my editorial comments, if you even want them? I also have something to say about the first example in chapter 2, the cd final example.
Thanks,
Matthew
Dave, an idea: It might be useful for you to include some text in the revised edition that shows (fairly specifically) how something WAS done in the first edition, where you've revise a section. Even some of the replaced sample code. It might be an excellent way for readers to realize "Hey, that's the way I've done it in several programs. I'll improve it the way Dave has."
Heck, if bytes aren't expensive, maybe you could do this in the form of having the first edition by an Appendix, and the notes I've described above could be the effect of "The Circles.pm module used in the first edition (see section 7.13 of the first edition, in the Appendix) was replaced by the Trigonometry.pm module, which is actively maintained." Etc.