A metadata scraper for the https://myrunningman.com/ website.
Go to file
2023-06-15 19:49:30 +12:00
.gitignore gitignore, update tagline 2022-08-08 19:44:03 +12:00
download-pages.sh combine download-pages and download-thumbs; stash max_episode file 2023-06-15 19:49:04 +12:00
Makefile makefile: initial commit 2023-06-15 19:49:30 +12:00
pages-to-json.php json: rename file, always work up to max_episode 2023-06-15 19:49:23 +12:00
README.md doc/readme: truncate sample 2022-08-08 19:44:48 +12:00

myrunningmancom-scraper

A metadata scraper for the https://myrunningman.com/ website.

Usage

  1. ./download-pages.sh to download HTML files only once
  2. ./download-thumbs.sh to collect thumbnails (optional)
  3. ./running-parser.php to parse HTML into final output.json data file

Example output

{
    "1": {
        "title": "Times Square",
        "broadcast_date": "2010-07-11",
        "filming_date": "2010-06-21",
        "location": "Times Square (Yeongdeungpo-gu, Seoul)",
        "description": "A never-before-seen action variety show with an amazing cast. To start off the first episode[...]