Scrape All Pages - Page scraper, then go state by state to send update emails - for any website,
Version - 2025-05-16 ver X12

This will extract the names, addresses, websites urls, email address, phone numbers and content for each bulleted entry on the page.

For DB work (not just sending emails), this is step 2 - AFTER creating the tables (main, and scraperonly, the latter used in step 3, importer)

WARNINGS:

  1.  For some reason <strong> and </strong> get missed - change them to <span class="farm> or <b>
  2. MAKE CERTAIN there are NO double quotes (") in any listing.
  3. On state pages, like GA.htm, anything after the substate links must be separated by an H2 tag, like Map and other information
  4. In functions using in preg match and replace, first clean the target by removing slashes (like in 1/2 mile) with $target = str_ireplace( "/", "", $target );
  5. Other problems are with preg_replace text with (,),\

How to use the form: run  1) this file from pickyourown.org 2), It saves files to the _private dir on server. 3) The files must exist (blank) first time Each time you run the file, it appends more to the csv.

Which website?

PickYourOwn
Easter
Farm Markets
PumpkinPatchesAndMore
Christmas Trees
Other (enter details below)

Generate (T)est emails (S)end the emails or  (N)othing          

Test - scrapes and shows results, incl testing pre-test email addresses and website urls, , but does not send emails or store anything
Send - sends emails to email address on the page, and a copy to emailupdates@domain.org (Use test page to test the emails first)
Create a CSV file and saves it to the _private dir on either the server or localhost, depending where you are running it from (no emails are sent, just a summary emailed to myself at emailupdates@domain.org NOTE: this is a SLOW process, about 10 - 30 minutes per group of states, Cloudflare may timeout, the page may freeze, BUT it is still working, so wait at least 30 minutes, download the CSV file and check it)

If sending email, from which email server / account?          

Gmail - johnateasteregghunts@gmail.com
Gmail - blakepickyourown@gmail.com
Benivia server - John2017@EasterEggHuntsAndEasterEvents.org
Benivia server - Blake2024@pickyourown.org
Easter server - John2017@EasterEggHuntsAndEasterEvents.org
FunFactoryTours - feedback@funfactorytours.com
PickYourOwn server - Blake2024@pickyourown.org

Use ZeroBounce Validation?

Yes
No

EXCLUDE the following, such as recent listings (current year) (Either ADDED or UPDATED matches date range)

0. No filter
1. Jan- March 2025
2. Any month in 2025
3. Pink 2025 (Easter website only)
4. Added update link 2025
5. Added update link 2025 and ZZZ section
6. Added update link 2025 and ZZZ section, and page scraped display

The rest of these choices (below) are automatic, based on the two choices above, unless you select O (other).

Test page is TE.htm, TE.php, TEfarmmarket.php, etc. with subpages TEtest1.htm, TEtest2.htm, or TEtest1.php, etc. The email addresses all go back to me.

Note: State or country region pages with no subpages require an exception on line 85 of results. or else you will get a "Warning: file_get_contents" error

Select ONE Country: US, Canada, UK, AU, NZ,

Notes: The U.S. is about 6.7 MB, Canada is about 0.7 MB
Must choose something below: The ALL select may choke, better to do groups of states

What state is this for: All  (best NOT to do All). Sizes are approximate for the CVS if you scrappe groups in order. Use groupings for CSV option above
    Alabama to California (582 kb) Colorado to Georgia  (1.35 MB) Hawaii to Maine (~2.0Mb)
  Mass to NH (3.5 MP) NJ to OK (4.3 MB)  OR to PA (5 MB)
RI to SC (5 MB)    SD to VT (5.5 MB)  VA to WY (6.2 MB) 
Test case (TE, TF, TG)

Note: the subpages (NJ1, NJ2, P1, P2, P3, etc) for used for larges states (>100 listings), when the whole state page times-out or crashes

 
Test page TE
UK Test page
SINGLE PAGE - enter url below

Alabama
Alaska
Arizona
Arkansas
California
   CA1-South
   CA2-SouthCent
   CA3-SanFran
   CA4-North
   CA5-ElDorado
   CA6-xmas-only
Colorado
   CO-Denver
   CO-notDenv
Connecticut
   CT-hart
   CT-east
   CT-W,S
Delaware
DC
Florida
   F1 (EC)
   F2 (W)
   F3 (PH)
   F4 (CitPlm)
   F5(SE,SW)
   F6(N)
F6 Not XMT
Georgia
   G1-Atl
   G2-N
   G3-S,E
Hawaii
Idaho
Illinois
   I1, Chg
   I2, Cen,So
Indiana
   IN-north
   IN-centr
   IN-south
Iowa
Kansas
Kentucky
Louisiana
Maine
   ME-south
   ME-augusta
   ME-west
   ME-N,ne,nc
Massachusetts
   MA-Bos-east
   MA-worcester
   MA-S
   MA-Berks
   MA-Frk,Hamps
   MA-Hampden
Maryland
Michigan
   MI-se
   MI-sw
   MI-cen
   MI-North
Minnesota
   MN-N-mpls
   MN-S-mpls
   MN-se,sw
   MN-centr
   MN-north
Missouri
Mississippi
Montana
Nebraska
Nevada
New Hampshire
   NH1-Hillsbor
   NH2-central
   NH-N,SE
   NH-SW
New Jersey
   NJ1
   NJ2
New Mexico
New York
   1-LI
   2-SE
   3-S,W
   4-finger
   5-Central
North Carolina
   NCWest
   NCPied
   NCEast
North Dakota
Oklahoma
Ohio
   O1-Columb
   O2-Cleveld
   O3-Cinc-Sou
   O4-nw,Tol, E
Oregon
   ORport
   ORsalem,nw
   ORsalem
   OReast,sw
Pennsylvania
   PA1-east
   PA2-central
   PA3-west
Rhode Island
South Carolina
South Dakota
Tennessee
   TN1-east
   TN2-middle
   TN3-nc,colum
   TN4-Clark,swc
   TN5-w,nw
Texas
   T1
   T2
   T3
Utah
Vermont
Virginia
   V1
   V2
   V3
Washington
   W1
   W2
West Virginia
Wisconsin
   WI1-SE
   WI2-SW
   WI3-Cent
   WI4-NW
   WI5-NE
Wyoming
Canada
Alberta to Nova Scotia
Ontario to Yukon
OR
Alberta
BritishColumbia
Manitoba
NewBrunswick
Newfoundland
NovaScotia
Ontario
PrinceEdwardIsland
Quebec
Saskatchewan
Yukon
Britain
London
Southern East Anglia
Northern East Anglia
Berks, Bucks,Ox
Hampshire
Kent
Sussex
East Midlands
West Midlands
Northeast England
Northwest England
Cornwall and Devon
Southwest
Yorkshire
Scotland
Wales
Northern Ireland
Australia
All of Australia
OR
Sydney area of NSW
Bilpin area, NSW
Highlands, S NSW
NW NSW
ACT NSW
NE NSW
Northern territory
Queensland
South Australia
Tasmania
Victoria
Melbourne area
Western Australia
New Zealand
New Zealand
OR
Auckland
Northland
Bay of Plenty, Gisborne
Hawke's Bay
Manawatu-Wanganui
Waikato
Wellington
South Island

South Island Otago

 

Page name of a SINGLE PAGE  to run:. (only use with the SINGLE PAGE checkbox above)

   


You can ignore settings below for now.  Settings are hard coded.

What string defines the beginning of the listings:

What database to use:

What string defines the end of the listings:

What is the tablename:

Suppress debugging displays?
Display or not, many echos used for debugging

Yes
No

Suppress showing each page retrieved?
Display or not, the page text which is a large text display with ads, you don't need to see, except to debug

Yes
No

Suppress showing closed farms in the ZZZ section?
Display or not, the page text which is a large text display with ads, you don't need to see, except to debug

Yes
No

Clear session variables?
- Do it once in a while, like every 10 times

Yes
No

What string defines separates each listing: (only Other category)

 

See PYO for master code. Note: "explode" requires the content be bracketed by the delimiter, so if there is only ONE county
on a page, it won't work.  Solution is add an h3 tag at the end of the listings, with innocuous content, see notes below

Other required files: (last modified 2025-03-04)

pickyourown.org/scrape-all-results-2025.php (or more recent equivalent)

Notes:

On region pages on PYO, if you have other bullet/<li> links below the links to the subpages, you must separate them with the endlistings separator.

For PYO, that is:

<h3><a name="farmmarkets">Other Local Farm Products (Honey, Horses, Milk, Meat, Eggs, Etc.)</a>
<br>(NOT pick-your-own, unless they are also listed above)</h3>

Example FLnorth.htm

Version changes

2022-l - adds second street address line street2, vstreet2
2022-m - adds check for RD redirect page links

2024k - fix country abbr not passedto scanpage func

2025- change to phpmailer

2025-h - add email validation (add new valid email addresses that fail to $validor in results-2025-h)

2025-i - upgraded compatibility to PHP ver 8.1 (from 7.1)


Benivia, LLC
Copyright © 2019 Benivia, LLC. All rights reserved.
Revised: 03/23/26