Make crawler smarter #33

Merged
bilal.catic merged 12 commits from make-crawler-smarter into master 2019-09-25 19:15:15 +02:00
bilal.catic commented 2019-09-25 09:01:51 +02:00 (Migrated from gitlab.com)

Crawler will crawl until at least one condition is met :

  1. Last page is reached
  2. Crawler found ad that is already saved to the DB and it is not renewed (published date is same as renewed date)

I made a few modifications regarding how processing is done. Now loop will not continue until all ads from one page are fetched and saved. This slowed crawler but it is negligible, compared to how code can be followed and understood what is happening.

Execute migrations : npm run migrate

Test crawler : npm run crawl

Crawler will crawl until at least one condition is met : 1) Last page is reached 2) Crawler found ad that is already saved to the DB and it is not renewed (published date is same as renewed date) I made a few modifications regarding how processing is done. Now loop will not continue until all ads from one page are fetched and saved. This slowed crawler but it is negligible, compared to how code can be followed and understood what is happening. Execute migrations : `npm run migrate` Test crawler : `npm run crawl`
bilal.catic commented 2019-09-25 09:02:10 +02:00 (Migrated from gitlab.com)

changed the description

changed the description
bilal.catic commented 2019-09-25 09:38:25 +02:00 (Migrated from gitlab.com)

changed the description

changed the description
bilal.catic commented 2019-09-25 09:38:35 +02:00 (Migrated from gitlab.com)

changed the description

changed the description
bilal.catic commented 2019-09-25 12:00:44 +02:00 (Migrated from gitlab.com)

added 1 commit

  • 3d203df9 - remove comment from delay between indexing pages

Compare with previous version

added 1 commit <ul><li>3d203df9 - remove comment from delay between indexing pages</li></ul> [Compare with previous version](/saburly/marketalarm/web/merge_requests/33/diffs?diff_id=56319417&start_sha=c9a959f8be0b3136659cbc9d4b05e1fa574dd49d)
bilal.catic commented 2019-09-25 19:15:15 +02:00 (Migrated from gitlab.com)

merged

merged
bilal.catic commented 2019-09-25 19:15:15 +02:00 (Migrated from gitlab.com)

mentioned in commit 0b083a02e2

mentioned in commit 0b083a02e2d1f3c80765831d508046cd77617045
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: senaduka/old-web#33