Crawler service #17

Merged
nedimu merged 13 commits from crawler-service into master 2019-06-24 16:09:48 +02:00
nedimu commented 2019-06-21 17:01:36 +02:00 (Migrated from gitlab.com)

Smoke test:

Run npm install
Run migrations

Scheduler: The request for this ticket was that scheduler is done as a job, there is and added npm script npm run scheduler witch runs the crawler service once, the scheduler on heroku will bi handled with addon http://www.modeo.co/blog/2015/1/8/heroku-scheduler-with-nodejs-tutorial after deployment

For simlpicity it would be good if you have only one realestate request at first run, so it is easy to track everything. Add more requests after first run was successful

  1. Run scheduler script

  2. In logs check if the select query for realestate requests executed correctly

  3. Check the generation of urls from realestate requests

  4. Check if all of the results from that paticular ULR are there (there is no paging present yet, so only the first page is being indexed)

  5. Checkout queries for bounding box

  6. Depending on the bounding box of the realestate request crawler results should be filtered (At first zoom out as much as you can so you get some results, edit the realestate request after that and zoom in to the area that at least one know realestate is present, and after runing sheduler again, check if rest of the properties in that municipality are filtered)

  7. Check if makretalerts are saved in db correctly (olxId is mising currently)

  8. Check if duplicate makretalerts are filtered (filtering is done by olx url)

Smoke test: Run npm install Run migrations Scheduler: The request for this ticket was that scheduler is done as a job, there is and added npm script `npm run scheduler` witch runs the crawler service once, the scheduler on heroku will bi handled with addon http://www.modeo.co/blog/2015/1/8/heroku-scheduler-with-nodejs-tutorial after deployment For simlpicity it would be good if you have only one realestate request at first run, so it is easy to track everything. Add more requests after first run was successful 1) Run scheduler script 2) In logs check if the select query for realestate requests executed correctly 3) Check the generation of urls from realestate requests 4) Check if all of the results from that paticular ULR are there (there is no paging present yet, so only the first page is being indexed) 5) Checkout queries for bounding box 6) Depending on the bounding box of the realestate request crawler results should be filtered (At first zoom out as much as you can so you get some results, edit the realestate request after that and zoom in to the area that at least one know realestate is present, and after runing sheduler again, check if rest of the properties in that municipality are filtered) 7) Check if makretalerts are saved in db correctly (olxId is mising currently) 8) Check if duplicate makretalerts are filtered (filtering is done by olx url)
bilal.catic commented 2019-06-21 22:55:38 +02:00 (Migrated from gitlab.com)

This should be converted to switch statement

This should be converted to switch statement
bilal.catic commented 2019-06-21 23:03:03 +02:00 (Migrated from gitlab.com)

Use JS template literals instead of string concatenation

Use JS template literals instead of string concatenation
bilal.catic commented 2019-06-21 23:24:05 +02:00 (Migrated from gitlab.com)

What is actualNoOfResults, it seems like unused variable

What is `actualNoOfResults`, it seems like unused variable
bilal.catic commented 2019-06-22 11:41:14 +02:00 (Migrated from gitlab.com)

allRERequests and findPointInsideBoundingBox methods are not related to URL, so it is strange to have them in this file, maybe in dbHelper file ?

allRERequests and findPointInsideBoundingBox methods are not related to URL, so it is strange to have them in this file, maybe in dbHelper file ?
bilal.catic commented 2019-06-22 11:56:57 +02:00 (Migrated from gitlab.com)

choose better name for re1

choose better name for `re1`
bilal.catic commented 2019-06-22 19:52:17 +02:00 (Migrated from gitlab.com)

This should not be hardcoded. Maybe put this info in result object in Olx crawler ?

This should not be hardcoded. Maybe put this info in result object in Olx crawler ?
bilal.catic commented 2019-06-22 23:54:22 +02:00 (Migrated from gitlab.com)

if gardenSize is NaN, writing market alert object to DB will fail. We should check if garden size is valid numeric value or set it to NULL value

if gardenSize is NaN, writing market alert object to DB will fail. We should check if garden size is valid numeric value or set it to NULL value
nedimu commented 2019-06-24 11:49:21 +02:00 (Migrated from gitlab.com)

changed this line in version 2 of the diff

changed this line in [version 2 of the diff](/saburly/marketalarm/web/merge_requests/17/diffs?diff_id=46161199&start_sha=2f474619caef8773ccf9ab3bb9958946f46f156b#24603bdf512873ee6870dcdd305de7de5308f2da_36_36)
nedimu commented 2019-06-24 11:49:22 +02:00 (Migrated from gitlab.com)

added 1 commit

Compare with previous version

added 1 commit <ul><li>6eba5c2a - gardenSize nan</li></ul> [Compare with previous version](/saburly/marketalarm/web/merge_requests/17/diffs?diff_id=46161199&start_sha=2f474619caef8773ccf9ab3bb9958946f46f156b)
nedimu commented 2019-06-24 13:32:40 +02:00 (Migrated from gitlab.com)

fixed

fixed
nedimu commented 2019-06-24 13:37:35 +02:00 (Migrated from gitlab.com)

Fixed

Fixed
nedimu commented 2019-06-24 13:38:02 +02:00 (Migrated from gitlab.com)

Will fix in next PR, when handeling paging

Will fix in next PR, when handeling paging
nedimu commented 2019-06-24 13:48:51 +02:00 (Migrated from gitlab.com)

Fixed

Fixed
nedimu commented 2019-06-24 13:49:27 +02:00 (Migrated from gitlab.com)

Fixed

Fixed
nedimu commented 2019-06-24 13:50:19 +02:00 (Migrated from gitlab.com)

Will be fixed when we add other crawlers, for now I will put TODO here

Will be fixed when we add other crawlers, for now I will put TODO here
nedimu commented 2019-06-24 14:20:38 +02:00 (Migrated from gitlab.com)

changed this line in version 3 of the diff

changed this line in [version 3 of the diff](/saburly/marketalarm/web/merge_requests/17/diffs?diff_id=46189848&start_sha=6eba5c2a97f26a2350e8a6fd46d96b6d856df921#021c2c8c9aec1c59d83fdae3c17100403b3580f6_155_155)
nedimu commented 2019-06-24 14:20:39 +02:00 (Migrated from gitlab.com)

changed this line in version 3 of the diff

changed this line in [version 3 of the diff](/saburly/marketalarm/web/merge_requests/17/diffs?diff_id=46189848&start_sha=6eba5c2a97f26a2350e8a6fd46d96b6d856df921#021c2c8c9aec1c59d83fdae3c17100403b3580f6_219_224)
nedimu commented 2019-06-24 14:20:39 +02:00 (Migrated from gitlab.com)

changed this line in version 3 of the diff

changed this line in [version 3 of the diff](/saburly/marketalarm/web/merge_requests/17/diffs?diff_id=46189848&start_sha=6eba5c2a97f26a2350e8a6fd46d96b6d856df921#021c2c8c9aec1c59d83fdae3c17100403b3580f6_191_196)
nedimu commented 2019-06-24 14:20:40 +02:00 (Migrated from gitlab.com)

added 1 commit

  • 2cf6f6f1 - Code refactoring, fixed bug with price parsing:

Compare with previous version

added 1 commit <ul><li>2cf6f6f1 - Code refactoring, fixed bug with price parsing:</li></ul> [Compare with previous version](/saburly/marketalarm/web/merge_requests/17/diffs?diff_id=46189848&start_sha=6eba5c2a97f26a2350e8a6fd46d96b6d856df921)
nedimu commented 2019-06-24 15:35:11 +02:00 (Migrated from gitlab.com)

added 1 commit

Compare with previous version

added 1 commit <ul><li>1aa91fb4 - Fixed gardenSize</li></ul> [Compare with previous version](/saburly/marketalarm/web/merge_requests/17/diffs?diff_id=46200253&start_sha=2cf6f6f1ff6b092a5cc58d8144b135945bd59d4a)
bilal.catic commented 2019-06-24 16:09:48 +02:00 (Migrated from gitlab.com)

merged

merged
bilal.catic commented 2019-06-24 16:09:50 +02:00 (Migrated from gitlab.com)

mentioned in commit 5ffdaef1bf

mentioned in commit 5ffdaef1bf4b01bf7b4489e37ca3953294e859e7
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: senaduka/old-web#17