Crawler service #17
Reference in New Issue
Block a user
Delete Branch "crawler-service"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Smoke test:
Run npm install
Run migrations
Scheduler: The request for this ticket was that scheduler is done as a job, there is and added npm script
npm run schedulerwitch runs the crawler service once, the scheduler on heroku will bi handled with addon http://www.modeo.co/blog/2015/1/8/heroku-scheduler-with-nodejs-tutorial after deploymentFor simlpicity it would be good if you have only one realestate request at first run, so it is easy to track everything. Add more requests after first run was successful
Run scheduler script
In logs check if the select query for realestate requests executed correctly
Check the generation of urls from realestate requests
Check if all of the results from that paticular ULR are there (there is no paging present yet, so only the first page is being indexed)
Checkout queries for bounding box
Depending on the bounding box of the realestate request crawler results should be filtered (At first zoom out as much as you can so you get some results, edit the realestate request after that and zoom in to the area that at least one know realestate is present, and after runing sheduler again, check if rest of the properties in that municipality are filtered)
Check if makretalerts are saved in db correctly (olxId is mising currently)
Check if duplicate makretalerts are filtered (filtering is done by olx url)
This should be converted to switch statement
Use JS template literals instead of string concatenation
What is
actualNoOfResults, it seems like unused variableallRERequests and findPointInsideBoundingBox methods are not related to URL, so it is strange to have them in this file, maybe in dbHelper file ?
choose better name for
re1This should not be hardcoded. Maybe put this info in result object in Olx crawler ?
if gardenSize is NaN, writing market alert object to DB will fail. We should check if garden size is valid numeric value or set it to NULL value
changed this line in version 2 of the diff
added 1 commit
6eba5c2a- gardenSize nanCompare with previous version
fixed
Fixed
Will fix in next PR, when handeling paging
Fixed
Fixed
Will be fixed when we add other crawlers, for now I will put TODO here
changed this line in version 3 of the diff
changed this line in version 3 of the diff
changed this line in version 3 of the diff
added 1 commit
2cf6f6f1- Code refactoring, fixed bug with price parsing:Compare with previous version
added 1 commit
1aa91fb4- Fixed gardenSizeCompare with previous version
merged
mentioned in commit
5ffdaef1bf