Make crawler smarter #33

Merged
bilal.catic merged 12 commits from make-crawler-smarter into master 2019-09-25 19:15:15 +02:00

12 Commits

Author SHA1 Message Date
Bilal Catic
3d203df988 remove comment from delay between indexing pages 2019-09-25 10:00:42 +00:00
Bilal Catic
c9a959f8be stop crawling when existing, not renewed ad is found 2019-09-25 08:55:00 +02:00
Bilal Catic
b3fcc6ba9a return new and existing real estates when saving results 2019-09-25 08:55:00 +02:00
Bilal Catic
f93d0e738f add delay between pages config variable 2019-09-25 08:55:00 +02:00
Bilal Catic
90bc57edb6 stop crawling when existing, non-renewed ad is found 2019-09-25 08:55:00 +02:00
Bilal Catic
746732f30b remove deleted column from RealEstate model 2019-09-25 08:55:00 +02:00
Bilal Catic
06d35fcb4b move ignored usernames config to crawler specific config 2019-09-25 08:55:00 +02:00
Bilal Catic
63eb64b0f6 parse and save published and renewed dates 2019-09-25 08:55:00 +02:00
Bilal Catic
c7184be5fc install moment and moment-timezone packages 2019-09-25 08:55:00 +02:00
Bilal Catic
18db554ea8 add published and renewed date columns to the RealEstates table 2019-09-25 08:55:00 +02:00
Bilal Catic
3140fdf0c0 use function generator to index pages; crawl in parallel 2019-09-25 08:55:00 +02:00
Bilal Catic
c4f6c6e1c3 construct crawling url before indexing single page 2019-09-25 08:55:00 +02:00