Commit Graph

36 Commits

Author SHA1 Message Date
Bilal Catic
e3e0ddd508 stop logging scrape action for Rental crawler 2019-11-01 01:02:45 +01:00
Bilal Catic
2e3ddbac95 fix request ad type bug 2019-11-01 00:01:02 +01:00
Bilal Catic
4318fa8a2d extend AD_TYPE enum in specific crawler files 2019-10-31 19:06:44 +01:00
Bilal Catic
ecc5b174a0 implement RENT option for Aktido; implement force crawl option 2019-10-30 17:23:43 +01:00
Bilal Catic
fa712ce97d implement RENT option for Rental; implement force crawl option 2019-10-30 15:53:11 +01:00
Bilal Catic
3abbed183e implement RENT and REQUEST option for OLX; implement force crawl option 2019-10-30 15:03:59 +01:00
Bilal Catic
97d93a3f37 add force crawl ENV option for OLX 2019-10-30 15:02:54 +01:00
Bilal Catic
1e36cb8423 add ALL category option for Rental agency 2019-10-28 09:24:08 +01:00
Bilal Catic
2c2fcd648f remove scrapeAd logging 2019-10-28 09:23:51 +01:00
Bilal Catic
5b6886f52b add ALL categories option for Aktido agency 2019-10-28 09:20:03 +01:00
Bilal Catic
f899c96dc6 add crawler and crawler config for Aktido agency 2019-10-28 09:14:45 +01:00
Bilal Catic
747ebb88e5 add debugging log switch for crawler process 2019-10-25 11:08:52 +02:00
Bilal Catic
7e3b0bfcd5 implement crawler for Prostor agency 2019-10-25 10:54:08 +02:00
Bilal Catic
6fc4218e39 add config files for Prostor agency 2019-10-24 17:11:12 +02:00
Bilal Catic
935ae60ae1 move specific crawler config to the separated files 2019-10-24 16:57:23 +02:00
Bilal Catic
2064d40985 stop "rental" crawler if there are no new real estates on the page 2019-10-24 11:26:11 +02:00
Bilal Catic
a6336b7d27 implement crawler for "rental" agency 2019-10-24 11:26:11 +02:00
Bilal Catic
ec798fe94c add crawler config and include specific crawler for "rental" agency 2019-10-24 11:26:11 +02:00
Bilal Catic
88e7cac420 allow olx crawler to recognize other OLX categories 2019-10-14 09:27:32 +02:00
Bilal Catic
9fc5072632 include other olx real estate categories in enums and configs 2019-10-14 09:27:32 +02:00
Bilal Catic
0818fcecd2 remove crawler and saver logging 2019-10-10 00:59:12 +02:00
Bilal Catic
5e8e13a984 fix enums 2019-09-30 14:27:01 +02:00
Bilal Catic
9c0104a57c refactor crawler - adapt to use new ENUM objects 2019-09-30 10:27:12 +02:00
Bilal Catic
e3e47345bc load AWS config through app config; fix ENV path 2019-09-30 09:44:19 +02:00
Bilal Catic
2e92f961ff start crawler loop when server is started 2019-09-26 17:30:06 +02:00
Bilal Catic
3d203df988 remove comment from delay between indexing pages 2019-09-25 10:00:42 +00:00
Bilal Catic
c9a959f8be stop crawling when existing, not renewed ad is found 2019-09-25 08:55:00 +02:00
Bilal Catic
b3fcc6ba9a return new and existing real estates when saving results 2019-09-25 08:55:00 +02:00
Bilal Catic
f93d0e738f add delay between pages config variable 2019-09-25 08:55:00 +02:00
Bilal Catic
90bc57edb6 stop crawling when existing, non-renewed ad is found 2019-09-25 08:55:00 +02:00
Bilal Catic
06d35fcb4b move ignored usernames config to crawler specific config 2019-09-25 08:55:00 +02:00
Bilal Catic
63eb64b0f6 parse and save published and renewed dates 2019-09-25 08:55:00 +02:00
Bilal Catic
3140fdf0c0 use function generator to index pages; crawl in parallel 2019-09-25 08:55:00 +02:00
Bilal Catic
c4f6c6e1c3 construct crawling url before indexing single page 2019-09-25 08:55:00 +02:00
Bilal Catic
3d46c82d3d create new crawler and Postgres saver 2019-09-18 15:32:48 +02:00
Bilal Catic
76a989fa37 replace old crawler, without specific crawler and saver implementation 2019-09-16 15:59:53 +02:00