Commit Graph

50 Commits

Author SHA1 Message Date
Naida Vatric
0c2d218d29 Changed floor numbers and basement-attic tag. 2020-01-02 00:10:31 +01:00
Naida Vatric
fed2dc00dc Changed number of rooms. 2019-12-29 23:42:39 +01:00
Bilal Catic
af42d2c448 improve OLX ad status detection 2019-11-14 08:47:48 +01:00
Bilal Catic
5148f88a62 improve Rental and Aktido ad status detection 2019-11-14 08:31:57 +01:00
Bilal Catic
a7cd75653d improve OLX ad status detection 2019-11-14 08:04:58 +01:00
Bilal Catic
168b2186e7 add more fields to the Prostor real estates crawler 2019-11-14 07:23:23 +01:00
Bilal Catic
c13857bc09 add additional fields to the Prostor crawler 2019-11-14 02:09:42 +01:00
Bilal Catic
3b3e2eda07 refactor Prostor crawler 2019-11-13 16:54:16 +01:00
Bilal Catic
a63671959b improve real estate properties detection for Rental 2019-11-12 22:53:16 +01:00
Bilal Catic
b6d68db3a3 improve real estate properties detection for aktido 2019-11-12 21:39:28 +01:00
Bilal Catic
c91e56c46e add additional real estate fields for Aktido crawler 2019-11-11 19:34:43 +01:00
Bilal Catic
e871550ba6 add two more heating types for Rental crawler 2019-11-11 18:46:01 +01:00
Bilal Catic
debdd01b28 add new fields to the Rental crawler 2019-11-11 17:15:46 +01:00
Bilal Catic
b6024af2cb add new fields for OLX crawler 2019-11-08 17:05:51 +01:00
Bilal Catic
e3e0ddd508 stop logging scrape action for Rental crawler 2019-11-01 01:02:45 +01:00
Bilal Catic
2e3ddbac95 fix request ad type bug 2019-11-01 00:01:02 +01:00
Bilal Catic
4318fa8a2d extend AD_TYPE enum in specific crawler files 2019-10-31 19:06:44 +01:00
Bilal Catic
ecc5b174a0 implement RENT option for Aktido; implement force crawl option 2019-10-30 17:23:43 +01:00
Bilal Catic
fa712ce97d implement RENT option for Rental; implement force crawl option 2019-10-30 15:53:11 +01:00
Bilal Catic
3abbed183e implement RENT and REQUEST option for OLX; implement force crawl option 2019-10-30 15:03:59 +01:00
Bilal Catic
97d93a3f37 add force crawl ENV option for OLX 2019-10-30 15:02:54 +01:00
Bilal Catic
1e36cb8423 add ALL category option for Rental agency 2019-10-28 09:24:08 +01:00
Bilal Catic
2c2fcd648f remove scrapeAd logging 2019-10-28 09:23:51 +01:00
Bilal Catic
5b6886f52b add ALL categories option for Aktido agency 2019-10-28 09:20:03 +01:00
Bilal Catic
f899c96dc6 add crawler and crawler config for Aktido agency 2019-10-28 09:14:45 +01:00
Bilal Catic
747ebb88e5 add debugging log switch for crawler process 2019-10-25 11:08:52 +02:00
Bilal Catic
7e3b0bfcd5 implement crawler for Prostor agency 2019-10-25 10:54:08 +02:00
Bilal Catic
6fc4218e39 add config files for Prostor agency 2019-10-24 17:11:12 +02:00
Bilal Catic
935ae60ae1 move specific crawler config to the separated files 2019-10-24 16:57:23 +02:00
Bilal Catic
2064d40985 stop "rental" crawler if there are no new real estates on the page 2019-10-24 11:26:11 +02:00
Bilal Catic
a6336b7d27 implement crawler for "rental" agency 2019-10-24 11:26:11 +02:00
Bilal Catic
ec798fe94c add crawler config and include specific crawler for "rental" agency 2019-10-24 11:26:11 +02:00
Bilal Catic
88e7cac420 allow olx crawler to recognize other OLX categories 2019-10-14 09:27:32 +02:00
Bilal Catic
9fc5072632 include other olx real estate categories in enums and configs 2019-10-14 09:27:32 +02:00
Bilal Catic
0818fcecd2 remove crawler and saver logging 2019-10-10 00:59:12 +02:00
Bilal Catic
5e8e13a984 fix enums 2019-09-30 14:27:01 +02:00
Bilal Catic
9c0104a57c refactor crawler - adapt to use new ENUM objects 2019-09-30 10:27:12 +02:00
Bilal Catic
e3e47345bc load AWS config through app config; fix ENV path 2019-09-30 09:44:19 +02:00
Bilal Catic
2e92f961ff start crawler loop when server is started 2019-09-26 17:30:06 +02:00
Bilal Catic
3d203df988 remove comment from delay between indexing pages 2019-09-25 10:00:42 +00:00
Bilal Catic
c9a959f8be stop crawling when existing, not renewed ad is found 2019-09-25 08:55:00 +02:00
Bilal Catic
b3fcc6ba9a return new and existing real estates when saving results 2019-09-25 08:55:00 +02:00
Bilal Catic
f93d0e738f add delay between pages config variable 2019-09-25 08:55:00 +02:00
Bilal Catic
90bc57edb6 stop crawling when existing, non-renewed ad is found 2019-09-25 08:55:00 +02:00
Bilal Catic
06d35fcb4b move ignored usernames config to crawler specific config 2019-09-25 08:55:00 +02:00
Bilal Catic
63eb64b0f6 parse and save published and renewed dates 2019-09-25 08:55:00 +02:00
Bilal Catic
3140fdf0c0 use function generator to index pages; crawl in parallel 2019-09-25 08:55:00 +02:00
Bilal Catic
c4f6c6e1c3 construct crawling url before indexing single page 2019-09-25 08:55:00 +02:00
Bilal Catic
3d46c82d3d create new crawler and Postgres saver 2019-09-18 15:32:48 +02:00
Bilal Catic
76a989fa37 replace old crawler, without specific crawler and saver implementation 2019-09-16 15:59:53 +02:00