Switch to new crawler #30

Merged
bilal.catic merged 13 commits from switch-to-new-crawler into master 2019-09-18 16:06:20 +02:00
bilal.catic commented 2019-09-18 15:45:33 +02:00 (Migrated from gitlab.com)

This PR includes :

  • new crawler and postgres saver, adapted to the new DB design

Saving results after each indexed page

All selected ad categories are crawled in parallel

  • crawler is very configurable through ENV variables

set ad type to crawl : all ads, only selling ads, only renting ads

set one or more ad categories to crawl : flats, houses, apartments, garages, lands, offices

  • Sequelize updated

PREPARATION:

  1. Start DB
  2. Execute migration (you can use npm run migrate)

SMOKE TEST :

  1. Add OLX crawler configuration ENV variables to your local .ENV file (look at the development.env)
  2. run crawler using npm command : npm run crawl
  3. Verify that results are saved to the DB
This PR includes : * new crawler and postgres saver, adapted to the new DB design > Saving results after each indexed page > All selected ad categories are crawled in parallel * crawler is very configurable through ENV variables > set ad type to crawl : all ads, only selling ads, only renting ads > set one or more ad categories to crawl : flats, houses, apartments, garages, lands, offices * Sequelize updated PREPARATION: 1. Start DB 2. Execute migration (you can use `npm run migrate`) SMOKE TEST : 1. Add OLX crawler configuration ENV variables to your local .ENV file (look at the development.env) 2. run crawler using npm command : `npm run crawl` 3. Verify that results are saved to the DB
bilal.catic commented 2019-09-18 15:46:17 +02:00 (Migrated from gitlab.com)

changed the description

changed the description
bilal.catic commented 2019-09-18 15:46:36 +02:00 (Migrated from gitlab.com)

changed the description

changed the description
bilal.catic commented 2019-09-18 15:46:59 +02:00 (Migrated from gitlab.com)

changed the description

changed the description
edazdarevic commented 2019-09-18 15:57:35 +02:00 (Migrated from gitlab.com)

@bilal.catic Kako bi mogli napraviti da sam crawler izracuna zadnju stranicu? Ne treba to sada ali nesto za razmisljanje.

@bilal.catic Kako bi mogli napraviti da sam crawler izracuna zadnju stranicu? Ne treba to sada ali nesto za razmisljanje.
bilal.catic commented 2019-09-18 16:02:05 +02:00 (Migrated from gitlab.com)

Olx prikaze koliko ima oglasa iz neke kategorije, trebam vidjeti koliko oglasa ide po stranici i tako mozemo znati broj stranica

Olx prikaze koliko ima oglasa iz neke kategorije, trebam vidjeti koliko oglasa ide po stranici i tako mozemo znati broj stranica
bilal.catic commented 2019-09-18 16:06:18 +02:00 (Migrated from gitlab.com)

mentioned in commit 51411a4109

mentioned in commit 51411a4109c4ffb66266ad941427b3a6b22fa4fe
bilal.catic commented 2019-09-18 16:06:20 +02:00 (Migrated from gitlab.com)

merged

merged
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: senaduka/old-web#30