A C r a w l i n g A p p l i c a t i o n w i t - - PowerPoint PPT Presentation

a c r a w l i n g a p p l i c a t i o n w i t h r wh a t
SMART_READER_LITE
LIVE PREVIEW

A C r a w l i n g A p p l i c a t i o n w i t - - PowerPoint PPT Presentation

A C r a w l i n g A p p l i c a t i o n w i t h R Wh a t a b o u t R e a l E s t a t e ? Wh a t a b o u t R e a l E s t a t e ? R e a l E s t a t e D e v e l o p me n


slide-1
SLIDE 1

A C r a w l i n g A p p l i c a t i

  • n

w i t h R

slide-2
SLIDE 2

Wh a t a b

  • u

t R e a l E s t a t e ?

slide-3
SLIDE 3

R e a l E s t a t e D e v e l

  • p

me n t

  • K

n

  • w

t h e s u p p l y a n d d e ma n d

  • I

n c r e a s i n g i n v e s t me n t v

  • l

u me s

  • S

i n g l e

  • p
  • i

n t v s . d e v e l

  • p

me n t

  • v

e r

  • t

i me

  • R

e q u i r e d d a t a :

  • S

u p p l y

  • f

r e a l e s t a t e

  • D

e ma n d / a b s

  • r

p t i

  • n
  • I

n f l u e n c i n g f a c t

  • r

s ( p r i c e , l

  • c

a t i

  • n

. . . )

3 Wh a t a b

  • u

t R e a l E s t a t e ?

B F S – L e e r w

  • h

n u n g s z ä h l u n g ( L W Z )

slide-4
SLIDE 4

N e w R e a l E s t a t e P r

  • j

e c t s

  • C

u r r e n t l y 2 1 1 1 l i s t i n g s ( S e p t e mb e r 2 1 9 )

  • F

r

  • m

s i n g l e h

  • u

s i n g s t

  • b

i g p r

  • j

e c t s

  • E

x i s t i n g w

  • r

k f l

  • w

a t W ü e s t P a r t n e r

4 Wh a t a b

  • u

t R e a l E s t a t e ?

slide-5
SLIDE 5

A C r a w l i n g A p p w i t h R

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

R e q u i r e me n t

M e a s u r e t h e A b s

  • r

p t i

  • n

R a t e

  • f

R e a l E s t a t e

slide-9
SLIDE 9

E x a mp l e : A b s

  • r

p t i

  • n

R a t e

  • f

F l a t s

9 R e q u i r e me n t

slide-10
SLIDE 10

A p p D e mo

slide-11
SLIDE 11

A r c h i t e c t u r e O v e r v i e w

slide-12
SLIDE 12

12 A r c h i t e c t u r e

D B I D B I

slide-13
SLIDE 13

A r c h i t e c t u r e D e t a i l s

slide-14
SLIDE 14

S h i n y D a s h b

  • a

r d

14 F r

  • n

t e n d

  • S

h i n y P r

  • x

y : d e p l

  • y

me n t a n d u s e r ma n a g e me n t

  • S

h i n y D a s h b

  • a

r d w i t h c u s t

  • m

C S S

  • F

r

  • m

c u s t

  • m

S h i n y

  • P

r

  • x

y

  • V

e r s e I ma g e

  • D

a t a T a b l e , L e a f l e t

. b

  • x

{ b

  • r

d e r

  • r

a d i u s : p x ; b a c k g r

  • u

n d : # E F E C E A ; b

  • x
  • s

h a d

  • w

: n

  • n

e ; b

  • r

d e r

  • t
  • p

: n

  • n

e ; }

slide-15
SLIDE 15

S h i n y D a s h b

  • a

r d

  • E

n t e r n e w p r

  • j

e c t s

  • R

e g i s t e r U R L s w i t h d a t a

  • T

e s t a u t

  • ma

t i c e x t r a c t i

  • n

a n d p a r s i n g

  • S

u r v e y p r

  • j

e c t s s t a t u s e s

  • A

n a l y z e d a t a

15 F r

  • n

t e n d

slide-16
SLIDE 16

P

  • s

t g r e s D a t a b a s e

16 B a c k e n d 16

d b <

  • d

b P

  • l

(

  • d

b c ( ) , D r i v e r = " P

  • s

t g r e S Q L U n i c

  • d

e " , D a t a b a s e = S y s . g e t e n v ( " P O S T G R E S _ D B " ) , U s e r N a me = S y s . g e t e n v ( " P O S T G R E S _ U S E R " ) , P a s s w

  • r

d = S y s . g e t e n v ( " P O S T G R E S _ P A S S WO R D " ) , S e r v e r n a me = S y s . g e t e n v ( " P O S T G R E S _ S E R V E R " ) , P

  • r

t = S y s . g e t e n v ( " P O S T G R E S _ P O R T " ) ) u r l s _ t

  • _

c r a w l <

  • d

b R e a d T a b l e ( d b , " u r l s " ) % > % f i l t e r ( u r l _ p a r s e _ s t a t u s = = 1 ) d b Wr i t e T a b l e ( d b , " d w e l l i n g s " , p a r s e d _ d a t a , a p p e n d = T R U E )

  • T

a b l e s : p r

  • j

e c t s , u r l s , d w e l l i n g s

  • C
  • n

n e c t i

  • n

f r

  • m

R w i t h D B I

  • L
  • a

d / a p p e n d d a t a f r a me s

slide-17
SLIDE 17

S p l a s h

B a c k e n d

slide-18
SLIDE 18

S p l a s h

18 B a c k e n d

  • H

e a d l e s s b r

  • w

s e r

  • R

e n d e r s U R L ( d a t a

  • f

t e n l

  • a

d e d v i a J S )

  • O

u t

  • f
  • t

h e

  • b
  • x

s p l a s h i ma g e

  • A

c c e s s e d f r

  • m

R w i t h s p l a s h r

  • R

e t u r n s f u l l H T M L / S c r e e n s h

  • t

l i b r a r y ( s p l a s h r ) my _ s p l a s h <

  • s

p l a s h ( h

  • s

t = " s p l a s h " , p

  • r

t = 8 5 ) h t ml <

  • r

e n d e r _ h t ml ( s p l a s h _

  • b

j = my _ s p l a s h , u r l = i n p u t $ u r l ) s c r e e n s h

  • t

<

  • r

e n d e r _ p n g ( s p l a s h _

  • b

j = my _ s p l a s h , u r l = i n p u t $ u r l , )

slide-19
SLIDE 19

A u t

  • E

x t r a c t i

  • n

/ P a r s i n g L

  • g

i c

19 B a c k e n d

  • P

a r s e H T M L w i t h r v e s t

  • E

x t r a c t t a b l e s t

  • d

a t a f r a me s

 h

t ml _ t a b l e ( )

 h

t ml _ n

  • d

e s ( ) % > % h t ml _ t e x t ( )

 D

i f f i c u l t t

  • a

u t

  • ma

t e !

  • P

a r s e d a t a f r a me s w i t h d p l y r

 S

e a r c h C

  • l

u mn s ( k e y w

  • r

d s , n u mb e r s )

 M

u t a t e

 V

a l i d a t e

slide-20
SLIDE 20

C r a w l i n g D a e mo n

20 B a c k e n d

  • S

e p a r a t e R c

  • n

t a i n e r

  • S

c h e d u l e d s c r i p t w i t h c r

  • n

j

  • b
  • G

e t U R L s f r

  • m

d b , g e t H T M L , p a r s e d a t a

  • O

n e r e q u e s t p e r d a y → s ma l l s e r v e r l

  • a

d

d b <

  • d

b P

  • l

(

  • d

b c ( ) , D r i v e r = , D a t a b a s e = … , . . . ) my _ s p l a s h <

  • s

p l a s h ( h

  • s

t = " s p l a s h " , p

  • r

t = 8 5 ) u r l s _ t

  • _

c r a w l <

  • d

b R e a d T a b l e ( d b , " u r l s " ) % > % f i l t e r ( u r l _ p a r s e _ s t a t u s = = 1 ) h t ml <

  • r

e n d e r _ h t ml ( s p l a s h _

  • b

j = my _ s p l a s h , u r l = u r l s _ t

  • _

c r a w l ) p a r s e d _ d a t a <

  • r

e t u r n _ t a b l e ( h t ml , u r l s _ t

  • _

c r a w l ) d b Wr i t e T a b l e ( d b , " d w e l l i n g s " , p a r s e d _ d a t a , a p p e n d = T R U E )

slide-21
SLIDE 21

21 A r c h i t e c t u r e

D B I D B I

slide-22
SLIDE 22

C u r r e n t S t a t e

slide-23
SLIDE 23

C u r r e n t S t a t e

23 C u r r e n t S t a t e

  • I

n c l u d e d i n Wü e s t P a r t n e r w e e k l y w

  • r

k f l

  • w
  • S

i g n i f i c a n t p

  • r

t i

  • n
  • f

l a r g e p r

  • j

e c t s t r a c k e d

  • D

a i l y d a t a t r a c k i n g N e x t s t e p s :

 M

  • d

e l a b s

  • r

p t i

  • n

r a t e s

 P

r

  • d

u c t i z i n g a n a l y s i s d a s h b

  • a

r d

 A

u t

  • ma

t i c U R L f i n d e r

slide-24
SLIDE 24

C h a l l e n g e s H e l p me :

  • )
slide-25
SLIDE 25

A u t

  • E

x t r a c t D a t a F r a me s f r

  • m

H T M L

25 C h a l l e n g e s

< t a b l e / > v s . < d i v / > T a b l e g e

  • me

t r y

slide-26
SLIDE 26

W e b C r a w l i n g i n G e n e r a l

slide-27
SLIDE 27

L a r g e C r a w l i n g P r

  • j

e c t s

  • P

e r f

  • r

ma n c e , a u t

  • ma

t i

  • n
  • E

a s y f

  • r

m s u b mi s s i

  • n
  • E

x t e n d e d s e s s i

  • n

ma n a g e me n t c a p a b i l i t i e s

  • Q

u i c k d e p l

  • y

me n t , t e s t i n g

M

  • r

e s

  • f

t w a r e d e v e l

  • p

me n t L e s s d a t a a n a l y s i s

27 O t h e r c r a w l i n g p r

  • j

e c t s

slide-28
SLIDE 28

T h i n g s t

  • c
  • n

s i d e r

28 O t h e r c r a w l i n g p r

  • j

e c t s

  • O

n l y c

  • l

l e c t a s mu c h a s y

  • u

r e a l l y n e e d

  • D
  • n
  • t
  • v

e r l

  • a

d s i t e s

  • R

e s p e c t d a t a p r i v a c y

  • C
  • n

s i d e r s

  • c

i a l i mp l i c a t i

  • n

s

h t t p s : / / w w w . s h i e l d s q u a r e . c

  • m/

g

  • d
  • b
  • t

s

  • a

n d

  • b

a d

  • b
  • t

s /

slide-29
SLIDE 29

H

  • w

n

  • t

t

  • g

e t c r a w l e d

29 O t h e r c r a w l i n g p r

  • j

e c t s

  • P

r

  • v

i d e a n A P I

  • U

s e a C A P T C H A ( me d i u m p r

  • t

e c t i

  • n

)

  • L

i mi t i n g r e q u e s t s v i a c

  • k

i e s ( l

  • w

p r

  • t

e c t i

  • n

)

  • L

i mi t i n g r e q u e s t s v i a I P ( l

  • w

p r

  • t

e c t i

  • n

)

  • T

h i n k l i k e a c r a w l e r ( me d i u m t

  • h

i g h p r

  • t

e c t i

  • n

)

slide-30
SLIDE 30

T h a n k y

  • u

f

  • r

y

  • u

r a t t e n t i

  • n

!

T h

  • ma

s M a i e r + 4 1 4 4 2 8 9 9 2 6 3 t h

  • ma

s . ma i e r @ d a t a h

  • u

s e . c h D a t a h

  • u

s e A G B l e i c h e r w e g 5 8 1 Z ü r i c h w w w . d a t a h

  • u

s e . c h S e p t e mb e r 1 6

t h

, 2 1 9 D a n i e l M e i s t e r + 4 1 4 4 2 8 9 9 2 3 d a n i e l . me i s t e r @ d a t a h

  • u

s e . c h D a t a h

  • u

s e A G B l e i c h e r w e g 5 8 1 Z ü r i c h w w w . d a t a h

  • u

s e . c h