Toward so)ware engineering in prac0ce Claire Le Goues 15-214 - - PowerPoint PPT Presentation

toward so ware engineering in prac0ce
SMART_READER_LITE
LIVE PREVIEW

Toward so)ware engineering in prac0ce Claire Le Goues 15-214 - - PowerPoint PPT Presentation

Toward so)ware engineering in prac0ce Claire Le Goues 15-214 April 27, 2017 1 Learning Goals Broad scope of so;ware engineering Importance of nontechnical issues IntroducCon to key challenges 2 So)ware is Everywhere So)ware is


slide-1
SLIDE 1

Toward so)ware engineering in prac0ce

Claire Le Goues 15-214 April 27, 2017

1

slide-2
SLIDE 2

Learning Goals

  • Broad scope of so;ware engineering
  • Importance of nontechnical issues
  • IntroducCon to key challenges

2

slide-3
SLIDE 3

So)ware is Everywhere So)ware is Important

(duh)

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

Gov’t example: So)ware is integral to DoD systems.

QuoCng an Air Force lieutenant general, “The only thing you can do with an F- 22 that does not require so;ware is take a picture of it.”

5

Crouching Dragon, Hidden So;ware: So;ware in Dod Weapon Systems (Ferguson, IEEE So;ware, 2001)

slide-6
SLIDE 6

Failed So)ware Projects

  • SAGE (Semi-AutomaCc Ground

Environment); started 1951, almost obsolete when finished in 1963; higher costs than Manha]an project

  • FBI Virtual Case File stopped in 2005 a;er 3

years and 170 M$

  • London stock exchange stopped Taurus

project 1993 a;er 11 years when 13200%

  • ver budget

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

8

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

“But we’re CMU students and we are really, really smart!”

11

slide-12
SLIDE 12

So)ware Engineering?

What is engineering? And how is it different from hacking/ programming?

12

slide-13
SLIDE 13

1968 NATO Conference on So)ware Engineering

  • ProvocaCve Title
  • Call for AcCon
  • “So;ware crisis”

13

slide-14
SLIDE 14

Envy of Engineers

  • Producing a car/bridge

– EsCmable costs and risks – Expected results – High quality

  • SeparaCon between plan

and producCon

  • SimulaCon before construcCon
  • Quality assurance through measurement
  • PotenCal for automaCon

14

slide-15
SLIDE 15

So)ware Engineering?

15

„The Establishment and use of sound engineering principles in order to obtain economically so4ware that is reliable and works efficiently on real machines.” [Bauer 1975, S. 524]

slide-16
SLIDE 16

16

slide-17
SLIDE 17

What happened with HealthCare.gov?

  • Poor team and process coordinaCon.
  • Changing requirements.
  • Inadequate quality assurance

infrastructure.

  • Architecture unsuited to the ulCmate

system load.

17

slide-18
SLIDE 18

Process

18

slide-19
SLIDE 19

How to develop so)ware?

  • 1. Discuss the so;ware that needs to be

wri]en

  • 2. Write some code
  • 3. Test the code to idenCfy the defects
  • 4. Debug to find causes of defects
  • 5. Fix the defects
  • 6. If not done, return to step 1

19

slide-20
SLIDE 20

So)ware Process

“The set of acCviCes and associated results that produce a so;ware product” What makes a good process?

20

Sommerville, SE, ed. 8

slide-21
SLIDE 21

21

Percent

  • f

Effort Time Project beginning Project end 100% 0%

slide-22
SLIDE 22

22

Percent

  • f

Effort Time Project beginning Project end 100% 0% Trashing / Rework ProducCve Coding

slide-23
SLIDE 23

23

Percent

  • f

Effort Time Project beginning Project end 100% 0% Trashing / Rework ProducCve Coding Process: Cost and Time esCmates, WriCng Requirements, Design, Change Management, Quality Assurance Plan, Development and IntegraCon Plan

slide-24
SLIDE 24

24

Percent

  • f

Effort Time Project beginning Project end 100% 0% ProducCve Coding Trashing / Rework Process

slide-25
SLIDE 25

25

Percent

  • f

Effort Time Project beginning Project end 100% 0% ProducCve Coding Process Trashing / Rework

slide-26
SLIDE 26

Example process issues

  • Change Control: Mid-project informal agreement to changes

suggested by customer or manager. Project scope expands 25-50%

  • Quality Assurance: Late detecCon of requirements and design
  • issues. Test-debug-reimplement cycle limits development of

new features. Release with known defects.

  • Defect Tracking: Bug reports collected informally, forgo]en
  • System IntegraCon: IntegraCon of independently developed

components at the very end of the project. Interfaces out of sync.

  • Source Code Control: Accidentally overwri]en changes, lost

work.

  • Scheduling: When project is behind, developers are asked

weekly for new esCmates.

26

slide-27
SLIDE 27

Process Costs

27

n(n − 1) / 2 communicaCon links

slide-28
SLIDE 28

Process Costs

28

slide-29
SLIDE 29

29

Large teams (29 people) create around six Cmes as many defects as small teams (3 people) and obviously burn through a lot more money. Yet, the large team appears to produce about the same mount of output in

  • nly an average of 12 days’ less Cme. This is

a truly astonishing finding, through it fits with my personal experience on projects

  • ver 35 years.
  • Phillip Amour, 2006, CACM 49:9
slide-30
SLIDE 30

Conway’s Law

30

“Any organizaCon that designs a system (defined broadly) will produce a design whose structure is a copy of the organizaCon's communicaCon structure.” — Mel Conway, 1967 “If you have four groups working on a compiler, you'll get a 4-pass compiler.”

slide-31
SLIDE 31

Module C Module A Module B

Congruence

31

slide-32
SLIDE 32

Microso)'s Small Team Prac0ces

  • Vision statement and milestones (2-4

month), no formal spec

  • Feature selecCon, prioriCzed by market,

assigned to milestones

  • Modular architecture

– Allows small federated teams (Conway's law)

  • Small teams of overlapping funcConal

specialists

32

Windows 95: 200 developers and testers, one of 250 products

slide-33
SLIDE 33

Microso)'s Small Team Prac0ces

  • Feature Team

– 3-8 developers (design, develop) – 3-8 testers (validaCon, verificaCon, usability, market analysis) – 1 program manager (vision, schedule communicaCon; leader, facilitator) – working on several features – 1 product manager (markeCng research, plan, betas)

33

slide-34
SLIDE 34

Microso)'s Small Team Prac0ces

  • "Synchronize and stabilize"
  • For each milestone

– 6-10 weeks feature development and conCnuous tesCng

  • frequent merges, daily builds

– 2-5 weeks integraCon and tesCng (“zero- bug release”, external betas ) – 2-5 weeks buffer

34

slide-35
SLIDE 35

Agile Prac0ces (e.g., Scrum)

  • 7+/-2 team members, collocated
  • self managing
  • Scrum master (potenCally shared among

2-3 teams)

  • Product owner / customer

representaCve

35

slide-36
SLIDE 36

Planning

36

slide-37
SLIDE 37

Measuring Progress?

  • “I’m almost done with the X. Component

A is almost fully implemented. Component B is finished except for the

  • ne stupid bug that someCmes crashes

the server. I only need to find the one stupid bug, but that can probably be done in an a;ernoon?”

37

slide-38
SLIDE 38

Almost Done Problem

  • Last 10% of work ->

40% of Cme (or 20/80)

  • Make progress

measureable

  • Avoid depending

enCrely on developer esCmaCons

38

Cme % completed 90% 100% reported progress planned actual

slide-39
SLIDE 39

Measuring Progress?

  • Developer judgment: x% done
  • Lines of code?
  • FuncConality?
  • Quality?

39

slide-40
SLIDE 40

Project Planning

40

IdenCfy constraints EsCmate project parameters Define milestones Create schedule acCviCes begin Check progress ReesCmate project parameter Refine schedule renegoCate constraints Technical review Problem? no yes Done? yes no Abort? Budget, Personal, Deadlines new feature requests

slide-41
SLIDE 41

Reasons for Missed Deadlines

  • Insufficient staff (illnesses, staff turnover, ...)
  • Insufficient qualiCcaCon
  • UnanCcipated difficulCes
  • UnrealisCc Cme esCmaCons
  • UnanCcipated dependencies
  • Changing requirements, addiConal requirements
  • Especially in student projects

– UnderesCmated Cme for learning technologies – Uneven work distribuCon – Last-minute panic.

41

slide-42
SLIDE 42

Recognize Scheduling Issues Early

  • Monitoring and formal reporCng

necessary

– Establish who, when, what – Compare planned/actual data

  • Measurable milestones
  • Outdated schedules no meaningful

management mechanism

42

slide-43
SLIDE 43

Team produc0vity

43

  • Brook's law: Adding people to a late

so;ware project makes it later.

slide-44
SLIDE 44

Es0ma0ng Effort

44

slide-45
SLIDE 45

45 π

slide-46
SLIDE 46

Task: Es0mate Time

  • A: Java version of the Monopoly boardgame

with Pi]sburgh street names

– (you)

  • B: Bank smartphone app

– (you with team of 4 developers, one experienced with iPhone apps, one with background in security)

  • EsCmate in 8h days (20 work days in a

month, 220 per year)

46

slide-47
SLIDE 47

Anda, Bente CD, Dag IK Sjøberg, and Audris Mockus. "Variability and reproducibility in so;ware engineering: A study of four companies that developed the same system." IEEE TransacLons on So4ware Engineering 35.3 (2009): 407-429. 47

slide-48
SLIDE 48

Development Process

48

slide-49
SLIDE 49

Risk and Uncertainty

49

slide-50
SLIDE 50

Innova0ve vs Rou0ne Projects

  • Most so;ware projects are innovaCve

– Google, Amazon, Ebay, Ne{lix – Vehicles and roboCcs – Language processing, Graphics

  • RouCne (now, not 10 years ago)

– E-commerce websites? – Many control systems? – RouCne gets automated -> innovaCon cycle

50

slide-51
SLIDE 51

Sources of Uncertainty

  • Unpredictable operaCng environment

– Cybersecurity threats, device drivers – UnanCcipated usage scenarios

  • Limited predicCve power of models

– HalCng, abstract interpretaCon, tesCng

  • Bounded raConality of humans

– Designers, developers – Customers, users

51

slide-52
SLIDE 52

Risk management

  • Key task of a project manager
  • IdenCfy and evaluate risks early
  • If necessary, plan miCgaCon strategies
  • Document results of risk analysis in project plan
  • Project risks: scheduling and resources

– e.g., staff illness/turnover

  • Product risks: Quality and funcConality of the product

– e.g. used component too slow

  • Business risks:

– e.g., compeCtor introduces similar product

52

slide-53
SLIDE 53

So)ware Architecture

53

slide-54
SLIDE 54

54

Requirements Miracle / genius developers ImplementaCon Architecture

slide-55
SLIDE 55

So)ware Architecture

"The so4ware architecture of a compuLng system is the set of structures needed to reason about the system, which comprise so4ware elements, relaLons among them, and properLes of both."

[Clements et al. 2010]

55

slide-56
SLIDE 56

Beyond func0onal correctness

  • Quality ma]ers, eg.,

– Availability – Modifiability, portability – Performance, scalability – Security – Testability – Usability – Cost to build, cost to operate

56

slide-57
SLIDE 57

Design vs. Architecture

Design QuesCons

  • How do I add a menu item in

Eclipse?

  • How can I make it easy to add

menu items in Eclipse?

  • What lock protects this data?
  • How does Google rank pages?
  • What encoder should I use for

secure communicaCon?

  • What is the interface between
  • bjects?

Architectural QuesCons

  • How do I extend Eclipse with a

plugin?

  • What threads exist and how do

they coordinate?

  • How does Google scale to billions
  • f hits per day?
  • Where should I put my firewalls?
  • What is the interface between

subsystems?

57

slide-58
SLIDE 58

Case Study: Architecture Changes at Twi`er

58

slide-59
SLIDE 59

59

slide-60
SLIDE 60

60

slide-61
SLIDE 61

Caching

61

slide-62
SLIDE 62

Decision to Rearchitect Twi`er

"A;er that experience, we determined we needed to step back. We then determined we needed to re-architect the site to support the conCnued growth

  • f Twi]er and to keep it running

smoothly."

62

slide-63
SLIDE 63

Redesign Goals

  • Improve median latency; lower outliers
  • Reduce number of machines 10x
  • Isolate failures
  • "We wanted cleaner boundaries with “related” logic

being in one place"

– encapsulaCon and modularity at the systems level (rather than at the class, module, or package level)

  • Quicker release of new features

– "run small and empowered engineering teams that could make local decisions and ship user-facing changes, independent of other teams"

63

slide-64
SLIDE 64

JVM vs Ruby VM

  • Rails servers capabile of 200-300

requests / sec / host

  • Experience with Scala on the JVM; level
  • f trust
  • Rewrite for JVM allowed 10-20k

requests / sec / host

64

slide-65
SLIDE 65

Programming Model

  • Ruby model: Concurrency at process level; request queued to

be handled by one process

  • Twi]er response aggregated from several services – addiCve

response Cmes

  • "As we started to decompose the system into services, each

team took slightly different approaches. For example, the failure semanLcs from clients to services didn’t interact well: we had no consistent back-pressure mechanism for servers to signal back to clients and we experienced “thundering herds” from clients aggressively retrying latent services."

  • Goal: Single and uniform way of thinking about concurrency

– Implemented in a library for RPC (Finagle), connecCon pooling, failover strategies and load balancing

65

slide-66
SLIDE 66

Independent Systems

  • " In our monolithic world, we either needed experts who

understood the enLre codebase or clear owners at the module or class level. Sadly, the codebase was geZng too large to have global experts and, in pracLce, having clear

  • wners at the module or class level wasn’t working. Our

codebase was becoming harder to maintain, and teams constantly spent Lme going on “archeology digs” to understand certain funcLonality. Or we’d organize “whale hunLng expediLons” to try to understand large scale failures that occurred."

  • From monolithic system to mulCple services

– Agree on RPC interfaces, develop system internals independently – Self-contained teams

66

slide-67
SLIDE 67

Storage

  • Single-master MySQL database bo]leneck despite more

modular code

  • Temporal clustering

– Short-term soluCon – Skewed load balance – One machine + replicaCons every 3 weeks

  • Move to distributed database

(Glizzard on MySQL) with "roughly sortable" ids

  • Stability over features –

using older MySQL version

67

slide-68
SLIDE 68

Data-Driven Decisions

  • Many small independent services,

number growing

  • Own dynamic analysis tool on top of RPC

framework

  • Framework to configure large numbers
  • f machines

– Including facility to expose feature to parts

  • f users only

68

slide-69
SLIDE 69

69

slide-70
SLIDE 70

On Saturday, August 3 in Japan, people watched an airing of Castle in the Sky, and at one moment they took to Twi]er so much that we hit a one- second peak of 143,199 Tweets per second.

70

slide-71
SLIDE 71

Outcome: Rearchitec0ng Twi`er

"This re-architecture has not only made the service more resilient when traffic spikes to record highs, but also provides a more flexible pla{orm on which to build more features faster, including synchronizing direct messages across devices, Twi]er cards that allow Tweets to become richer and contain more content, and a rich search experience that includes stories and users."

71

slide-72
SLIDE 72

Key Insights: Twi`er Case Study

  • Architectural decisions affect enCre

systems, not only individual modules

  • Abstract, different abstracCons for

different scenarios

  • Reason about quality a]ributes early
  • Make architectural decisions explicit

72

slide-73
SLIDE 73

Was the original architect wrong?

73

slide-74
SLIDE 74

How can I test my system with respect to desired quality a`ributes?

74

slide-75
SLIDE 75

Example: Scalability

75

Which QA strategy is suitable?

slide-76
SLIDE 76

Example: SQL Injec0on A`acks

76

h]p://xkcd.com/327/ Which QA strategy is suitable?

slide-77
SLIDE 77

Example: Usability

77

Which QA strategy is suitable?

slide-78
SLIDE 78

78

slide-79
SLIDE 79

QA Tradeoffs

  • Understand limitaCons of QA approaches

– e.g. tesCng vs staCc analysis, formal verificaCon vs inspecCon, …

  • Mix and match techniques
  • Different techniques for different

qualiCes

  • …When am I done?

79

slide-80
SLIDE 80

Quick aside on bug fixing and the tricky rela0onship between design, intent, implementa0on, and your cranky users…

80

slide-81
SLIDE 81

Race condi0ons

  • Races can occur when:

– MulCple threads of control access shared data – Data gets corrupted when internal integrity assumpCons are violated.

  • How we protect against races

– Use “lock” objects that enable access by one thread at a Cme

  • E.g., event dispatch
  • A language feature in Java, Ada95, etc.

– Follow a thread discipline in which only one thread can access criCcal data (Common in GUI APIs e.g., graphical toolkit redraw)

  • Issue: Basically the hardest bugs to find, fix, and protect

against.

– Why?

81

slide-82
SLIDE 82

java.io.BufferedInputStream

  • Buffering wrapper for unbuffered

stream input: read, close, reset, skip, mark, etc.

  • JDK < 1.2: Race condiCon between

methods read and close: interleaved execuCon could cause read to throw

NullPointerException

– But not always; concurrency à non- determinisCc!

  • JDK1.2 fixes by synchronize-ing

the methods, prevenCng close and read from interleaving.

82

(Aaron Greenhouse)

slide-83
SLIDE 83

Reac0on to bug fix

“This really sucks. Now just to convert

to [JDK 1.2] I’ve got to rewrite code that has worked since JDK 1.02… It’s pre`y

  • bvious that syncing close would break

things.” Comment in Bug ID #4225348:

“Aaempt to close while reading causes deadlock”

83

slide-84
SLIDE 84

Why was everyone so mad?

  • Java socket programming idiom that requires the ability to close mid-read:

“Hung” socket stream: Use separate thread to close and interrupt “hung” read or write

  • In other words: clients assumed read and close can interleave!

– Bug fix prevents interleaving. – Intent inferred — is it correct?

  • Design choices — What is/was the design intent?

– Interleaving intended — Fix race while allowing interleaving – Interleaving not intended — Provide alternaCve idiom to get the same effect.

  • What should the Java designers have done? What’s a good solu0on to

this problem? Whose fault was it?

84

slide-85
SLIDE 85

Upshot

  • Fix was undone in JDK1.3

– Re-enabled socket idiom. – Compromises safety of the class by re-enabling the race condiCon

  • BufferedInputStream was fixed to both prevent the race and

allow socket idiom for JDK 1.5

  • Issue #1 – Race condiCon in deployed producCon library code
  • Issue #2 – Lack of documentaCon of design intent with

respect to concurrency.

  • Moral: bugs are hard, and correctness depends on context

and user expecta0ons.

85

slide-86
SLIDE 86

Summary: take 15-313!

  • So;ware Engineering in pracCce requires

consideraCon of numerous issues---technical and social---above the level of individual class design/implementaCon.

  • Do you think this is interesCng? 15-313,

FoundaCons of So;ware Engineering is

  • ffered in the Fall.
  • And consider the undergraduate SE minor!

86