SLIDE 1 FastFunction: Replacing a herd
- f lemmings with a cheetah
A Ruby framework for interaction with PostgreSQL databases
Henrietta Dombrovskaya, Srivathsava Rangarajan, Jonathan Marks Enova Chicago, USA
SLIDE 2
This presentation is not about databases! We are online lenders, and all we care about is
$$MONEY$$
Actually… It’s all about application performance! What is this presentation about
SLIDE 3
Why we care about performance?
Time = Money! The problem
SLIDE 4
Nobody likes to wait!
1 sec page load slowdown => $1.6 billion lost sales/year Slowing search results by 0.4 sec – loss of 8 million searches per day
Money by the numbers
SLIDE 5
- 50% visitors abandon the site, which is not loaded within 3 sec
- 79% visitors will never return again
Of course, we sell money, and that makes a difference,
but…
- Will you wait for 30 sec for the registration to be completed?
- Especially when there are competitors around?
How long they can wait?
SLIDE 6
That’s what we have:
Our US master PG database runs on 80 thread processors 2.4GHz 512 Gb RAM – almost completely used by disk cache 1066MHz (responses from RAM are 0.9 ns) I/O 4Gb/sec with avg response time 3ms I/O utilization: 40%
Even with the best hardware available:
we can make it only twice faster Current cost: 20K (commodity) Next – 100K – somewhat faster (non-commodity) Next - 1,000K - twice faster (mainframe)
Hardware is not a problem!
SLIDE 7
Too many database calls per one action! Why there are so many? ORM (Active Record) results in ORIM
What is the reason for slowness?
Action Avg # db calls Max # db calls See amortization schedule 350 2,500 See my balances 84 2,500 See my payments 170 1,800 Perform application search 70 1,200 See account summary 80 790
SLIDE 8
We introduced the Logic Split methodology about three years ago as a solution to our performance problems (EDBT 2014):
Logic Split methodology
SLIDE 9
Logic Split Steps
üDisassemble üIdentify data retrieval üConstruct a single query üExecute üUse retrieved data in other steps
SLIDE 10 We’ve achieved amazing performance
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Customer Summary Loan Summary Loan Payments Installments
Avg DB time(sec)
Old Avg Time (sec) New Avg Time (sec)
SLIDE 11
But… look at the code!
SLIDE 12
This is way too complicated! Besides, it’s against the OO concept!
So the app developers say…
SLIDE 13
FastFunction
A Better Fit for Logic Split and Performance
ActiveRecord FastFunction General purpose Optimized by task Expressive query Single-purpose Builds SQL queries Calls stored procedures Familiar to developers Looks like ActiveRecord
SLIDE 14 ActiveRecord Generates SQL
14
Lemming.all SELECT * FROM lemmings
SLIDE 15 Adding Conditionals
15
Lemming.find_by(name: “Bob”, tag: 5) SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5
SLIDE 16 Adding Complexity
16
Lemming.find_by(name: “Bob”, tag: 5) .order_by(:age) SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5 ORDER BY age
SLIDE 17 SELECT * FROM lemmings WHERE name = ‘Bob’ AND tag = 5 ORDER BY age Execute the Query
17
Lemming.find_by(name: “Bob”, tag: 5) .order_by(:age) Generated Query Application Database Query Result
SLIDE 18 SELECT result.* FROM cheetahs_by_name_and_tag(?, ?) AS result FastFunction Minimal SQL
18
CheetahsByNameAndTag
SLIDE 19 SELECT result.* FROM cheetahs_by_name_and_tag(‘Bob’, 5) AS result
Database Knows Better
19
CheetahsByNameAndTag.execute(‘Bob’, 5) Templated Query Application Database Query Result
SLIDE 20 Alike Where It Counts…
20
…To the Application Programmer
SLIDE 21 ActiveRecord
class Peanut < ActiveRecord::Base # Convention identifies base table as “peanuts” end
Create a Model Class
21
FastFunction
class PopularPeanutsByBrand < FastFunction function = ‘popular_peanuts_by_brand(?)’ end
# Convention can give function name too, but not arg list
SLIDE 22 ActiveRecord
peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor
Invoke a Query Operation
22
FastFunction
peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor
SLIDE 23 ActiveRecord
peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor
Receive a List of Results…
23
FastFunction
peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor
SLIDE 24 ActiveRecord
peanuts = Peanuts.find_by(brand: ‘Planters’).order_by(:rating) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor
…As Native Objects
24
FastFunction
peanuts = PopularPeanutsByBrand.execute(‘Planters’) most_popular = peanuts.first puts “Best Planters peanuts: “ + most_popular.flavor
SLIDE 25 …With Discovered Attributes
25
ActiveRecord
CREATE TABLE peanuts ( brand varchar(50), flavor varchar(50), salty boolean );
FastFunction
CREATE TYPE peanut ( brand varchar(50), flavor varchar(50), salty boolean ); peanut.brand peanut.flavor peanut.salty
Ruby Object
SLIDE 26 In the “fast function”!
CREATE FUNCTION popular_peanuts_by_brand(brand VARCHAR) RETURNS SETOF peanut AS $$ SELECT (brand, flavor, salty)::peanut FROM black_magic INNER JOIN dark_wizardry WHERE db_knows_better_than_app $$ LANGUAGE ‘sql’;
So Where Is The SQL?
26
SLIDE 27 FastFunction …
27
- Data retrieval only
- Provide familiar object-oriented
data access pattern
- Facilitate custom access strategy
- Need SQL type and expertise
- Run queries AR can’t write
- Allow results not stored in any
- ne table
- Encapsulate queries better
maintained in database
Does
- Replace ActiveRecord
- Interact directly with
ActiveRecord
- Insert, Update, nor Delete
- Implement associations
- utside the stored procedure
- Need a base table
- Perform operations outside
the stored procedure
Doesn’t
SLIDE 28 Case Study: Account Presenter
28
Account Government Identifiers Phone Numbers Addresses Spouse details account.home_address account.work_address account.alt_address account.state_id account.driving_license account.passport
SLIDE 29
Grouping Access Simply put, the problem is one of grouping access to the database and tables.
class Account def home_address; end def work_address; end def all_addresses [home_address] + [work_address] end def first_name; end def language; end end
SLIDE 30
Design Constraints # tables: 70-80 # fields: 230+
1) optimized, robust join strategy geqo collapse_limits 2) readable and maintainable UDF modularized performant 3) dynamically mapped return type application seamlessly maps updated types
SLIDE 31
Case Study: Data Flow
Account Addresses Personal Spouse home_address first_name language work home language first_name addresses people
SLIDE 32
Common Table Expressions (CTEs) Similar to derived table.
WITH address AS ( SELECT $, * FROM addresses ... WHERE account_id = $ GROUP BY type ), spouse_details AS ( SELECT $, * FROM spouses ... WHERE account_id = $ ), account_details AS ( SELECT $, * FROM accounts ... WHERE account_id = $ ) ...
SLIDE 33 Testing Methodology Split into correctness and performance testing
- A/B test on 30,000 randomly chosen accounts
- On same production environment
- Spread across multiple parts of multiple days
- Warm and cold cache
- Split load into 4 groups:
20 40 60 80 100 120 1 2 3 4 query count Query Group
Query Groups
average_query_count average_normalized_query_count
SLIDE 34 Results-1 Current vs. FastFunction-optimized fetch times
50 100 150 200 250 300 1 2 3 4 Time(ms) Query Group
Current, Optimized data fetch vs. Time
50 100 150 200 250 300 1 2 3 4 Time(ms) Query Group
Current, Optimized cached data fetch vs. Time current_fetch_time(avg)
SLIDE 35 Results-2 FastFunction data fetch breakdown
10 20 30 40 50 60
- ptimized_uncached_time_split
- ptimized_cached_time_split
Time(ms)
Optimized (uncached, cached) data fetch breakdown
average_data_pruning_time average_misc_query_time average_hash_structuring_overhead average_mapper_marshalling_overhead average_mapper_call_setup_overhead average_function_call_time
SLIDE 36 The ORIM is not purely technical:
- Ireland et al: conceptual and psychological aspects
- Agile Data essay: cultural impedance mismatch
ORMs which allow to embed SQL:
- Hibernate
- LINQ
- SQLAlchemy
Identify delinquent code patterns:
Holistic optimization:
Related work
SLIDE 37
- Spreading awareness
- Adding new features, making the FastFunction
looking more like another ORM
- Creating a whole database API layer
Future work