NEBC Database Course 2008 Implementing a Relational Database Tim - - PowerPoint PPT Presentation

nebc database course 2008 implementing a relational
SMART_READER_LITE
LIVE PREVIEW

NEBC Database Course 2008 Implementing a Relational Database Tim - - PowerPoint PPT Presentation

NEBC Database Course 2008 Implementing a Relational Database Tim Booth : tbooth@ceh.ac.uk Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema


slide-1
SLIDE 1

NEBC Database Course 2008 Implementing a Relational Database

Tim Booth : tbooth@ceh.ac.uk

slide-2
SLIDE 2

Implementing a relational database

establish requirements data analysis Conceptual Data Model database design Logical Schema Specification implement

schema and database

Data Requirements

slide-3
SLIDE 3

Our design so far:

relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: String Name: String primary key: FeatID foreign key: SourceSequence references Sequence

slide-4
SLIDE 4

Implementing our database

  • Most of hard work is already done
  • Create relations using SQL
  • Define the data types for our columns
  • Define primary and foreign keys
  • Add constraints
  • Add any appropriate default values
slide-5
SLIDE 5

Our design so far:

relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName

slide-6
SLIDE 6

Create Table

relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename , datatype , url );

slide-7
SLIDE 7

Add data types

Reminder:

  • Numerical
  • integer,float,numeric
  • String/Text
  • varchar,text
  • Date/Time
  • timestamp,date
  • Boolean
slide-8
SLIDE 8

Add data types

relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) );

slide-9
SLIDE 9

Primary Keys

relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200), primary key (databasename) );

slide-10
SLIDE 10

Foreign Keys

relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(50), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase, foreign key (sourceorganism) references organism );

slide-11
SLIDE 11

Foreign Keys

relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase(databasename), foreign key (sourceorganism) references organism(organismnumber) );

slide-12
SLIDE 12

feature table

CREATE TABLE feature ( featid varchar(50), name varchar(100), sourcesequence varchar(50), primary key (featid), foreign key (sourcesequence) references sequence(accessionnumber) );

slide-13
SLIDE 13
  • rganism table

CREATE TABLE organism (

  • rganismnumber integer

species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );

slide-14
SLIDE 14

Constraints

  • Constraints restrict the values that can be inserted or

updated in columns

  • Types of constraints
  • NOT NULL
  • UNIQUE
  • Simply add to column definition

url varchar(100) NOT NULL

  • r

url varchar(100) UNIQUE

  • NOT NULL and UNIQUE implicit on primary key
slide-15
SLIDE 15

Constraints

  • CHECK constraint

numberoflegs integer CHECK (numberoflegs>2)

slide-16
SLIDE 16

publicdatabase table

CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) UNIQUE, primary key (databasename) );

slide-17
SLIDE 17

Constraints

  • To keep links between tables working you need to

preserve the matching values – referential integrity

  • automatically set up when you declare the primary and

foreign keys

  • This will prevent you from deleting a record with a

primary key before you have deleted all the child foreign key records

slide-18
SLIDE 18

Constraints

  • Example from 'BigHit' database
slide-19
SLIDE 19

Constraints

slide-20
SLIDE 20
  • rganism table

CREATE TABLE organism (

  • rganismnumber integer,

species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );

How shall we create the unique primary key values for

  • rganismnumber?
slide-21
SLIDE 21

Sequences

  • Sequence is a database object in PostgreSQL which

gives you an automatically incrementing numeric value (equivalent to 'autonumber' in Access)

CREATE SEQUENCE my_seq (can specify increment,min and max) SELECT NEXTVAL('my_seq')

slide-22
SLIDE 22

Default values

  • Still don't want to have to select the value each time
  • Can set a default value for column which is

automatically filled in every time a record is inserted

CREATE TABLE organism (

  • rganismnumber integer DEFAULT NEXTVAL('my_seq'),

species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );

slide-23
SLIDE 23

Create your database

  • To create your database run your SQL table and other
  • bject creation statements in a single script
  • Example - demodatabase.sql
  • Be sure create tables in the right order
  • Can't create table that refers to a primary key in a

table that doesn't exist yet

  • You also need data...
slide-24
SLIDE 24

Populate your database

  • Insert data using INSERT sql statements

INSERT INTO organism (species,strain,genomeseq,commonname) VALUES ('Oryctolagus cuniculus',NULL,'false','rabbit');

  • Default values will inserted automatically

CREATE TABLE organism (

  • rganismnumber integer DEFAULT NEXTVAL('my_seq'),

...

slide-25
SLIDE 25

Populate your database

  • Be sure to insert data in correct order
  • Don't try and insert a foreign key value when the

primary key value hasn't been inserted yet

  • Run the demodatabase.sql script
slide-26
SLIDE 26

Querying your database

  • Now that your database is set up and data has been

inserted we can query it

slide-27
SLIDE 27

Rabbit OXPKA

Rabbit phosphorylase

Nucleotide EMBL

TRIC_RABIT

Troponin I

protein Swissprot

rabbit PHS2_RABIT

Glycogen phosphorylase

Proten Swissprot

rabit 1ABB

Glycogen Phosphorylase

protein_structure

PDB

Rabbit Q8MJF7

pol protein

protein TrEMBL

Rabbit CK829726

Nucleotide dbEST

rabbit OCPHOS2

rabbit muscle phosphorylase mrna

Nucleotide EMBL

Trt3_rabit

Troponin T

Protein UniProt

Rabbit KPB1_Rabit

Phosphorylase B kinase alpha regulatory chain

Protein Swissprot

Organism Accession Number ID Sequence Type Database Name

slide-28
SLIDE 28

Querying your database

slide-29
SLIDE 29

Querying your database

slide-30
SLIDE 30

Querying your database

slide-31
SLIDE 31

Querying your database

  • What have we gained?
  • No data redundancy
  • Data is consistent
  • Enforced quality control – no missing data
  • Only have to change data once
  • Flexibility to run a variety of queries
slide-32
SLIDE 32

Views

  • Views are queries that are saved in the database as
  • bjects
  • Appear much like a table which can be queried in the

same way

  • Good if underlying query is very complex

CREATE VIEW viewname AS query

slide-33
SLIDE 33

Views

slide-34
SLIDE 34

Views

slide-35
SLIDE 35

Indexes

  • Searching data by scanning is slow
  • Indexes make this searching faster
  • Implicit indexes are set up for primary keys as these are

used a lot for searching data

slide-36
SLIDE 36

Indexes

  • An index can be created on any column

CREATE INDEX orgname_idx on organism (commonname)

  • An index is helpful on a column that is regularly

searched on (i.e. Used in the WHERE clause)

slide-37
SLIDE 37

Index worked example

  • There are more movies in the file

demodata/moremovies.csv

  • These are already loaded into the database table

'demodata.moremovies'

  • in PGAdmin3:
  • INSERT INTO bighit.movie

(SELECT * FROM demodata.moremovies);

  • SELECT * FROM movie where rating = 'U';
slide-38
SLIDE 38

Index worked example

  • Explain the query
  • Now make an index:
  • CREATE INDEX myindex ON movie (rating);
  • Now explain the original query again
  • This works for very complex queries!
slide-39
SLIDE 39

Complex Query Analysis

slide-40
SLIDE 40

The “MART” Strategy

  • Normalisation is the process of removing data

redundancy from your database design

  • But it adds complexity
  • Views can make querying simpler
  • Indexes can make querying faster
  • But... Sometimes this is not enough. Maintaining a

summary table for quick querying is known as 'denormalisation'

  • Many large databases (eg. EnsEMBL) resort to this
slide-41
SLIDE 41

More features...

  • stored procedures
  • triggers
  • cascading updates
  • custom types
  • custom functions
  • extension modules
  • load balancing
  • replication
  • ...