NEBC Database Course 2008 Implementing a Relational Database Tim - - PowerPoint PPT Presentation
NEBC Database Course 2008 Implementing a Relational Database Tim - - PowerPoint PPT Presentation
NEBC Database Course 2008 Implementing a Relational Database Tim Booth : tbooth@ceh.ac.uk Implementing a relational database establish requirements Data Requirements data analysis Conceptual Data Model database design Logical Schema
Implementing a relational database
establish requirements data analysis Conceptual Data Model database design Logical Schema Specification implement
schema and database
Data Requirements
Our design so far:
relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism relation Organism OrganismNo: Integer Species: String Strain: String GenomeSeq: Boolean CommonName: String primary key: OrganismNo relation Feature FeatID: String Name: String primary key: FeatID foreign key: SourceSequence references Sequence
Implementing our database
- Most of hard work is already done
- Create relations using SQL
- Define the data types for our columns
- Define primary and foreign keys
- Add constraints
- Add any appropriate default values
Our design so far:
relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName
Create Table
relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename , datatype , url );
Add data types
Reminder:
- Numerical
- integer,float,numeric
- String/Text
- varchar,text
- Date/Time
- timestamp,date
- Boolean
Add data types
relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) );
Primary Keys
relation PublicDatabase DatabaseName: String DataType: String URL: String primary key: DatabaseName CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200), primary key (databasename) );
Foreign Keys
relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(50), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase, foreign key (sourceorganism) references organism );
Foreign Keys
relation Sequence AccessionNumber: String ID: String primary key: AccessionNumber foreign key: SourceDatabase references PublicDatabase foreign key: SourceOrganism references Organism CREATE TABLE sequence ( accessionnumber varchar(50), id varchar(50), sourcedatabase varchar(30), sourceorganism integer, primary key (accessionnumber), foreign key (sourcedatabase) references publicdatabase(databasename), foreign key (sourceorganism) references organism(organismnumber) );
feature table
CREATE TABLE feature ( featid varchar(50), name varchar(100), sourcesequence varchar(50), primary key (featid), foreign key (sourcesequence) references sequence(accessionnumber) );
- rganism table
CREATE TABLE organism (
- rganismnumber integer
species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );
Constraints
- Constraints restrict the values that can be inserted or
updated in columns
- Types of constraints
- NOT NULL
- UNIQUE
- Simply add to column definition
url varchar(100) NOT NULL
- r
url varchar(100) UNIQUE
- NOT NULL and UNIQUE implicit on primary key
Constraints
- CHECK constraint
numberoflegs integer CHECK (numberoflegs>2)
publicdatabase table
CREATE TABLE publicdatabase ( databasename varchar(50), datatype varchar(20), url varchar(200) UNIQUE, primary key (databasename) );
Constraints
- To keep links between tables working you need to
preserve the matching values – referential integrity
- automatically set up when you declare the primary and
foreign keys
- This will prevent you from deleting a record with a
primary key before you have deleted all the child foreign key records
Constraints
- Example from 'BigHit' database
Constraints
- rganism table
CREATE TABLE organism (
- rganismnumber integer,
species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );
How shall we create the unique primary key values for
- rganismnumber?
Sequences
- Sequence is a database object in PostgreSQL which
gives you an automatically incrementing numeric value (equivalent to 'autonumber' in Access)
CREATE SEQUENCE my_seq (can specify increment,min and max) SELECT NEXTVAL('my_seq')
Default values
- Still don't want to have to select the value each time
- Can set a default value for column which is
automatically filled in every time a record is inserted
CREATE TABLE organism (
- rganismnumber integer DEFAULT NEXTVAL('my_seq'),
species varchar(100), strain varchar(100), genomeseq boolean, commonname varchar(100), primary key (organismnumber) );
Create your database
- To create your database run your SQL table and other
- bject creation statements in a single script
- Example - demodatabase.sql
- Be sure create tables in the right order
- Can't create table that refers to a primary key in a
table that doesn't exist yet
- You also need data...
Populate your database
- Insert data using INSERT sql statements
INSERT INTO organism (species,strain,genomeseq,commonname) VALUES ('Oryctolagus cuniculus',NULL,'false','rabbit');
- Default values will inserted automatically
CREATE TABLE organism (
- rganismnumber integer DEFAULT NEXTVAL('my_seq'),
...
Populate your database
- Be sure to insert data in correct order
- Don't try and insert a foreign key value when the
primary key value hasn't been inserted yet
- Run the demodatabase.sql script
Querying your database
- Now that your database is set up and data has been
inserted we can query it
Rabbit OXPKA
Rabbit phosphorylase
Nucleotide EMBL
TRIC_RABIT
Troponin I
protein Swissprot
rabbit PHS2_RABIT
Glycogen phosphorylase
Proten Swissprot
rabit 1ABB
Glycogen Phosphorylase
protein_structure
PDB
Rabbit Q8MJF7
pol protein
protein TrEMBL
Rabbit CK829726
Nucleotide dbEST
rabbit OCPHOS2
rabbit muscle phosphorylase mrna
Nucleotide EMBL
Trt3_rabit
Troponin T
Protein UniProt
Rabbit KPB1_Rabit
Phosphorylase B kinase alpha regulatory chain
Protein Swissprot
Organism Accession Number ID Sequence Type Database Name
Querying your database
Querying your database
Querying your database
Querying your database
- What have we gained?
- No data redundancy
- Data is consistent
- Enforced quality control – no missing data
- Only have to change data once
- Flexibility to run a variety of queries
Views
- Views are queries that are saved in the database as
- bjects
- Appear much like a table which can be queried in the
same way
- Good if underlying query is very complex
CREATE VIEW viewname AS query
Views
Views
Indexes
- Searching data by scanning is slow
- Indexes make this searching faster
- Implicit indexes are set up for primary keys as these are
used a lot for searching data
Indexes
- An index can be created on any column
CREATE INDEX orgname_idx on organism (commonname)
- An index is helpful on a column that is regularly
searched on (i.e. Used in the WHERE clause)
Index worked example
- There are more movies in the file
demodata/moremovies.csv
- These are already loaded into the database table
'demodata.moremovies'
- in PGAdmin3:
- INSERT INTO bighit.movie
(SELECT * FROM demodata.moremovies);
- SELECT * FROM movie where rating = 'U';
Index worked example
- Explain the query
- Now make an index:
- CREATE INDEX myindex ON movie (rating);
- Now explain the original query again
- This works for very complex queries!
Complex Query Analysis
The “MART” Strategy
- Normalisation is the process of removing data
redundancy from your database design
- But it adds complexity
- Views can make querying simpler
- Indexes can make querying faster
- But... Sometimes this is not enough. Maintaining a
summary table for quick querying is known as 'denormalisation'
- Many large databases (eg. EnsEMBL) resort to this
More features...
- stored procedures
- triggers
- cascading updates
- custom types
- custom functions
- extension modules
- load balancing
- replication
- ...