Database/Fundamentals

Databases and SQL for Data Science with Python (2/2) (from Coursera)

metamong 2022. 4. 19.

* 저번 Database & SQl posting에 이어서..! 내용 계속 진행🎉

 

Databases and SQL for Data Science with Python (1/2) (from Coursera)

1) Getting Started with SQL Q1) What is SQL? (Structured English Query Language) = a language used for relational databases to query or get data out of a database - 더 자세한 설명은 하단 포스팅 참..

sh-avid-learner.tistory.com

4) Accessing Databases using Python

 

 

SQL API consists of library function calls as an application programming interface

 

 

* DB-API = Python's standard API for accessing relational databases

 

- easy to implement & understand

- API has been 

 

 

 

 

 

 

 

* Using SQLite 3 with Jupyter Notebook (fundamentals)

 

- connecting the db and making a cursor

 

import sqlite3
conn = sqlite3.connect('filename.sqlite')
cur = conn.cursor()

 

- CREATE TABLE -

(before creating a table make sure use a DROP statement if it already exists)

 

cur.execute('DROP TABLE INSTRUCTOR')
cur.execute('CREATE TABLE INSTRUCTOR(id INTEGER PRIMARY KEY NOT NULL, FNAME VARCHAR(20), LNAME VARCHAR(20), CITY VARCHAR(20), CCODE CHAR(2))')
conn.commit()

 

 

- INSERT INTO ~ VALUES -

 

cur.execute('INSERT INTO INSTRUCTOR VALUES (1, "Rav", "Ahuja", "TORONTO", "CA")')
cur.execute('INSERT INTO INSTRUCTOR VALUES (2, "Raul", "Chong", "Markham", "CA"), (3, "Hima", "Vasudevan", "Chicago", "US")')
conn.commit()

 

 

- Query Data -

 

cur.execute('SELECT * FROM INSTRUCTOR')
for row in cur:
    print(row)

 

 

- Retrieve Data into Pandas -

 

import pandas
selectQuery = "select * from INSTRUCTOR"
pdf = pandas.read_sql(selectQuery, conn)
pdf

 

 

- Close the connection -

= We free all resources by closing the connection. Remember that it is always important to close connections so that we can avoid unused connections taking up resources

 

cur.close()

 

5) Advanced SQL for Data Engineer

 

* Views *

= an alternative way of representing data that exists in one or more tables or views

- can include specified columns from multiple base tables and existing views

- once queried, can be queried like a table

- only the definition of the view is stored, not the data

(the data that the view represents is stored only in the base tables, not by the view itself)

- show a selection of data for a give table (can omit some sensitive data)

- combine two or more tables in meaningful ways

- simplify access to data (without accesing main table)

 

- CREATE VIEW -

 

CREATE VIEW <view name> (<column_alias_1>, <column_alias_2>, ... <column_alias_n>)
AS SELECT <column_1>, <column_2>, ... <column_n>
FROM <table_name>
WHERE <predicate>; #optional

 

 

- REPLACE VIEW -

 

CREATE OR REPLACE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

 

 

- DROP VIEW -

 

DROP VIEW <view_name>;

 

* Stored Procedures *

= a set of SQL statements stored and executed on the database server

- write in many different languages

- instead of sending multiple SQL statements from the client to server, encapsulate them in a stored procedure on a server and send only one statement

- accept in the form of parameter

- return results to the client

- reduction in network traffic & improvement in performance

- reuse of code

- increase in security (not need to expose data, server-side logic to validate data)

- can call from external app / dynamic SQL statements

 

- example of Stored Procedures -

 

Q) (from previous Schools in Chicago data) Write the structure of a query to create or replace a stored procedure called UPDATE_LEADERS_SCORE that takes a in_School_ID parameter as an integer and a in_Leader_Score parameter as an integer.

Inside stored procedure, write a SQL statement to update the Leaders_Score field in the CHICAGO_PUBLIC_SCHOOLS table for the school identified by in_School_ID to the value in the in_Leader_Score parameter. Inside stored procedure, write a SQL IF statement to update the Leaders_Icon field in the CHICAGO_PUBLIC_SCHOOLS table for the school identified by in_School_ID using the following info.

 

 

ans)

--#SET TERMINATOR@
CREATE OR REPLACE PROCEDURE UPDATE_LEADERS_SCORE(IN in_School_ID INTEGER, IN in_Leader_Score INTEGER)
LANGUAGE SQL
BEGIN
	UPDATE SCHOOLS
	SET "Leaders_Score" = in_Leader_Score
	WHERE "School_ID" = in_School_ID;
	
	IF in_Leader_Score > 0 AND in_Leader_Score < 20 
		THEN
			UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Very Weak';
	    ELSEIF in_Leader_Score < 40 THEN
	       	UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Weak';	
	    ELSEIF in_Leader_Score < 60 THEN
	       	UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Average';
	    ELSEIF in_Leader_Score < 80 THEN
	       	UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Strong';
	    ELSEIF in_Leader_Score < 100 THEN
	       	UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Very Strong';
	   	END IF;
END@

 

* ACID Transactions *

# transaction = an indivisible unit of work

- consists of one or more SQL statements

- either all happens or none

 

#ACID 

- ATOMIC (all changes must be performed successfully or not at all)

- CONSISTENT (data must be in a consistent state before and after transaction)

- ISOLATED (no other process can change the data while the transaction is running)

DURABLE (the changes made by the transaction must persist)

 

** use the EXEC SQL command to execute SQL statements from code

(ex) EXEC SQL COMMIT WORK; EXEC SQL ROLLBACK WORK;)

 

→ commit) When a transaction completes normally, a transaction processing system commits the changes made to the data; that is, it makes them permanent and visible to other transactions

→ rollback) When a transaction does not complete normally, the system rolls back (or backs out) the changes; that is, it restores the data to its last consistent state (recoverable)

 

Q) Update your stored procedure definition. Add a generic ELSE clause to the IF statement that rolls back the current work if the score did not fit any of the preceding categories. Add a statement to commit the current unit of work at the end of the procedure

 

--#SET TERMINATOR@
CREATE OR PROCEDURE UPDATE_LEADERS_SCORE(IN in_School_ID INTEGER, IN in_Leader_Score INTEGER)
LANGUAGE SQL
BEGIN
	UPDATE SCHOOLS
	SET "Leaders_Score" = in_Leader_Score
	WHERE "School_ID" = in_School_ID;
	
	IF in_Leader_Score > 0 AND in_Leader_Score < 20 
		THEN
			UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Very Weak';
	    ELSEIF in_Leader_Score < 40 THEN
	       	UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Weak';	
	    ELSEIF in_Leader_Score < 60 THEN
	       	UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Average';
	    ELSEIF in_Leader_Score < 80 THEN
	       	UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Strong';
	    ELSEIF in_Leader_Score < 100 THEN
	       	UPDATE CHICAGO_PUBLIC_SCHOOLS
				SET "Leaders_Icon" = 'Very Strong';
				
	ELSE
		ROLLBACK WORK;
		
		END IF;
		COMMIT WORK;
END@

 

Q. stored procedures vs. transactions?

A.

transaction은 작업의 한 unit으로써, 한번에 처리되어야 할 연산의 단위이다.

이런 여러 개의 transaction을 모아서 일종의 module로 모아놓은 것이 stored procedure.

예를 들면 금융권에서 돈을 가져오고 체크하는 일종의 한 개의 '프로시저'를 만들 수 있고,

이 '프로시저' 내부에는 여러 개의 transaction이 저장되어 있음.

database에 저장되어 있는 stored proecdure를 사용해 효율적이게 한 번에 작업 실행 가능

 

* JOIN Statements *

 

# Join Operator = combine rows from two or more tables (based on a relationship)

- the tables that are joined are related to the primary & foreign keys

→ primary key) uniquely identifies each row in a table

→ foreign key) refers to a primary key of another table

 

# inner join
- displays matches only

 

 

- the rows are displayed if the borrower matches

- if the key columns in both tables have the same name, need to explicitly reference each using a table name or alias

 

# outer join

- unlike inner joins, outer joins also return the rows that do not have a match between the tables

- Left outer join) all rows from the left table & any matching rows from the right table

- RIght outer join) all rows from the right table & any matching rows from the left table

- Full outer join) all rows from both tables

 

- (하단) BORROWER is the left table, LOAN is the right table -

 

(same goes for the RIGHT JOIN & FULL JOIN (in the sql statement))

 

# CROSS JOIN
SELECT column_name(s)
FROM table1
CROSS JOIN table2;

# INNER JOIN
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
WHERE condition;

# LEFT OUTER JOIN
SELECT column_name(s)
FROM table1
LEFT OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;

# RIGHT OUTER JOIN
SELECT column_name(s)
FROM table1
RIGHT OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;

# FULL OUTER JOIN
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;

# SELF JOIN (to compare rows within the same table)
SELECT column_name(s)
FROM table1 T1, table1 T2
WHERE condition;

ex)

 

Q) Select the names and job start dates of all employees who work for the department number 5

INNER JOIN by TWO tables

 

SELECT E.F_NAME, E.L_NAME, JH.START_DATE
FROM EMPLOYEES AS e
INNER JOIN JOB_HISTORY AS JH ON E.EMP_ID = JH.EMPL_ID
WHERE E.DEP_ID = '5';

 

Q) Select the names, job start dates, and job titles of all employees who work for the department number 5

INNER JOIN by THREE tables (use INNER JOIN for second times)

 

SELECT E.F_NAME, E.L_NAME, JH.START_DATE, J.JOB_TITLE
FROM EMPLOYEES as E
INNER JOIN JOB_HISTORY as JH on E.EMP_ID = JH.EMPL_ID
INNER JOIN JOBS as J on E.JOB_ID = J.JOB_IDENT
WHERE E.DEP_ID = '5';

 

Q) Perform a Left Outer Join on the EMPLOYEES and DEPARTMENT tables and select employee id, last name, department id and department name for all employees.

 

SELECT E.EMP_ID, E.L_NAME, E.DEP_ID, D.DEP_NAME
FROM EMPLOYEES as E
LEFT OUTER JOIN DEPARTMENTS as D on E.DEP_ID = D.DEP_ID_DEP;

 

 

Q) Re-write the previous query but limit the result set to include only the rows for employees born before 1980

 

SELECT E.EMP_ID, E.L_NAME, E.DEP_ID, D.DEP_NAME
FROM EMPLOYEES as E
LEFT OUTER JOIN DEPARTMENTS as D on E.DEP_ID = D.DEP_ID_DEP
WHERE YEAR(E.B_DATE) < 1980;

 

 

* Q) Re-write the previous query but have the result set include all the employees 

but department names for only the employees who were born before 1980

 

(JOINING table has a condition - use AND clause after ON)

 

SELECT E.EMP_ID, E.L_NAME, E.DEP_ID, D.DEP_NAME
FROM EMPLOYEES as E
LEFT OUTER JOIN DEPARTMENTS as D on E.DEP_ID = D.DEP_ID_DEP
AND YEAR(E.B_DATE) < 1980;

 

Q) Perform a Full Join on the EMPLOYEES and DEPARTMENT tables and select the First name, Last name and Department name of all employees

 

SELECT E.F_NAME, E.L_NAME, D.DEP_NAME
FROM EMPLOYEES AS E
FULL OUTER JOIN DEPARTMENTS AS D ON E.DEP_ID = D.DEPT_ID_DEP;

 

 

Q) Re-write the previous query but have the result set include all employee names

but department id and department names only for male employees

 

SELECT E.F_NAME, E.L_NAME, D.DEP_NAME
FROM EMPLOYEES AS E
FULL OUTER JOIN DEPARTMENTS AS D ON E.DEP_ID = D.DEPT_ID_DEP AND E.SEX = 'M';

 

Q) List the case number, type of crime and community area for all crimes in community area number 18

INNER JOIN

 

cur.execute('SELECT CR.CASE_NUMBER, CR.PRIMARY_TYPE, CE.COMMUNITY_AREA_NUMBER FROM Crime CR INNER JOIN Census CE ON CR.COMMUNITY_AREA_NUMBER = CE.COMMUNITY_AREA_NUMBER WHERE CR.COMMUNITY_AREA_NUMBER = 18')
for row in cur:
    print(row)
# what we want from ON = community_area_number is same as what we want from WHERE = community_area_number

 

(※ON Clause's column is the same from what we want from WHERE Clause - use INNER JOIN)

ans) ('JA560123', 'CRIMINAL DAMAGE', 18.0) ('JA107722', 'OTHER OFFENSE', 18.0)

 

Q) List all crimes that took place at a school. Include case number, crime type and community name

 

cur.execute('SELECT CR.CASE_NUMBER, CR.PRIMARY_TYPE, CE.COMMUNITY_AREA_NAME FROM Crime CR LEFT OUTER JOIN Census CE ON CR.COMMUNITY_AREA_NUMBER = CE.COMMUNITY_AREA_NUMBER WHERE CR.LOCATION_DESCRIPTION LIKE "%SCHOOL%"')
for row in cur:
    print(row)

 

(※ON Clause's column is different from what we want from WHERE Clause - have to use LEFT(RIGHT) OUTER JOIN)

(즉 ON과 WHERE절의 column 이름이 서로 같은 지, 다른 지에 따라 INNER, OUTER JOIN 종류가 달라짐)

 

( *SQLite doesn't support RIGHT / FULL OUTER JOIN - only use LEFT OUTER JOIN )

(RIGHT/FULL OUTER JOIN을 쓰려면 table 순서를 바꾸면 됨)

 

ans) ('HL353697', 'BATTERY', 'South Shore') ('HL725506', 'BATTERY', 'Lincoln Square') ('HP716225', 'BATTERY', 'Douglas') ('HH639427', 'BATTERY', 'Austin') ('JA460432', 'BATTERY', 'Ashburn') ('HS200939', 'CRIMINAL DAMAGE', 'Austin') ('HK577020', 'NARCOTICS', 'Rogers Park') ('HS305355', 'NARCOTICS', 'Brighton Park') ('HT315369', 'ASSAULT', 'East Garfield Park') ('HR585012', 'CRIMINAL TRESPASS', 'Ashburn') ('HH292682', 'PUBLIC PEACE VIOLATION', None) ('G635735', 'PUBLIC PEACE VIOLATION', None)

 

Q) For the communities of Oakland, Armour Square, Edgewater and CHICAGO

list the associated community_area_numbers and the case_numbers

 

cur.execute('SELECT CE.COMMUNITY_AREA_NUMBER,CR.CASE_NUMBER FROM Census CE LEFT OUTER JOIN Crime CR ON CE.COMMUNITY_AREA_NUMBER = CR.COMMUNITY_AREA_NUMBER WHERE CE.COMMUNITY_AREA_NAME IN ("Oakland", "Armour Square", "Edgewater", "CHICAGO")')
for row in cur:
    print(row)

 

ans) (34.0, None) (36.0, None) (77.0, 'HM392612') (77.0, 'HV537004') (None, None)

 

Q) Write and execute a SQL query to list the school names, community names and average attendance

for communities with a hardship index of 98

 

cur.execute('SELECT S.NAME_OF_SCHOOL, S.COMMUNITY_AREA_NAME, S.AVERAGE_STUDENT_ATTENDANCE \
FROM Schools S LEFT OUTER JOIN Census C ON S.COMMUNITY_AREA_NUMBER = C.COMMUNITY_AREA_NUMBER WHERE C.HARDSHIP_INDEX = 98.0')
for row in cur:
    print(row)

 

ans) ('George Washington Carver Military Academy High School', 'RIVERDALE', '91.60%') ('George Washington Carver Primary School', 'RIVERDALE', '90.90%') ('Ira F Aldridge Elementary School', 'RIVERDALE', '92.90%') ('William E B Dubois Elementary School', 'RIVERDALE', '93.30%')


* 내용 일부 출처) https://velog.io/@emawlrdl/%EC%98%A4%EB%9D%BC%ED%81%B4-%EB%B0%B0%EC%B9%98Batch

* 출처) Databases and SQL for Data Science with Python (Coursera)

 

댓글