Databases: SQL & MySQL for Beginners (step-by-step guide)

Faraz Hanif

17 min readFeb 24, 2019

Overview:

Part I | Background & Introduction

Part II | Set-up of MySQL and PopSQL

Part III | Core fundamentals of SQL

Part IV | Advance concepts of SQL

Part V | Conclusion

Part I | Background & Introduction

Background

Today, the topic of discussion is SQL and MySQL. Before we get too techy, the primary reason for creating this article is to provide a step-by-step guide and an overview for current Bootcamp students, recent Bootcamp graduates like myself and others who are curious about databases and are eager to learn SQL. Due to limited time and a heavy workload during Bootcamps, students do not have adequate time to learn SQL or learn about databases in depth. The secondary reason why this post is created is to see how databases fit in the web development ecosystem.

Introduction

The three tech words that stand out from the paragraph above are databases, SQL and MySQL. Let’s explain what each one is. A database is a collection of organized data that can be accessed, managed and updated. With a database, CRUD operations are available to us just like with a RESTful API. Creating, Reading, Updating or Deleting data is therefore possible in databases. Databases are essential to every app since that’s where all the data gets stored from users' information to information about any other entity that exists in the app. SQL stands for Structured Query Language. SQL is used to communicate with databases and is the standard language of relational database management systems (RDBMS). SQL is written so it can be understood by an RDBMS, a software that understands SQL and in return communicates with the server and returns what was asked for via SQL statements called queries. Some common RDBMS are Oracle, Sybase, and MySQL. These various RDBMS are just like different programming languages, where the logic is the same, but the syntax or the implementation of each differs slightly from one to the other. There are two types of databases: Relational and Non-relational. In layman terms, relational is in the form of tables where non-relational is any form that’s not a table like a document-based database, a graph and many more. Today, the focus will be on MySQL, a relational database.

Part II | Set-up of MySQL and PopSQL

The best way to learn in programming is by doing. So, without further ado, let’s install MySQL and start learning SQL.

Installation of MySQL

Go to: https://dev.mysql.com/downloads/mysql/
For MacOS, download the DMG Archive version and for Windows, download the ZIP Archive version (MacOS will be used in today’s example).
Follow the instructions in order to complete the installation. When prompted to create a password for the root user, just create one that you will remember since it will be needed later.
Next, head to the terminal, type in “mysql -u root -p” without the quotes. Then, you will enter the password you created in Step 3.
Some of the common errors faced are:

solution: This error means the password created in Step 3 is not being used. Repeat Step 4. If a similar error is seen but instead of (using password: NO), it’s (using password: YES), this means the password was entered incorrectly. Repeat Step 4.

solution: This error means that the MySQL server is not on. Let’s fix that now, go to system preferences, click on MySQL application and turn the server is ON.

If any other error is encountered, just google it.

6. Now, MySQL server is up and we can start by creating a database in the terminal.

*name represents the name of the database. Naming is subjective. Make sure to put that semicolon to denote the end of a query.

PopSQL

The reason for using PopSQL is because it’s a text editor for SQL and it connects to the MySQL server, so the same results are obtained as using MySQL in a terminal when querying. Being able to run multiple queries and seeing how the behavior of one differs from another is what was most helpful using this tool. The user’s visual experience as a learner of SQL enhanced greatly using PopSQL .

*Each query is separated using semicolons, the green line to the left of the query indicates which query is being run. Running another query is as easy as clicking on a separate query instead of typing every query over and over again in the terminal.

Installation of PopSQL

Go to: https://popsql.com/
Download.
Once the download has been completed, sign in with Google or create an account.
Once signed in, open the app.
Next, let’s connect the database that was created earlier. In this example, it’s name. When prompted, nickname can be anything you please, hostname is localhost, port is 3306, database is name, username is root and the password is the same password created during the installation of MySQL.
Press Connect and now PopSQL is connected to the database, in this example, name.

7. Protip: In the app, for better visualization of data, select the middle icon from the three icons on the top right corner of the app so the results will be shown below the queries once the RUN button is clicked.

Hopefully, it makes a bit more sense of what SQL and MySQL are at this point of our journey at a higher level. This concludes the setup portion of the guide. Let’s dive into SQL now.

Part III | Core Fundamentals of SQL

*SQL working with an RDBMS, Microsoft SQL.

Student

For this section of the guide, a single table, student table (left) will be used to show the behavior of different basic SQL commands. Okay, so far, we have a database, name, but there are no tables in it currently. Let’s change that now.

Creating a Table

*running this query creates a student table in the database.

Primary Keys

A primary key is a column on the table that uniquely identifies each row in the table. For example,

*one way to denote which column is the primary key, making sure reasonable length of VARCHAR is allocated and definitions of columns are separated with a comma.

*another way to denote which column is the primary key.

After creating the table, running the query below confirms that the table has been created correctly

*This query returns the schema of the given table.

The result is:

The field column represents different columns of the student table (see Student table above). The type column shows the expected data types for each column in the student table. Some common data types for columns are:

INT, VARCHAR(), DECIMAL(M,N), BLOB, DATE, and TIMESTAMP

INT simply means the values of the column will be integers. VARCHAR(20) means that the values for this column will be a string with a maximum string length of 20. Choosing this number wisely can save some space in the database. Let me explain. Imagine, we have a name column and instead of VARCHAR(20), what if we had VARCHAR(200). This means that for each row of the name column, the maximum length of 200 is being saved, even though not too many names have character length anywhere close to 200. Decide the length of VARCHAR depending on the values that will go under the given column. The maximum length for strings is 255. DECIMAL(M, N) is used for decimals, M is the total length of the decimal and N is the length of numbers from the decimal point to the end of the decimal. For example, 3.14, this is a decimal data type and M is 3 and N is 2 so it would be written as DECIMAL(3,2). BLOB is a data type that’s uncommon in use, but it’s good to know that this data type is for images, other multimedia files and for storage of other binary data. DATE values are in ‘YYYY-MM-DD’ format, meaning no time part. The supported range is ‘1000–01–01’ to ‘999–12–31’. TIMESTAMP includes both time and date. The values are in ‘YYYY-MM-DD HH:MM:SS’ UTC format. The supported range is ‘1970–01–01 00:00:01’ UTC to ‘2038–01–19 03:14:07’ UTC.

Deleting a Table

*If for some reason you want to delete a table, running this query will delete the table from the database. To confirm that the table has been deleted, run:

*running this query will result in:

*this message confirms that the student table no longer exists in the database.

Adding Columns to a Table

*adding gpa column with a decimal data type that has total digits of 3 and digits after the decimal point being 2.

To confirm run:

*running this query will show us the schema of student table.

As a result,

*this confirms that indeed the gpa column has been added to the student table.

Dropping Columns from a Table

*This query will delete the gpa column from student table. Again, to confirm, run the DESCRIBE query to see the schema of the student table and gpa should no longer be part of the table.

Inserting Data into a Table

*This query will add a row to the student table with the information provided in VALUES. The order of values has to match how they were defined when creating the table (student_id, name, major).

To check if data is inserted correctly, run:

*this query returns everything from the student table ( it reads select (* means ) everything from the student table), running it will result in:

*inserting the data above was a success!

For example, run:

Now, run the SELECT query to confirm the insertion was correct,

*The student table now has two rows. This process can be repeated to insert as many students as needed.

Inserting available data into a Table

Let’s say we want to add a third student, but he’s undecided and doesn’t have a major yet. We can insert data that we do have by running the following,

*This format allows us to add available information into the table. So, after the table name, in this case, student, we specify within the parentheses the columns we do have the data for and the corresponding values within the VALUES parentheses.

Again, run the SELECT query to get back all the data and the output should be:

*Now we have 3 rows with Matt’s major being NULL since we didn’t insert a value for it. Remember since student_id column is the primary key and knowing that primary keys are unique, inserting a student_id that previously exists will throw an error.

Common SQL constraints can be added when defining the schema of a table in CREATE TABLE query to make inserting of data easier.

For example,

If we drop the previous student table and create this version, the result will be the following after running the DESCRIBE query:

*This table shows that values for student_id column will auto-increment, meaning you no longer have to pass it in the VALUES section of the INSERT query, the database will do that for you. The name column cannot be NULL and the values for major column must be unique and if a major value is not passed in, the default value will be ‘undecided’. Data that satisfies the conditions defined in CREATE TABLE query can be inserted into the table. If not, data insertion will fail.

Run the following queries,

after running the SELECT query, the result will be:

*again, data that meets the criteria set in CREATE TABLE query will be inserted into the table and the student_id will auto-increment, so there’s no need to specify that while inserting as shown above and the default value for the major column is ‘undecided’ when a value is not passed in.

Updating Data in a Table

Let’s say we want to update the table for students who had major as ‘Biology’ to ‘Bio’ for all records. Running the following query,

*updates records in student table that previously had ‘Biology’ as a major to ‘Bio’. The WHERE keyword specifies the rows that need to be updated. Without it, every record gets updated to the new set value(s). Comparison operator used here is the equal sign (=).

After running the SELECT query, the result will be:

*updated Rob’s major from ‘Biology’ to ‘Bio’.

Deleting Data in a Table

Let’s say Rob from our student table has left the school and his record needs to be deleted in order to update records. This can be achieved by running,

*Rob’s student_id was 2 so the WHERE keyword in this query is specifying which row to delete from the student table. Just like with UPDATE, without the WHERE keyword, all of the data in student table will be deleted.

In result, after running the SELECT query, we get:

*record with student_id of 2 has been deleted.

Getting Information from Database

In this section, we will look at the SELECT keyword of SQL. Imagine we have tons of data, so far we have been looking at getting everything (*) from the table. What if we only wanted to get back certain rows or columns from the table?

This approach can be implemented by running:

*here instead of getting every column back like before, we are stating we only want the name column from student table. Getting another column back is as easy as adding commas between the column names in the query above.

The result is,

*voila! only names are returned.

Getting Back Ordered Data

If for some reason, we want to get back data in a certain order, this can be achieved by doing the following,

First, let’s add another student,

*Matt is added to the student table.

Second, let’s confirm by running,

*confirms Matt is added.

In result,

Imagine, we want to get back every column from the student table where the records returned are in an ascending order for column, name.

*running this query will return every column from student table where the records are ordered based on ascending names. By default, ORDER BY keyword is ascending, if descending order is desired, place DESC after the column name in ORDER BY section of the query.

If multiple records have the same value for the column we are ordering by, a second column can be added to order records even further by adding a comma and the second column in the ORDER BY section of the query.

Quick snapshot of how this may look:

*if further ordering is needed.

Get Limited Data Back

*LIMIT keyword controls how many records are returned.

Student table has 3 total students, but we only want two, the query above does exactly that:

*returns the number of records defined in LIMIT section of queries.

Filtering of Data with SELECT Queries

Imagine, we want to get back records of students who are Math majors only in today’s example. This can be done by running,

*WHERE keyword in SELECT queries allows for filtered data to be returned and remember if we wanted to, we can choose to get back certain columns back from the student table instead of getting back every column in this example.

*returns the records of students who are Math majors.

Other Comparison Operators

In WHERE section of SELECT queries, other common comparison operators can be used: <, >, < =, > =, =, <>, AND and OR. Most of these operators we already know, but <> is an unusual one and it denotes in-equation or not equals to.

Filtering Based on Group of Values

Let’s say we have values, ‘Math’ and ‘Biology’, and we want to get back any records from the student table with majors that match any one of those values. How can we do that?

By running,

*Matt is the only student with a major that matches any values in the IN section of the query.

This concludes the core fundamentals of SQL section of this guide. Yes, it’s lengthy, but the goal is to really grasp the basic commands of SQL and I hope that’s the case at this point. Take a quick break if needed and we will jump into multiple tables and complex table schemas next!

Part IV | Advance Concepts of SQL

In this section, we will be looking at a complex database schema in order to see behaviors of advanced SQL queries. We will be creating multiple tables for a Company database. Let’s do that now!

Creating Database Tables

This Company database will have 5 tables.

First, is the employee table. Before we do that, let’s drop the student table created in the section before. Then, run

*employee table is created to store details about each employee in the company. Even though super_id and branch_id are foreign keys, we can’t specify that until the branch table is created shortly.

Running,

*this query will return the schema of the employee table.

Will return,

*schema of the employee table. Super_id value for each row is the emp_id, who is the supervisor for this employee. Branch_id values indicate the relationship between employees and branches, which employee works where.

Next, we have a branch table.

*This query will create the branch table. This table shows various branches, managers and their start dates for each branch. Here, we can specify the details of the FOREIGN KEY since the employee table has already been created. Mgr_id values come from the employee table.

*running the DESCRIBE query will return this schema of branch table with mgr_id being a foreign key.

Now that the branch table is created, we can go ahead and specify the FOREIGN KEY details in the employee table,

*running the two above queries will make super_id and branch_id foreign keys in the employee table.

*in result, we get an updated schema of employee table with super_id and branch_id as foreign keys.

Now, let’s create the client table.

*schema of client table with branch_id as a foreign key and this table displays details about each client and their associated branches.

*after running the DESCRIBE query on the client table, the above table will be returned.

Then, it’s the works_with table. This table shows relationships between employees and clients. Let’s explore,

*creates works_with table with two columns as a primary key in order to uniquely identify each row because emp_id and client_id columns alone do not uniquely identify each row. Both of these columns are also foreign keys as well. This is an example of how complex a table schema can get. This table displays the relationship between an employee and a client with total sales included.

After running the DESCRIBE query, we get

Lastly, we have our final, the 5th, branch_supplier table.

*running this query creates the branch_supplier table. Again, to uniquely identify each row in this table, two columns are used as a primary key, branch_id which is a foreign key as well and supplier_name which is not. This table displays a relationship between branches, suppliers to those branches and what is being supplied.

After running the DESCRIBE query, we get

*as shown above, primary keys are combination of branch_id (foreign key) and supplier_name columns.

This concludes the creating tables part of this section. Let’s start inserting!

Inserting of Data into Tables

Due to having various foreign keys in tables, the order of insertion has to be specific,

*First, insert the first employee’s data with id = 100. Foreign keys, branch_id, and super_id are NULL since both, branch and employee tables are empty at this point. Then, insert a value to the branch table since a mgr_id from the employee table is now available. Now that there is a branch that this employee works at, we can update the record for this employee in the employee table with branch_id = 1. Another employee who works at the same branch, his information is inserted and notice since David Wallace is the supervisor of Jan, both branch and super foreign keys are non-NULL values.

The above example takes care of all employees who work at the Corporate branch. By following this pattern, we are able to add data into both employee and branch tables.

*after running the SELECT query on the employee table, this result is obtained.

By repeating the above process, employees who work at Scranton and Stamford branches can also be added,

*running these queries in this order from adding the branch’s manager first, then adding the actual branch, then updating the manager’s record to show the correct branch_id and lastly adding all other employees who work at the same branch will ensure data is inserted correctly despite having multiple foreign keys in both tables.

*in result, after running the SELECT query, the employee table with 9 total records.

*after running SELECT query, the branch table.

Inserting continues to branch_supplier table,

*inserting data into the branch_supplier table, straightforward inserting of values compared to inserting into employee or branch tables.

In result, after running the SELECT query,

*branch_supplier table with primary key coming from two columns, branch_id and supplier_name columns together uniquely identify each record in the table.

Next is the insertion of data into the client table,

*running the SELECT query returns this client table.

Lastly, insertion of works_with table,

*SELECT query returns this works_with table.

Whew! That was a lot of inserting! We now have all 5 tables created and populated with company’s data. Let’s start with a review of:

Querying of Data using SELECT keyword

Imagine we want to get back all employees from the employee table, this is an easy one,

*running this query will return all employees from the employee table , (*) means select everything from the given table.

If you want to find all clients, just replace the table name from employee to client in SELECT query to retrieve all clients.

Next, let’s say we want to get all employees ordered by salary,

*while getting all records from the employee, we want to order the records based on salary (by default, it’s in an ascending order).

*in result we get records of all employees in an ascending order based on salary, if we wanted to get the employee with the highest salary first, we would just add DESC to the end of ORDER BY section of the query.

Now, we would like to get all employees, but this time ordered by sex then name. How can we do this?

*if ordering by multiple columns is desired, just separate each column to order by using commas. Even though we wanted to order by name, there is no name column so ordering is done using first_name and last_name instead.

*employees are ordered first by sex, then first_name and then last_name. If there were any duplicates in sex, records would be ordered further using first_name and if there are duplicates still, further ordering is done using last_name.

The next prompt is to find the first 5 employees from the employee table,

*from the core fundamentals section, LIMIT keyword limits the number of records returned.

Imagine, instead of getting all columns returned from the employee, we only wanted first and last name columns,

*only need first and last name columns from employee table.

*returns just the first_name and last_name columns from the employee table for all employees.

Instead of getting the column names back as first_name and last_name, we want to get back as forename and surname respectively. Aliasing column names is possible,

*aliasing column names.

Next prompt is to find all different genders in the employee table, we can look at a new SQL keyword DISTINCT,

*this query will return all different or unique genders found in the employee table. This query can run with any column in the table, just have the correct column name after the DISTINCT keyword in the query.

*M/F are the two unique genders in employee table.

This wraps up this section, let’s move onto:

SQL Functions

A SQL function is a block of SQL statements, a query, that returns a value. Let’s see it in action!

This time around, we want to find the number of employees, a value,

*since emp_id is the primary key, by counting that column, we can get the total number of employees. Specify in parentheses of COUNT section of query which column you want to count.

*this function returns a single value, 9, number of total employees.

Next, we want to find out how many employees have supervisors, this can be done using the above query, except replace emp_id column name with super_id to count how many entries have values for this column.

Kicking up a notch, next prompt is to find female employees born after 1970.

*this query will only count female employees who are born after 1970.

In result,

*2 such records.

Let’s check out the AVG keyword in SQL functions,

*this query will return the average salary from the employee table.

*average salary of an employee.

The above query can be updated to

*this query will return the average salary for males. Change ‘M’ to ‘F’ to find out the average salary for females.

The next prompt is to instead find the sum of all employees' salaries (SUM keyword).

*returns the sum from the column specified.

*sum of all employees' salaries.

Aggregation

Aggregation enables the display of returned data from SQL functions in a more helpful way.

For example, the prompt is to find out how many males and females there are in the company.

*this aggregated query will first find unique sex values and then count how many times those values occur in the table.

Just to bring the logic home, another example,

*GROUP BY section of the query finds unique super_ids in the table. COUNT keyword counts how many times these unique super_ids occur in the table, including NULL.

*the final result, for each unique super_id, the count column displays how many times that super_id appeared in the table.

The employee table has been used a ton, let’s look at the works_with table now,

The next prompt is to find the total sales of each salesman.

By looking at the works_with table, we can see that each employee appears multiple times in the table because that employee has multiple clients and for each client, there’s a value for total_sales. Our goal is to sum various total_sales for each employee. Looking at the previous example, how can this be done?

*First, let’s find unique emp_ids from works_with table, then let’s sum up each total_sales value for each unique emp_id.

*Total sales for each employee in the table.

We can replace emp_id with client_id in the above-aggregated query to get the total amount each client spent instead. These aggregate queries are great, play around with the Company’s database schema to get better at using them.

Wildcards & LIKE keyword in SQL

Wildcards are used to grab data that match a certain pattern. Let’s explain!

The prompt is to find all clients that are LLC. Looking at the client table, we can’t use anything we know so far, this is when wildcards come in:

*in the WHERE clause, we are saying return those records where client names follow the pattern defined inside quotes after the LIKE keyword. ‘%LLC’ would only match to client names that end with LLC. % denotes any numbers of characters.

*1 record returned because this is the only one that matches the pattern defined in the WHERE clause of the query using the wildcard (%) and LIKE keyword.

If you are familiar with regular expressions, this syntax is a simplified version of that.

Let’s look at another prompt, find any branch suppliers who are in the label business.

*the pattern this time is any characters before and after label. Records matching this pattern will be returned.

*the only branch supplier doing label business.

The next prompt is to find any employee born in October.

*the pattern here is since we know how dates are formatted, the first 4 numbers represent the year. Each _ indicates a character, 4 of them means 4 characters, then the hyphen and then 10 for October and the rest can be anything, hence represented by %.

*1 record in the employee table that matches the pattern defined in the query.

Lastly, let’s find clients that have school in their names.

*pattern defined to match any client names that end with school. Again, % means any number of characters before school as long as the client's name ends with ‘school’.

* one record that matches the pattern.

Part V | Conclusion

You can generate powerful queries from SQL statements as seen in this blog post. Remember, the best way to cement the concepts and get better at SQL is by practicing and solving SQL problems, just like most things in software development.