SQL For Interview

SQL basic concepts for interview

Today let's learn about the rank and partition feature of SQL. This is a very useful feature of SQL and is used in many places. Let's see how it works.

DQL (Data Query Language)

DQL is used to query the database and retrieve data from it. The most commonly used DQL command is SELECT. The order of execution of DQL commands is as follows: 1. FROM 2. WHERE 3. GROUP BY 4. HAVING 5. SELECT 6. ORDER BY 7. LIMIT

Window Functions and Partition

The different functions help to perform calculations across a set of table rows that are somehow related to the current row. This is similar to the type of calculation that can be done with an aggregate function. However, unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row. The rows retain their separate identities.

The PARTITION BY clause is used to divide the result set into partitions to which the window function is applied. The PARTITION BY clause is optional. If it is not specified, the entire result set is treated as a single partition. - RANK(): This function returns the rank of each row in a group of rows. The rank is calculated based on the order of the rows in each group. - DENSE_RANK(): This function returns the rank of each row in a group of rows. The rank is calculated based on the order of the rows in each group. The difference between this function and the RANK function is that this function does not skip any ranks. For example, if there are two rows with the same rank, then the next row will have the rank 3. - ROW_NUMBER() :This function returns the row number of each row in a group of rows. The row number is calculated based on the order of the rows in each group. - NTILE() :This function returns the rank of each row in a group of rows. The rank is calculated based on the order of the rows in each group. The difference between this function and the RANK function is that this function does not skip any ranks. For example, if there are two rows with the same rank, then the next row will have the rank 3. - FIRST_VALUE() :This function returns the first value in a group of rows. The first value is calculated based on the order of the rows in each group. - LAST_VALUE() :This function returns the last value in a group of rows. The last value is calculated based on the order of the rows in each group. - NTH_VALUE() :This function returns the nth value in a group of rows. The nth value is calculated based on the order of the rows in each group. - LEAD() :This function returns the value of a column from the next row in a group of rows. The value is calculated based on the order of the rows in each group. - LAG() :This function returns the value of a column from the previous row in a group of rows. The value is calculated based on the order of the rows in each group. - CUME_DIST() :This function returns the cumulative distribution of a value in a group of rows. The cumulative distribution is calculated based on the order of the rows in each group. - PERCENT_RANK() :This function returns the percent rank of a value in a group of rows. The percent rank is calculated based on the order of the rows in each group.

What are the different operators available in SQL?

SQL provides a variety of operators to perform different operations on data. Some of the most commonly used

Examples of operators in SQL

SELECT * FROM employee
WHERE salary > 50000 AND department = 'IT';

SELECT name || ' ' || department AS full_info
FROM employee;

JOINS

Let's say we have a table called employee with the following data:

SELECT id, name, salary, department,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employee;

The above query will give us the following result:

+----+----------+--------+------------+------+
| id | name     | salary | department | rank |
+----+----------+--------+------------+------+
|  6 | Alice    |  60000 | HR         |    1 |
|  5 | Bob      |  50000 | HR         |    2 |
|  4 | Mary     |  40000 | HR         |    3 |
|  3 | Smith    |  30000 | IT         |    1 |
|  2 | John     |  20000 | IT         |    2 |
|  1 | Prasanna |  10000 | IT         |    3 |
+----+----------+--------+------------+------+

As you can see, the rank is calculated based on the salary of the employee in each department. The employee with the highest salary in each department gets the rank 1, the employee with the second highest salary gets the rank 2 and so on.

What we did here is we partitioned the rows by department and then ordered them by salary. This means that the rows are grouped by department and then ordered by salary. The rank is calculated based on the order of the rows in each group.

Pattern matching in SQL and REGEX

Pattern matching is a technique used to search for specific patterns in a string. In SQL, we can use the LIKE operator to perform pattern matching. The LIKE operator is used in a WHERE clause to search for a specified pattern in a column. The percent sign (%) is used as a wildcard character to represent zero or more characters, while the underscore (_) is used to represent a single character.

For example, the following query retrieves all employees whose names start with the letter 'A':

SELECT * FROM employee
WHERE name LIKE 'A%';

Similarly, we can use regular expressions (REGEX) for more complex pattern matching. REGEX allows us to define a search pattern using a combination of literals, operators, and special characters. In SQL, we can use the REGEXP operator to perform regular expression matching.

For example, the following query retrieves all employees whose names contain the letter 'a' followed by any character and then the letter 'e':

SELECT * FROM employee
WHERE name REGEXP 'a.e';

REGEX vs REGEXP - The main difference between REGEX and REGEXP is that REGEX is a general term for a search pattern, while REGEXP is a specific implementation of regular expressions in SQL. REGEXP is used in SQL to perform pattern matching using regular expressions.

The % wildcard matches zero or more characters of any type and can be used to define wildcards both before and after the pattern. Search a student in your database with first name beginning with the letter K:

SELECT * FROM students WHERE first_name LIKE 'K%'

Omitting the patterns using the NOT keyword

Use the NOT keyword to select records that don't match the pattern. This query returns all students whose first name does not begin with K.

Matching a pattern anywhere using the % wildcard twice

Search for a student in the database where he/she has a K in his/her first name.

SELECT * FROM students WHERE first_name LIKE '%Q%'

Using the _ wildcard to match pattern at a specific position

The _ wildcard matches exactly one character of any type. It can be used in conjunction with % wildcard. This query fetches all students with letter K at the third position in their first name.

SELECT * FROM students WHERE first_name LIKE '__K%'

Matching patterns for a specific length

The _ wildcard plays an important role as a limitation when it matches exactly one character. It limits the length and position of the matched results. For example -

WHERE first_name LIKE '___%'

Bounded preceding and following

In SQL, the terms "bounded preceding" and "bounded following" are used in the context of window functions to define a specific range of rows relative to the current row. These terms help to specify how many rows before or after the current row should be included in the calculation performed by the window function. - Bounded Preceding: This term refers to a specific number of rows that come before the current row in the result set. When using bounded preceding, you can specify a fixed number of rows to include in the calculation. For example, if you want to calculate the sum of values for the current row and the two preceding rows, you would use "2 PRECEDING" in your window function. - Bounded Following: This term refers to a specific number of rows that come after the current row in the result set. Similar to bounded preceding, you can specify a fixed number of rows to include in the calculation. For example, if you want to calculate the sum of values for the current row and the two following rows, you would use "2 FOLLOWING" in your window function. - Example: .. code-block:: sql

SELECT id, name, salary, SUM(salary) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sum_salary FROM employee;

In this example, the SUM function calculates the sum of salaries for the current row and the two preceding rows based on the order of the id column.

Bounded Following and Bounded Preceding are useful when you want to perform calculations that consider a specific range of rows around the current row, allowing for more precise analysis of data within a defined context.

Aggregate Functions

Aggregate functions are used to perform calculations on a set of values and return a single value. Some commonly used aggregate functions are: - COUNT(): This function returns the number of rows in a group. - SUM(): This function returns the sum of values in a group. - AVG(): This function returns the average of values in a group. - MIN(): This function returns the minimum value in a group. - MAX(): This function returns the maximum value in a group. - GROUP_CONCAT(): This function returns a concatenated string of values in a group.

Case Statements

The CASE statement is used to perform conditional logic in SQL. It allows you to evaluate a set of conditions and return a specific value based on the result of the evaluation. The syntax for the CASE statement is as follows: .. code-block:: sql

CASE
WHEN condition1 THEN result1 WHEN condition2 THEN result2 ... ELSE resultN END

... code-block:: sql

SELECT id, name, salary, CASE WHEN salary > 50000 THEN 'High' WHEN salary BETWEEN 30000 AND 50000 THEN 'Medium' ELSE 'Low' END AS salary_range FROM employee;

links

social