SQLPad Q&A

SQL Updated Apr 29, 2024 10 mins read Leon Leon
SQLPad Q&A cover image

Quick summary

Summarize this blog with AI

SQLPad founder Leon Wei, recently did an AMA on Reddit r/sql, he answered some great questions in database, data science, latest technology, industry trends, growing career as a data professional.


Question:

Could you explain what the goal of sqlpad.io is? 🤔

Answer:

Hi, thanks for your question.

The goal of sqlpad is to help people in data analytics/science quickly learn or refresh their SQL skills.

It could be for a job interview or be more productive at work.


Question:

I am a mid-level SQL developer (4+ years in MySQL, two MS certifications in Querying data and Designing databases). I know all essential functions and index use cases, etc. What should I be studying now to be ready for the next ten years of database engineering?

 

Answer:

Cool question.

The industry trend is that the data is getting bigger and bigger, more volume every day.

I would probably explore tools or technologies that allow you to handle a large volume of data in a distributed data warehouse.

Some of those tools might include Spark, Redshift, Presto, Druid, Snowflake.

If you want to own an end-to-end data product and have more visibility and significant impact on your org/company, I would also recommend developing a good set of soft skills.

Including presentation, learning how to talk to business people, and convincing bosses to get resources, as all senior roles eventually lead to good salesmanship.


Question: 

Hello, I am a complete beginner with no experience in SQL or python/R…however, I am very keen to learn tools and develop my skills to work as a data analyst/scientist..how would you suggest I go about it? What would you recommend to your younger version at the start of their data analyst/scientist career? Any tips and guides would be helpful. Thank you so much for doing this. ☺️

 

Answer:

Hi there, thanks for your great questions. 

I would recommend finding a mentor/coach for complete beginners, i.e., people who had the same goal but had already figured it out and achieved it.

Work with them, learn more about the tools/skills they use, and create a personalized plan for the job interview.

It's probably the most time-saving and efficient way to understand what skills/techniques/tools you should focus on. 

For example, if your dream job is a data scientist at Amazon, talk to an amazon data scientist and ask for help.

You can also use our mentoring platform: instamentor.com, to find a mentor if you don't have a connection.

But generally speaking, I would focus on the following skills

  1. SQL
  2. R or Python
  3. A/B testing
  4. Product sense
  5. Statistics
  6. Probability
  7. Behavioral questions/ leadership

I've written a long article, which you can read it more here

https://instamentor.com/articles/7-steps-to-prepare-a-data-scientist-job-interview-in-silicon-valley-in-2020-analytics-inference-track

We have helped complete beginners preparing their data science interviews and landed jobs at FAANG companies a few months later.

Lastly, if I can have a do-over, I would probably.

  1. Switching to an Apple computer asap, which made me 10x more productive today than windows as a developer (serious suggestion 😃);
  2. I would also learn as much of computer science /software as possible (data structures/algorithms, operating systems, web development)
  3. Identify a field/industry you will be passionate to work on for the next 5-10 years and go full-speed. I got into statistical learning (aka machine learning) long ago and never regretted it. It's my opinion that it's still not too late to get into AI/ machine learning field as we are still in the early stage of AI.

Question:

What are your thoughts on a non-relational database and converting it to relational analytics and general data querying?

The bigger picture is that we are currently using MongoDB to store our data, and there's been pushback in having that data translated to run SQL. So would you say it's worth the effort to put our data somewhere else to be able to use SQL? Or learn Nosql and don't migrate the data?

Answer:

Are you guys using MongoDB for production? (External customer-facing apps for live transactions ) ?

Generally speaking, I think it's a great idea to have a dedicated analytics DB/data warehouse/Hadoop cluster for data analysts/data scientists/product managers to use, separate from a production database.

As querying on the dedicated/separate DB will not impact production DB (supporting live online transactions) and data safety.

It also helps people (new team members) who don't know MongoDB (or are unwilling to learn) to be up and running quickly.

But that does mean you will need to add a separate ETL pipeline to pass data over to this relationship database. Again, it's extra work and adds delays.


Question

When entering a new environment, what is the best way to learn the data structure quickly?


Answer

Hi there, that is a great question.

I would schedule a 1:1 meeting with people working on the database for a long time and ask them to help you understand it instead of figuring it out or guessing what a column name means all by yourself.

It's an excellent opportunity to ask seemingly basic/or even silly questions, but since you are new to the team/project, no one will judge against you.

You can take advantage of 'being the new guy' and ask many questions, but you will only have 1 or 2 months for those questions, try to use this opportunity.

Three months down the road, and if you still ask what that column of data is, people will think otherwise.


Question

Hi, thanks for taking the time to do this. I'm in the UK as opposed to the US, and I am currently self-teaching myself in coding/database management in the hopes of getting a job in the industry.

I do not have a university degree. Are there any things I could do to make myself more attractive to potential employers? Thanks

Answer

  1. Probably look for jobs that don't have a college degree to start with. I know there are remote work companies that are open to it;
  2. If possible, get certificates from accredited institutes (Microsoft/Oracle/AWS), demonstrating to the employer that you are serious about this field and have spent time learning/working on it.
  3. Show related courses (could be an online course from Coursera/edx/udacity/udemy, etc.);
  4. Projects (for example: could be a website that an employer can visit, where you used Mongo DB, which is an exact database the company is hiring for)
  5. Github repo, if you contributed to open source projects or your project, they have many stars.
  6. Lastly, it might be tricky, but you can find someone with a similar background as yours and have found a great job, reach out to them, ask them for advice.

Question

Hi, I'm an entry-level Data Scientist (2018 intern, 2019-21 full time). I'm wondering what tools outside of SQL I should be mastering? At my job, my role somehow morphed into a data management role more so than a data scientist. Meaning, I haven't done a lot of machine learning, etc., outside of school.

I used to be proficient in Python & R (I haven't used them in a while), I use SQL daily.

For ETL tools, I'm proficient using SSIS, Azure Data Factory & Alteryx.

Data Viz wise, I'm proficient with Tableau, Power BI & Excel.

Answer

Wow, you've learned/mastered many tools as an entry-level data scientist, much better than me in my early career congrats.

First of all, most data science projects or machine learning projects will take 90% of the time to get the correct data, so you are usually spending a lot of time managing data. 

I wouldn't feel so upset about it. 

This also proves that's why SQL is so important.

From a technical perspective: do you work or collaborate with a software engineering or data engineering team? If so, a good understanding of their tech stack or toolchain could be beneficial. 

Understanding how the production environment works, how the software or app is built and served to your customers, etc., could be beneficial.

It also sounds like you are missing a chance to work on a machine learning project. So perhaps it's a good time to talk to your manager and ask for such a project.

Think about from their perspective why you should working on this ML project, how it can benefit them and your team before talking to your manager.

If that is not an option, I recommend picking a machine learning competition on Kaggle and brushing up your ML skills outside of work.


Question

Hey, Thanks for doing this. Although I was never a full data guy, I worked quite a lot with SQL (Oracle, MySQL, Postgres), dashboards, python & bash, etc. But for the last three years, I have been a Product Owner/Business Analyst, but I realize I want to go back to do more tech things.

How do I go back without taking a pay cut (I can't afford it due to family reasons)? What would be your advice? Which skills are missing?

Answer

Hi, thanks for your question.

First of all, in the united states, usually engineering jobs (data engineer, software engineer) get paid better than data scientist/analyst.

This does not include machine learning scientists or hardcore AI researchers. Those roles are paid much higher, seven figures if you are outstanding. 

If you are switching jobs from a data analyst role to a data engineer or a software engineer role, you will likely get a better salary.

Lastly, if you are going to stay at your current work but just wanted to get some hands-on experience with databases, why not ask your manager? Then, make a project proposal, etc. 

But don't say it loud that it is just because you want to do those projects. Instead, think about it from your manager or company's perspective, why you doing this project will help them.


Question

First of all, I love this site (sqlpad.io). It helped me a lot. Secondly, what capabilities does a proficient SQL user have in your experience? I am strictly talking about the SQL skillset. Thanks

Answer

Thanks for your question.

From a data analyst/scientist perspective, your SQL skills are probably quite proficient if you know all the following without much googling.

Basic stuff

  1. SELECT, WHERE, LIMIT for filtering results
  2. COUNT(DISTINCT), SUM, AVG, MAX, GROUP BY, HAVING for aggregation
  3. Lightweight regex using LIKE 
  4. Timestamp related functions such as datediff
  5. INNER JOIN, OUTER JOIN (LEFT JOIN), SELF JOIN
  6. Handling missing data such NULL using COALESCE
  7. Subqueries and CTEs, temporary tables

More advanced

  1. Window functions like ROW_NUMBER, RANK, LAG/LEAD
  2. Data transformation using CASE WHEN
  3. Query optimization
  4. Indexing (which columns to add and which not, and what type of index to add);

Also, a few data engineer related topics that I found useful (usually not required for pure data analytics/science roles) 

  1. Develop ETL scripts to automate the data pipeline with airflow or cron jobs;  
  2. UPSERT (update/insert);
  3. Store procedures/ user-defined functions;
  4. Database/data warehouse design (star schema, snowflake);
  5. Primary keys, foreign keys, constraints
  6. Sharding (how to distribute data evenly if in a cluster)

Question

Why is SQL so important for a data scientist career? 

Answer

If you ask any data scientist, they will probably tell you 90% of their time is spent on data processing/munging. 

The success of your analytics results, insights, and quality of your model depends on the quality of your data.

Take a machine learning modeling project, for example:

The overall data process from raw data to clean, ready-to-use data usually involves the following steps:

  1. Data acquisition.
    1. Talking to domain experts and identify the source of the data, understand how the data is generated, if it is of high quality (machine-generated vs. manually entered);
  2. Data Preprocessing
    1. Remove or impute missing data, extract features from textual or categorical data, normalize some data, split the data into training vs. testing, down/upsampling, etc.
  3. Data Postprocessing
    1. Sanity check to make sure there are no apparent mistakes were introduced in previous steps;
    2. Remove outliers or special cases;

And you will likely need to use SQL in every single step! 

Now you are convinced SQL is essential for your data science career, how about start learning it and sign up today?

 

Interview Prep

Begin Your SQL, Python, and R Journey

Master 230 interview-style coding questions and build the data skills needed for analyst, scientist, and engineering roles.

Related Articles

All Articles