Data Engineering#
2023 - A year in review
On the 30th of November 2022, I created my ChatGPT account and started trying out the capabilities of ChatGPT, which back then represented a major disruption of what we knew as chatbots. I remember trying out in the first days the capabilities of the chatbot, mainly focusing on what we would now call "reasoning" capabilites. I tested out its flexibility, it's understanding of instructions and its ability to accomplish tasks.
No errors raised for overflowing data when using CHAR and VARCHAR data types in SQL Server
It’s not always possible to predict when problems will occur with your database’s architecture and it’s always a shock when it does, especially because most of the unexpected problems occur after a deploy to production (in this scenario I’ve inherited the database, I wasn’t the one to design it, or at least not this failing part).
The latest problem I’ve faced was with data which was 9 characters long attempted to be stored into a CHAR(8) column. Have a look and test the below script to see what happens:
Finding locked tables in SQL Server
Sometime this week I executed a DELETE query with an explicit BEGIN TRANSACTION on our DEV environment, the moved to another session to check some other results and forgot the transaction hanging. A few minutes later I left for home and by the time I was close to home a developer called me on my personal phone.
He was very stressed saying that he can’t do a SELECT TOP 1 * FROM
OurBigTable and what can he do about it? It was at that point that I figured that I forgot the transaction open, without any COMMIT or ROLLBACK executed.
Split string in SQL Server with Recursive CTE
I decided to spend my time productively while I was monitoring a database restore, so I wanted to see if I could create a recursive CTE that could do a split string in SQL Server into words (or tokens), based on a certain delimiter.
And so I built the recursive CTE you can see next. Because I’m not 100% sure if the order of the words will be consistent throughout multiple executions I also added a “LVL” column to enforce the order of the rows, so that you can be 100% certain that the way the result is displayed is the actual order of the words in the phrase.
Simplifying LIKE conditions in WHERE clause
Let’s say you have a query with a long WHERE clause in which you’re checking if a set of characters exist in a column and the only way you’re thinking of doing this is with a set of LIKE conditions in your where clause, like below:
My year in Big Data, ML and AI
Although I’ve made this blog live a couple of months ago and it has a Big Data section, I didn’t have the time to work with technologies from this field very much, especially in the latter part of the year.
Even though I didn’t manage to test these technologies and implement them in a “homework” project, I’ve tried to keep myself up to date with things happening in this Big Data and Machine Learning ecosystem. And so, a small review of what small steps I managed to take this year:
How to find queries while tracing in SQL Server
Let’s say you’re working on a DEV database and you’re trying to get some information from SQL Server Profiler about the number of READS or WRITES that your queries are making and / or CPU time spent or other metrics. On a database with a lot of users running queries on the DB at the same time this can be difficult. But luckily there’s an easy fix to find those queries.