Next videos in this series: Automate ETL Pipeline (Airflow): ruclips.net/video/eZfD6x9FJ4E/видео.html ETL Load Reference Data: ruclips.net/video/W-8tEFAWD5A/видео.html ETL Incremental Data Load (Source Change Detection): ruclips.net/video/32ErvH_m_no/видео.html&t ETL Incremental Data Load (Destination Change Comparison): ruclips.net/video/a_T8xRaCO60/видео.html How to connect to SQL Server via Python: ruclips.net/video/zdezE6TWSQQ/видео.html&t How to connect to Postgres via Python: ruclips.net/video/h-u1ML-OWok/видео.html
Thank you for the demonstration, I looking for this kind of content/demonstration for a long, time, I always heard about "ETL" just hearded but never had a chance to see how does it in real.
I'm not sure how I found your site, but holly cow this is just fantastic. I appreciate all the work you have put into this and I will enjoy learing from your examples!!!
Yes, you can achieve this using the Source Change Detection technique. I have covered the incremental data load in the following videos. Feel free to check them out. ruclips.net/video/32ErvH_m_no/видео.html&t ruclips.net/video/a_T8xRaCO60/видео.html
In this same flow is it possible to use excel file as a data source and then load the data in the database then showcase it to a frontend as a report ?
Hello! That's a great explanation, thanks! Please tell me how the data transfer is carried out? Are we using the RAM of the server where Python is installed or are we using the RAM of the server where PostgreSQL is installed? I want to understand if this scenario is suitable if there is a table with 30 million rows on the SQL Server side?
Hey there, in this use we utilized Pandas and it loads the data in memory of the server where Python is installed. So you would need to make sure either data fits in the server's memory/load in batches or use the chunking strategy to load your data. Hope this helps.
Thank you for the video! I would appreciate it if answer to my question. Why this method is better than traditional approach with ETL tools like SSIS, IBM DataStage, SAP Data Services. What I can make with Python ETL which I can't with other tools. Could you please give me some examples.
Glad you like the content. Oh boy where to start… anyone who dealt with traditional tools and mapping columns manually, casting the data formats to traditional tools formats and to pick up new updates in the source will tell you that this solution handles all of these challenges gracefully! Try developing a similar solution in one of those tools and you ll see!
why we used tbl[0] ? in for tbl in src_tables: #query and load save data to dataframe df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) load(df, tbl[0]) Shouldn't we use tbl only? for current reference?
Hi Tejas, you can use above approach but you'd need to perform further action to get the actual value. The "tbl" is a pyodc.row and not a straight forward list. The Row object in pyodbc seems to be a combination of tuple, list and dict objects.
That’s too broad of a question. Is there a specific area you want to optimize? Here are some broad tips to optimize the ETL pipeline. Hope this helps. * Eliminate database Reads/Writes. ... * Cache the Data. ... * Use Parallel Processing. ...look into paper-mill * Filter Unnecessary Datasets. ... * Integrate Only What You Want.
I’m glad you liked it. Will try and come up with a real-time scenario. If you re interested in data streaming (continuous data flow) then I check out my video on Kafka: ruclips.net/video/gPvwvkCVSnY/видео.html
for tbl in src_tables: #print(tbl) df = pd.read_sql_query(f''select * from {tbl[0]}', src_conn) Hi, I'm getting invalid syntax in the df row. I may not be using the f string correctly, but looking for ideas. Thank you Morgan
Hi Morgan, it seems you have two quotes after the f. f''select * from {tbl[0]}', src_conn Remove one of the quotes and you should be good. Here is the original code.. for tbl in src_tables: #print(tbl) df = pd.read_sql_query(f'select * from {tbl[0]}', src_conn)
Hi again, I used your code from github and changed the name of the server server = "LAPTOP-8SESKAVH\SQLEXPRESS" . I'm still getting: Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections I did substitute in AdventureWorksDW2019 Not sure what is going on Morgan Morgan's Attempt # -*- coding: utf-8 -*- """ Created on Sat Nov 5 14:34:16 2022 @author: mtman """ #import needed libraries from sqlalchemy import create_engine import pyodbc import pandas as pd import os #get password from environmnet var pwd = os.environ['PGPASS'] uid = os.environ['PGUID'] #sql db details driver = "{SQL Server Native Client 11.0}" server = "LAPTOP-8SESKAVH\SQLEXPRESS" #"haq-PC" database = "AdventureWorksDW2019;" #extract data from sql server def extract(): try: src_conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server + '\SQLEXPRESS' + ';DATABASE=' + database + ';UID=' + uid + ';PWD=' + pwd) src_cursor = src_conn.cursor() # execute query src_cursor.execute(""" select t.name as table_name from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """) src_tables = src_cursor.fetchall() for tbl in src_tables: #query and load save data to dataframe df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) load(df, tbl[0]) except Exception as e: print("Data extract error: " + str(e)) finally: src_conn.close() #load data to postgres def load(df, tbl): try: rows_imported = 0 engine = create_engine(f'postgresql://{uid}:{pwd}@{server}:5432/AdventureWorks') print(f'importing rows {rows_imported} to {rows_imported + len(df)}... for table {tbl}') # save df to postgres df.to_sql(f'stg_{tbl}', engine, if_exists='replace', index=False) rows_imported += len(df) # add elapsed time to final print out print("Data imported successful") except Exception as e: print("Data load error: " + str(e)) try: #call extract function extract() except Exception as e: print("Error while extracting data: " + str(e)) @@BiInsightsInc
@@mtmanalyst make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Did you create this user in your environment? If not you will need to create it. Here is the script for it. Hope this helps. github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/SQL%20Scripts/SQL%20Server/create%20etl%20login%20and%20role%20-%20SQLServer.sql
Hi bro, very informative session.. am working on the ETL QA framework creation using pyspark. for that I have to create a directory structure in pycharm.. is there any reference video you have created? Please share
I have covered Pytest recently as a testing framework for data engineering pipeline. You can check out videos on that topic. If you are using Pytest then you can use the following folder structure. Here is the docs for Pytest: docs.pytest.org/en/7.1.x/explanation/goodpractices.html pyproject.toml src/ mypkg/ __init__.py app.py view.py tests/ test_app.py test_view.py
Hello, Sir! Thank you for the great learning material! But I would like to ask help in running the ETL python script. I'm getting an error and it says 'Login failed for user etl'. Could this be because of the permissions?
Thanks. You will need to create the 'etl' user in both of the databases. Here are the SQL scripts to create them. github.com/hnawaz007/pythondataanalysis/tree/main/ETL%20Pipeline/SQL%20Scripts Also, here is the basic video on how to test your connection via Python using the user/password. ruclips.net/video/zdezE6TWSQQ/видео.html&t
Thank you for the reply, Sir! Apparently I had to change my authentication method and it solved the problem. Thank you for the additional video as well! @@BiInsightsInc
Data sources are sap hana and sql server. My target table is same sql server. How can I perform upsert. I do have primary key on target table but it might be possible other columns might update in future . Please help
Hi Shrutika, you can use the following video as the guide. In this video we perform upsert based on a primary key column. Happy coding: ruclips.net/video/a_T8xRaCO60/видео.html
@@avinash7003 I have covered the Data Engineering role in the following video. Tools can vary depending on the Tech Stack the company is using. But I will do a broader video on what are the base requirements for Data Engineering role. ruclips.net/video/fwkLcp8dbic/видео.html
a really great content!!!!, can you help me with the last step? i am having trouble for the last step Test ETL Pipeline, when using the cmd it says like this" '\postgres' is not recognized as an internal or external command, operable program or batch file." if i am using my C:, but when using D: its "Access is denied." so what can i do to do the last step? thank youuu
his is a very common error "is not recognized as an internal or external command". It comes up when command prompt does not know the location of Python or the script you are executing. First we make sure, is the executable actually installed? If yes, continue with the rest, if not, install it first. If you have any executable which you are attempting to run from cmd.exe then you need to tell cmd.exe where this file is located.
Yes, there are various tools out there that will convert MySQL to Postgres syntax. There is online tool you can use. It will allow you to select source and target. Then there is mysql2postgres tool. If you're dealing with table DDL and want to convert MySQL data types to Postgres then you can write a simple script to swap the data types. www.sqlines.com/online github.com/maxlapshin/mysql2postgres
hi, it was wonderfull tutorial, it help me a lot for understanding the ETL concept. but i'm struggle to follow your tutorial, it start from the Environment Variables. i can't found the similar variable like in your video in my laptop. i hope you can explain me, how to create that variable, or mybe what the solution to see the UID event without open the environment variables...
You can define the system variable under System > Advance and environment variables. I show the variables and their content at 4:10. They contain your database username and password.
amazing Video, I followed all but I had this error and could not find solution Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 17 for SQL Server]Neither DSN nor SERVER keyword supplied (0) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute (0)') Error while extracting data: local variable 'src_conn' referenced before assignment
Hi Nabil, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Hope this helps.
Thanks. The transform step happen after the extract step. I have few examples of the transformations in the following video. ruclips.net/video/eZfD6x9FJ4E/видео.html
Hello, I am getting the following error. Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)") Error while extracting data: local variable 'src_conn' referenced before assignment driver is installed and I have done the other troubleshooting steps found in the previous comments.
Please checkout the following video. It goes over the connection setup and common errors. ruclips.net/video/zdezE6TWSQQ/видео.html&t In addition, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433. You need to check the SQL Server if it a) Accepts remote connections. b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services. Open "SQL Server Configuration Manager" Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name" Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
@@BiInsightsInc I am getting another error. I even tried copying the script directly from your repo changing only the server variable and Data extract error: ('42000', "[42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ':'. (102) (SQLExecDirectW)")
Really nice explanation. However Im getting the following error: 27: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Data load error: No module named 'psycopg2' Any leads to tackle the issue?
Thanks. You need to install the 'psycopg2' module in your environment. Also, we have updated the code to use SQLAlchemy to get rid of the future warning you're seeing in the code. Here is the link: github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
nvm, I managed to solve the issue. it seems that this video is not enough to come up with the whole migration, redirection to other tutorials is needed. Still, I think you have done an amazing job. Thanks!
im having a new problem is that, Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 18 for SQL Server]Named Pipes Provider: Could not open a connection to SQL Server [67]. (67) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to CALVIN/SQLEXPRESS. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (67)') what do i do?
I have covered the commom SQL Server connection issues in this video here. With so many questions on this topic I had to cover it separately. ruclips.net/video/zdezE6TWSQQ/видео.html&t
THIS IS AMAZING THANKS!!! HELP!!!! 1)I did the same thing and it got importeed into postgres, But i need the column data types to be in specif cformat for example Datetime, VARCHAR, DOUBLE ,INT not these text,bigint etc. Please suggest what should i do!! 2) Data is same but the data types are different, i tried changing the data type in the data frame and created custom columns with CREATE sql query but it doesnt matter as pd.to_sql always replaces that table adn creates a new one. So what should i do? Thank you
Hi Arin, you can define data types for each of the columns in your dataframe in dictionary and while importing set the data type for each column. Here is an example to get you started. type_dict = {'Col_A': 'category', 'Col_B': 'int16', 'Col_C': 'float16', 'Col_D': 'float32'} df = pd.read_csv(myfile, dtype=type_dict)
This is cool! But isn't this more of an Extract and Load process than Extract, Transform and Load process? Most of the time while creating the Data Marts transform scripts is a heck and long. Also in the same scenario the Load procedure is mainly for insert query of records than tables. Nevertheless good video!
hai friend.. I have problem. in the sql server name is ENGINEERDATA\ENGDATA and when I'm declared variable ='ENGINEERDATA\ENGDATA' in code when I run the scripts always appear pycopg2.operationalError couldn't translate host name "engineerdata\ENGDATA" to address :unknow host. please help me
You can concatenate the server and instance as a text if the slash is causing an error. I have covered the how to connect to SQL Server and common issues while establishing a connection in this video here: ruclips.net/video/zdezE6TWSQQ/видео.html
@@BiInsightsInc oh, I have fixed the error, in the target url you have declared the user and password but in the load() function you still continue to option userid with password
Could you please explain below part as am getting error like mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'sakila.tables' doesn't exist. what is t and sys here,,,,,am using mysql src_cursor.execute(""" select t.name as table_name from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """)
Hey Siva, t is the alias for the tables and sys is the system schema of the SQL Server. You can find the sys equivalent of the MYSQL and query the table info.
I have made few videos on how to schedule or trigger your Python ETL scripts. You can schedule them via Windows task scheduler, Airflow or Dagster. Feel free to check them out. ruclips.net/video/t8QADtYdWEI/видео.html&t ruclips.net/video/eZfD6x9FJ4E/видео.html&t ruclips.net/video/IsuAltPOiEw/видео.html&t
I encountered this error while running the file: raise KeyError(key) from None at code 'uid' and 'pwd'. Could you help me with this? Also a very nice video! Thank you
Hi Dyu, script is not able to find the uid and pwd variables. If you do have these as environment variables then simple declare them in your script i.e. uid=“username”. Same for the pwd which is the database password.
@@BiInsightsInc Thanks for your help, I have done with that Error. And I have this new one: Data extract error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)') Error while extracting data: local variable 'src_conn' referenced before assignment. I have changed from SQL server Native Client 11, SQL server and ODBC Driver 17for SQL Server, but none of it help me with this bug. Thank you !
@@duynguyenduc1255 this is one of the common issues. I have covered the causes in this video below. You need to debug your SQL Server connection prior to attempting the etl pipeline. Happy coding. ruclips.net/video/zdezE6TWSQQ/видео.html
Hi,bro. Video so cool I have trouble, can you help me. i installed msodbcsql. Tks you so much Extract error:('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)') Data Extract Error: 'src_connect' is not defined
I have covered how to connect to a SQL Server database using Python. This is a common question that comes up in the ETL series. So, I decided to cover it and direct viewers to it if they are facing this issue: ruclips.net/video/zdezE6TWSQQ/видео.html
@@BiInsightsInc Hi Friend, I followed the video and the data appeared on Jupyter, but it gave me the new error: Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)') Error while extracting data: name 'src_conn' is not defined. Help me !!
hello sir I got some error and I don't get any clue through it after trying other alternative. Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")error Error while extracting data: local variable 'src_conn' referenced before assignment I really appreciate your help!!!!
Hi Noor, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to SQL Server port 1433. Hope this helps.
@@BiInsightsInc After all method executed, I got another error c:\Users\User\Desktop\ETL\etl.py:28: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy. df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn) importing rows 0 to 606... for table DimProduct Data load error: (psycopg2.OperationalError) connection to server at "LAPTOP-3C9TIKCE" (fe80::d8c0:aa56:7702:d5b5), port 5432 failed: FATAL: no pg_hba.conf entry for host "fe80::d8c0:aa56:7702:d5b5%19", user "etl", database "AdventureWorks", no encryption
@@nooraliikhwan8139 you need to edit your pg_hba.conf file on the machine. It is located on the following directory on window: C:\Program Files\PostgreSQL\14\data. Add below entries or whichever ones are missing on your pc. # TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only local all all md5 # IPv4 local connections: host all all 127.0.0.1/32 trust host all all 0.0.0.0/0 trust # IPv6 local connections: host all all ::1/128 trust host all all 0.0.0.0/0 trust
Followed the tutorial step by step and I am facing the following error. Even Googling couldn't help. Please help me. Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
Hi Dan, it looks like you're not able to connect to SQL Server. Make sure SQL Server is running you're able to connect to it via SQL Server authentication. Also, you need a SQL Server driver installed: www.microsoft.com/en-us/download/details.aspx?id=36434 I'd suggest the test your connection first to make sure you're able to connect.
@@DhanunjayaSrisailamTTTWill try and cover ETL with modern tools. If you want to setup SQL Server and Postgres then I have covered the installation here: ruclips.net/video/e5mvoKuV3xs/видео.html&t ruclips.net/video/fjYiWXHI7Mo/видео.html
importing rows 0 to 10... for table data load error: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")... (I am using sql server)
You are selecting from "sqlite_master" that's not the system schema in SQL Server. The SQL Server has a "sys" schema and table that stores table the information is called tables. Please refer to the repo and copy the script from there. github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
@@BiInsightsInc def load(df, tbl): try: rows_imported = 0 engine = create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;') con = engine.connnect() print(f'importing rows {rows_imported} to {rows_imported+len(df)}... for table {tbl}') df.to_sql(f'sys_{tbl}',con, if_exists='replace',index=False) rows_imported +=len(df) print("Data imported succesfully.") except Exception as e: print("data load error: " + str(e)) try: extract() except Exception as e: print("Error while extracting data: " + str) This is are my configs for loading to sql server but im still getting the error
@@skipa9906 your connection details do not seems to be complete. If you run only the following line: create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;') This will throw an error. Here is what I used to successfully connect and persist data to SQL Server. #user and password pwd = 'demopass' uid = 'etl' #sql db details dr = "SQL Server Native Client 11.0" srvr = "localhost\SQLEXPRESS" db = "AdventureWorksDW2019" engine = create_engine(f"mssql+pyodbc://{uid}:{pwd}@{srvr}:1433/{db}?driver={dr}") df.to_sql("customer_retention", engine ,if_exists='replace', index=False)
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")
Hi Anthonius, make sure you have the sql server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to sql server port 1433. Hope this helps.
@@isbakhullail6693 here is a stack overflow’s link that shows how to determine If SQL Server’s client is installed. stackoverflow.com/questions/10499643/check-if-sql-server-client-is-installed
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)') Error while extracting data: local variable 'src_conn' referenced before assignment
Hey, great video. I was wondering if this would be a good guide for creating a basic pipeline project for job interviews’s like Data Analysis/Engineering? Thank you.
Yes, this can a good example for a basic pipeline. However, I'd advise to go with the following video as it presents a complete picture of ETL (Extract, Transform and Load). ruclips.net/video/eZfD6x9FJ4E/видео.html
@@BiInsightsInc reading data from DB2 and insert to postgres , if in the middle pandas df is to get the result from DB2 and insert into postgres then int in DB2 auto convert to float
I think you need an introduction to ETL and/or and Data Engineering and why we are needed in the world of day. I’d say pick up a book in either subject and you will see the benefits of it. Here an intro to data engineering. ruclips.net/video/fwkLcp8dbic/видео.html
I have done a video on this topic with Airflow. Feel free to check it out! How to build and automate your Python ETL pipeline with Airflow | Data pipeline | Python ruclips.net/video/eZfD6x9FJ4E/видео.html
Next videos in this series:
Automate ETL Pipeline (Airflow): ruclips.net/video/eZfD6x9FJ4E/видео.html
ETL Load Reference Data: ruclips.net/video/W-8tEFAWD5A/видео.html
ETL Incremental Data Load (Source Change Detection): ruclips.net/video/32ErvH_m_no/видео.html&t
ETL Incremental Data Load (Destination Change Comparison): ruclips.net/video/a_T8xRaCO60/видео.html
How to connect to SQL Server via Python: ruclips.net/video/zdezE6TWSQQ/видео.html&t
How to connect to Postgres via Python: ruclips.net/video/h-u1ML-OWok/видео.html
But you table data type is text by default. How do you dynamically define that ?
My kind of guy! No frills or thrills, just straight to the point.
I am fortunate to find this channel on the RUclips ocean.... Can't thank you enough mate🙏🙏
Thank you for the demonstration, I looking for this kind of content/demonstration for a long, time, I always heard about "ETL" just hearded but never had a chance to see how does it in real.
So easy tutorial with clear insight. Please keep every tutorial so easy !
I'm not sure how I found your site, but holly cow this is just fantastic. I appreciate all the work you have put into this and I will enjoy learing from your examples!!!
Glad you found the channel and thanks for the kind words. Welcome to the community. Happy coding!
Fantastic videos, short and to the point! Great work, thank you for sharing!
Fantastic! Thanks a lot for all these series of videos
Such an easy to follow video, fantastic tutorial!
i am aspiring data engineer this video is really helpful
Thanks very much for your invaluable contribution towards eternal learning process.
Straight to the point !!! Love it I’m subscribing
Thank you for this please continue to make these informative videos, you are a very good illustrator
Lovely. short and crisp and to the point. amazing video
Glad you liked it!
Password as an environment variable is an absolute game changer.
Thanks
Exactly what I was looking for.Very well done! Thank you
great information, thankyou
Fantastic stuff, you really filled in some unknown gaps for me. Thanks.
Great tutorial, thanks for the information.
Thanks for such an informative video
Great job, keep going 👍
great great video
Excellent presentation. Keep it up.
Perfect explanation. 👍👍
I'm impressed!
Awesome! Thanks you so much for sharing your knowledge. Please keep it up.
Very useful
Thanks for sharing 😀
Helpful. Thank you for this video
is there possibility to only fetch records which has been modified on the source and update only them on destination side?
Yes, you can achieve this using the Source Change Detection technique. I have covered the incremental data load in the following videos. Feel free to check them out.
ruclips.net/video/32ErvH_m_no/видео.html&t
ruclips.net/video/a_T8xRaCO60/видео.html
Great Video
Glad you enjoyed it
Nice vid 🙏🏾
thank you very much for this!!!
best tutorial on youtube. Do you have any courses?
Great video. Congrats
AMAZING
Top tier staff this!!!
Thank you so much for the insight process
In this same flow is it possible to use excel file as a data source and then load the data in the database then showcase it to a frontend as a report ?
Of course you can. Here is a video on how to load data from excel file(s) to a database.
ruclips.net/video/W-8tEFAWD5A/видео.html
@@BiInsightsInc do provide consultancy ? i have few doubts regarding postgres and ETL.
such a good video. Super good explanation . Do you have same kind video but ETL to bigquery??
Thanks. I have covered big query on the following video:
ruclips.net/video/Nq8Td8h_szk/видео.html
Hello! That's a great explanation, thanks! Please tell me how the data transfer is carried out? Are we using the RAM of the server where Python is installed or are we using the RAM of the server where PostgreSQL is installed? I want to understand if this scenario is suitable if there is a table with 30 million rows on the SQL Server side?
Hey there, in this use we utilized Pandas and it loads the data in memory of the server where Python is installed. So you would need to make sure either data fits in the server's memory/load in batches or use the chunking strategy to load your data. Hope this helps.
Thank you for the video!
I would appreciate it if answer to my question. Why this method is better than traditional approach with ETL tools like SSIS, IBM DataStage, SAP Data Services. What I can make with Python ETL which I can't with other tools. Could you please give me some examples.
Glad you like the content. Oh boy where to start… anyone who dealt with traditional tools and mapping columns manually, casting the data formats to traditional tools formats and to pick up new updates in the source will tell you that this solution handles all of these challenges gracefully! Try developing a similar solution in one of those tools and you ll see!
great stuff thanks
why we used tbl[0] ? in
for tbl in src_tables:
#query and load save data to dataframe
df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn)
load(df, tbl[0])
Shouldn't we use tbl only? for current reference?
Hi Tejas, you can use above approach but you'd need to perform further action to get the actual value. The "tbl" is a pyodc.row and not a straight forward list. The Row object in pyodbc seems to be a combination of tuple, list and dict objects.
Nice video! I have a question, how can we optimize the ETL process while creating extract and load methods in notebook itself?
That’s too broad of a question. Is there a specific area you want to optimize?
Here are some broad tips to optimize the ETL pipeline. Hope this helps.
* Eliminate database Reads/Writes. ...
* Cache the Data. ...
* Use Parallel Processing. ...look into paper-mill
* Filter Unnecessary Datasets. ...
* Integrate Only What You Want.
fantastic video. Highly appreciated!.
On request, please make a video on Real time Data Engineering projects!. No good content in RUclips
I’m glad you liked it. Will try and come up with a real-time scenario. If you re interested in data streaming (continuous data flow) then I check out my video on Kafka: ruclips.net/video/gPvwvkCVSnY/видео.html
Fantastic!, Thank you so much.
If my destination is Dimentions and fact table then I need to write separate method for table or is there any other way
Yes, you need to perform subsequent transformation steps after extracting data to shape to fit your target tables.
for tbl in src_tables:
#print(tbl)
df = pd.read_sql_query(f''select * from {tbl[0]}', src_conn)
Hi, I'm getting invalid syntax in the df row.
I may not be using the f string correctly, but looking for ideas.
Thank you
Morgan
Hi Morgan, it seems you have two quotes after the f. f''select * from {tbl[0]}', src_conn
Remove one of the quotes and you should be good. Here is the original code..
for tbl in src_tables:
#print(tbl)
df = pd.read_sql_query(f'select * from {tbl[0]}', src_conn)
Hi,
Thank you for the note!
I'll give it a try tomorrow AM.
Great course
Morgan
Hi again,
I used your code from github and changed the name of the server
server = "LAPTOP-8SESKAVH\SQLEXPRESS" .
I'm still getting:
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections
I did substitute in AdventureWorksDW2019
Not sure what is going on
Morgan
Morgan's Attempt
# -*- coding: utf-8 -*-
"""
Created on Sat Nov 5 14:34:16 2022
@author: mtman
"""
#import needed libraries
from sqlalchemy import create_engine
import pyodbc
import pandas as pd
import os
#get password from environmnet var
pwd = os.environ['PGPASS']
uid = os.environ['PGUID']
#sql db details
driver = "{SQL Server Native Client 11.0}"
server = "LAPTOP-8SESKAVH\SQLEXPRESS" #"haq-PC"
database = "AdventureWorksDW2019;"
#extract data from sql server
def extract():
try:
src_conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server + '\SQLEXPRESS' + ';DATABASE=' + database + ';UID=' + uid + ';PWD=' + pwd)
src_cursor = src_conn.cursor()
# execute query
src_cursor.execute(""" select t.name as table_name
from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """)
src_tables = src_cursor.fetchall()
for tbl in src_tables:
#query and load save data to dataframe
df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn)
load(df, tbl[0])
except Exception as e:
print("Data extract error: " + str(e))
finally:
src_conn.close()
#load data to postgres
def load(df, tbl):
try:
rows_imported = 0
engine = create_engine(f'postgresql://{uid}:{pwd}@{server}:5432/AdventureWorks')
print(f'importing rows {rows_imported} to {rows_imported + len(df)}... for table {tbl}')
# save df to postgres
df.to_sql(f'stg_{tbl}', engine, if_exists='replace', index=False)
rows_imported += len(df)
# add elapsed time to final print out
print("Data imported successful")
except Exception as e:
print("Data load error: " + str(e))
try:
#call extract function
extract()
except Exception as e:
print("Error while extracting data: " + str(e))
@@BiInsightsInc
@@mtmanalyst make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433.
You need to check the SQL Server if it
a) Accepts remote connections.
b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services.
Open "SQL Server Configuration Manager"
Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name"
Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Did you create this user in your environment? If not you will need to create it. Here is the script for it. Hope this helps.
github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/SQL%20Scripts/SQL%20Server/create%20etl%20login%20and%20role%20-%20SQLServer.sql
Thank you
I'll get on this.
Morgan
Explicação muito boa. Vc tem algum tutorial extraindo dados de um ERP?
Thankyou so much...!
Hi bro, very informative session.. am working on the ETL QA framework creation using pyspark. for that I have to create a directory structure in pycharm.. is there any reference video you have created? Please share
I have covered Pytest recently as a testing framework for data engineering pipeline. You can check out videos on that topic. If you are using Pytest then you can use the following folder structure. Here is the docs for Pytest: docs.pytest.org/en/7.1.x/explanation/goodpractices.html
pyproject.toml
src/
mypkg/
__init__.py
app.py
view.py
tests/
test_app.py
test_view.py
Why not use the SQL functions in pandas?
You can use SQL functions. However, I am extracting and loading data therefore, there is no need for them in this context.
Thanks a lot !
Hello, Sir! Thank you for the great learning material! But I would like to ask help in running the ETL python script. I'm getting an error and it says 'Login failed for user etl'. Could this be because of the permissions?
Thanks. You will need to create the 'etl' user in both of the databases. Here are the SQL scripts to create them.
github.com/hnawaz007/pythondataanalysis/tree/main/ETL%20Pipeline/SQL%20Scripts
Also, here is the basic video on how to test your connection via Python using the user/password.
ruclips.net/video/zdezE6TWSQQ/видео.html&t
Thank you for the reply, Sir! Apparently I had to change my authentication method and it solved the problem. Thank you for the additional video as well! @@BiInsightsInc
Data sources are sap hana and sql server. My target table is same sql server. How can I perform upsert. I do have primary key on target table but it might be possible other columns might update in future . Please help
Hi Shrutika, you can use the following video as the guide. In this video we perform upsert based on a primary key column. Happy coding: ruclips.net/video/a_T8xRaCO60/видео.html
What is the ETL tool you used data injection and extraction?
I am using Python to extract and load data. The IDE used is Pycharm.
@@BiInsightsInc can you make video about tools and programing required for Data engineer
@@avinash7003 I have covered the Data Engineering role in the following video. Tools can vary depending on the Tech Stack the company is using. But I will do a broader video on what are the base requirements for Data Engineering role.
ruclips.net/video/fwkLcp8dbic/видео.html
a really great content!!!!, can you help me with the last step? i am having trouble for the last step Test ETL Pipeline, when using the cmd it says like this" '\postgres' is not recognized as an internal or external command, operable program or batch file." if i am using my C:, but when using D: its "Access is denied." so what can i do to do the last step? thank youuu
his is a very common error "is not recognized as an internal or external command". It comes up when command prompt does not know the location of Python or the script you are executing.
First we make sure, is the executable actually installed? If yes, continue with the rest, if not, install it first.
If you have any executable which you are attempting to run from cmd.exe then you need to tell cmd.exe where this file is located.
I have also the same problem. I have python installed and gave the directory but it doesn’t work. What should i do
Any suggestions or available code which can copy the compatible data type from MySQL to postgres..?? Please respond
Yes, there are various tools out there that will convert MySQL to Postgres syntax. There is online tool you can use. It will allow you to select source and target. Then there is mysql2postgres tool. If you're dealing with table DDL and want to convert MySQL data types to Postgres then you can write a simple script to swap the data types.
www.sqlines.com/online
github.com/maxlapshin/mysql2postgres
hi, it was wonderfull tutorial, it help me a lot for understanding the ETL concept.
but i'm struggle to follow your tutorial, it start from the Environment Variables. i can't found the similar variable like in your video in my laptop. i hope you can explain me, how to create that variable, or mybe what the solution to see the UID event without open the environment variables...
You can define the system variable under System > Advance and environment variables. I show the variables and their content at 4:10. They contain your database username and password.
amazing Video, I followed all but I had this error and could not find solution
Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 17 for SQL Server]Neither DSN nor SERVER keyword supplied (0) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute (0)')
Error while extracting data: local variable 'src_conn' referenced before assignment
Hi Nabil, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433.
You need to check the SQL Server if it
a) Accepts remote connections.
b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services.
Open "SQL Server Configuration Manager"
Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name"
Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Hope this helps.
Hi, Please is there any video or how do i change my directory to C:\postgres>
Thanks
You can type the following command in command prompt: cd c:\postgres
So cool!! where does transform happen?
Thanks. The transform step happen after the extract step. I have few examples of the transformations in the following video. ruclips.net/video/eZfD6x9FJ4E/видео.html
@@BiInsightsInc thank you!
Is this also working if ODBC Driver will be used?
Yes, you can use ODBC driver to connect to SQL server therefore, you can use this approach to read and load data.
Hello, I am getting the following error.
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")
Error while extracting data: local variable 'src_conn' referenced before assignment
driver is installed and I have done the other troubleshooting steps found in the previous comments.
Please checkout the following video. It goes over the connection setup and common errors.
ruclips.net/video/zdezE6TWSQQ/видео.html&t
In addition, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433.
You need to check the SQL Server if it
a) Accepts remote connections.
b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services.
Open "SQL Server Configuration Manager"
Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name"
Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
@@BiInsightsInc I am getting another error. I even tried copying the script directly from your repo changing only the server variable and
Data extract error: ('42000', "[42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ':'. (102) (SQLExecDirectW)")
thankyou bro.
Really nice explanation. However Im getting the following error:
27: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn)
Data load error: No module named 'psycopg2'
Data load error: No module named 'psycopg2'
Data load error: No module named 'psycopg2'
Data load error: No module named 'psycopg2'
Data load error: No module named 'psycopg2'
Any leads to tackle the issue?
Thanks. You need to install the 'psycopg2' module in your environment. Also, we have updated the code to use SQLAlchemy to get rid of the future warning you're seeing in the code. Here is the link: github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
@@BiInsightsInc that works but now I'm getting a new error:
importing rows 0 to 606... for table DimProduct
Data load error: (psycopg2.errors.InsufficientPrivilege) permission denied for schema public
LINE 2: CREATE TABLE "stg_DimProduct" (
^
[SQL:
CREATE TABLE "stg_DimProduct" (
"ProductKey" BIGINT,
"ProductAlternateKey" TEXT,
"ProductSubcategoryKey" FLOAT(53),
"WeightUnitMeasureCode" TEXT,
"SizeUnitMeasureCode" TEXT,
"EnglishProductName" TEXT,
"SpanishProductName" TEXT,
"FrenchProductName" TEXT,
"StandardCost" FLOAT(53),
"FinishedGoodsFlag" BOOLEAN,
"Color" TEXT,
"SafetyStockLevel" BIGINT,
"ReorderPoint" BIGINT,
"ListPrice" FLOAT(53),
"Size" TEXT,
"SizeRange" TEXT,
"Weight" FLOAT(53),
"DaysToManufacture" BIGINT,
"ProductLine" TEXT,
"DealerPrice" FLOAT(53),
"Class" TEXT,
"Style" TEXT,
"ModelName" TEXT,
"LargePhoto" TEXT,
"EnglishDescription" TEXT,
"FrenchDescription" TEXT,
"ChineseDescription" TEXT,
"ArabicDescription" TEXT,
"HebrewDescription" TEXT,
"ThaiDescription" TEXT,
"GermanDescription" TEXT,
"JapaneseDescription" TEXT,
"TurkishDescription" TEXT,
"StartDate" TIMESTAMP WITHOUT TIME ZONE,
"EndDate" TIMESTAMP WITHOUT TIME ZONE,
"Status" TEXT
)
Any workaround for this?
nvm, I managed to solve the issue. it seems that this video is not enough to come up with the whole migration, redirection to other tutorials is needed. Still, I think you have done an amazing job. Thanks!
Thank you!
purchased soft soft was because there are so many resources and remake templates out there to help understand how people have
im having a new problem is that, Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 18 for SQL Server]Named Pipes Provider: Could not open a connection to SQL Server [67]. (67) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to CALVIN/SQLEXPRESS. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (67)')
what do i do?
I have covered the commom SQL Server connection issues in this video here. With so many questions on this topic I had to cover it separately.
ruclips.net/video/zdezE6TWSQQ/видео.html&t
THIS IS AMAZING THANKS!!!
HELP!!!!
1)I did the same thing and it got importeed into postgres, But i need the column data types to be in specif cformat for example Datetime, VARCHAR, DOUBLE ,INT not these text,bigint etc. Please suggest what should i do!!
2) Data is same but the data types are different, i tried changing the data type in the data frame and created custom columns with CREATE sql query but it doesnt matter as pd.to_sql always replaces that table adn creates a new one.
So what should i do?
Thank you
Hi Arin, you can define data types for each of the columns in your dataframe in dictionary and while importing set the data type for each column. Here is an example to get you started.
type_dict = {'Col_A': 'category', 'Col_B': 'int16', 'Col_C': 'float16', 'Col_D': 'float32'}
df = pd.read_csv(myfile, dtype=type_dict)
This is cool! But isn't this more of an Extract and Load process than Extract, Transform and Load process? Most of the time while creating the Data Marts transform scripts is a heck and long. Also in the same scenario the Load procedure is mainly for insert query of records than tables. Nevertheless good video!
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video:
ruclips.net/video/eZfD6x9FJ4E/видео.html
Thanks! Subscribe and like buttons done.
hai friend.. I have problem. in the sql server name is ENGINEERDATA\ENGDATA and when I'm declared variable ='ENGINEERDATA\ENGDATA' in code when I run the scripts always appear pycopg2.operationalError couldn't translate host name "engineerdata\ENGDATA" to address :unknow host.
please help me
You can concatenate the server and instance as a text if the slash is causing an error. I have covered the how to connect to SQL Server and common issues while establishing a connection in this video here:
ruclips.net/video/zdezE6TWSQQ/видео.html
@@BiInsightsInc thank you so much friend
rg.postgresql.util.PSQLException: ERROR: permission denied for schema public I run exact function but I encounter this error, please help me :((
Grant your user permission on the schema with following statement.
GRANT USAGE ON SCHEMA public TO your_user;
@@BiInsightsInc oh, I have fixed the error, in the target url you have declared the user and password but in the load() function you still continue to option userid with password
@@vandc1684 That's great. There are two databases (source and target) and we connect to both therefore, two connections.
@@BiInsightsInc I know, thanks for your pj, have a nice day
Could you please explain below part as am getting error like mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'sakila.tables' doesn't exist.
what is t and sys here,,,,,am using mysql
src_cursor.execute(""" select t.name as table_name
from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """)
Hey Siva, t is the alias for the tables and sys is the system schema of the SQL Server. You can find the sys equivalent of the MYSQL and query the table info.
@@BiInsightsInc Thanks for the reply sir let me check
Hi this video was useful, can u make a video on Data migration from Mongodb to mysql using apache airflow with an example
wonderful bro..but how to trigger these code automatically on some condition, pls make vedio for schedulling also using some shell script etc
I have made few videos on how to schedule or trigger your Python ETL scripts. You can schedule them via Windows task scheduler, Airflow or Dagster. Feel free to check them out.
ruclips.net/video/t8QADtYdWEI/видео.html&t
ruclips.net/video/eZfD6x9FJ4E/видео.html&t
ruclips.net/video/IsuAltPOiEw/видео.html&t
Please tutorial sql server to bigquery with airflow
Hi Irfan, I am planning to do more videos on Airflow. Will cover Airflow to BigQuery as well. Stay tuned.
@@BiInsightsInc thankyou sir, i wait for that
I encountered this error while running the file: raise KeyError(key) from None at code 'uid' and 'pwd'. Could you help me with this? Also a very nice video! Thank you
Hi Dyu, script is not able to find the uid and pwd variables. If you do have these as environment variables then simple declare them in your script i.e. uid=“username”. Same for the pwd which is the database password.
@@BiInsightsInc Thanks for your help, I have done with that Error. And I have this new one: Data extract error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Error while extracting data: local variable 'src_conn' referenced before assignment.
I have changed from SQL server Native Client 11, SQL server and ODBC Driver 17for SQL Server, but none of it help me with this bug. Thank you !
@@duynguyenduc1255 this is one of the common issues. I have covered the causes in this video below. You need to debug your SQL Server connection prior to attempting the etl pipeline. Happy coding.
ruclips.net/video/zdezE6TWSQQ/видео.html
Hi,bro. Video so cool
I have trouble, can you help me. i installed msodbcsql. Tks you so much
Extract error:('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Data Extract Error: 'src_connect' is not defined
I have covered how to connect to a SQL Server database using Python. This is a common question that comes up in the ETL series. So, I decided to cover it and direct viewers to it if they are facing this issue: ruclips.net/video/zdezE6TWSQQ/видео.html
@@BiInsightsInc Hi Friend, I followed the video and the data appeared on Jupyter, but it gave me the new error: Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
Error while extracting data: name 'src_conn' is not defined.
Help me !!
hello sir I got some error and I don't get any clue through it after trying other alternative.
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")error
Error while extracting data: local variable 'src_conn' referenced before assignment
I really appreciate your help!!!!
Hi Noor, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to SQL Server port 1433. Hope this helps.
@@BiInsightsInc thank you so much for your speedy reply...I will try it and get back to you later.😄
@@BiInsightsInc
After all method executed, I got another error
c:\Users\User\Desktop\ETL\etl.py:28: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn)
importing rows 0 to 606... for table DimProduct
Data load error: (psycopg2.OperationalError) connection to server at "LAPTOP-3C9TIKCE" (fe80::d8c0:aa56:7702:d5b5), port 5432 failed: FATAL: no pg_hba.conf entry for host "fe80::d8c0:aa56:7702:d5b5%19", user "etl", database "AdventureWorks", no encryption
@@nooraliikhwan8139 you need to edit your pg_hba.conf file on the machine. It is located on the following directory on window: C:\Program Files\PostgreSQL\14\data. Add below entries or whichever ones are missing on your pc.
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all md5
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all 0.0.0.0/0 trust
# IPv6 local connections:
host all all ::1/128 trust
host all all 0.0.0.0/0 trust
thanks 🙂
Hi sir,
Can you make a video how to migrate data from SQL server to snowflake
Hello Motamarri, thanks for stopping by. Will try and cover Snowflake ❄️ soon.
Followed the tutorial step by step and I am facing the following error. Even Googling couldn't help. Please help me.
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
Hi Dan, it looks like you're not able to connect to SQL Server. Make sure SQL Server is running you're able to connect to it via SQL Server authentication. Also, you need a SQL Server driver installed:
www.microsoft.com/en-us/download/details.aspx?id=36434
I'd suggest the test your connection first to make sure you're able to connect.
@@BiInsightsInc I am very new to SQL server, I am trying to learn ETL with mysql background. Please make a playlist on ETL and ELT.
@@DhanunjayaSrisailamTTTWill try and cover ETL with modern tools. If you want to setup SQL Server and Postgres then I have covered the installation here:
ruclips.net/video/e5mvoKuV3xs/видео.html&t
ruclips.net/video/fjYiWXHI7Mo/видео.html
importing rows 0 to 10... for table
data load error: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")... (I am using sql server)
You are selecting from "sqlite_master" that's not the system schema in SQL Server. The SQL Server has a "sys" schema and table that stores table the information is called tables. Please refer to the repo and copy the script from there.
github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
@@BiInsightsInc thank you
@@BiInsightsInc def load(df, tbl):
try:
rows_imported = 0
engine = create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;')
con = engine.connnect()
print(f'importing rows {rows_imported} to {rows_imported+len(df)}... for table {tbl}')
df.to_sql(f'sys_{tbl}',con, if_exists='replace',index=False)
rows_imported +=len(df)
print("Data imported succesfully.")
except Exception as e:
print("data load error: " + str(e))
try:
extract()
except Exception as e:
print("Error while extracting data: " + str)
This is are my configs for loading to sql server but im still getting the error
@@skipa9906 your connection details do not seems to be complete. If you run only the following line: create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;')
This will throw an error.
Here is what I used to successfully connect and persist data to SQL Server.
#user and password
pwd = 'demopass'
uid = 'etl'
#sql db details
dr = "SQL Server Native Client 11.0"
srvr = "localhost\SQLEXPRESS"
db = "AdventureWorksDW2019"
engine = create_engine(f"mssql+pyodbc://{uid}:{pwd}@{srvr}:1433/{db}?driver={dr}")
df.to_sql("customer_retention", engine ,if_exists='replace', index=False)
@@BiInsightsInc thank you. i found a solution similar to how you do. great videos, im enjoying them.
txs man!!!
Where is the transform part
I have done a follow up to this with transformations in the following video. ruclips.net/video/eZfD6x9FJ4E/видео.html
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")
Hi Anthonius, make sure you have the sql server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to sql server port 1433. Hope this helps.
@@BiInsightsInc how to make sure SQL Server Native Client Installed in our machine ?
@@isbakhullail6693 here is a stack overflow’s link that shows how to determine If SQL Server’s client is installed.
stackoverflow.com/questions/10499643/check-if-sql-server-client-is-installed
@@BiInsightsInc thanks sir
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1)
(SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
Error while extracting data: local variable 'src_conn' referenced before assignment
Extracting and Loading are the easiest parts. There's no transforming data in this video.
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video:
ruclips.net/video/eZfD6x9FJ4E/видео.html
Hey, great video. I was wondering if this would be a good guide for creating a basic pipeline project for job interviews’s like Data Analysis/Engineering? Thank you.
Yes, this can a good example for a basic pipeline. However, I'd advise to go with the following video as it presents a complete picture of ETL (Extract, Transform and Load).
ruclips.net/video/eZfD6x9FJ4E/видео.html
4:17 you have your gmail password exposed
Thanks for pointing that out. Much appreciated. It has been updated:)
Sometimes int will auto convert to float in pandas and get error
Please share an example of this scenario.
@@BiInsightsInc reading data from DB2 and insert to postgres , if in the middle pandas df is to get the result from DB2 and insert into postgres then int in DB2 auto convert to float
@@SMCGPRA yes Pandas can convert numbers to float. You can convert it in your transformation layer. So what error it causes?
@@BiInsightsInc cast back to int will solve it but unnecessary process
You can declare data types for each column and provide it to pandas and it will adhere to your data types.
I'm still don't see benefit from it why need to make it like this if we can extract from source dirctly
I think you need an introduction to ETL and/or and Data Engineering and why we are needed in the world of day. I’d say pick up a book in either subject and you will see the benefits of it. Here an intro to data engineering.
ruclips.net/video/fwkLcp8dbic/видео.html
do with airflow please sir
I have done a video on this topic with Airflow. Feel free to check it out!
How to build and automate your Python ETL pipeline with Airflow | Data pipeline | Python
ruclips.net/video/eZfD6x9FJ4E/видео.html
This is just EL.
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video:
ruclips.net/video/eZfD6x9FJ4E/видео.html
houly shit
Thank you so much
97108 Emard Stream