How to build an ETL pipeline with Python | Data pipeline | Export from SQL Server to PostgreSQL

BI Insights Inc

Просмотров 208 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 янв 2025

Комментарии • 209

@BiInsightsInc 2 года назад ⁺²¹
Next videos in this series:
Automate ETL Pipeline (Airflow): ruclips.net/video/eZfD6x9FJ4E/видео.html
ETL Load Reference Data: ruclips.net/video/W-8tEFAWD5A/видео.html
ETL Incremental Data Load (Source Change Detection): ruclips.net/video/32ErvH_m_no/видео.html&t
ETL Incremental Data Load (Destination Change Comparison): ruclips.net/video/a_T8xRaCO60/видео.html
How to connect to SQL Server via Python: ruclips.net/video/zdezE6TWSQQ/видео.html&t
How to connect to Postgres via Python: ruclips.net/video/h-u1ML-OWok/видео.html
@js913 2 года назад
But you table data type is text by default. How do you dynamically define that ?
@severtone263 2 года назад ⁺²³
My kind of guy! No frills or thrills, just straight to the point.
@seedaccount3217 2 года назад ⁺⁶
I am fortunate to find this channel on the RUclips ocean.... Can't thank you enough mate🙏🙏
@dan2go Месяц назад ⁺¹
Thank you for the demonstration, I looking for this kind of content/demonstration for a long, time, I always heard about "ETL" just hearded but never had a chance to see how does it in real.
@katy807 2 года назад ⁺⁷
So easy tutorial with clear insight. Please keep every tutorial so easy !
@l.kennethwells2138 Год назад
I'm not sure how I found your site, but holly cow this is just fantastic. I appreciate all the work you have put into this and I will enjoy learing from your examples!!!
@BiInsightsInc Год назад ⁺¹
Glad you found the channel and thanks for the kind words. Welcome to the community. Happy coding!
@flyingdutchman9804 Год назад ⁺⁴
Fantastic videos, short and to the point! Great work, thank you for sharing!
@BoubacarBilloBah-j5g 11 месяцев назад
Fantastic! Thanks a lot for all these series of videos
@deanmarucci4467 Год назад ⁺³
Such an easy to follow video, fantastic tutorial!
@hsk7715 11 месяцев назад
i am aspiring data engineer this video is really helpful
@aparab82 Год назад ⁺¹
Thanks very much for your invaluable contribution towards eternal learning process.
@terrywelch9630 3 месяца назад
Straight to the point !!! Love it I’m subscribing
@nicolposthumus4243 Год назад
Thank you for this please continue to make these informative videos, you are a very good illustrator
@GopalSharma-mv9jy 8 месяцев назад
Lovely. short and crisp and to the point. amazing video
@BiInsightsInc 8 месяцев назад
Glad you liked it!
@afk4dyz 5 месяцев назад ⁺¹
Password as an environment variable is an absolute game changer.
@nicolposthumus4243 Год назад
Thanks
@shiblyaziz4528 Год назад
Exactly what I was looking for.Very well done! Thank you
@gethorsegh Год назад
great information, thankyou
@kai13man 2 года назад ⁺²
Fantastic stuff, you really filled in some unknown gaps for me. Thanks.
@essickallen2133 2 года назад ⁺¹
Great tutorial, thanks for the information.
@anandagarwal291 2 года назад
Thanks for such an informative video
@danielvoss2483 7 месяцев назад
Great job, keep going 👍
@data_dave 2 года назад
great great video
@pspointssara8472 2 года назад
Excellent presentation. Keep it up.
@mukulraj6184 2 года назад
Perfect explanation. 👍👍
@grahamlindsay9798 Год назад
I'm impressed!
@mdhasnat8395 2 года назад
Awesome! Thanks you so much for sharing your knowledge. Please keep it up.
@SUMITKUMAR-xu8zr 4 месяца назад
Very useful
@Ferruccio_Guicciardi Год назад
Thanks for sharing 😀
@SigdelSanjog 2 года назад
Helpful. Thank you for this video
@CookieMonsteeerrr 2 года назад ⁺¹
is there possibility to only fetch records which has been modified on the source and update only them on destination side?
@BiInsightsInc 2 года назад ⁺¹
Yes, you can achieve this using the Source Change Detection technique. I have covered the incremental data load in the following videos. Feel free to check them out.
ruclips.net/video/32ErvH_m_no/видео.html&t
ruclips.net/video/a_T8xRaCO60/видео.html
@ashishshethia6549 10 месяцев назад
Great Video
@BiInsightsInc 10 месяцев назад
Glad you enjoyed it
@ZyAriaStokley-Nesbitt 3 месяца назад
Nice vid 🙏🏾
@ThuanNguyen-yl4qd 11 месяцев назад
thank you very much for this!!!
@mateenfoster4595 2 года назад
best tutorial on youtube. Do you have any courses?
@orafaelgf 2 года назад
Great video. Congrats
@adilsaju 5 месяцев назад
AMAZING
@NobixLee 2 года назад
Top tier staff this!!!
@trieutruong2410 2 года назад
Thank you so much for the insight process
@AbhinavPandey-dt8pn 9 месяцев назад
In this same flow is it possible to use excel file as a data source and then load the data in the database then showcase it to a frontend as a report ?
@BiInsightsInc 9 месяцев назад
Of course you can. Here is a video on how to load data from excel file(s) to a database.
ruclips.net/video/W-8tEFAWD5A/видео.html
@AbhinavPandey-dt8pn 9 месяцев назад
@@BiInsightsInc do provide consultancy ? i have few doubts regarding postgres and ETL.
@FIBONACCIVEGA 2 месяца назад
such a good video. Super good explanation . Do you have same kind video but ETL to bigquery??
@BiInsightsInc 2 месяца назад
Thanks. I have covered big query on the following video:
ruclips.net/video/Nq8Td8h_szk/видео.html
@ivanl7786 5 месяцев назад
Hello! That's a great explanation, thanks! Please tell me how the data transfer is carried out? Are we using the RAM of the server where Python is installed or are we using the RAM of the server where PostgreSQL is installed? I want to understand if this scenario is suitable if there is a table with 30 million rows on the SQL Server side?
@BiInsightsInc 5 месяцев назад
Hey there, in this use we utilized Pandas and it loads the data in memory of the server where Python is installed. So you would need to make sure either data fits in the server's memory/load in batches or use the chunking strategy to load your data. Hope this helps.
@ВенциславКолев-с8р 2 года назад ⁺¹
Thank you for the video!
I would appreciate it if answer to my question. Why this method is better than traditional approach with ETL tools like SSIS, IBM DataStage, SAP Data Services. What I can make with Python ETL which I can't with other tools. Could you please give me some examples.
@BiInsightsInc 2 года назад ⁺³
Glad you like the content. Oh boy where to start… anyone who dealt with traditional tools and mapping columns manually, casting the data formats to traditional tools formats and to pick up new updates in the source will tell you that this solution handles all of these challenges gracefully! Try developing a similar solution in one of those tools and you ll see!
@kinelekaite3878 2 года назад
great stuff thanks
@TejasKatba 2 года назад
why we used tbl[0] ? in
for tbl in src_tables:
#query and load save data to dataframe
df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn)
load(df, tbl[0])
Shouldn't we use tbl only? for current reference?
@BiInsightsInc 2 года назад
Hi Tejas, you can use above approach but you'd need to perform further action to get the actual value. The "tbl" is a pyodc.row and not a straight forward list. The Row object in pyodbc seems to be a combination of tuple, list and dict objects.
@arunasingh8617 2 года назад ⁺²
Nice video! I have a question, how can we optimize the ETL process while creating extract and load methods in notebook itself?
@BiInsightsInc 2 года назад ⁺⁴
That’s too broad of a question. Is there a specific area you want to optimize?
Here are some broad tips to optimize the ETL pipeline. Hope this helps.
* Eliminate database Reads/Writes. ...
* Cache the Data. ...
* Use Parallel Processing. ...look into paper-mill
* Filter Unnecessary Datasets. ...
* Integrate Only What You Want.
@vedanthasm2659 2 года назад
fantastic video. Highly appreciated!.
On request, please make a video on Real time Data Engineering projects!. No good content in RUclips
@BiInsightsInc 2 года назад
I’m glad you liked it. Will try and come up with a real-time scenario. If you re interested in data streaming (continuous data flow) then I check out my video on Kafka: ruclips.net/video/gPvwvkCVSnY/видео.html
@VictorHugo-bd3bf 2 года назад
Fantastic!, Thank you so much.
@motamarriajaykumar8885 2 года назад
If my destination is Dimentions and fact table then I need to write separate method for table or is there any other way
@BiInsightsInc 2 года назад
Yes, you need to perform subsequent transformation steps after extracting data to shape to fit your target tables.
@mtmanalyst 2 года назад ⁺¹
for tbl in src_tables:
#print(tbl)
df = pd.read_sql_query(f''select * from {tbl[0]}', src_conn)
Hi, I'm getting invalid syntax in the df row.
I may not be using the f string correctly, but looking for ideas.
Thank you
Morgan
@BiInsightsInc 2 года назад ⁺¹
Hi Morgan, it seems you have two quotes after the f. f''select * from {tbl[0]}', src_conn
Remove one of the quotes and you should be good. Here is the original code..
for tbl in src_tables:
#print(tbl)
df = pd.read_sql_query(f'select * from {tbl[0]}', src_conn)
@mtmanalyst 2 года назад
Hi,
Thank you for the note!
I'll give it a try tomorrow AM.
Great course
Morgan
@mtmanalyst 2 года назад
Hi again,
I used your code from github and changed the name of the server
server = "LAPTOP-8SESKAVH\SQLEXPRESS" .
I'm still getting:
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections
I did substitute in AdventureWorksDW2019
Not sure what is going on
Morgan
Morgan's Attempt
# -*- coding: utf-8 -*-
"""
Created on Sat Nov 5 14:34:16 2022
@author: mtman
"""
#import needed libraries
from sqlalchemy import create_engine
import pyodbc
import pandas as pd
import os
#get password from environmnet var
pwd = os.environ['PGPASS']
uid = os.environ['PGUID']
#sql db details
driver = "{SQL Server Native Client 11.0}"
server = "LAPTOP-8SESKAVH\SQLEXPRESS" #"haq-PC"
database = "AdventureWorksDW2019;"
#extract data from sql server
def extract():
try:
src_conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server + '\SQLEXPRESS' + ';DATABASE=' + database + ';UID=' + uid + ';PWD=' + pwd)
src_cursor = src_conn.cursor()
# execute query
src_cursor.execute(""" select t.name as table_name
from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """)
src_tables = src_cursor.fetchall()
for tbl in src_tables:
#query and load save data to dataframe
df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn)
load(df, tbl[0])
except Exception as e:
print("Data extract error: " + str(e))
finally:
src_conn.close()
#load data to postgres
def load(df, tbl):
try:
rows_imported = 0
engine = create_engine(f'postgresql://{uid}:{pwd}@{server}:5432/AdventureWorks')
print(f'importing rows {rows_imported} to {rows_imported + len(df)}... for table {tbl}')
# save df to postgres
df.to_sql(f'stg_{tbl}', engine, if_exists='replace', index=False)
rows_imported += len(df)
# add elapsed time to final print out
print("Data imported successful")
except Exception as e:
print("Data load error: " + str(e))
try:
#call extract function
extract()
except Exception as e:
print("Error while extracting data: " + str(e))
@@BiInsightsInc
@BiInsightsInc 2 года назад
@@mtmanalyst make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433.
You need to check the SQL Server if it
a) Accepts remote connections.
b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services.
Open "SQL Server Configuration Manager"
Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name"
Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Did you create this user in your environment? If not you will need to create it. Here is the script for it. Hope this helps.
github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/SQL%20Scripts/SQL%20Server/create%20etl%20login%20and%20role%20-%20SQLServer.sql
@mtmanalyst 2 года назад
Thank you
I'll get on this.
Morgan
@oberdansantos5802 2 года назад ⁺¹
Explicação muito boa. Vc tem algum tutorial extraindo dados de um ERP?
@kumudinigandhesiri7149 4 месяца назад
Thankyou so much...!
@balajisivam1875 2 года назад ⁺¹
Hi bro, very informative session.. am working on the ETL QA framework creation using pyspark. for that I have to create a directory structure in pycharm.. is there any reference video you have created? Please share
@BiInsightsInc Год назад ⁺¹
I have covered Pytest recently as a testing framework for data engineering pipeline. You can check out videos on that topic. If you are using Pytest then you can use the following folder structure. Here is the docs for Pytest: docs.pytest.org/en/7.1.x/explanation/goodpractices.html
pyproject.toml
src/
mypkg/
__init__.py
app.py
view.py
tests/
test_app.py
test_view.py
@CaribouDataScience Год назад
Why not use the SQL functions in pandas?
@BiInsightsInc Год назад ⁺¹
You can use SQL functions. However, I am extracting and loading data therefore, there is no need for them in this context.
@gralleg9634 2 года назад
Thanks a lot !
@hellowillow_YT Год назад
Hello, Sir! Thank you for the great learning material! But I would like to ask help in running the ETL python script. I'm getting an error and it says 'Login failed for user etl'. Could this be because of the permissions?
@BiInsightsInc Год назад ⁺¹
Thanks. You will need to create the 'etl' user in both of the databases. Here are the SQL scripts to create them.
github.com/hnawaz007/pythondataanalysis/tree/main/ETL%20Pipeline/SQL%20Scripts
Also, here is the basic video on how to test your connection via Python using the user/password.
ruclips.net/video/zdezE6TWSQQ/видео.html&t
@hellowillow_YT Год назад
Thank you for the reply, Sir! Apparently I had to change my authentication method and it solved the problem. Thank you for the additional video as well! @@BiInsightsInc
@shutzzzzzz Год назад
Data sources are sap hana and sql server. My target table is same sql server. How can I perform upsert. I do have primary key on target table but it might be possible other columns might update in future . Please help
@BiInsightsInc Год назад
Hi Shrutika, you can use the following video as the guide. In this video we perform upsert based on a primary key column. Happy coding: ruclips.net/video/a_T8xRaCO60/видео.html
@avinash7003 Год назад
What is the ETL tool you used data injection and extraction?
@BiInsightsInc Год назад
I am using Python to extract and load data. The IDE used is Pycharm.
@avinash7003 Год назад
@@BiInsightsInc can you make video about tools and programing required for Data engineer
@BiInsightsInc Год назад
@@avinash7003 I have covered the Data Engineering role in the following video. Tools can vary depending on the Tech Stack the company is using. But I will do a broader video on what are the base requirements for Data Engineering role.
ruclips.net/video/fwkLcp8dbic/видео.html
@calvinallen8425 Год назад
a really great content!!!!, can you help me with the last step? i am having trouble for the last step Test ETL Pipeline, when using the cmd it says like this" '\postgres' is not recognized as an internal or external command, operable program or batch file." if i am using my C:, but when using D: its "Access is denied." so what can i do to do the last step? thank youuu
@BiInsightsInc Год назад ⁺¹
his is a very common error "is not recognized as an internal or external command". It comes up when command prompt does not know the location of Python or the script you are executing.
First we make sure, is the executable actually installed? If yes, continue with the rest, if not, install it first.
If you have any executable which you are attempting to run from cmd.exe then you need to tell cmd.exe where this file is located.
@md.khademulaminsohel2892 10 месяцев назад
I have also the same problem. I have python installed and gave the directory but it doesn’t work. What should i do
@MrPrateek008 Год назад
Any suggestions or available code which can copy the compatible data type from MySQL to postgres..?? Please respond
@BiInsightsInc Год назад
Yes, there are various tools out there that will convert MySQL to Postgres syntax. There is online tool you can use. It will allow you to select source and target. Then there is mysql2postgres tool. If you're dealing with table DDL and want to convert MySQL data types to Postgres then you can write a simple script to swap the data types.
www.sqlines.com/online
github.com/maxlapshin/mysql2postgres
@royranggarofiulazmi9708 Год назад
hi, it was wonderfull tutorial, it help me a lot for understanding the ETL concept.
but i'm struggle to follow your tutorial, it start from the Environment Variables. i can't found the similar variable like in your video in my laptop. i hope you can explain me, how to create that variable, or mybe what the solution to see the UID event without open the environment variables...
@BiInsightsInc Год назад
You can define the system variable under System > Advance and environment variables. I show the variables and their content at 4:10. They contain your database username and password.
@nabilrahmouni297 2 года назад ⁺¹
amazing Video, I followed all but I had this error and could not find solution
Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 17 for SQL Server]Neither DSN nor SERVER keyword supplied (0) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute (0)')
Error while extracting data: local variable 'src_conn' referenced before assignment
@BiInsightsInc 2 года назад
Hi Nabil, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433.
You need to check the SQL Server if it
a) Accepts remote connections.
b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services.
Open "SQL Server Configuration Manager"
Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name"
Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
Once you make above changes simply test the connection via python script to make sure you are able to connect to the SQL Server. Hope this helps.
@stormjay1810 Год назад
Hi, Please is there any video or how do i change my directory to C:\postgres>
Thanks
@BiInsightsInc Год назад
You can type the following command in command prompt: cd c:\postgres
@ernestmukaru1070 2 года назад
So cool!! where does transform happen?
@BiInsightsInc 2 года назад ⁺¹
Thanks. The transform step happen after the extract step. I have few examples of the transformations in the following video. ruclips.net/video/eZfD6x9FJ4E/видео.html
@ernestmukaru1070 2 года назад
@@BiInsightsInc thank you!
@nadeschdat5858 Год назад
Is this also working if ODBC Driver will be used?
@BiInsightsInc Год назад ⁺¹
Yes, you can use ODBC driver to connect to SQL server therefore, you can use this approach to read and load data.
@ooossssss 2 года назад
Hello, I am getting the following error.
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")
Error while extracting data: local variable 'src_conn' referenced before assignment
driver is installed and I have done the other troubleshooting steps found in the previous comments.
@BiInsightsInc 2 года назад
Please checkout the following video. It goes over the connection setup and common errors.
ruclips.net/video/zdezE6TWSQQ/видео.html&t
In addition, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in the firewall to allow connections to SQL Server port 1433.
You need to check the SQL Server if it
a) Accepts remote connections.
b) Check if the TCP/IP protocol is enabled. If not enable it and restart the services.
Open "SQL Server Configuration Manager"
Now Click on "SQL Server Network Configuration" and Click on "Protocols for Name"
Right Click on "TCP/IP" (make sure it is Enabled) Click on Properties
@ooossssss 2 года назад
@@BiInsightsInc I am getting another error. I even tried copying the script directly from your repo changing only the server variable and
Data extract error: ('42000', "[42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ':'. (102) (SQLExecDirectW)")
@CodingByAmp 2 года назад ⁺¹
thankyou bro.
@jorozcobe Год назад
Really nice explanation. However Im getting the following error:
27: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn)
Data load error: No module named 'psycopg2'
Data load error: No module named 'psycopg2'
Data load error: No module named 'psycopg2'
Data load error: No module named 'psycopg2'
Data load error: No module named 'psycopg2'
Any leads to tackle the issue?
@BiInsightsInc Год назад
Thanks. You need to install the 'psycopg2' module in your environment. Also, we have updated the code to use SQLAlchemy to get rid of the future warning you're seeing in the code. Here is the link: github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
@jorozcobe Год назад
@@BiInsightsInc that works but now I'm getting a new error:
importing rows 0 to 606... for table DimProduct
Data load error: (psycopg2.errors.InsufficientPrivilege) permission denied for schema public
LINE 2: CREATE TABLE "stg_DimProduct" (
^
[SQL:
CREATE TABLE "stg_DimProduct" (
"ProductKey" BIGINT,
"ProductAlternateKey" TEXT,
"ProductSubcategoryKey" FLOAT(53),
"WeightUnitMeasureCode" TEXT,
"SizeUnitMeasureCode" TEXT,
"EnglishProductName" TEXT,
"SpanishProductName" TEXT,
"FrenchProductName" TEXT,
"StandardCost" FLOAT(53),
"FinishedGoodsFlag" BOOLEAN,
"Color" TEXT,
"SafetyStockLevel" BIGINT,
"ReorderPoint" BIGINT,
"ListPrice" FLOAT(53),
"Size" TEXT,
"SizeRange" TEXT,
"Weight" FLOAT(53),
"DaysToManufacture" BIGINT,
"ProductLine" TEXT,
"DealerPrice" FLOAT(53),
"Class" TEXT,
"Style" TEXT,
"ModelName" TEXT,
"LargePhoto" TEXT,
"EnglishDescription" TEXT,
"FrenchDescription" TEXT,
"ChineseDescription" TEXT,
"ArabicDescription" TEXT,
"HebrewDescription" TEXT,
"ThaiDescription" TEXT,
"GermanDescription" TEXT,
"JapaneseDescription" TEXT,
"TurkishDescription" TEXT,
"StartDate" TIMESTAMP WITHOUT TIME ZONE,
"EndDate" TIMESTAMP WITHOUT TIME ZONE,
"Status" TEXT
)
Any workaround for this?
@jorozcobe Год назад
nvm, I managed to solve the issue. it seems that this video is not enough to come up with the whole migration, redirection to other tutorials is needed. Still, I think you have done an amazing job. Thanks!
@Anroth98 2 года назад
Thank you!
@kishoribhalsing 2 года назад
purchased soft soft was because there are so many resources and remake templates out there to help understand how people have
@calvinallen8425 Год назад
im having a new problem is that, Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 18 for SQL Server]Named Pipes Provider: Could not open a connection to SQL Server [67]. (67) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 18 for SQL Server]Login timeout expired (0); [08001] [Microsoft][ODBC Driver 18 for SQL Server]A network-related or instance-specific error has occurred while establishing a connection to CALVIN/SQLEXPRESS. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (67)')
what do i do?
@BiInsightsInc Год назад
I have covered the commom SQL Server connection issues in this video here. With so many questions on this topic I had to cover it separately.
ruclips.net/video/zdezE6TWSQQ/видео.html&t
@arinbose5778 2 года назад
THIS IS AMAZING THANKS!!!
HELP!!!!
1)I did the same thing and it got importeed into postgres, But i need the column data types to be in specif cformat for example Datetime, VARCHAR, DOUBLE ,INT not these text,bigint etc. Please suggest what should i do!!
2) Data is same but the data types are different, i tried changing the data type in the data frame and created custom columns with CREATE sql query but it doesnt matter as pd.to_sql always replaces that table adn creates a new one.
So what should i do?
Thank you
@BiInsightsInc 2 года назад ⁺¹
Hi Arin, you can define data types for each of the columns in your dataframe in dictionary and while importing set the data type for each column. Here is an example to get you started.
type_dict = {'Col_A': 'category', 'Col_B': 'int16', 'Col_C': 'float16', 'Col_D': 'float32'}
df = pd.read_csv(myfile, dtype=type_dict)
@Emotekofficial Год назад ⁺¹
This is cool! But isn't this more of an Extract and Load process than Extract, Transform and Load process? Most of the time while creating the Data Marts transform scripts is a heck and long. Also in the same scenario the Load procedure is mainly for insert query of records than tables. Nevertheless good video!
@BiInsightsInc Год назад ⁺¹
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video:
ruclips.net/video/eZfD6x9FJ4E/видео.html
@KESHA1982 Год назад
Thanks! Subscribe and like buttons done.
@ihab6796 2 года назад
hai friend.. I have problem. in the sql server name is ENGINEERDATA\ENGDATA and when I'm declared variable ='ENGINEERDATA\ENGDATA' in code when I run the scripts always appear pycopg2.operationalError couldn't translate host name "engineerdata\ENGDATA" to address :unknow host.
please help me
@BiInsightsInc 2 года назад
You can concatenate the server and instance as a text if the slash is causing an error. I have covered the how to connect to SQL Server and common issues while establishing a connection in this video here:
ruclips.net/video/zdezE6TWSQQ/видео.html
@ihab6796 2 года назад
@@BiInsightsInc thank you so much friend
@vandc1684 9 месяцев назад
rg.postgresql.util.PSQLException: ERROR: permission denied for schema public I run exact function but I encounter this error, please help me :((
@BiInsightsInc 9 месяцев назад
Grant your user permission on the schema with following statement.
GRANT USAGE ON SCHEMA public TO your_user;
@vandc1684 9 месяцев назад
@@BiInsightsInc oh, I have fixed the error, in the target url you have declared the user and password but in the load() function you still continue to option userid with password
@BiInsightsInc 9 месяцев назад ⁺¹
@@vandc1684 That's great. There are two databases (source and target) and we connect to both therefore, two connections.
@vandc1684 9 месяцев назад
@@BiInsightsInc I know, thanks for your pj, have a nice day
@sivahanuman4466 Год назад
Could you please explain below part as am getting error like mysql.connector.errors.ProgrammingError: 1146 (42S02): Table 'sakila.tables' doesn't exist.
what is t and sys here,,,,,am using mysql
src_cursor.execute(""" select t.name as table_name
from sys.tables t where t.name in ('DimProduct','DimProductSubcategory','DimProductSubcategory','DimProductCategory','DimSalesTerritory','FactInternetSales') """)
@BiInsightsInc Год назад ⁺¹
Hey Siva, t is the alias for the tables and sys is the system schema of the SQL Server. You can find the sys equivalent of the MYSQL and query the table info.
@sivahanuman4466 Год назад
@@BiInsightsInc Thanks for the reply sir let me check
@snehithjshiju1332 Год назад
Hi this video was useful, can u make a video on Data migration from Mongodb to mysql using apache airflow with an example
@stocktradingeducationthrou6750 2 года назад
wonderful bro..but how to trigger these code automatically on some condition, pls make vedio for schedulling also using some shell script etc
@BiInsightsInc 2 года назад
I have made few videos on how to schedule or trigger your Python ETL scripts. You can schedule them via Windows task scheduler, Airflow or Dagster. Feel free to check them out.
ruclips.net/video/t8QADtYdWEI/видео.html&t
ruclips.net/video/eZfD6x9FJ4E/видео.html&t
ruclips.net/video/IsuAltPOiEw/видео.html&t
@abnormalirfan 2 года назад ⁺¹
Please tutorial sql server to bigquery with airflow
@BiInsightsInc 2 года назад ⁺¹
Hi Irfan, I am planning to do more videos on Airflow. Will cover Airflow to BigQuery as well. Stay tuned.
@WEN_the_cat_Astonot 2 года назад
@@BiInsightsInc thankyou sir, i wait for that
@duynguyenduc1255 2 года назад
I encountered this error while running the file: raise KeyError(key) from None at code 'uid' and 'pwd'. Could you help me with this? Also a very nice video! Thank you
@BiInsightsInc 2 года назад
Hi Dyu, script is not able to find the uid and pwd variables. If you do have these as environment variables then simple declare them in your script i.e. uid=“username”. Same for the pwd which is the database password.
@duynguyenduc1255 2 года назад
@@BiInsightsInc Thanks for your help, I have done with that Error. And I have this new one: Data extract error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Error while extracting data: local variable 'src_conn' referenced before assignment.
I have changed from SQL server Native Client 11, SQL server and ODBC Driver 17for SQL Server, but none of it help me with this bug. Thank you !
@BiInsightsInc 2 года назад
@@duynguyenduc1255 this is one of the common issues. I have covered the causes in this video below. You need to debug your SQL Server connection prior to attempting the etl pipeline. Happy coding.
ruclips.net/video/zdezE6TWSQQ/видео.html
@linhtuan8603 Год назад
Hi,bro. Video so cool
I have trouble, can you help me. i installed msodbcsql. Tks you so much
Extract error:('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Data Extract Error: 'src_connect' is not defined
@BiInsightsInc Год назад ⁺¹
I have covered how to connect to a SQL Server database using Python. This is a common question that comes up in the ETL series. So, I decided to cover it and direct viewers to it if they are facing this issue: ruclips.net/video/zdezE6TWSQQ/видео.html
@linhtuan8603 Год назад ⁺¹
@@BiInsightsInc Hi Friend, I followed the video and the data appeared on Jupyter, but it gave me the new error: Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
Error while extracting data: name 'src_conn' is not defined.
Help me !!
@nooraliikhwan8139 2 года назад
hello sir I got some error and I don't get any clue through it after trying other alternative.
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")error
Error while extracting data: local variable 'src_conn' referenced before assignment
I really appreciate your help!!!!
@BiInsightsInc 2 года назад
Hi Noor, make sure you have the SQL Server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to SQL Server port 1433. Hope this helps.
@nooraliikhwan8139 2 года назад
@@BiInsightsInc thank you so much for your speedy reply...I will try it and get back to you later.😄
@nooraliikhwan8139 2 года назад
@@BiInsightsInc
After all method executed, I got another error
c:\Users\User\Desktop\ETL\etl.py:28: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df = pd.read_sql_query(f'select * FROM {tbl[0]}', src_conn)
importing rows 0 to 606... for table DimProduct
Data load error: (psycopg2.OperationalError) connection to server at "LAPTOP-3C9TIKCE" (fe80::d8c0:aa56:7702:d5b5), port 5432 failed: FATAL: no pg_hba.conf entry for host "fe80::d8c0:aa56:7702:d5b5%19", user "etl", database "AdventureWorks", no encryption
@BiInsightsInc 2 года назад
@@nooraliikhwan8139 you need to edit your pg_hba.conf file on the machine. It is located on the following directory on window: C:\Program Files\PostgreSQL\14\data. Add below entries or whichever ones are missing on your pc.
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all md5
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all 0.0.0.0/0 trust
# IPv6 local connections:
host all all ::1/128 trust
host all all 0.0.0.0/0 trust
@asimkhas3663 2 года назад
thanks 🙂
@motamarriajaykumar8885 2 года назад
Hi sir,
Can you make a video how to migrate data from SQL server to snowflake
@BiInsightsInc 2 года назад
Hello Motamarri, thanks for stopping by. Will try and cover Snowflake ❄️ soon.
@DhanunjayaSrisailamTTT 2 года назад
Followed the tutorial step by step and I am facing the following error. Even Googling couldn't help. Please help me.
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1) (SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
@BiInsightsInc 2 года назад
Hi Dan, it looks like you're not able to connect to SQL Server. Make sure SQL Server is running you're able to connect to it via SQL Server authentication. Also, you need a SQL Server driver installed:
www.microsoft.com/en-us/download/details.aspx?id=36434
I'd suggest the test your connection first to make sure you're able to connect.
@DhanunjayaSrisailamTTT 2 года назад ⁺¹
@@BiInsightsInc I am very new to SQL server, I am trying to learn ETL with mysql background. Please make a playlist on ETL and ELT.
@BiInsightsInc 2 года назад
@@DhanunjayaSrisailamTTTWill try and cover ETL with modern tools. If you want to setup SQL Server and Postgres then I have covered the installation here:
ruclips.net/video/e5mvoKuV3xs/видео.html&t
ruclips.net/video/fjYiWXHI7Mo/видео.html
@skipa9906 2 года назад
importing rows 0 to 10... for table
data load error: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)")... (I am using sql server)
@BiInsightsInc 2 года назад ⁺¹
You are selecting from "sqlite_master" that's not the system schema in SQL Server. The SQL Server has a "sys" schema and table that stores table the information is called tables. Please refer to the repo and copy the script from there.
github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py
@skipa9906 2 года назад
@@BiInsightsInc thank you
@skipa9906 2 года назад
@@BiInsightsInc def load(df, tbl):
try:
rows_imported = 0
engine = create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;')
con = engine.connnect()
print(f'importing rows {rows_imported} to {rows_imported+len(df)}... for table {tbl}')
df.to_sql(f'sys_{tbl}',con, if_exists='replace',index=False)
rows_imported +=len(df)
print("Data imported succesfully.")
except Exception as e:
print("data load error: " + str(e))
try:
extract()
except Exception as e:
print("Error while extracting data: " + str)
This is are my configs for loading to sql server but im still getting the error
@BiInsightsInc 2 года назад ⁺¹
@@skipa9906 your connection details do not seems to be complete. If you run only the following line: create_engine(r'Driver=SQL Server;Server=.\SQLEXPRESS;Database=Test2;Trusted_Connection=yes;')
This will throw an error.
Here is what I used to successfully connect and persist data to SQL Server.
#user and password
pwd = 'demopass'
uid = 'etl'
#sql db details
dr = "SQL Server Native Client 11.0"
srvr = "localhost\SQLEXPRESS"
db = "AdventureWorksDW2019"
engine = create_engine(f"mssql+pyodbc://{uid}:{pwd}@{srvr}:1433/{db}?driver={dr}")
df.to_sql("customer_retention", engine ,if_exists='replace', index=False)
@skipa9906 2 года назад
@@BiInsightsInc thank you. i found a solution similar to how you do. great videos, im enjoying them.
@DiogoSBastos Год назад
txs man!!!
@Daily_Dose_010 Год назад
Where is the transform part
@BiInsightsInc Год назад ⁺¹
I have done a follow up to this with transformations in the following video. ruclips.net/video/eZfD6x9FJ4E/видео.html
@anthoniusadi5754 2 года назад
Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")
@BiInsightsInc 2 года назад
Hi Anthonius, make sure you have the sql server driver installed on your machine, create the etl user with provided script. Also, add a rule in firewall to allow connections to sql server port 1433. Hope this helps.
@isbakhullail6693 2 года назад
@@BiInsightsInc how to make sure SQL Server Native Client Installed in our machine ?
@BiInsightsInc 2 года назад
@@isbakhullail6693 here is a stack overflow’s link that shows how to determine If SQL Server’s client is installed.
stackoverflow.com/questions/10499643/check-if-sql-server-client-is-installed
@isbakhullail6693 2 года назад
@@BiInsightsInc thanks sir
@isbakhullail6693 2 года назад
Data extract error: ('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (-1)
(SQLDriverConnect); [08001] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0); [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (-1)')
Error while extracting data: local variable 'src_conn' referenced before assignment
@davidcorona644 2 года назад ⁺¹
Extracting and Loading are the easiest parts. There's no transforming data in this video.
@BiInsightsInc 2 года назад ⁺³
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video:
ruclips.net/video/eZfD6x9FJ4E/видео.html
@Divine_Serpent_Geh Год назад
Hey, great video. I was wondering if this would be a good guide for creating a basic pipeline project for job interviews’s like Data Analysis/Engineering? Thank you.
@BiInsightsInc Год назад
Yes, this can a good example for a basic pipeline. However, I'd advise to go with the following video as it presents a complete picture of ETL (Extract, Transform and Load).
ruclips.net/video/eZfD6x9FJ4E/видео.html
@ashwinrawat9622 8 месяцев назад ⁺²
4:17 you have your gmail password exposed
@BiInsightsInc 8 месяцев назад ⁺¹
Thanks for pointing that out. Much appreciated. It has been updated:)
@SMCGPRA Год назад
Sometimes int will auto convert to float in pandas and get error
@BiInsightsInc Год назад
Please share an example of this scenario.
@SMCGPRA Год назад
@@BiInsightsInc reading data from DB2 and insert to postgres , if in the middle pandas df is to get the result from DB2 and insert into postgres then int in DB2 auto convert to float
@BiInsightsInc Год назад
@@SMCGPRA yes Pandas can convert numbers to float. You can convert it in your transformation layer. So what error it causes?
@SMCGPRA Год назад
@@BiInsightsInc cast back to int will solve it but unnecessary process
@BiInsightsInc Год назад
You can declare data types for each column and provide it to pandas and it will adhere to your data types.
@HeroicKhalid Год назад
I'm still don't see benefit from it why need to make it like this if we can extract from source dirctly
@BiInsightsInc Год назад
I think you need an introduction to ETL and/or and Data Engineering and why we are needed in the world of day. I’d say pick up a book in either subject and you will see the benefits of it. Here an intro to data engineering.
ruclips.net/video/fwkLcp8dbic/видео.html
@djamier9334 2 года назад
do with airflow please sir
@BiInsightsInc 2 года назад
I have done a video on this topic with Airflow. Feel free to check it out!
How to build and automate your Python ETL pipeline with Airflow | Data pipeline | Python
ruclips.net/video/eZfD6x9FJ4E/видео.html
@rangarajann1781 2 года назад
This is just EL.
@BiInsightsInc 2 года назад
The focus was Extract and load (ELT). I covered the whole process in the ETL automation video:
ruclips.net/video/eZfD6x9FJ4E/видео.html
@matheusbispo2428 Год назад
houly shit
@ranvijaymehta Год назад
Thank you so much
@ReedBurnell 3 месяца назад
97108 Emard Stream

Следующие

Автовоспроизведение

How to connect to MySQL and export data from MySQL to excel | Python | Pandas | Export dataframe