Видео 66
Просмотров 63 995

AI Innovation Through Collaboration With Startups

20:44

AI in IT Service

19:38

The Tukey Cramer Procedure

14:41

Levene Test

17:09

Neural Networks Meeting

35:04

Neural Networks 2: MNIST Classification

26:36

Generative AI - Introduction

Generative AI - Introduction

Видео

AI Innovation Through Collaboration With Startups

20:44

AI Innovation Through Collaboration With Startups

Просмотров 588 месяцев назад

AI Innovation Through Collaboration With Startups

19:38

AI in IT Service

Просмотров 909 месяцев назад

AI in IT Service

14:41

The Tukey Cramer Procedure

Просмотров 3654 года назад

The Tukey Cramer Procedure

17:09

Levene Test

Просмотров 4,1 тыс.4 года назад

Levene Test

35:04

Neural Networks Meeting

Просмотров 394 года назад

Neural Networks Meeting

26:36

Neural Networks 2: MNIST Classification

Просмотров 1414 года назад

Neural Networks 2: MNIST Classification

41:48

Neural Networks - Introduction

Просмотров 1125 лет назад

Neural Networks - Introduction

13:12

Chi square lesson overview

Просмотров 225 лет назад

Chi square lesson overview

38:47

Text Classification With Python

Просмотров 33 тыс.5 лет назад

Text Classification With Python

13:03

Our Little Girl Grows Up

Просмотров 1265 лет назад

Our Little Girl Grows Up

1:03:14

Machine Learning

Просмотров 1565 лет назад

Machine Learning

0:12

test

Просмотров 275 лет назад

test

36:56

Expert Systems

Просмотров 13 тыс.5 лет назад

Expert Systems

11:20

MGNT 671 - Python Getting Started

Просмотров 755 лет назад

MGNT 671 - Python Getting Started

26:59

MGNT 671 - Meeting 1 Extended

Просмотров 145 лет назад

MGNT 671 - Meeting 1 Extended

introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers

26:30

introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers

Просмотров 955 лет назад

introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers

4:30

Multiple Linear Regression Problem

Просмотров 325 лет назад

Multiple Linear Regression Problem

10:35

Chi Square Test

Просмотров 285 лет назад

Chi Square Test

22:59

MGNT 333 Summer 1-- Course Introduction

Просмотров 815 лет назад

MGNT 333 Summer 1 Course Introduction

Syllabus Text Analytics: Inclusive Excellence

25:51

Syllabus Text Analytics: Inclusive Excellence

Просмотров 365 лет назад

Syllabus Text Analytics: Inclusive Excellence

24:37

Decision Analysis

Просмотров 1275 лет назад

Decision Analysis

28:52

Welcome to Text Analytics!

Просмотров 1195 лет назад

Welcome to Text Analytics!

11:07

MSL4 Problem 5

Просмотров 485 лет назад

MSL4 Problem 5

29:33

MSL3 5 And 7

Просмотров 1145 лет назад

MSL3 5 And 7

22:29

Visualization Lab

Просмотров 335 лет назад

Visualization Lab

19:13

Lab 1: Grade What if

Просмотров 555 лет назад

Lab 1: Grade What if

27:59

MGNT 333 - Course Introduction

Просмотров 785 лет назад

MGNT 333 - Course Introduction

21:22

Jmp Lab

Просмотров 346 лет назад

Jmp Lab

15:16

MSL13 Problem 5

Просмотров 186 лет назад

MSL13 Problem 5

@PANDURANG99 2 месяца назад
How to do Multi Level Hierarchical Classification
@chrisphayao 4 месяца назад
Thanks for clear explanation - only the wild clicking around is very confusing, maybe consider to go step by step through a code without clicking around, thank you
@rajm5349 7 месяцев назад
can i get this coding
@hotdogsinmytummy 7 месяцев назад
Interesting🙌🗣️
@slushys9919 9 месяцев назад
Hi guys, for anyone that's looking for the code you may use the following: import os import random import string from nltk import word_tokenize from collections import defaultdict from nltk import FreqDist from nltk.corpus import stopwords from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn import metrics import pickle import csv stop_words = set(stopwords.words('english')) stop_words.add('said') stop_words.add('mr') BASE_DIR = LABELS = ['business', 'entertainment', 'politics', 'sport', 'tech'] # step 1 def create_data_set(): with open('data.csv', 'w', encoding='utf8', newline='') as csvfile: csv_writer = csv.writer(csvfile, delimiter=',') csv_writer.writerow(['Label', 'Filename', 'Text']) # Write header for label in LABELS: dir = '%s/%s' % (BASE_DIR, label) for filename in os.listdir(dir): fullfilename = '%s/%s' % (dir, filename) print(fullfilename) with open(fullfilename, 'rb') as file: text = file.read().decode(errors='replace').replace(' ', '') csv_writer.writerow([label, filename, text]) # [ (label, text), (label, text) ] # step 2 def setup_docs(): docs = [] # (label, text) with open('data.csv', 'r', encoding='utf8') as datafile: for row in datafile: parts = row.split(',') doc = (parts[0], parts[2].strip()) # assuming label is in the first column, and text in the third column docs.append(doc) return docs def get_tokens(text): tokens = word_tokenize(text) tokens = [t for t in tokens if not t in stop_words] return tokens def clean_text(text): text = text.translate(str.maketrans('', '', string.punctuation)) text = text.lower() return text def print_frequency_dist(docs): tokens = defaultdict(list) for doc in docs: doc_label = doc[0] doc_text = clean_text(doc[1]) doc_tokens = get_tokens(doc_text) tokens[doc_label].extend(doc_tokens) # Corrected typo for category_label, category_tokens in tokens.items(): print(category_label) fd = FreqDist(category_tokens) print(fd.most_common(20)) def get_splits(docs): random.shuffle(docs) X_train = [] # training documents y_train = [] # corresponding training labels X_test = [] # test documents y_test = [] # corresponding test labels pivot = int(.80 * len(docs)) for i in range(0, pivot): X_train.append(docs[i][1]) y_train.append(docs[i][0]) for i in range(pivot, len(docs)): X_test.append(docs[i][1]) y_test.append(docs[i][0]) return X_train, X_test, y_train, y_test def evaluate_classifier(title, classifier, vectorizer, X_test, y_test): X_test_tfidf = vectorizer.transform(X_test) y_pred = classifier.predict(X_test_tfidf) precision = metrics.precision_score(y_test, y_pred, average='weighted', zero_division=0) # Use 'weighted' for multiclass recall = metrics.recall_score(y_test, y_pred, average='weighted') # Use 'weighted' for multiclass f1 = metrics.f1_score(y_test, y_pred, average='weighted') # Use 'weighted' for multiclass print("%s\t%f\t%f\t%f " % (title, precision, recall, f1)) def train_classifier(docs): X_train, X_test, y_train, y_test = get_splits(docs) # the object that turns text into vectors/numbers vectorizer = CountVectorizer(stop_words='english', ngram_range=(1, 3), min_df=3, analyzer='word') # creates doc-term matrix dtm = vectorizer.fit_transform(X_train) # train Naive Bayes classifier naive_bayas_classifier = MultinomialNB().fit(dtm, y_train) evaluate_classifier("Naive Bayes\tTRAIN\t", naive_bayas_classifier, vectorizer, X_train, y_train) evaluate_classifier("Naive Bayes\tTEST\t", naive_bayas_classifier, vectorizer, X_test, y_test) # store the classifier clf_filename = 'naive_bayas_classifier.pkl' pickle.dump(naive_bayas_classifier, open(clf_filename, 'wb')) # also store the vectorizer so we can transform new data vec_filename = 'count_vectorizer.pkl' pickle.dump(vectorizer, open(vec_filename, 'wb')) def classify(text): # load classifier clf_filename = nb_clf = pickle.load(open(clf_filename, 'rb')) # vectorize the new text vec_filename = vectorizer = pickle.load(open(vec_filename, 'rb')) # preprocess the text processed_text = clean_text(text) tokens = get_tokens(processed_text) # make prediction pred = nb_clf.predict(vectorizer.transform([processed_text])) print(pred[0]) if __name__ == '__main__': # create_data_set() # docs = setup_docs() # print_frequency_dist(docs) # train_classifier(docs) # deployment in production new_doc = "Google showed off some new camera features on the Pixel 4 today" classify(new_doc) print("Done")
@slushys9919 9 месяцев назад
Note that the file paths need to be changed and there are steps that need to be followed in the video for the program to work. Remember to download the sample dataset from bbc as well when you're testing the code
@slushys9919 9 месяцев назад
I've also adjusted the parts of the codes due to some errors popping up but it should still work
@slushys9919 9 месяцев назад
what are the pkl and txt files for? and are they needed for the code to function?
@affectionlifeaffliction 10 месяцев назад
nothing works in jupyter notebook
@andonij 11 месяцев назад
Thank you so much Richard. You make my sunday with this explanation, excellent video.
@chrissonntag9 Год назад
I cannot find the source code or the data source :-( so not useful for me
@giantdutchviking Год назад
Thanks for taking the time to make this vid. Been learning Python for a short while, although I didnt understand all, it gave a good insight what machine learning does. Doesnt sound so "scary" anymore
@niteshsneha Год назад
Can you please share the github link for source code?
@InnocenceVVX Год назад
This is very cool, thank you
@tharindunilakshana1883 Год назад
import pandas as pd import json import numpy as np import csv import os import random import string from nltk import word_tokenize from collections import defaultdict from nltk import FreqDist from nltk.corpus import stopwords from sklearn. feature_extraction.text import TfidfVectorizer from sklearn. feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn import metrics import pickle import os.path stop_words = set (stopwords.words('english')) stop_words.add('said') stop_words.add('mr') BASE_DIR='C:/Users/user/Desktop/bbc/News Articles' LABELS = ['business', 'entertainment', 'politics', 'sport', 'tech'] def create_data_set(): with open('data.csv', 'w', encoding='utf8') as outfile: for label in LABELS: dir = '%s/%s' % (BASE_DIR, label) for filename in os. listdir(dir): fullfilename = '%s/%s' % (dir, filename) print(fullfilename) with open(fullfilename, 'rb') as file: text = file.read().decode(errors='replace').replace(' ', '') feilds = [label,filename,text] # creating a csv writer object csvwriter = csv.writer(outfile) # writing the fields csvwriter.writerow(feilds) print(text) #outfile.write('%s\t%s\t%s ' % (label, filename, text)) def setup_docs(): docs = [] # (label, text) with open('data.txt', 'r', encoding='utf8') as datafile: for row in datafile: parts = row.split('\t') doc = (parts[0], parts[2].strip()) docs.append(doc) return docs def clean_text(text): #remove punctuation text = text.translate(str.maketrans('', '', string. punctuation)) #convert to lower case text = text. lower() return text def get_tokens(text): # get individual words. tokens = word_tokenize (text) # remove common words that are useless tokens = [t for t in tokens if not t in stop_words] return tokens def print_frequency_dist(docs): tokens = defaultdict(list) #lets make a giant list of all the words for each category for doc in docs: doc_label = doc[0] doc_text = clean_text(doc[1]) doc_tokens = get_tokens(doc_text) # doc_text = clean_text(doc[1]) # doc_tokens = get_tokens(doc_text) tokens[doc_label].extend(doc_tokens) for category_label, category_tokens in tokens. items(): print (category_label) fd = FreqDist(category_tokens) print(fd.most_common(20)) def get_splits(docs): # scramble docs. random.shuffle(docs) X_train = []#training documents y_train = [] #corresponding training labels X_test=[] #test documents y_test= [] #correspoding test label pivot = int(.80*len(docs)) for i in range(0, pivot): X_train.append(docs[1][1]) y_train.append(docs[i][0]) for i in range(pivot, len(docs)): X_test.append(docs[i][1]) y_test.append(docs[i][0]) return X_train, X_test, y_train, y_test def evaluate_classifier(title, classifier, vectorizer, X_test, y_test): X_test_tfidf = vectorizer.transform(X_test) y_pred = classifier.predict(X_test_tfidf) precision = metrics.precision_score(y_test, y_pred,average='micro') recall = metrics.recall_score(y_test, y_pred,average='micro') f1 = metrics.f1_score(y_test, y_pred,average='micro') print("%s\t%f\t%f\t%f " % (title, precision, recall, f1)) def train_classifier(docs): X_train, X_test, y_train, y_test = get_splits(docs) # the object that turns text into vectors vectorizer = CountVectorizer(stop_words='english', ngram_range=(1, 3), min_df=3, analyzer='word') # create doc-term matrix dtm = vectorizer.fit_transform(X_train) # train Naive Bayes classifier naive_bayes_classifier = MultinomialNB().fit(dtm, y_train) evaluate_classifier("Naive Bayes\tTRAIN\t", naive_bayes_classifier, vectorizer, X_train, y_train) evaluate_classifier("Naive Bayes\tTEST\t", naive_bayes_classifier, vectorizer, X_test, y_test) # store the classifier clf_filename = 'naive_bayes_classifier.pkl' pickle.dump(naive_bayes_classifier, open(clf_filename, 'wb')) # also store the vectorizer so we can transform new data vec_filename = 'count_vectorizer.pkl' pickle.dump(vectorizer, open(vec_filename, 'wb')) #clssify the new content function def classify(text): #load classifier clf_filename='naive_bayes_classifier.pkl' nb_clf = pickle.load(open(clf_filename, 'rb')) #vectorize, the new text vec_filename = 'count_vectorizer.pkl' vectorizer = pickle.load(open (vec_filename, 'rb')) pred = nb_clf.predict (vectorizer.transform([text])) print(pred[0]) #create_data_set() #set up the document #docs = setup_docs() #print the word frequency #print_frequency_dist(docs) #train the classifier #train_classifier(docs) #classify the new content using pkl files new_doc = "Transparency International Sri Lanka (TISL) filed a petition in the Supreme Court yesterday (June 12), seeking to intervene in the ongoing Fundamental Rights case (SC/FR/Application No.168/2021) filed by the Center for Environmental Justice (CEJ) and three more petitioners, highlighting the serious allegations of bribery and corruption surrounding the X-Press Pearl disaster. The intervention petition is filed in the public interest. It refers to serious allegations of irregularity, mishandling, sabotage, bribery and corruption surrounding the claim for compensation arising from the X-Press Pearl disaster. Several key points have been raised in the intervention petition: The grave allegations of interference and extraneous pressure surrounding the claim for compensation arising from the X-Press Pearl disaster. The statement by the Justice Minister in Parliament on April 25, 2023, that one Chamara Gunasekara alias Manjusiri Nissanka had received a payment of USD 250 million into a private bank account in connection with the X-Press Pearl disaster. The media statements of Chinthaka Waragoda, who reportedly invented a machine to remove debris which washed ashore after the shipwreck, alleging that he was offered payment to discontinue the use of his machine, to avoid exposing the full extent of the damage caused by the disaster. Questions surrounding the quantum of compensation due to Sri Lanka for the damages caused by MV X-Press Pearl.The freight ship ‘MV X-Press Pearl’ caught fire off the coast of Colombo on 20th May, 2021. It sank a few days later, releasing its cargo of plastic pellets and tons of toxic chemicals into the ocean, causing Sri Lanka’s worst maritime disaster to date. It is alleged that Sri Lankan authorities obtained the assistance of the International Tanker Owners Pollution Federation Limited (ITOPF), a representative of the insurer of the Shipowner, in the post-disaster activities, despite the grave conflict of interest arising from it. TISL has urged that the private parties involved in the X-Press Pearl incident be held accountable, and be made to pay optimal compensation for the damage and pollution caused to the marine and coastal ecology of Sri Lanka, and the payment of compensation for the loss caused to the fishing communities and those engaged in tourism, as well as obtaining compensation under the Marine Pollution Prevention Act. TISL has also highlighted the need to hold anyone guilty of wrongdoing fully accountable. The petition for intervention is to be mentioned for Support in the Supreme Court on Thursday (June 15)." classify(new_doc) print("Done...!")
@anirbanghose8647 Год назад
loved it. it made a complete sense. thanks.
@viane123456 Год назад
I have a text file in which I have many lines i want to classify each lines how can i do it?
@anirbanghose4512 Год назад
Rewatch the video.
@mohamedsidhoum6835 Год назад
thank you for this video , form where i can get the source code please ?
@yongxing1848 Год назад
where is the code for this
@vedantvashishth989 Год назад
Were you able to find it. If yes please send me the link.
@yongxing1848 Год назад
@@vedantvashishth989 I re-write everything that he has put up
@amanichouk8967 Год назад
Thank you so much this is amazing and so structured
@tonyhasago 2 года назад
H - great video and it worked!! How would I asses a score the accuracy of the final step, classifying some new text?
@agradel100 Год назад
have you found out how to do it?
@bongimaposa 2 года назад
Wonderful! Thank you so much - detailed and I could follow.
@ikennanwankwo7448 2 года назад
Hello thank you for your video but the "create_data_set" function does not work if there are varying multiple files(.txt, .doc, .bin etc) in the subfolder. The Data.txt file output is empty (hence nothing is written to the data.txt file)
@ikennanwankwo7448 2 года назад
I just want it to write only the .txt files to the data.txt file(All the .txt files in the subfolders have the same file name if that helps)
@PD-qg2fo 2 года назад
Thank you so much sir. I was looking for this kind of tutorial..
@bibichbicha687 2 года назад
Can i get the code ,please sir?
@vedantvashishth989 Год назад
Were you able to find it. If yes please send me the link.
@fathersonduo 2 года назад
Please make more videos!!
@ananthakrishnan4754 2 года назад
Some people talk about p vale some talk about critical value - now I am confused.
@Muuip 2 года назад
Great presentation, much appreciated!
@Arrato1977 2 года назад
Explained super easy ! Thank you !
@walidbenaouda8935 2 года назад
can i get the code ?
@hggaming911 2 года назад
Awesome simple and clean code. Please can we have link to download the code?
@mikiyasassefakassa9136 2 года назад
mr i have aquation how you prepare dataset for text news classification in document level
@lunabaalbaki3169 2 года назад
Hey, thank you for the tutorial it's very helpful. Can you share your code?
@adylmanulat2465 2 года назад
good day sir, I just wanted to ask if an independent variable is not significant or does not have an explanatory power to the model but when removing it lowers the adjusted r-square what does this imply? so far the reason that i know the reason is because the t-statistic is greater than one. With this information, what can we infer?
@naughtychohan7956 2 года назад
can i get this code?
@vedantvashishth989 Год назад
Were you able to find it. If yes please send me the link.
@JiminPark-ld2xx 2 года назад
Does anyone use excel or csv data to work with text classification. Or should I create these .txt file for each and every row of my data?
@learner3585 2 года назад
Very good tutorial with good explanation. I was able to follow along and also able to run the whole program while watching the video. Thanks.
@doloresdizon7685 2 года назад
I am having trouble, Can you please help me :(
@StanleyDenman 3 года назад
Your video seems to be right on point as to what I want to do, but I am very confused. I am confused about the learning model aspect. If I want to just create hard rules for text classification, I do not need the data set training, right?
@LeomarOsorio 3 года назад
Thank you for this tutorial. This is a good walkthrough.
@mandysingh4044 3 года назад
Hlo sir, i want to contact you.
@pujanrajrai4930 3 года назад
thank you very much sir for this lecture this really helped me a lot hoping to see more content on machine learning
@timhn4010 3 года назад
The dataset: www.kaggle.com/pariza/bbc-news-summary
@rabbilbhuiyan5666 3 года назад
Excellent video and demonstration for text analysis. Thank you very much Sir !
@engbahja 3 года назад
many thanks for this. could you please the expert systems list you mention at the end of the video?
@lukajozic9768 3 года назад
Nice! There is however a function in the sklearn library called train_test_split I believe, that does exactly what your get_splits function does. Also, it would be helpful if you submitted the code in the description. Good video and great explanations!
@angelpascual1516 3 года назад
Can you pls help me to answer/solve this with conclusion. Alternative Capacity for New Store New Bridge Built No New Bride A. 1 14 B. 2 10 C. 4 6 Where A= Small B= Medium C= Large 1.Assume the pay offs represents profits. Determine the alternative that would under minimax approach. 2. Assume the pay offs represents profits. Determine the alternative that would be chosen under maximin approach. 3. Assume the pay offs represents the profits. Determine the alternative that would be chosen under maximax approach. 4. Assume the pay offs represents profits. Determine the alternative that would be chosen under laplace approach.
@faisalagarbaa1 3 года назад
Where is the Python source code?
@PULAMOLUDEEPAKBCE 3 года назад
can you share the code
@peterkirka4862 4 года назад
Hello Mr. Gruss. Is it possible to find Your code from this video somewhere?
@eduardolopez7323 5 лет назад
Just what i was looking for, thanks man.
@WillemAartVanDorpen 3 года назад
You're welcome!

Richard Gruss

Видео

Комментарии