Richard Gruss
Richard Gruss
  • Видео 66
  • Просмотров 63 995
Generative AI - Introduction
Generative AI - Introduction
Просмотров: 671

Видео

AI Innovation Through Collaboration With Startups
Просмотров 588 месяцев назад
AI Innovation Through Collaboration With Startups
AI in IT Service
Просмотров 909 месяцев назад
AI in IT Service
The Tukey Cramer Procedure
Просмотров 3654 года назад
The Tukey Cramer Procedure
Levene Test
Просмотров 4,1 тыс.4 года назад
Levene Test
Neural Networks Meeting
Просмотров 394 года назад
Neural Networks Meeting
Neural Networks 2: MNIST Classification
Просмотров 1414 года назад
Neural Networks 2: MNIST Classification
Neural Networks - Introduction
Просмотров 1125 лет назад
Neural Networks - Introduction
Chi square lesson overview
Просмотров 225 лет назад
Chi square lesson overview
Text Classification With Python
Просмотров 33 тыс.5 лет назад
Text Classification With Python
Our Little Girl Grows Up
Просмотров 1265 лет назад
Our Little Girl Grows Up
Machine Learning
Просмотров 1565 лет назад
Machine Learning
test
Просмотров 275 лет назад
test
Expert Systems
Просмотров 13 тыс.5 лет назад
Expert Systems
MGNT 671 - Python Getting Started
Просмотров 755 лет назад
MGNT 671 - Python Getting Started
MGNT 671 - Meeting 1 Extended
Просмотров 145 лет назад
MGNT 671 - Meeting 1 Extended
introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers
Просмотров 955 лет назад
introduction to MGNT 671: Artificial Intelligence and Machine Learning for Managers
Multiple Linear Regression Problem
Просмотров 325 лет назад
Multiple Linear Regression Problem
Chi Square Test
Просмотров 285 лет назад
Chi Square Test
MGNT 333 Summer 1-- Course Introduction
Просмотров 815 лет назад
MGNT 333 Summer 1 Course Introduction
Syllabus Text Analytics: Inclusive Excellence
Просмотров 365 лет назад
Syllabus Text Analytics: Inclusive Excellence
Decision Analysis
Просмотров 1275 лет назад
Decision Analysis
Welcome to Text Analytics!
Просмотров 1195 лет назад
Welcome to Text Analytics!
MSL4 Problem 5
Просмотров 485 лет назад
MSL4 Problem 5
MSL3 5 And 7
Просмотров 1145 лет назад
MSL3 5 And 7
Visualization Lab
Просмотров 335 лет назад
Visualization Lab
Lab 1: Grade What if
Просмотров 555 лет назад
Lab 1: Grade What if
MGNT 333 - Course Introduction
Просмотров 785 лет назад
MGNT 333 - Course Introduction
Jmp Lab
Просмотров 346 лет назад
Jmp Lab
MSL13 Problem 5
Просмотров 186 лет назад
MSL13 Problem 5

Комментарии

  • @PANDURANG99
    @PANDURANG99 2 месяца назад

    How to do Multi Level Hierarchical Classification

  • @chrisphayao
    @chrisphayao 4 месяца назад

    Thanks for clear explanation - only the wild clicking around is very confusing, maybe consider to go step by step through a code without clicking around, thank you

  • @rajm5349
    @rajm5349 7 месяцев назад

    can i get this coding

  • @hotdogsinmytummy
    @hotdogsinmytummy 7 месяцев назад

    Interesting🙌🗣️

  • @slushys9919
    @slushys9919 9 месяцев назад

    Hi guys, for anyone that's looking for the code you may use the following: import os import random import string from nltk import word_tokenize from collections import defaultdict from nltk import FreqDist from nltk.corpus import stopwords from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn import metrics import pickle import csv stop_words = set(stopwords.words('english')) stop_words.add('said') stop_words.add('mr') BASE_DIR = LABELS = ['business', 'entertainment', 'politics', 'sport', 'tech'] # step 1 def create_data_set(): with open('data.csv', 'w', encoding='utf8', newline='') as csvfile: csv_writer = csv.writer(csvfile, delimiter=',') csv_writer.writerow(['Label', 'Filename', 'Text']) # Write header for label in LABELS: dir = '%s/%s' % (BASE_DIR, label) for filename in os.listdir(dir): fullfilename = '%s/%s' % (dir, filename) print(fullfilename) with open(fullfilename, 'rb') as file: text = file.read().decode(errors='replace').replace(' ', '') csv_writer.writerow([label, filename, text]) # [ (label, text), (label, text) ] # step 2 def setup_docs(): docs = [] # (label, text) with open('data.csv', 'r', encoding='utf8') as datafile: for row in datafile: parts = row.split(',') doc = (parts[0], parts[2].strip()) # assuming label is in the first column, and text in the third column docs.append(doc) return docs def get_tokens(text): tokens = word_tokenize(text) tokens = [t for t in tokens if not t in stop_words] return tokens def clean_text(text): text = text.translate(str.maketrans('', '', string.punctuation)) text = text.lower() return text def print_frequency_dist(docs): tokens = defaultdict(list) for doc in docs: doc_label = doc[0] doc_text = clean_text(doc[1]) doc_tokens = get_tokens(doc_text) tokens[doc_label].extend(doc_tokens) # Corrected typo for category_label, category_tokens in tokens.items(): print(category_label) fd = FreqDist(category_tokens) print(fd.most_common(20)) def get_splits(docs): random.shuffle(docs) X_train = [] # training documents y_train = [] # corresponding training labels X_test = [] # test documents y_test = [] # corresponding test labels pivot = int(.80 * len(docs)) for i in range(0, pivot): X_train.append(docs[i][1]) y_train.append(docs[i][0]) for i in range(pivot, len(docs)): X_test.append(docs[i][1]) y_test.append(docs[i][0]) return X_train, X_test, y_train, y_test def evaluate_classifier(title, classifier, vectorizer, X_test, y_test): X_test_tfidf = vectorizer.transform(X_test) y_pred = classifier.predict(X_test_tfidf) precision = metrics.precision_score(y_test, y_pred, average='weighted', zero_division=0) # Use 'weighted' for multiclass recall = metrics.recall_score(y_test, y_pred, average='weighted') # Use 'weighted' for multiclass f1 = metrics.f1_score(y_test, y_pred, average='weighted') # Use 'weighted' for multiclass print("%s\t%f\t%f\t%f " % (title, precision, recall, f1)) def train_classifier(docs): X_train, X_test, y_train, y_test = get_splits(docs) # the object that turns text into vectors/numbers vectorizer = CountVectorizer(stop_words='english', ngram_range=(1, 3), min_df=3, analyzer='word') # creates doc-term matrix dtm = vectorizer.fit_transform(X_train) # train Naive Bayes classifier naive_bayas_classifier = MultinomialNB().fit(dtm, y_train) evaluate_classifier("Naive Bayes\tTRAIN\t", naive_bayas_classifier, vectorizer, X_train, y_train) evaluate_classifier("Naive Bayes\tTEST\t", naive_bayas_classifier, vectorizer, X_test, y_test) # store the classifier clf_filename = 'naive_bayas_classifier.pkl' pickle.dump(naive_bayas_classifier, open(clf_filename, 'wb')) # also store the vectorizer so we can transform new data vec_filename = 'count_vectorizer.pkl' pickle.dump(vectorizer, open(vec_filename, 'wb')) def classify(text): # load classifier clf_filename = nb_clf = pickle.load(open(clf_filename, 'rb')) # vectorize the new text vec_filename = vectorizer = pickle.load(open(vec_filename, 'rb')) # preprocess the text processed_text = clean_text(text) tokens = get_tokens(processed_text) # make prediction pred = nb_clf.predict(vectorizer.transform([processed_text])) print(pred[0]) if __name__ == '__main__': # create_data_set() # docs = setup_docs() # print_frequency_dist(docs) # train_classifier(docs) # deployment in production new_doc = "Google showed off some new camera features on the Pixel 4 today" classify(new_doc) print("Done")

    • @slushys9919
      @slushys9919 9 месяцев назад

      Note that the file paths need to be changed and there are steps that need to be followed in the video for the program to work. Remember to download the sample dataset from bbc as well when you're testing the code

    • @slushys9919
      @slushys9919 9 месяцев назад

      I've also adjusted the parts of the codes due to some errors popping up but it should still work

  • @slushys9919
    @slushys9919 9 месяцев назад

    what are the pkl and txt files for? and are they needed for the code to function?

  • @affectionlifeaffliction
    @affectionlifeaffliction 10 месяцев назад

    nothing works in jupyter notebook

  • @andonij
    @andonij 11 месяцев назад

    Thank you so much Richard. You make my sunday with this explanation, excellent video.

  • @chrissonntag9
    @chrissonntag9 Год назад

    I cannot find the source code or the data source :-( so not useful for me

  • @giantdutchviking
    @giantdutchviking Год назад

    Thanks for taking the time to make this vid. Been learning Python for a short while, although I didnt understand all, it gave a good insight what machine learning does. Doesnt sound so "scary" anymore

  • @niteshsneha
    @niteshsneha Год назад

    Can you please share the github link for source code?

  • @InnocenceVVX
    @InnocenceVVX Год назад

    This is very cool, thank you

  • @tharindunilakshana1883
    @tharindunilakshana1883 Год назад

    import pandas as pd import json import numpy as np import csv import os import random import string from nltk import word_tokenize from collections import defaultdict from nltk import FreqDist from nltk.corpus import stopwords from sklearn. feature_extraction.text import TfidfVectorizer from sklearn. feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn import metrics import pickle import os.path stop_words = set (stopwords.words('english')) stop_words.add('said') stop_words.add('mr') BASE_DIR='C:/Users/user/Desktop/bbc/News Articles' LABELS = ['business', 'entertainment', 'politics', 'sport', 'tech'] def create_data_set(): with open('data.csv', 'w', encoding='utf8') as outfile: for label in LABELS: dir = '%s/%s' % (BASE_DIR, label) for filename in os. listdir(dir): fullfilename = '%s/%s' % (dir, filename) print(fullfilename) with open(fullfilename, 'rb') as file: text = file.read().decode(errors='replace').replace(' ', '') feilds = [label,filename,text] # creating a csv writer object csvwriter = csv.writer(outfile) # writing the fields csvwriter.writerow(feilds) print(text) #outfile.write('%s\t%s\t%s ' % (label, filename, text)) def setup_docs(): docs = [] # (label, text) with open('data.txt', 'r', encoding='utf8') as datafile: for row in datafile: parts = row.split('\t') doc = (parts[0], parts[2].strip()) docs.append(doc) return docs def clean_text(text): #remove punctuation text = text.translate(str.maketrans('', '', string. punctuation)) #convert to lower case text = text. lower() return text def get_tokens(text): # get individual words. tokens = word_tokenize (text) # remove common words that are useless tokens = [t for t in tokens if not t in stop_words] return tokens def print_frequency_dist(docs): tokens = defaultdict(list) #lets make a giant list of all the words for each category for doc in docs: doc_label = doc[0] doc_text = clean_text(doc[1]) doc_tokens = get_tokens(doc_text) # doc_text = clean_text(doc[1]) # doc_tokens = get_tokens(doc_text) tokens[doc_label].extend(doc_tokens) for category_label, category_tokens in tokens. items(): print (category_label) fd = FreqDist(category_tokens) print(fd.most_common(20)) def get_splits(docs): # scramble docs. random.shuffle(docs) X_train = []#training documents y_train = [] #corresponding training labels X_test=[] #test documents y_test= [] #correspoding test label pivot = int(.80*len(docs)) for i in range(0, pivot): X_train.append(docs[1][1]) y_train.append(docs[i][0]) for i in range(pivot, len(docs)): X_test.append(docs[i][1]) y_test.append(docs[i][0]) return X_train, X_test, y_train, y_test def evaluate_classifier(title, classifier, vectorizer, X_test, y_test): X_test_tfidf = vectorizer.transform(X_test) y_pred = classifier.predict(X_test_tfidf) precision = metrics.precision_score(y_test, y_pred,average='micro') recall = metrics.recall_score(y_test, y_pred,average='micro') f1 = metrics.f1_score(y_test, y_pred,average='micro') print("%s\t%f\t%f\t%f " % (title, precision, recall, f1)) def train_classifier(docs): X_train, X_test, y_train, y_test = get_splits(docs) # the object that turns text into vectors vectorizer = CountVectorizer(stop_words='english', ngram_range=(1, 3), min_df=3, analyzer='word') # create doc-term matrix dtm = vectorizer.fit_transform(X_train) # train Naive Bayes classifier naive_bayes_classifier = MultinomialNB().fit(dtm, y_train) evaluate_classifier("Naive Bayes\tTRAIN\t", naive_bayes_classifier, vectorizer, X_train, y_train) evaluate_classifier("Naive Bayes\tTEST\t", naive_bayes_classifier, vectorizer, X_test, y_test) # store the classifier clf_filename = 'naive_bayes_classifier.pkl' pickle.dump(naive_bayes_classifier, open(clf_filename, 'wb')) # also store the vectorizer so we can transform new data vec_filename = 'count_vectorizer.pkl' pickle.dump(vectorizer, open(vec_filename, 'wb')) #clssify the new content function def classify(text): #load classifier clf_filename='naive_bayes_classifier.pkl' nb_clf = pickle.load(open(clf_filename, 'rb')) #vectorize, the new text vec_filename = 'count_vectorizer.pkl' vectorizer = pickle.load(open (vec_filename, 'rb')) pred = nb_clf.predict (vectorizer.transform([text])) print(pred[0]) #create_data_set() #set up the document #docs = setup_docs() #print the word frequency #print_frequency_dist(docs) #train the classifier #train_classifier(docs) #classify the new content using pkl files new_doc = "Transparency International Sri Lanka (TISL) filed a petition in the Supreme Court yesterday (June 12), seeking to intervene in the ongoing Fundamental Rights case (SC/FR/Application No.168/2021) filed by the Center for Environmental Justice (CEJ) and three more petitioners, highlighting the serious allegations of bribery and corruption surrounding the X-Press Pearl disaster. The intervention petition is filed in the public interest. It refers to serious allegations of irregularity, mishandling, sabotage, bribery and corruption surrounding the claim for compensation arising from the X-Press Pearl disaster. Several key points have been raised in the intervention petition: The grave allegations of interference and extraneous pressure surrounding the claim for compensation arising from the X-Press Pearl disaster. The statement by the Justice Minister in Parliament on April 25, 2023, that one Chamara Gunasekara alias Manjusiri Nissanka had received a payment of USD 250 million into a private bank account in connection with the X-Press Pearl disaster. The media statements of Chinthaka Waragoda, who reportedly invented a machine to remove debris which washed ashore after the shipwreck, alleging that he was offered payment to discontinue the use of his machine, to avoid exposing the full extent of the damage caused by the disaster. Questions surrounding the quantum of compensation due to Sri Lanka for the damages caused by MV X-Press Pearl.The freight ship ‘MV X-Press Pearl’ caught fire off the coast of Colombo on 20th May, 2021. It sank a few days later, releasing its cargo of plastic pellets and tons of toxic chemicals into the ocean, causing Sri Lanka’s worst maritime disaster to date. It is alleged that Sri Lankan authorities obtained the assistance of the International Tanker Owners Pollution Federation Limited (ITOPF), a representative of the insurer of the Shipowner, in the post-disaster activities, despite the grave conflict of interest arising from it. TISL has urged that the private parties involved in the X-Press Pearl incident be held accountable, and be made to pay optimal compensation for the damage and pollution caused to the marine and coastal ecology of Sri Lanka, and the payment of compensation for the loss caused to the fishing communities and those engaged in tourism, as well as obtaining compensation under the Marine Pollution Prevention Act. TISL has also highlighted the need to hold anyone guilty of wrongdoing fully accountable. The petition for intervention is to be mentioned for Support in the Supreme Court on Thursday (June 15)." classify(new_doc) print("Done...!")

  • @anirbanghose8647
    @anirbanghose8647 Год назад

    loved it. it made a complete sense. thanks.

  • @viane123456
    @viane123456 Год назад

    I have a text file in which I have many lines i want to classify each lines how can i do it?

  • @mohamedsidhoum6835
    @mohamedsidhoum6835 Год назад

    thank you for this video , form where i can get the source code please ?

  • @yongxing1848
    @yongxing1848 Год назад

    where is the code for this

    • @vedantvashishth989
      @vedantvashishth989 Год назад

      Were you able to find it. If yes please send me the link.

    • @yongxing1848
      @yongxing1848 Год назад

      @@vedantvashishth989 I re-write everything that he has put up

  • @amanichouk8967
    @amanichouk8967 Год назад

    Thank you so much this is amazing and so structured

  • @tonyhasago
    @tonyhasago 2 года назад

    H - great video and it worked!! How would I asses a score the accuracy of the final step, classifying some new text?

    • @agradel100
      @agradel100 Год назад

      have you found out how to do it?

  • @bongimaposa
    @bongimaposa 2 года назад

    Wonderful! Thank you so much - detailed and I could follow.

  • @ikennanwankwo7448
    @ikennanwankwo7448 2 года назад

    Hello thank you for your video but the "create_data_set" function does not work if there are varying multiple files(.txt, .doc, .bin etc) in the subfolder. The Data.txt file output is empty (hence nothing is written to the data.txt file)

    • @ikennanwankwo7448
      @ikennanwankwo7448 2 года назад

      I just want it to write only the .txt files to the data.txt file(All the .txt files in the subfolders have the same file name if that helps)

  • @PD-qg2fo
    @PD-qg2fo 2 года назад

    Thank you so much sir. I was looking for this kind of tutorial..

  • @bibichbicha687
    @bibichbicha687 2 года назад

    Can i get the code ,please sir?

    • @vedantvashishth989
      @vedantvashishth989 Год назад

      Were you able to find it. If yes please send me the link.

  • @fathersonduo
    @fathersonduo 2 года назад

    Please make more videos!!

  • @ananthakrishnan4754
    @ananthakrishnan4754 2 года назад

    Some people talk about p vale some talk about critical value - now I am confused.

  • @Muuip
    @Muuip 2 года назад

    Great presentation, much appreciated!

  • @Arrato1977
    @Arrato1977 2 года назад

    Explained super easy ! Thank you !

  • @walidbenaouda8935
    @walidbenaouda8935 2 года назад

    can i get the code ?

  • @hggaming911
    @hggaming911 2 года назад

    Awesome simple and clean code. Please can we have link to download the code?

  • @mikiyasassefakassa9136
    @mikiyasassefakassa9136 2 года назад

    mr i have aquation how you prepare dataset for text news classification in document level

  • @lunabaalbaki3169
    @lunabaalbaki3169 2 года назад

    Hey, thank you for the tutorial it's very helpful. Can you share your code?

  • @adylmanulat2465
    @adylmanulat2465 2 года назад

    good day sir, I just wanted to ask if an independent variable is not significant or does not have an explanatory power to the model but when removing it lowers the adjusted r-square what does this imply? so far the reason that i know the reason is because the t-statistic is greater than one. With this information, what can we infer?

  • @naughtychohan7956
    @naughtychohan7956 2 года назад

    can i get this code?

    • @vedantvashishth989
      @vedantvashishth989 Год назад

      Were you able to find it. If yes please send me the link.

  • @JiminPark-ld2xx
    @JiminPark-ld2xx 2 года назад

    Does anyone use excel or csv data to work with text classification. Or should I create these .txt file for each and every row of my data?

  • @learner3585
    @learner3585 2 года назад

    Very good tutorial with good explanation. I was able to follow along and also able to run the whole program while watching the video. Thanks.

    • @doloresdizon7685
      @doloresdizon7685 2 года назад

      I am having trouble, Can you please help me :(

  • @StanleyDenman
    @StanleyDenman 3 года назад

    Your video seems to be right on point as to what I want to do, but I am very confused. I am confused about the learning model aspect. If I want to just create hard rules for text classification, I do not need the data set training, right?

  • @LeomarOsorio
    @LeomarOsorio 3 года назад

    Thank you for this tutorial. This is a good walkthrough.

  • @mandysingh4044
    @mandysingh4044 3 года назад

    Hlo sir, i want to contact you.

  • @pujanrajrai4930
    @pujanrajrai4930 3 года назад

    thank you very much sir for this lecture this really helped me a lot hoping to see more content on machine learning

  • @timhn4010
    @timhn4010 3 года назад

    The dataset: www.kaggle.com/pariza/bbc-news-summary

  • @rabbilbhuiyan5666
    @rabbilbhuiyan5666 3 года назад

    Excellent video and demonstration for text analysis. Thank you very much Sir !

  • @engbahja
    @engbahja 3 года назад

    many thanks for this. could you please the expert systems list you mention at the end of the video?

  • @lukajozic9768
    @lukajozic9768 3 года назад

    Nice! There is however a function in the sklearn library called train_test_split I believe, that does exactly what your get_splits function does. Also, it would be helpful if you submitted the code in the description. Good video and great explanations!

  • @angelpascual1516
    @angelpascual1516 3 года назад

    Can you pls help me to answer/solve this with conclusion. Alternative Capacity for New Store New Bridge Built No New Bride A. 1 14 B. 2 10 C. 4 6 Where A= Small B= Medium C= Large 1.Assume the pay offs represents profits. Determine the alternative that would under minimax approach. 2. Assume the pay offs represents profits. Determine the alternative that would be chosen under maximin approach. 3. Assume the pay offs represents the profits. Determine the alternative that would be chosen under maximax approach. 4. Assume the pay offs represents profits. Determine the alternative that would be chosen under laplace approach.

  • @faisalagarbaa1
    @faisalagarbaa1 3 года назад

    Where is the Python source code?

  • @PULAMOLUDEEPAKBCE
    @PULAMOLUDEEPAKBCE 3 года назад

    can you share the code

  • @peterkirka4862
    @peterkirka4862 4 года назад

    Hello Mr. Gruss. Is it possible to find Your code from this video somewhere?

  • @eduardolopez7323
    @eduardolopez7323 5 лет назад

    Just what i was looking for, thanks man.