GenAI Learning

GenAI Learning - Getting Started

GenAI Initial Demo

Generative AI - Tokens, Chunking, embeddings etc..

GenAI is transforming industries - It powers applications such as text generation, image creation etc...

It is a Pre-trained Language Model.

GenAI generates new content based on pre-trained modela...
IT does prettymuch everything... text,images,audio,vedio and even code etc..

genAI comprises of
  1. Models
  2. Transformers
  3. Prompt Engineering
  4. Inference
  5. Context Window
  6. Token
  7. Vector
  8. Embeddings
  9. Chunking
  10. Multi-Mode Models
  11. Diffusion Models


Transformer Network:
------------------------>
Context Window:
Foundational Models  to Models like GPT and BERT

Tokens and Tokenization:
The process of breaking down input into Tokens

Embeddings and Vectors:

Chunking:


Prompt Engineering:
Designing prompts to get the desired output. Techniques includes zero-shot, one-shot and few-shot learning

Zero-shot learning
one-shot learning
few-shot lerning


Multimodal Models:

Diffusion models




Github Copilot


AWS Generative AI hosting system is AWS Bedrock

Architectures behind the genAI
1. Generative Adversarial Network(GAN)
2. Variational Autoencoder(VAE)
3. Transformers

Foundation Model Lifecycle

Foundation Model Lifecycle Script

Data Selection
Model Selection
Pre-Training
Fine-Tuning
Evaluating
Deployment
Feedback & Monitoring
Iteration and Optimization

Prompt Engineering

User Prompt
System prompts

AWS INfra for genAI Apps:
--------------------------------->
SageMaker Jump start

PartyRock  - Playground for testing genAI apps.

Optimizing AI with Vector Databases,

AWS Bedrock to host genAI Models

AWS Bedrock for Generative AI development

Foundational Models & Applications

Foundational Models Applications

Convolutional Neural Network
Recurrent Neural Network

AI Model Performance Metrics:
------------------------------------>
Accuracy
Precision
Recall
F1 Score

Choose Model Based on Metrics

Retrieval Augmented Generation (RAG)

_images/Retrieval_Augmented_Generation.png

How RAG Works

RAG Shell Script

Retrieval Augmented Generation Script

Amazon Bedrock uses RAGs to enhance foundational model performance in customer applications.

Amazon Bedrock and Amazon Kendra USe Vector database to enhance foundation model performance in semantic search and document retrieval.

Selecting the Pre-trained Models:

RAG(Retrieval Augmented Generation) is a method that basically combines two components i.e.
Large language model generation and information Retrieval.

Basically it take Large Language model and it retrieves information.

So the goal of RAG is to basically retrieve that information and ingested into the model.

Vector Databases: the Backbone of RAG
-------------------------------------------------->
metadata about images
metadata about audio
metadata about videos

Amazon Bedrock: RAG in action

Amazon Bedrock leverages RAG to enhance language models, Retrieves data from knowledge bases to improve responses.

Vector Databases

Vector Databases Overview

Vector Databases store data as embeddings, which are numerical representations of data like text and images.

These embeddings allow fast, efficient and semantically relevant searches for AI and machine learning tasks.

Several AWS Services help store and manage embeddings in vector databases.


Amazon OpenSearch service for Generative AI.
k-Nearest Neighbors(k-NN) for the efficient queries.


Amazon Aurora PostgreSQL-Compatible Edition and Amazon RDS for PostgreSQL support pgvector
1. pgvector extension available on Amazon Aurora and Amazon RDS for PostgreSQL.
2. Enables storage and similarity searches using ML-generated embeddings
3. Embeddings capture semantic meaning from text processed by large language models(LLMs)

Amazon Neptune ML:
-------------------------->
Uses Graph Neural Networks(GNNs) to enhance predictions using complex graph relationships.

Vector Search for amazon memoryDB

Vector search by Amazon DocumentDB(with MongoDB compatibility)

RAG with Amazon Bedrock and Custom Knowledge bases.

RAG combines retrieved data with generative models.
Amazon Bedrock supports RAG by integrating with custom knowledge bases.

Foundation Model Customizations

_images/foundations_model_customizations.png

Amazon Bedrock Agents

Foundational Models

Foundational Models Script

Retrieval Augmented Generation -- It combines retrievd data with Generative models.

AGents for multi-step tasks:

Amazon Bedrock agents.. that can help to do multi-step workflows, complex workflows,

Multimodal agents

Prompt Engineering

Prompt Engineering Practices

Prompt engineering provides certain inputs to model and tells it what to do with the user inputs.

Prompt Templates
Negative Prompts
Context in prompts


Pre-Training and Fine-tuning Foundational Models

PEFT - parameter Efficient Fine-Tuning
LoRA - Low-Rank Adaption
ReFT - Reperesentaion Fine-Tunning

Multitask Fine-Tuning

Model Performance Evaluation

Foundational Model Performance Script

Jupyter Notebook

Big-Bench

Holistic Evaluation of Language models

Amazon SageMaker CLarify

Amazon Bedrock and BERTScore

Context-learning

fine-tuning

Responsible AI Practices

Responsible AI Considerations

Healthcare
Financial
Law Firms

Governance
Security
robustness
explainability
fairness in AI

Tools for identifying responsible AI:
----------------------------------------------->
1. Amazon SageMaker Clarify  -- Bias Detection, Model Decisions with SageMaker Clarify
2. Amazon Bedrock for Guardrails
3. Environmental impact in assessment of AI


Data PRivacy and security risks:


Balaned datasets

AMazon Sagemaker clarify

SageMaker Data Wrangler

Data Preprocessing:
------------------------------>
1. Data CLeaning
2. Normalization
3. Feature Selection

REgualr auditing and fairness

Transparency and explainable ai models
----------------------------------------->

Human-Centered AI

A2I - Amazon Augmented AI

Reinforcement Learning from Human Feedback (RLHF)

Human Centered Design(HCD)

Security, Compliance & Governance

Security and Governance in GenAI

security

SageMaker notebook instance in a private subnet

SageMaker Distributed Training - Inter-mode-encryption

Amazon SageMaker Security


Data Source --> Data Processin --> Data Storage

SageMaker Model Registry for Model versioning
SageMaker Model cards  - Documentong model deails

SageMaker Feature Store

AWS Artifact: Simplifying Compliance Reporting
AWS Glue DataBrew: Data Preparation for Governance
AWS Lake Foramtion
Amazon S3
Amazon SageMaker Clarify
AWS Config - Continuous Monitoring for Compliance
AWS Inspector - Security and Compliance Assessment
AWS Audit Manager - Streamlined compliance Auditing
AWS Cloudtrail - all API calls
AWS Trusted Advisor - Best practices and compliance recommendations


Data Governance Strategies:

Data Lifecycle management
- s3 lifecycle management

Data logging -- AWS Cloud Trail, Amazon cloudwatch

Data Curation and understanding -- AWS Glue DataBrew

Master DAta Management  using Amazon Redshift and AWS Glue