Top 100 AI ML MCQs (2026) – With Answers & Explanations

This article covers the top 100 AI ML MCQ questions with answers and explanations. It’s a fast, beginner-friendly way to prepare for interviews, exams, and stay updated with essential AI and Machine Learning concepts in 2026.

AI & Machine Learning MCQs for Interviews

Before 2026, when we went for interviews, we usually searched for “Top 100 Python MCQ questions”, “50 JavaScript concepts,” etc. But now, after the AI boom, things have changed.

Even if you’re going for a C interview, interviewers expect you to have at least basic knowledge of AI and Machine Learning. And if you’re not actively working in ML, it becomes even more important to understand the key concepts, terms, and fundamentals.

That’s exactly why I created this compact and essential AI & ML MCQ guide. You can go through all 100 questions in about an hour, but the value it can add to your interview preparation is massive. Potentially worth thousands if you negotiate well!

Here are your top 100 AI ML MCQs, along with their answers and explanations:

Q1. Which of the following best describes the relationship between Artificial Intelligence, Machine Learning, and Deep Learning?

A. AI is a subset of Machine Learning.
B. Deep Learning is a subset of AI, and AI is a subset of Machine Learning.
C. Machine Learning is a subset of AI, and Deep Learning is a subset of Machine Learning.
D. All three are distinct fields with no overlap.

Show Answer

Answer: C
Machine Learning acts as a subset of the broader AI field, while Deep Learning is a specialized subset of Machine Learning.

Q2. In the context of Machine Learning, what is the primary difference between supervised and unsupervised learning?

A. Supervised learning uses labeled data, while unsupervised learning uses unlabeled data.
B. Supervised learning is used for clustering, while unsupervised learning is used for classification.
C. Supervised learning requires no human intervention, while unsupervised learning does.
D. Supervised learning works with images, while unsupervised learning works with text.

Show Answer

Answer: A
Supervised learning algorithms learn from input-output pairs (labeled data), whereas unsupervised learning finds patterns in data without labels.

Q3. Which algorithm is commonly used for both classification and regression tasks in supervised learning?

A. K-Means Clustering
B. Apriori Algorithm
C. K-Nearest Neighbors (KNN)
D. Principal Component Analysis (PCA)

Show Answer

Answer: C
KNN is a versatile algorithm that can be adapted to predict discrete labels (classification) or continuous values (regression).

Q4. What is the primary goal of a regression algorithm in machine learning?

A. To group similar data points together.
B. To predict a continuous numerical value.
C. To classify data into distinct categories.
D. To reduce the dimensionality of the dataset.

Show Answer

Answer: B
Regression analysis is used to predict a dependent variable (output) which is continuous in nature, such as price or temperature.

Q5. Which Python library is most widely used for performing numerical computations in AI and ML projects?

A. Matplotlib
B. NumPy
C. Scikit-learn
D. TensorFlow

Show Answer

Answer: B
NumPy provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

Q6. In a Confusion Matrix for a binary classification problem, what does the term “True Positive” represent?

A. The model predicted negative, and the actual value was negative.
B. The model predicted positive, and the actual value was negative.
C. The model predicted positive, and the actual value was positive.
D. The model predicted negative, and the actual value was positive.

Show Answer

Answer: C
A True Positive indicates a correct prediction where the model correctly identified the positive class.

Q7. Which evaluation metric is best suited when dealing with an imbalanced dataset in a classification problem?

A. Accuracy
B. F1-Score
C. Mean Squared Error
D. R-Squared

Show Answer

Answer: B
The F1-Score is the harmonic mean of precision and recall, making it more robust than accuracy when class distribution is uneven.

Q8. What is the phenomenon called when a machine learning model learns the training data too well, including noise and outliers?

A. Underfitting
B. Overfitting
C. Regularization
D. Cross-validation

Show Answer

Answer: B
Overfitting occurs when a model captures noise in the training data, leading to poor generalization on unseen data.

Q9. Which technique is used to reduce the complexity of a model to prevent overfitting?

A. Feature Scaling
B. Regularization
C. Data Augmentation
D. Gradient Descent

Show Answer

Answer: B
Regularization (like L1 and L2) adds a penalty term to the loss function to discourage the model from fitting complex patterns in the noise.

Q10. In the context of neural networks, what is the function of an activation function?

A. To initialize the weights of the network.
B. To calculate the loss of the network.
C. To introduce non-linearity into the network.
D. To connect different layers of the network.

Show Answer

Answer: C
Activation functions allow neural networks to learn complex patterns by introducing non-linear properties to the output of neurons.

Q11. Which activation function outputs a value between 0 and 1 and is often used for binary classification output layers?

A. ReLU (Rectified Linear Unit)
B. Tanh
C. Sigmoid
D. Softmax

Show Answer

Answer: C
The Sigmoid function maps any input to a value between 0 and 1, making it suitable for modeling probabilities in binary classification.

Q12. What is the main purpose of using the “Softmax” activation function in a neural network?

A. To handle missing values.
B. To perform binary classification.
C. to convert output scores into probabilities for multi-class classification.
D. To speed up the training process.

Show Answer

Answer: C
Softmax converts a vector of numbers into a vector of probabilities, where the probabilities sum up to 1, used for multi-class problems.

Q13. Which unsupervised learning algorithm is primarily used for dimensionality reduction?

A. K-Means
B. DBSCAN
C. Principal Component Analysis (PCA)
D. Linear Regression

Show Answer

Answer: C
PCA reduces the number of variables in a dataset while preserving as much variance (information) as possible.

Q14. In Reinforcement Learning, what is the entity that interacts with the environment and makes decisions called?

A. The Supervisor
B. The Agent
C. The Model
D. The Oracle

Show Answer

Answer: B
An agent in reinforcement learning takes actions within an environment to maximize a cumulative reward.

Q15. Which optimization algorithm is considered the standard for training deep neural networks due to its efficiency?

A. Stochastic Gradient Descent (SGD)
B. Adam (Adaptive Moment Estimation)
C. Batch Gradient Descent
D. Ridge Regression

Show Answer

Answer: B
Adam combines the benefits of two other extensions of SGD (AdaGrad and RMSProp) and generally converges faster.

Q16. What does the term “Epoch” refer to in the context of training a neural network?

A. One complete pass of the entire training dataset through the algorithm.
B. A single update of the model’s weights.
C. The time taken to train the model.
D. A subset of the training data.

Show Answer

Answer: A
An epoch means the entire dataset has been passed forward and backward through the neural network exactly once.

Q17. Which library in Python is specifically designed for data manipulation and analysis, often used in AI preprocessing?

A. NumPy
B. Pandas
C. PyTorch
D. Keras

Show Answer

Answer: B
Pandas offers data structures and operations for manipulating numerical tables and time series, crucial for data cleaning.

Q18. What is the primary disadvantage of using a very large batch size during training?

A. It makes the model overfit immediately.
B. It requires less memory.
C. It can lead to slower convergence and getting stuck in local minima.
D. It increases the number of epochs required.

Show Answer

Answer: C
Large batch sizes can reduce the stochastic nature of gradient descent, potentially causing the model to converge to sharp, less generalizable minima.

Q19. In the K-Means clustering algorithm, what does the letter “K” represent?

A. The number of iterations.
B. The number of clusters.
C. The distance metric.
D. The number of features.

Show Answer

Answer: B
K is a hyperparameter that specifies the number of centroids (clusters) the algorithm should find in the data.

Q20. Which type of machine learning problem involves predicting a category label?

A. Regression
B. Clustering
C. Classification
D. Dimensionality Reduction

Show Answer

Answer: C
Classification is the task of predicting a discrete class label, such as “Spam” or “Not Spam”.

Q21. What is the “Vanishing Gradient Problem” typically associated with in deep learning?

A. Gradients becoming too large, causing weights to explode.
B. Gradients becoming too small, preventing earlier layers from learning.
C. The loss function becoming zero.
D. The model training too quickly.

Show Answer

Answer: B
In deep networks, gradients can shrink exponentially as they backpropagate, leaving early layers with near-zero updates, halting learning.

Q22. Which deep learning architecture is best suited for processing sequential data like time series or text?

A. Convolutional Neural Network (CNN)
B. Recurrent Neural Network (RNN)
C. Random Forest
D. Logistic Regression

Show Answer

Answer: B
RNNs have loops allowing information to persist, making them effective for tasks where the order of data points matters.

Q23. What is the technique called where a pre-trained model is used as the starting point for a new task?

A. Regularization
B. Transfer Learning
C. Ensemble Learning
D. Feature Extraction

Show Answer

Answer: B
Transfer learning leverages knowledge learned from a large dataset to solve a similar problem with a smaller dataset.

Q24. Which metric represents the proportion of actual positives that were correctly identified by the model?

A. Precision
B. Recall (Sensitivity)
C. Accuracy
D. Specificity

Show Answer

Answer: B
Recall calculates how many of the actual positive cases the model was able to predict correctly.

Q25. In a decision tree, what is the name of the measure used to select the best split at a node?

A. Learning Rate
B. Information Gain / Gini Impurity
C. Root Mean Square Error
D. Correlation Coefficient

Show Answer

Answer: B
These metrics quantify the purity of the split, helping the algorithm decide which feature provides the most useful information.

Q26. What does the term “Bias” represent in the context of the Bias-Variance tradeoff?

A. Error introduced by approximating a real-world problem with a simplified model.
B. Error caused by sensitivity to small fluctuations in the training set.
C. The difference between predicted and actual values.
D. The noise in the data.

Show Answer

Answer: A
High bias usually leads to underfitting, where the model is too simple to capture the underlying pattern of the data.

Q27. Which layer in a Convolutional Neural Network (CNN) is responsible for extracting features like edges and textures?

A. Fully Connected Layer
B. Pooling Layer
C. Convolutional Layer
D. Dropout Layer

Show Answer

Answer: C
Convolutional layers apply filters (kernels) to the input image to create feature maps that highlight distinct features.

Q28. What is the primary function of the “Pooling Layer” in a CNN?

A. To increase the dimensionality of the image.
B. To reduce the spatial dimensions (downsampling).
C. To add non-linearity.
D. To connect all neurons.

Show Answer

Answer: B
Pooling layers reduce the number of parameters and computation in the network, helping to control overfitting.

Q29. Which method is commonly used to handle missing values in a dataset during preprocessing?

A. Backpropagation
B. Imputation
C. Normalization
D. One-Hot Encoding

Show Answer

Answer: B
Imputation involves replacing missing values with substituted values, such as the mean, median, or mode of the column.

Q30. What is the process of converting categorical variables into a form that could be provided to ML algorithms called?

A. Normalization
B. Standardization
C. One-Hot Encoding
D. Imputation

Show Answer

Answer: C
One-Hot Encoding creates binary columns for each category, preventing the algorithm from assuming an ordinal relationship.

Q31. Which algorithm is based on Bayes’ Theorem and assumes independence between predictors?

A. Naive Bayes
B. Support Vector Machine
C. Decision Tree
D. Random Forest

Show Answer

Answer: A
Naive Bayes is a probabilistic classifier that assumes the presence of a particular feature in a class is unrelated to the presence of any other feature.

Q32. In Support Vector Machine (SVM), what is the name of the boundary that separates the data points of different classes?

A. Decision Boundary / Hyperplane
B. Centroid
C. Root Node
D. Weight Vector

Show Answer

Answer: A
The hyperplane is the line (in 2D) or plane (in 3D) that maximizes the margin between different classes.

Q33. What is “Feature Scaling” in the context of data preprocessing?

A. Selecting only the most important features.
B. Transforming features to a similar scale.
C. Removing duplicate features.
D. Adding new features.

Show Answer

Answer: B
Feature scaling (like Min-Max or Standardization) ensures that one feature does not dominate others due to larger magnitude.

Q34. Which popular deep learning framework was developed by the Facebook AI Research lab?

A. TensorFlow
B. Keras
C. PyTorch
D. Theano

Show Answer

Answer: C
PyTorch is an open-source machine learning library developed by Facebook, known for its flexibility and dynamic computation graphs.

Q35. What is the term for the error between the predicted value and the actual value in regression analysis?

A. Residual
B. Variance
C. Gradient
D. Bias

Show Answer

Answer: A
A residual is the difference between the observed value and the predicted value provided by the model.

Q36. Which layer is typically placed at the end of a CNN for performing the final classification task?

A. Convolutional Layer
B. Pooling Layer
C. Fully Connected (Dense) Layer
D. Batch Normalization Layer

Show Answer

Answer: C
Fully connected layers take the flattened feature maps and perform the high-level reasoning to output class probabilities.

Q37. What technique is used to prevent a neural network from overfitting by randomly disabling neurons during training?

A. Batch Normalization
B. Dropout
C. Gradient Clipping
D. Early Stopping

Show Answer

Answer: B
Dropout randomly sets a fraction of input units to 0 during training, forcing the network to learn more robust features.

Q38. In the context of NLP, what is the simplest way to convert text into numerical vectors?

A. Word2Vec
B. Bag of Words (BoW)
C. LSTM
D. Transformer

Show Answer

Answer: B
The Bag of Words model represents text by the frequency of words, disregarding grammar and word order.

Q39. Which loss function is most appropriate for a multi-class classification problem?

A. Mean Squared Error
B. Binary Cross-Entropy
C. Categorical Cross-Entropy
D. Hinge Loss

Show Answer

Answer: C
Categorical Cross-Entropy compares the predicted probability distribution with the actual distribution for multiple classes.

Q40. What is the role of the “Learning Rate” in gradient descent?

A. To determine the size of the step taken towards the minimum.
B. To determine the direction of the gradient.
C. To initialize the weights.
D. To calculate the bias.

Show Answer

Answer: A
The learning rate controls how much the model weights are updated in response to the estimated error during training.

Q41. Which ensemble method combines multiple models (typically decision trees) trained on random subsets of the data?

A. Boosting
B. Bagging (Random Forest)
C. Stacking
D. Blending

Show Answer

Answer: B
Bagging (Bootstrap Aggregating) trains models in parallel on different samples and aggregates their predictions, as seen in Random Forests.

Q42. What is the primary difference between Bagging and Boosting?

A. Bagging trains models sequentially, while Boosting trains them in parallel.
B. Bagging trains models in parallel, while Boosting trains them sequentially.
C. Bagging is for regression only, while Boosting is for classification only.
D. There is no difference.

Show Answer

Answer: B
Bagging builds independent models, whereas Boosting builds models sequentially, where each new model corrects the errors of the previous one.

Q43. What is the technique of providing input data to a model and checking its output against known results called?

A. Training
B. Testing / Evaluation
C. Validation
D. Inference

Show Answer

Answer: B
Testing/Evaluation uses a separate dataset to measure the performance of the trained model on unseen data.

Q44. Which algorithm is effective for anomaly detection and recommendation systems?

A. Linear Regression
B. K-Means Clustering
C. Logistic Regression
D. SVM

Show Answer

Answer: B
K-Means can identify anomalies as points that are far from any cluster centroid or form their own small clusters.

Q45. What does the “R-Squared” (R2) score indicate in regression?

A. The average error magnitude.
B. The proportion of the variance in the dependent variable that is predictable from the independent variables.
C. The squared root of the mean error.
D. The correlation between predicted and actual values.

Show Answer

Answer: B
R-Squared represents the goodness of fit of the model, ranging from 0 to 1, where 1 indicates a perfect fit.

Q46. In NLP, which technique reduces words to their base or root form?

A. Tokenization
B. Lemmatization / Stemming
C. Parsing
D. One-Hot Encoding

Show Answer

Answer: B
Stemming and Lemmatization normalize text by reducing inflected words to their word stem or lemma.

Q47. What is a “Generative Adversarial Network” (GAN) composed of?

A. An Encoder and a Decoder.
B. A Generator and a Discriminator.
C. A Convolutional and a Pooling layer.
D. A Regressor and a Classifier.

Show Answer

Answer: B
GANs consist of two neural networks contesting with each other: a generator creates data, and a discriminator evaluates it.

Q48. What is the purpose of a validation set during model training?

A. To train the model weights.
B. To evaluate the model’s final performance.
C. To tune hyperparameters and prevent overfitting during training.
D. To store the data.

Show Answer

Answer: C
The validation set provides an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters.

Q49. Which metric is defined as the ratio of True Positives to the total predicted Positives?

A. Recall
B. Precision
C. Accuracy
D. F1-Score

Show Answer

Answer: B
Precision answers the question: “Of all the instances predicted as positive, how many were actually positive?”

Q50. What is the name of the process where raw text is broken down into smaller units (words or sentences)?

A. Filtering
B. Tokenization
C. Normalization
D. Vectorization

Show Answer

Answer: B
Tokenization is the first step in NLP preprocessing, breaking a stream of text into words, phrases, symbols, or other meaningful elements.

Q51. Which algorithm works by calculating the probability of a data point belonging to a class based on the distance to its neighbors?

A. K-Nearest Neighbors (KNN)
B. Logistic Regression
C. Naive Bayes
D. SVM

Show Answer

Answer: A
KNN classifies a data point based on how its neighbors are classified, assuming similar things exist in close proximity.

Q52. What is the main assumption made by the Naive Bayes classifier?

A. Features are highly correlated.
B. Features are independent of each other.
C. Data is normally distributed.
D. Data is linearly separable.

Show Answer

Answer: B
The “naive” assumption is that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Q53. Which technique helps in handling the “exploding gradient” problem in RNNs?

A. Gradient Clipping
B. Dropout
C. Batch Normalization
D. ReLU Activation

Show Answer

Answer: A
Gradient clipping sets a threshold to cap the maximum value of gradients, preventing them from growing exponentially.

Q54. What is “Data Augmentation” primarily used for in computer vision?

A. To clean the data.
B. To increase the diversity of the training set by applying transformations.
C. To reduce the number of images.
D. To label the images.

Show Answer

Answer: B
Data augmentation increases dataset size by rotating, flipping, or cropping existing images, helping models generalize better.

Q55. Which algorithm is specifically designed to solve the vanishing gradient problem in traditional RNNs?

A. LSTM (Long Short-Term Memory)
B. Perceptron
C. CNN
D. K-Means

Show Answer

Answer: A
LSTMs introduce gating mechanisms that allow the network to decide what to keep and what to forget, preserving gradients over longer sequences.

Q56. What is the purpose of the “Kernel Trick” in SVM?

A. To speed up training.
B. To map non-linear data into a higher-dimensional space where it is linearly separable.
C. To reduce the number of support vectors.
D. To handle missing values.

Show Answer

Answer: B
The Kernel Trick allows SVMs to create non-linear decision boundaries by implicitly mapping inputs to high-dimensional feature spaces.

Q57. Which Python visualization library is commonly used to plot heatmaps and statistical data?

A. Matplotlib
B. Seaborn
C. Plotly
D. Bokeh

Show Answer

Answer: B
Seaborn is built on top of Matplotlib and provides a high-level interface for drawing attractive and informative statistical graphics.

Q58. What does the term “Backpropagation” refer to?

A. The process of feeding input forward through the network.
B. The algorithm used to calculate the gradient of the loss function with respect to the weights.
C. The activation function logic.
D. The method of initializing weights.

Show Answer

Answer: B
Backpropagation moves backward from the output layer to the input layer, calculating gradients to update the weights via gradient descent.

Q59. Which type of scaling transforms features to have a mean of 0 and a standard deviation of 1?

A. Min-Max Scaling
B. Standardization (Z-score normalization)
C. Normalization
D. Robust Scaling

Show Answer

Answer: B
Standardization rescales data so that the distribution has a mean of 0 and a standard deviation of 1.

Q60. In Reinforcement Learning, what is the “Reward” signal?

A. The input data given to the agent.
B. The feedback signal indicating how well the agent is performing.
C. The action taken by the agent.
D. The environment state.

Show Answer

Answer: B
The reward is a scalar feedback signal that the agent tries to maximize over time through its actions.

Q61. What is “Underfitting” in machine learning?

A. A model that performs too well on training data but poorly on test data.
B. A model that is too simple to capture the underlying structure of the data.
C. A model that has too many parameters.
D. A model that takes too long to train.

Show Answer

Answer: B
Underfitting occurs when a model is too simple (e.g., linear model for complex data) and fails to learn the pattern.

Q62. Which regularization technique adds the absolute value of weights to the loss function?

A. L1 Regularization (Lasso)
B. L2 Regularization (Ridge)
C. Dropout
D. Elastic Net

Show Answer

Answer: A
L1 regularization adds the magnitude of coefficients, often resulting in sparse models where some weights become zero.

Q63. Which regularization technique adds the squared value of weights to the loss function?

A. L1 Regularization
B. L2 Regularization (Ridge)
C. Lasso
D. Gradient Clipping

Show Answer

Answer: B
L2 regularization adds the squared magnitude of coefficients, preventing weights from becoming too large.

Q64. What is a “Tensor” in the context of deep learning frameworks?

A. A scalar value.
B. A mathematical operation.
C. A multi-dimensional array (generalization of matrices).
D. A type of activation function.

Show Answer

Answer: C
Tensors are the primary data structure in frameworks like TensorFlow and PyTorch, representing scalars, vectors, matrices, and n-dimensional arrays.

Q65. Which algorithm is widely used for market basket analysis?

A. K-Means
B. Apriori Algorithm
C. Linear Regression
D. SVM

Show Answer

Answer: B
The Apriori algorithm is used for association rule learning to find frequent itemsets in transactional databases.

Q66. What does the acronym “CNN” stand for?

A. Central Neural Network
B. Convolutional Neural Network
C. Connected Neural Node
D. Computational Neural Network

Show Answer

Answer: B
CNNs are a class of deep neural networks most commonly applied to analyzing visual imagery.

Q67. Which algorithm builds multiple decision trees and merges them to get a more accurate and stable prediction?

A. Decision Tree
B. Random Forest
C. Logistic Regression
D. KNN

Show Answer

Answer: B
Random Forest is an ensemble method that operates by constructing a multitude of decision trees at training time.

Q68. What is the purpose of the ROC curve?

A. To visualize the training loss.
B. to show the tradeoff between True Positive Rate and False Positive Rate.
C. To plot the accuracy vs epochs.
D. To visualize the data distribution.

Show Answer

Answer: B
The Receiver Operating Characteristic (ROC) curve illustrates the diagnostic ability of a binary classifier as its discrimination threshold is varied.

Q69. What does AUC stand for in classification metrics?

A. Area Under the Curve
B. Accuracy Under Classification
C. Average Unit Cost
D. Augmented Unit Count

Show Answer

Answer: A
AUC measures the entire two-dimensional area underneath the ROC curve, providing an aggregate measure of performance across all thresholds.

Q70. Which loss function is typically used for binary classification problems?

A. Mean Squared Error
B. Binary Cross-Entropy
C. Hinge Loss
D. Categorical Cross-Entropy

Show Answer

Answer: B
Binary Cross-Entropy measures the performance of a classification model whose output is a probability value between 0 and 1.

Q71. What is “Early Stopping” used for?

A. To stop the computer from sleeping.
B. To halt training when validation loss stops improving to prevent overfitting.
C. To stop the data loading process.
D. To reduce learning rate.

Show Answer

Answer: B
Early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method.

Q72. Which technique is used to visualize high-dimensional data in 2D or 3D?

A. PCA
B. t-SNE
C. LDA
D. Both A and B

Show Answer

Answer: D
Both PCA and t-SNE are dimensionality reduction techniques used to visualize complex datasets in lower dimensions.

Q73. What is the “Curse of Dimensionality”?

A. The difficulty of processing images.
B. The phenomenon where data becomes sparse as the number of features increases.
C. The inability to train deep networks.
D. The slow speed of gradient descent.

Show Answer

Answer: B
As dimensions increase, the volume of the space increases so fast that the available data becomes sparse, making analysis difficult.

Q74. Which optimizer uses momentum to accelerate gradient descent?

A. SGD with Momentum
B. AdaGrad
C. RMSProp
D. Adam

Show Answer

Answer: A
Momentum helps accelerate gradients in the right direction, dampening oscillations.

Q75. In the context of NLP, what is TF-IDF?

A. A type of neural network.
B. A numerical statistic that reflects how important a word is to a document in a collection.
C. A tokenization method.
D. A stemming algorithm.

Show Answer

Answer: B
Term Frequency-Inverse Document Frequency weighs words by their frequency in a document versus their frequency across all documents.

Q76. Which method is used to save a trained machine learning model to disk?

A. Serialization (e.g., Pickle or Joblib)
B. Fitting
C. Transforming
D. Parsing

Show Answer

Answer: A
Serialization converts the model object into a byte stream to be stored or transmitted.

Q77. What is the “Input Layer” in a neural network?

A. The layer that makes the final prediction.
B. The first layer that receives the raw input data.
C. The layer with the most weights.
D. The layer that applies activation functions.

Show Answer

Answer: B
The input layer is the entry point of the network, containing one neuron per feature in the dataset.

Q78. What is the name of the process of adjusting weights during training?

A. Forward Propagation
B. Optimization / Weight Update
C. Activation
D. Pooling

Show Answer

Answer: B
Optimization algorithms (like Gradient Descent) adjust the weights to minimize the loss function.

Q79. Which library provides a high-level API for building and training deep learning models?

A. NumPy
B. Keras
C. Pandas
D. OpenCV

Show Answer

Answer: B
Keras acts as an interface for libraries like TensorFlow, allowing for fast experimentation with deep neural networks.

Q80. What is “Batch Normalization”?

A. Normalizing the entire dataset before training.
B. Normalizing the inputs of each layer to stabilize and accelerate training.
C. A regularization technique that drops neurons.
D. A method to increase batch size.

Show Answer

Answer: B
Batch Norm normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

Q81. Which distance metric is most commonly used in KNN for continuous variables?

A. Manhattan Distance
B. Euclidean Distance
C. Cosine Similarity
D. Hamming Distance

Show Answer

Answer: B
Euclidean distance is the straight-line distance between two points in Euclidean space, standard for continuous data.

Q82. What is “Transfer Learning”?

A. Moving data from one database to another.
B. Reusing a pre-trained model on a new problem.
C. Transferring weights from one layer to another randomly.
D. Learning without data.

Show Answer

Answer: B
Transfer learning stores knowledge gained while solving one problem and applies it to a different but related problem.

Q83. What is the function of “Word Embeddings” like Word2Vec?

A. To convert words into sparse vectors.
B. To map words to dense vectors of real numbers where similar words have similar encodings.
C. To count word frequencies.
D. To correct spelling mistakes.

Show Answer

Answer: B
Word embeddings capture semantic meaning, placing similar words close together in the vector space.

Q84. Which technique allows a model to learn from a stream of data without storing the entire dataset?

A. Batch Learning
B. Online Learning
C. Transfer Learning
D. Ensemble Learning

Show Answer

Answer: B
Online learning incrementally updates the model as new data arrives, suitable for dynamic environments.

Q85. What is the role of a “Loss Function”?

A. To measure the accuracy of the model.
B. To quantify how far the predicted value is from the actual value.
C. To activate the neurons.
D. To initialize the network.

Show Answer

Answer: B
The loss function acts as a guide for the optimizer, telling it how wrong the model’s predictions are.

Q86. What does the “Flatten” layer do in a CNN?

A. It shrinks the image dimensions.
B. It converts the multi-dimensional feature maps into a single 1D vector.
C. It removes noise.
D. It adds depth to the image.

Show Answer

Answer: B
Flattening is necessary to transition from convolutional/pooling layers to the fully connected layers for classification.

Q87. Which activation function is known for being computationally efficient and mitigating the vanishing gradient problem?

A. Sigmoid
B. Tanh
C. ReLU
D. Softmax

Show Answer

Answer: C
ReLU (Rectified Linear Unit) outputs the input directly if positive, otherwise, it outputs zero, allowing faster training.

Q88. What is “Cross-Validation”?

A. Training the model only once.
B. A technique to assess how the results of a statistical analysis will generalize to an independent dataset.
C. Validating the code syntax.
D. Comparing two different programming languages.

Show Answer

Answer: B
Common methods include K-Fold CV, where the dataset is split into K folds, and the model is trained K times, each time using a different fold for validation.

Q89. What is the name of the standard dataset often used to test image classification algorithms?

A. MNIST
B. IMDB
C. Titanic
D. Boston Housing

Show Answer

Answer: A
MNIST contains handwritten digits (0-9) and is the “Hello World” of computer vision and deep learning.

Q90. What does a “Dense” layer imply in a neural network?

A. The layer has very few connections.
B. The layer is fully connected, meaning all neurons receive input from all neurons in the previous layer.
C. The layer has no weights.
D. The layer performs convolution.

Show Answer

Answer: B
Dense layers are standard fully connected layers where each input node is connected to each output node.

Q91. Which unsupervised algorithm is used for anomaly detection by modeling the probability distribution of data?

A. Isolation Forest
B. K-Means
C. PCA
D. Random Forest

Show Answer

Answer: A
Isolation Forest explicitly isol anomalies closer to the root of the decision tree rather than profiling normal data points.

Q92. What is the main disadvantage of the Sigmoid activation function in deep networks?

A. It is computationally expensive.
B. It outputs negative values.
C. It can cause vanishing gradients and outputs are not zero-centered.
D. It is linear.

Show Answer

Answer: C
Sigmoid squishes large inputs to a small range (0-1), causing gradients to become very small during backpropagation.

Q93. In Reinforcement Learning, what is the strategy called where the agent chooses the action believed to yield the highest reward?

A. Exploration
B. Exploitation
C. Pruning
D. Backtracking

Show Answer

Answer: B
Exploitation uses current knowledge to maximize immediate reward, whereas exploration seeks new information.

Q94. Which concept is used to handle categorical variables with no inherent order (like Red, Green, Blue)?

A. Label Encoding
B. Ordinal Encoding
C. One-Hot Encoding
D. Frequency Encoding

Show Answer

Answer: C
One-Hot Encoding prevents the model from assuming a numerical order exists between the categories.

Q95. What is the function of “Padding” in a CNN?

A. To reduce the image size.
B. To add extra pixels around the border of the input image.
C. To increase the number of channels.
D. To remove noise.

Show Answer

Answer: B
Padding (e.g., ‘same’ padding) preserves the spatial dimension of the volume after convolution, ensuring edge pixels are not ignored.

Q96. What is the primary difference between Linear Regression and Logistic Regression?

A. Linear Regression is for classification, Logistic is for regression.
B. Linear Regression predicts continuous values, Logistic Regression predicts probabilities for classification.
C. Linear Regression uses deep learning, Logistic uses shallow learning.
D. There is no difference.

Show Answer

Answer: B
Linear Regression fits a straight line to data, while Logistic Regression fits an “S” shaped curve (sigmoid) to predict categorical outcomes.

Q97. What is “Gradient Descent”?

A. An optimization algorithm to minimize the cost function.
B. A way to visualize data.
C. A type of neural network layer.
D. A method to clean data.

Show Answer

Answer: A
It iteratively adjusts parameters in the opposite direction of the gradient to find the minimum of the cost function.

Q98. Which deep learning model architecture utilizes “Self-Attention” mechanisms?

A. CNN
B. RNN
C. Transformer
D. Autoencoder

Show Answer

Answer: C
Transformers use self-attention to process sequential data in parallel, forming the basis of models like BERT and GPT.

Q99. What is the purpose of a “Bias” term in a neural network neuron?

A. To weigh the input.
B. To shift the activation function to the left or right.
C. To normalize the output.
D. To reduce the learning rate.

Show Answer

Answer: B
The bias allows the activation function to be shifted, allowing the model to fit data that doesn’t pass through the origin.

Q100. Which popular AI platform provides pre-trained models and APIs for vision, language, and speech tasks?

A. Scikit-learn
B. Google Cloud AI / Azure AI / AWS AI
C. Matplotlib
D. Pandas

Show Answer

Answer: B
Major cloud providers offer AI-as-a-Service, allowing developers to integrate advanced AI capabilities without building models from scratch.

Conclusion

I hope this 1 hour of reading boosts your confidence and helps you stand out in today’s AI-driven hiring. If you’re preparing for a Python interview, check out the article below:

And if you’re more into SQL, we’ve got a few helpful guides for you as well:

Best of luck!

Aditya Gupta
Aditya Gupta
Articles: 495