hypothesis space ml

Data Science
Data Analysis
Data Visualization
Machine Learning
Deep Learning
Computer Vision
Artificial Intelligence
AI ML DS Interview Series
AI ML DS Projects series
Data Engineering
Web Scrapping

Hypothesis in Machine Learning

The concept of a hypothesis is fundamental in Machine Learning and data science endeavours. In the realm of machine learning, a hypothesis serves as an initial assumption made by data scientists and ML professionals when attempting to address a problem. Machine learning involves conducting experiments based on past experiences, and these hypotheses are crucial in formulating potential solutions.

It’s important to note that in machine learning discussions, the terms “hypothesis” and “model” are sometimes used interchangeably. However, a hypothesis represents an assumption, while a model is a mathematical representation employed to test that hypothesis. This section on “Hypothesis in Machine Learning” explores key aspects related to hypotheses in machine learning and their significance.

Table of Content

How does a Hypothesis work?

Hypothesis space and representation in machine learning, hypothesis in statistics, faqs on hypothesis in machine learning.

A hypothesis in machine learning is the model’s presumption regarding the connection between the input features and the result. It is an illustration of the mapping function that the algorithm is attempting to discover using the training set. To minimize the discrepancy between the expected and actual outputs, the learning process involves modifying the weights that parameterize the hypothesis. The objective is to optimize the model’s parameters to achieve the best predictive performance on new, unseen data, and a cost function is used to assess the hypothesis’ accuracy.

In most supervised machine learning algorithms, our main goal is to find a possible hypothesis from the hypothesis space that could map out the inputs to the proper outputs. The following figure shows the common method to find out the possible hypothesis from the Hypothesis space:

Hypothesis Space (H)

Hypothesis space is the set of all the possible legal hypothesis. This is the set from which the machine learning algorithm would determine the best possible (only one) which would best describe the target function or the outputs.

Hypothesis (h)

A hypothesis is a function that best describes the target in supervised machine learning. The hypothesis that an algorithm would come up depends upon the data and also depends upon the restrictions and bias that we have imposed on the data.

The Hypothesis can be calculated as:

[Tex]y = mx + b [/Tex]

m = slope of the lines
b = intercept

To better understand the Hypothesis Space and Hypothesis consider the following coordinate that shows the distribution of some data:

Say suppose we have test data for which we have to determine the outputs or results. The test data is as shown below:

We can predict the outcomes by dividing the coordinate as shown below:

So the test data would yield the following result:

But note here that we could have divided the coordinate plane as:

The way in which the coordinate would be divided depends on the data, algorithm and constraints.

All these legal possible ways in which we can divide the coordinate plane to predict the outcome of the test data composes of the Hypothesis Space.
Each individual possible way is known as the hypothesis.

Hence, in this example the hypothesis space would be like:

The hypothesis space comprises all possible legal hypotheses that a machine learning algorithm can consider. Hypotheses are formulated based on various algorithms and techniques, including linear regression, decision trees, and neural networks. These hypotheses capture the mapping function transforming input data into predictions.

Hypothesis Formulation and Representation in Machine Learning

Hypotheses in machine learning are formulated based on various algorithms and techniques, each with its representation. For example:

Linear Regression : [Tex] h(X) = \theta_0 + \theta_1 X_1 + \theta_2 X_2 + … + \theta_n X_n[/Tex]
Decision Trees : [Tex]h(X) = \text{Tree}(X)[/Tex]
Neural Networks : [Tex]h(X) = \text{NN}(X)[/Tex]

In the case of complex models like neural networks, the hypothesis may involve multiple layers of interconnected nodes, each performing a specific computation.

Hypothesis Evaluation:

The process of machine learning involves not only formulating hypotheses but also evaluating their performance. This evaluation is typically done using a loss function or an evaluation metric that quantifies the disparity between predicted outputs and ground truth labels. Common evaluation metrics include mean squared error (MSE), accuracy, precision, recall, F1-score, and others. By comparing the predictions of the hypothesis with the actual outcomes on a validation or test dataset, one can assess the effectiveness of the model.

Hypothesis Testing and Generalization:

Once a hypothesis is formulated and evaluated, the next step is to test its generalization capabilities. Generalization refers to the ability of a model to make accurate predictions on unseen data. A hypothesis that performs well on the training dataset but fails to generalize to new instances is said to suffer from overfitting. Conversely, a hypothesis that generalizes well to unseen data is deemed robust and reliable.

The process of hypothesis formulation, evaluation, testing, and generalization is often iterative in nature. It involves refining the hypothesis based on insights gained from model performance, feature importance, and domain knowledge. Techniques such as hyperparameter tuning, feature engineering, and model selection play a crucial role in this iterative refinement process.

In statistics , a hypothesis refers to a statement or assumption about a population parameter. It is a proposition or educated guess that helps guide statistical analyses. There are two types of hypotheses: the null hypothesis (H0) and the alternative hypothesis (H1 or Ha).

Null Hypothesis(H 0 ): This hypothesis suggests that there is no significant difference or effect, and any observed results are due to chance. It often represents the status quo or a baseline assumption.
Aternative Hypothesis(H 1 or H a ): This hypothesis contradicts the null hypothesis, proposing that there is a significant difference or effect in the population. It is what researchers aim to support with evidence.

Q. How does the training process use the hypothesis?

The learning algorithm uses the hypothesis as a guide to minimise the discrepancy between expected and actual outputs by adjusting its parameters during training.

Q. How is the hypothesis’s accuracy assessed?

Usually, a cost function that calculates the difference between expected and actual values is used to assess accuracy. Optimising the model to reduce this expense is the aim.

Q. What is Hypothesis testing?

Hypothesis testing is a statistical method for determining whether or not a hypothesis is correct. The hypothesis can be about two variables in a dataset, about an association between two groups, or about a situation.

Q. What distinguishes the null hypothesis from the alternative hypothesis in machine learning experiments?

The null hypothesis (H0) assumes no significant effect, while the alternative hypothesis (H1 or Ha) contradicts H0, suggesting a meaningful impact. Statistical testing is employed to decide between these hypotheses.

Please Login to comment...

Improve your Coding Skills with Practice

What kind of Experience do you want to share?

What’s a Hypothesis Space?

Last updated: March 18, 2024

Math and Logic

It's finally here:

>> The Road to Membership and Baeldung Pro .

Going into ads, no-ads reading , and bit about how Baeldung works if you're curious :)

1. Introduction

Machine-learning algorithms come with implicit or explicit assumptions about the actual patterns in the data. Mathematically, this means that each algorithm can learn a specific family of models, and that family goes by the name of the hypothesis space.

In this tutorial, we’ll talk about hypothesis spaces and how to choose the right one for the data at hand.

2. Hypothesis Spaces

Let’s say that we have a binary classification task and that the data are two-dimensional. Our goal is to find a model that classifies objects as positive or negative. Applying Logistic Regression , we can get the models of the form:

which estimate the probability that the object at hand is positive.

2.1. Hypotheses and Assumptions

The underlying assumption of hypotheses ( 1 ) is that the boundary separating the positive from negative objects is a straight line. So, every hypothesis from this space corresponds to a straight line in a 2D plane. For instance:

2.2. Regression

3. expressivity of a hypothesis space.

We could informally say that one hypothesis space is more expressive than another if its hypotheses are more diverse and complex.

We may underfit the data if our algorithm’s hypothesis space isn’t expressive enough. For instance, linear hypotheses aren’t particularly good options if the actual data are extremely non-linear:

So, training an algorithm that has a very expressive space increases the chance of completely capturing the patterns in the data. However, it also increases the risk of overfitting. For instance, a space containing the hypotheses of the form:

would start modelling the noise, which we see from its decision boundary:

Such models would generalize poorly to unseen data.

3.1. Expressivity vs. Interpretability

Additionally, even if a complex hypothesis has a good generalization capability, it may be unusable in practice because it’s too complicated to understand or compute. What’s more, intricated hypotheses offer limited insight into the real-world process that generated the data. For example, a quadratic model:

4. How to Choose the Hypothesis Space?

We need to find the right balance between expressivity and simplicity. Unfortunately, that’s easier said than done. Most of the time, we need to rely on our intuition about the data.

So, we should start by exploring the dataset, using visualizations as much as possible. For instance, we can conclude that a straight line isn’t likely to be an adequate boundary for the above classification data. However, a high-order curve would probably be too complex even though it might split the dataset into two classes without an error.

A second-degree curve might be the compromise we seek, but we aren’t sure. So, we start with the space of quadratic hypotheses:

We get a model whose decision boundary appears to be a good fit even though it misclassifies some objects:

Since we’re satisfied with the model, we can stop here. If that hadn’t been the case, we could have tried a space of cubic models. The idea would be to iteratively try incrementally complex families until finding a model that both performs well and is easy to understand.

4. Conclusion

In this article, we talked about hypotheses spaces in machine learning. An algorithm’s hypothesis space contains all the models it can learn from any dataset.

The algorithms with too expressive spaces can generalize poorly to unseen data and be too complex to understand, whereas those with overly simple hypotheses may underfit the data. So, when applying machine-learning algorithms in practice, we need to find the right balance between expressivity and simplicity.

Machine Learning

Artificial Intelligence

Control System

Supervised Learning

Classification, miscellaneous, related tutorials.

Interview Questions

The hypothesis is a common term in Machine Learning and data science projects. As we know, machine learning is one of the most powerful technologies across the world, which helps us to predict results based on past experiences. Moreover, data scientists and ML professionals conduct experiments that aim to solve a problem. These ML professionals and data scientists make an initial assumption for the solution of the problem.

This assumption in Machine learning is known as Hypothesis. In Machine Learning, at various times, Hypothesis and Model are used interchangeably. However, a Hypothesis is an assumption made by scientists, whereas a model is a mathematical representation that is used to test the hypothesis. In this topic, "Hypothesis in Machine Learning," we will discuss a few important concepts related to a hypothesis in machine learning and their importance. So, let's start with a quick introduction to Hypothesis.

It is just a guess based on some known facts but has not yet been proven. A good hypothesis is testable, which results in either true or false.

: Let's understand the hypothesis with a common example. Some scientist claims that ultraviolet (UV) light can damage the eyes then it may also cause blindness.

In this example, a scientist just claims that UV rays are harmful to the eyes, but we assume they may cause blindness. However, it may or may not be possible. Hence, these types of assumptions are called a hypothesis.

The hypothesis is one of the commonly used concepts of statistics in Machine Learning. It is specifically used in Supervised Machine learning, where an ML model learns a function that best maps the input to corresponding outputs with the help of an available dataset.

There are some common methods given to find out the possible hypothesis from the Hypothesis space, where hypothesis space is represented by and hypothesis by Th ese are defined as follows:

It is used by supervised machine learning algorithms to determine the best possible hypothesis to describe the target function or best maps input to output.

It is often constrained by choice of the framing of the problem, the choice of model, and the choice of model configuration.

. It is primarily based on data as well as bias and restrictions applied to data.

Hence hypothesis (h) can be concluded as a single hypothesis that maps input to proper output and can be evaluated as well as used to make predictions.

The hypothesis (h) can be formulated in machine learning as follows:

Where,

Y: Range

m: Slope of the line which divided test data or changes in y divided by change in x.

x: domain

c: intercept (constant)

: Let's understand the hypothesis (h) and hypothesis space (H) with a two-dimensional coordinate plane showing the distribution of data as follows:

Hypothesis space (H) is the composition of all legal best possible ways to divide the coordinate plane so that it best maps input to proper output.

Further, each individual best possible way is called a hypothesis (h). Hence, the hypothesis and hypothesis space would be like this:

Similar to the hypothesis in machine learning, it is also considered an assumption of the output. However, it is falsifiable, which means it can be failed in the presence of sufficient evidence.

Unlike machine learning, we cannot accept any hypothesis in statistics because it is just an imaginary result and based on probability. Before start working on an experiment, we must be aware of two important types of hypotheses as follows:

A null hypothesis is a type of statistical hypothesis which tells that there is no statistically significant effect exists in the given set of observations. It is also known as conjecture and is used in quantitative analysis to test theories about markets, investment, and finance to decide whether an idea is true or false. An alternative hypothesis is a direct contradiction of the null hypothesis, which means if one of the two hypotheses is true, then the other must be false. In other words, an alternative hypothesis is a type of statistical hypothesis which tells that there is some significant effect that exists in the given set of observations.

The significance level is the primary thing that must be set before starting an experiment. It is useful to define the tolerance of error and the level at which effect can be considered significantly. During the testing process in an experiment, a 95% significance level is accepted, and the remaining 5% can be neglected. The significance level also tells the critical or threshold value. For e.g., in an experiment, if the significance level is set to 98%, then the critical value is 0.02%.

The p-value in statistics is defined as the evidence against a null hypothesis. In other words, P-value is the probability that a random chance generated the data or something else that is equal or rarer under the null hypothesis condition.

If the p-value is smaller, the evidence will be stronger, and vice-versa which means the null hypothesis can be rejected in testing. It is always represented in a decimal form, such as 0.035.

Whenever a statistical test is carried out on the population and sample to find out P-value, then it always depends upon the critical value. If the p-value is less than the critical value, then it shows the effect is significant, and the null hypothesis can be rejected. Further, if it is higher than the critical value, it shows that there is no significant effect and hence fails to reject the Null Hypothesis.

In the series of mapping instances of inputs to outputs in supervised machine learning, the hypothesis is a very useful concept that helps to approximate a target function in machine learning. It is available in all analytics domains and is also considered one of the important factors to check whether a change should be introduced or not. It covers the entire training data sets to efficiency as well as the performance of the models.

Hence, in this topic, we have covered various important concepts related to the hypothesis in machine learning and statistics and some important parameters such as p-value, significance level, etc., to understand hypothesis concepts in a better way.

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Transact-SQL

Reinforcement Learning

R Programming

React Native

Python Design Patterns

Python Pillow

Python Turtle

Preparation

Verbal Ability

Company Questions

Trending Technologies

Cloud Computing

Data Science

B.Tech / MCA

Data Structures

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

C Programming

Data Mining

Data Warehouse

Programmathically

Introduction to the hypothesis space and the bias-variance tradeoff in machine learning.

In this post, we introduce the hypothesis space and discuss how machine learning models function as hypotheses. Furthermore, we discuss the challenges encountered when choosing an appropriate machine learning hypothesis and building a model, such as overfitting, underfitting, and the bias-variance tradeoff.

The hypothesis space in machine learning is a set of all possible models that can be used to explain a data distribution given the limitations of that space. A linear hypothesis space is limited to the set of all linear models. If the data distribution follows a non-linear distribution, the linear hypothesis space might not contain a model that is appropriate for our needs.

To understand the concept of a hypothesis space, we need to learn to think of machine learning models as hypotheses.

The Machine Learning Model as Hypothesis

Generally speaking, a hypothesis is a potential explanation for an outcome or a phenomenon. In scientific inquiry, we test hypotheses to figure out how well and if at all they explain an outcome. In supervised machine learning, we are concerned with finding a function that maps from inputs to outputs.

But machine learning is inherently probabilistic. It is the art and science of deriving useful hypotheses from limited or incomplete data. Our functions are not axioms that explain the data perfectly, and for most real-life problems, we will never have all the data that exists. Accordingly, we will not find the one true function that perfectly describes the data. Instead, we find a function through training a model to map from known training input to known training output. This way, the model gradually approximates the assumed true function that describes the distribution of the data. So we treat our model as a hypothesis that needs to be tested as to how well it explains the output from a given input. We do this using a test or validation data set.

The Hypothesis Space

During the training process, we select a model from a hypothesis space that is subject to our constraints. For example, a linear hypothesis space only provides linear models. We can approximate data that follows a quadratic distribution using a model from the linear hypothesis space.

Of course, a linear model will never have the same predictive performance as a quadratic model, so we can adjust our hypothesis space to also include non-linear models or at least quadratic models.

The Data Generating Process

The data generating process describes a hypothetical process subject to some assumptions that make training a machine learning model possible. We need to assume that the data points are from the same distribution but are independent of each other. When these requirements are met, we say that the data is independent and identically distributed (i.i.d.).

Independent and Identically Distributed Data

How can we assume that a model trained on a training set will perform better than random guessing on new and previously unseen data? First of all, the training data needs to come from the same or at least a similar problem domain. If you want your model to predict stock prices, you need to train the model on stock price data or data that is similarly distributed. It wouldn’t make much sense to train it on whether data. Statistically, this means the data is identically distributed . But if data comes from the same problem, training data and test data might not be completely independent. To account for this, we need to make sure that the test data is not in any way influenced by the training data or vice versa. If you use a subset of the training data as your test set, the test data evidently is not independent of the training data. Statistically, we say the data must be independently distributed .

Overfitting and Underfitting

We want to select a model from the hypothesis space that explains the data sufficiently well. During training, we can make a model so complex that it perfectly fits every data point in the training dataset. But ultimately, the model should be able to predict outputs on previously unseen input data. The ability to do well when predicting outputs on previously unseen data is also known as generalization. There is an inherent conflict between those two requirements.

If we make the model so complex that it fits every point in the training data, it will pick up lots of noise and random variation specific to the training set, which might obscure the larger underlying patterns. As a result, it will be more sensitive to random fluctuations in new data and predict values that are far off. A model with this problem is said to overfit the training data and, as a result, to suffer from high variance .

To avoid the problem of overfitting, we can choose a simpler model or use regularization techniques to prevent the model from fitting the training data too closely. The model should then be less influenced by random fluctuations and instead, focus on the larger underlying patterns in the data. The patterns are expected to be found in any dataset that comes from the same distribution. As a consequence, the model should generalize better on previously unseen data.

But if we go too far, the model might become too simple or too constrained by regularization to accurately capture the patterns in the data. Then the model will neither generalize well nor fit the training data well. A model that exhibits this problem is said to underfit the data and to suffer from high bias . If the model is too simple to accurately capture the patterns in the data (for example, when using a linear model to fit non-linear data), its capacity is insufficient for the task at hand.

When training neural networks, for example, we go through multiple iterations of training in which the model learns to fit an increasingly complex function to the data. Typically, your training error will decrease during learning the more complex your model becomes and the better it learns to fit the data. In the beginning, the training error decreases rapidly. In later training iterations, it typically flattens out as it approaches the minimum possible error. Your test or generalization error should initially decrease as well, albeit likely at a slower pace than the training error. As long as the generalization error is decreasing, your model is underfitting because it doesn’t live up to its full capacity. After a number of training iterations, the generalization error will likely reach a trough and start to increase again. Once it starts to increase, your model is overfitting, and it is time to stop training.

Ideally, you should stop training once your model reaches the lowest point of the generalization error. The gap between the minimum generalization error and no error at all is an irreducible error term known as the Bayes error that we won’t be able to completely get rid of in a probabilistic setting. But if the error term seems too large, you might be able to reduce it further by collecting more data, manipulating your model’s hyperparameters, or altogether picking a different model.

Bias Variance Tradeoff

We’ve talked about bias and variance in the previous section. Now it is time to clarify what we actually mean by these terms.

Understanding Bias and Variance

In a nutshell, bias measures if there is any systematic deviation from the correct value in a specific direction. If we could repeat the same process of constructing a model several times over, and the results predicted by our model always deviate in a certain direction, we would call the result biased.

Variance measures how much the results vary between model predictions. If you repeat the modeling process several times over and the results are scattered all across the board, the model exhibits high variance.

In their book “Noise” Daniel Kahnemann and his co-authors provide an intuitive example that helps understand the concept of bias and variance. Imagine you have four teams at the shooting range.

Team B is biased because the shots of its team members all deviate in a certain direction from the center. Team B also exhibits low variance because the shots of all the team members are relatively concentrated in one location. Team C has the opposite problem. The shots are scattered across the target with no discernible bias in a certain direction. Team D is both biased and has high variance. Team A would be the equivalent of a good model. The shots are in the center with little bias in one direction and little variance between the team members.

Generally speaking, linear models such as linear regression exhibit high bias and low variance. Nonlinear algorithms such as decision trees are more prone to overfitting the training data and thus exhibit high variance and low bias.

A linear model used with non-linear data would exhibit a bias to predict data points along a straight line instead of accomodating the curves. But they are not as susceptible to random fluctuations in the data. A nonlinear algorithm that is trained on noisy data with lots of deviations would be more capable of avoiding bias but more prone to incorporate the noise into its predictions. As a result, a small deviation in the test data might lead to very different predictions.

To get our model to learn the patterns in data, we need to reduce the training error while at the same time reducing the gap between the training and the testing error. In other words, we want to reduce both bias and variance. To a certain extent, we can reduce both by picking an appropriate model, collecting enough training data, selecting appropriate training features and hyperparameter values. At some point, we have to trade-off between minimizing bias and minimizing variance. How you balance this trade-off is up to you.

The Bias Variance Decomposition

Mathematically, the total error can be decomposed into the bias and the variance according to the following formula.

Remember that Bayes’ error is an error that cannot be eliminated.

Our machine learning model represents an estimating function \hat f(X) for the true data generating function f(X) where X represents the predictors and y the output values.

Now the mean squared error of our model is the expected value of the squared difference of the output produced by the estimating function \hat f(X) and the true output Y.

The bias is a systematic deviation from the true value. We can measure it as the squared difference between the expected value produced by the estimating function (the model) and the values produced by the true data-generating function.

Of course, we don’t know the true data generating function, but we do know the observed outputs Y, which correspond to the values generated by f(x) plus an error term.

The variance of the model is the squared difference between the expected value and the actual values of the model.

Now that we have the bias and the variance, we can add them up along with the irreducible error to get the total error.

A machine learning model represents an approximation to the hypothesized function that generated the data. The chosen model is a hypothesis since we hypothesize that this model represents the true data generating function.

We choose the hypothesis from a hypothesis space that may be subject to certain constraints. For example, we can constrain the hypothesis space to the set of linear models.

When choosing a model, we aim to reduce the bias and the variance to prevent our model from either overfitting or underfitting the data. In the real world, we cannot completely eliminate bias and variance, and we have to trade-off between them. The total error produced by a model can be decomposed into the bias, the variance, and irreducible (Bayes) error.

About Author

Best Guesses: Understanding The Hypothesis in Machine Learning

February 22, 2024
General , Supervised Learning , Unsupervised Learning

Machine learning is a vast and complex field that has inherited many terms from other places all over the mathematical domain.

It can sometimes be challenging to get your head around all the different terminologies, never mind trying to understand how everything comes together.

In this blog post, we will focus on one particular concept: the hypothesis.

While you may think this is simple, there is a little caveat regarding machine learning.

The statistics side and the learning side.

Don’t worry; we’ll do a full breakdown below.

You’ll learn the following:

What Is a Hypothesis in Machine Learning?

Is This any different than the hypothesis in statistics?
What is the difference between the alternative hypothesis and the null?
Why do we restrict hypothesis space in artificial intelligence?
Example code performing hypothesis testing in machine learning

In machine learning, the term ‘hypothesis’ can refer to two things.

First, it can refer to the hypothesis space, the set of all possible training examples that could be used to predict or answer a new instance.

Second, it can refer to the traditional null and alternative hypotheses from statistics.

Since machine learning works so closely with statistics, 90% of the time, when someone is referencing the hypothesis, they’re referencing hypothesis tests from statistics.

Is This Any Different Than The Hypothesis In Statistics?

In statistics, the hypothesis is an assumption made about a population parameter.

The statistician’s goal is to prove it true or disprove it.

This will take the form of two different hypotheses, one called the null, and one called the alternative.

Usually, you’ll establish your null hypothesis as an assumption that it equals some value.

For example, in Welch’s T-Test Of Unequal Variance, our null hypothesis is that the two means we are testing (population parameter) are equal.

This means our null hypothesis is that the two population means are the same.

We run our statistical tests, and if our p-value is significant (very low), we reject the null hypothesis.

This would mean that their population means are unequal for the two samples you are testing.

Usually, statisticians will use the significance level of .05 (a 5% risk of being wrong) when deciding what to use as the p-value cut-off.

What Is The Difference Between The Alternative Hypothesis And The Null?

The null hypothesis is our default assumption, which we are trying to prove correct.

The alternate hypothesis is usually the opposite of our null and is much broader in scope.

For most statistical tests, the null and alternative hypotheses are already defined.

You are then just trying to find “significant” evidence we can use to reject our null hypothesis.

These two hypotheses are easy to spot by their specific notation. The null hypothesis is usually denoted by H₀, while H₁ denotes the alternative hypothesis.

Example Code Performing Hypothesis Testing In Machine Learning

Since there are many different hypothesis tests in machine learning and data science, we will focus on one of my favorites.

This test is Welch’s T-Test Of Unequal Variance, where we are trying to determine if the population means of these two samples are different.

There are a couple of assumptions for this test, but we will ignore those for now and show the code.

You can read more about this here in our other post, Welch’s T-Test of Unequal Variance .

We see that our p-value is very low, and we reject the null hypothesis.

What Is The Difference Between The Biased And Unbiased Hypothesis Spaces?

The difference between the Biased and Unbiased hypothesis space is the number of possible training examples your algorithm has to predict.

The unbiased space has all of them, and the biased space only has the training examples you’ve supplied.

Since neither of these is optimal (one is too small, one is much too big), your algorithm creates generalized rules (inductive learning) to be able to handle examples it hasn’t seen before.

Here’s an example of each:

Example of The Biased Hypothesis Space In Machine Learning

The Biased Hypothesis space in machine learning is a biased subspace where your algorithm does not consider all training examples to make predictions.

This is easiest to see with an example.

Let’s say you have the following data:

Happy and Sunny and Stomach Full = True

Whenever your algorithm sees those three together in the biased hypothesis space, it’ll automatically default to true.

This means when your algorithm sees:

Sad and Sunny And Stomach Full = False

It’ll automatically default to False since it didn’t appear in our subspace.

This is a greedy approach, but it has some practical applications.

Example of the Unbiased Hypothesis Space In Machine Learning

The unbiased hypothesis space is a space where all combinations are stored.

We can use re-use our example above:

This would start to breakdown as

Happy = True

Happy and Sunny = True

Happy and Stomach Full = True

Let’s say you have four options for each of the three choices.

This would mean our subspace would need 2^12 instances (4096) just for our little three-word problem.

This is practically impossible; the space would become huge.

So while it would be highly accurate, this has no scalability.

More reading on this idea can be found in our post, Inductive Bias In Machine Learning .

Why Do We Restrict Hypothesis Space In Artificial Intelligence?

We have to restrict the hypothesis space in machine learning. Without any restrictions, our domain becomes much too large, and we lose any form of scalability.

This is why our algorithm creates rules to handle examples that are seen in production.

This gives our algorithms a generalized approach that will be able to handle all new examples that are in the same format.

Hypothesis Space

Reference work entry
Cite this reference work entry

Hendrik Blockeel

5830 Accesses

4 Citations

4 Altmetric

Model space

The hypothesis space used by a machine learning system is the set of all hypotheses that might possibly be returned by it. It is typically defined by a Hypothesis Language , possibly in conjunction with a Language Bias .

Motivation and Background

Many machine learning algorithms rely on some kind of search procedure: given a set of observations and a space of all possible hypotheses that might be considered (the “hypothesis space”), they look in this space for those hypotheses that best fit the data (or are optimal with respect to some other quality criterion).

To describe the context of a learning system in more detail, we introduce the following terminology. The key terms have separate entries in this encyclopedia, and we refer to those entries for more detailed definitions.

A learner takes observations as inputs. The Observation Language is the language used to describe these observations.

The hypotheses that a learner may produce, will be formulated in...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Author information

Authors and affiliations.

You can also search for this author in PubMed Google Scholar

Editor information

Editors and affiliations.

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052

Claude Sammut

Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800

Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry.

Blockeel, H. (2011). Hypothesis Space. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_373

Download citation

DOI : https://doi.org/10.1007/978-0-387-30164-8_373

Publisher Name : Springer, Boston, MA

Print ISBN : 978-0-387-30768-8

Online ISBN : 978-0-387-30164-8

eBook Packages : Computer Science Reference Module Computer Science and Engineering

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

What is the difference between hypothesis space and representational capacity?

I am reading Goodfellow et al Deeplearning Book . I found it difficult to understand the difference between the definition of the hypothesis space and representation capacity of a model.

In Chapter 5 , it is written about hypothesis space:

One way to control the capacity of a learning algorithm is by choosing its hypothesis space, the set of functions that the learning algorithm is allowed to select as being the solution.

And about representational capacity:

The model speciﬁes which family of functions the learning algorithm can choose from when varying the parameters in order to reduce a training objective. This is called the representational capacity of the model.

If we take the linear regression model as an example and allow our output $y$ to takes polynomial inputs, I understand the hypothesis space as the ensemble of quadratic functions taking input $x$ , i.e $y = a_0 + a_1x + a_2x^2$ .

How is it different from the definition of the representational capacity, where parameters are $a_0$ , $a_1$ and $a_2$ ?

machine-learning
terminology
computational-learning-theory
hypothesis-class

3 Answers 3

Consider a target function $f: x \mapsto f(x)$ .

A hypothesis refers to an approximation of $f$ . A hypothesis space refers to the set of possible approximations that an algorithm can create for $f$ . The hypothesis space consists of the set of functions the model is limited to learn. For instance, linear regression can be limited to linear functions as its hypothesis space, or it can be expanded to learn polynomials.

The representational capacity of a model determines the flexibility of it, its ability to fit a variety of functions (i.e. which functions the model is able to learn), at the same. It specifies the family of functions the learning algorithm can choose from.

1 $\begingroup$ Does it mean that the set of functions described by the representational capacity is strictly included in the hypothesis space ? By definition, is it possible to have functions in the hypothesis space NOT described in the representational capacity ? $\endgroup$ – Qwarzix Commented Aug 23, 2018 at 8:43
$\begingroup$ It's still pretty confusing to me. Most sources say that a "model" is an instance (after execution/training on data) of a "learning algorithm". How, then, can a model specify the family of functions the learning algorithm can choose from? It doesn't make sense to me. The authors of the book should've explained these concepts in more depth. $\endgroup$ – Talendar Commented Oct 9, 2020 at 13:09

A hypothesis space is defined as the set of functions $\mathcal H$ that can be chosen by a learning algorithm to minimize loss (in general).

$$\mathcal H = \{h_1, h_2,....h_n\}$$

The hypothesis class can be finite or infinite, for example a discrete set of shapes to encircle certain portion of the input space is a finite hypothesis space, whereas hpyothesis space of parametrized functions like neural nets and linear regressors are infinite.

Although the term representational capacity is not in the vogue a rough definition woukd be: The representational capacity of a model, is the ability of its hypothesis space to approximate a complex function, with 0 error, which can only be approximated by infinitely many hypothesis spaces whose representational capacity is equal to or exceed the representational capacity required to approximate the complex function.

The most popular measure of representational capacity is the $\mathcal V$ $\mathcal C$ Dimension of a model. The upper bound for VC dimension ( $d$ ) of a model is: $$d \leq \log_2| \mathcal H|$$ where $|H|$ is the cardinality of the set of hypothesis space.

A hypothesis space/class is the set of functions that the learning algorithm considers when picking one function to minimize some risk/loss functional.

The capacity of a hypothesis space is a number or bound that quantifies the size (or richness) of the hypothesis space, i.e. the number (and type) of functions that can be represented by the hypothesis space. So a hypothesis space has a capacity. The two most famous measures of capacity are VC dimension and Rademacher complexity.

In other words, the hypothesis class is the object and the capacity is a property (that can be measured or quantified) of this object, but there is not a big difference between hypothesis class and its capacity, in the sense that a hypothesis class naturally defines a capacity, but two (different) hypothesis classes could have the same capacity.

Note that representational capacity (not capacity , which is common!) is not a standard term in computational learning theory, while hypothesis space/class is commonly used. For example, this famous book on machine learning and learning theory uses the term hypothesis class in many places, but it never uses the term representational capacity .

Your book's definition of representational capacity is bad , in my opinion, if representational capacity is supposed to be a synonym for capacity , given that that definition also coincides with the definition of hypothesis class, so your confusion is understandable.

1 $\begingroup$ I agree with you. The authors of the book should've explained these concepts in more depth. Most sources say that a "model" is an instance (after execution/training on data) of a "learning algorithm". How, then, can a model specify the family of functions the learning algorithm can choose from? Also, as you pointed out, the definition of the terms "hypothesis space" and "representational capacity" given by the authors are practically the same, although they use the terms as if they represent different concepts. $\endgroup$ – Talendar Commented Oct 9, 2020 at 13:18

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged machine-learning terminology computational-learning-theory hypothesis-class capacity ..

Featured on Meta
Bringing clarity to status tag usage on meta sites
Announcing a change to the data-dump process

Hot Network Questions

Is it safe to install programs other than with a distro's package manager?
Difference between 失敬する and 盗む
Risks of exposing professional email accounts?
What would happen if the voltage dropped below one volt and the button was not hit?
Help writing block matrix
World Building Knowledgebase - How to write good Military World Building
Is it possible to travel to USA with legal cannabis?
What does "if you ever get up this way" mean?
How would you slow the speed of a rogue solar system?
Invest smaller lump sum vs investing (larger) monthly amount
Not a cross, not a word (number crossword)
Why doesn’t dust interfere with the adhesion of geckos’ feet?
How can coordinates be meaningless in General Relativity?
Why are poverty definitions not based off a person's access to necessities rather than a fixed number?
Why is the wiper fluid hose on the Mk7 Golf covered in cloth tape?
Could an empire rise by economic power?
Is consciousness a prerequisite for knowledge?
Does it make sense for the governments of my world to genetically engineer soldiers?
MANIFEST_UNKNOWN error: OCI index found, but Accept header does not support OCI indexes
What rules of legal ethics apply to information a lawyer learns during a consultation?
Can Christian Saudi Nationals visit Mecca?
Can my employer require me to wear a dirty uniform and refuse to provide additional shirts?
best way to double-bend arrows smoothly
How to prevent my frozen dessert from going solid?

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

How to calculate hypothesis space

I'm trying to calculate the size of the hypothesis space of a function F. This function takes $N$ binary inputs and outputs a single binary classification.

With $N$ binary inputs, then the size of the domain must be $2^N$ . Then, I would think that for each of these possible $2^N$ instances there must be two hypotheses (one for each output). This would make the total number of hypotheses equal to $2 \times (2^N)$ .

I have read from other sources that the correct number of hypotheses is actually $2^{(2^N)}$ . What is the mistake in my thinking?

machine-learning
combinatorics

1 $\begingroup$ Could you please explain how you obtain the value of $2\times(2^N)$? That number does not appear to follow from the information you gave. Perhaps a complete enumeration of the cases when $N=2$ would clarify things. $\endgroup$ – whuber ♦ Commented Dec 25, 2015 at 4:08
$\begingroup$ My thinking was that each combination of the N binary inputs could yield a result of either true or false (a binary output). With two possible outputs for each of the 2^N possible function evaluations, I calculated there must be 2*(2^N) different hypotheses. I hope that explains my thinking better. $\endgroup$ – Isaac Getto Commented Dec 25, 2015 at 4:12
$\begingroup$ Please revisit your calculation, because it is incorrect. Explicit consideration of the case $N=2$ may help clear this up. $\endgroup$ – whuber ♦ Commented Dec 26, 2015 at 14:21

3 Answers 3

In general, whenever we have a function $f: \mathcal{D} \rightarrow \mathcal{C}$ , the function can be considered as an element of the set $\mathcal{C}^\mathcal{D}$ (called the function space ). The set of all possible functions with domain $\mathcal{D}$ and codomain $\mathcal{C}$ is the full function space $\mathcal{C}^\mathcal{D}$ . Each function in the space can be considered as a list of outputs for each of the inputs --- the list has $|\mathcal{D}|$ elements and each element takes on one of $|\mathcal{C}|$ possible outputs. Consequently, using a simple application of the multiplication principle of counting , we have:

$$\begin{align} \text{No. of possible functions with domain } \mathcal{D} \text{ and codomain } \mathcal{C} &= \underbrace{|\mathcal{C}| \times \cdots \times |\mathcal{C}|}_{|\mathcal{D}| \text{ times}} \\[12pt] &= |\mathcal{C}|^{|\mathcal{D}|}. \\[6pt] \end{align}$$

Now, you have already correctly determined that there are $2^n$ possible inputs in the domain of the function, so we have $\mathcal{D} = 2^n$ in the present case. For every possible input in the domain the function output takes on one of two binary values, so we have $|\mathcal{C}| = 2$ . Consequently, in this case we have:

$$\text{No. of possible functions with domain } \mathcal{D} \text{ and codomain } \mathcal{C} = |\mathcal{C}|^{|\mathcal{D}|} = 2^{2^n}. $$

1 $\begingroup$ Your answer requires a a knowledge of set theory and would be confusing to someone who would not start "counting" from zero. I am not familiar with using domain and codomain in the context of set theory, so I do not fully understand your explanation. It is no doubt correct, but accessibility may be an issue. $\endgroup$ – Carl Commented Feb 20, 2021 at 1:51
$\begingroup$ That is true, but I think this question is inherently a question about function spaces, which are generally explained in terms of sets. In order for the OP to obtain a good knowledge of this issue, I think he will ultimately need to read some material on function spaces and the rules of counting sets. $\endgroup$ – Ben Commented Feb 20, 2021 at 3:24
$\begingroup$ I agree, but other people read this as well, and not everyone, e.g., me, wants to learn about set language. There is nothing wrong with your answer, nor with mine, the only difference is jargon. I tried for accessibility, you tried for precision of set language, question of taste really. $\endgroup$ – Carl Commented Feb 20, 2021 at 7:04
$\begingroup$ P.S. +1 for your answer. $\endgroup$ – Carl Commented Feb 20, 2021 at 11:14
1 $\begingroup$ I like the fact that you have given a non-set based answer (+1). One of the nice things about having multiple answers is that you get explanations pitched with different levels of assumed knowledge and rigour. $\endgroup$ – Ben Commented Jul 3, 2022 at 1:56

Think of the output as being a lock (0 closed, 1 opened) that is potentially opened by keys. That is, there might be no combination that can open the lock, or as many as $2^n$ keys that can open it. If the lock can be opened by only one key, then counting in binary it is some number between $0000\dots0000$ and $1111\dots1111$ for a binary number of length $n$ , and there are $2^n$ of those. Next we ask how may combinations of two keys can open the lock and there are $\left(\begin{array}{c}2^n\\2\end{array}\right)$ of those.

In general, we are adding up combinations

$$\left(\begin{array}{c}2^n\\0\end{array}\right)+\left(\begin{array}{c}2^n\\1\end{array}\right)+\left(\begin{array}{c}2^n\\2\end{array}\right)+\dots+\left(\begin{array}{c}2^n\\2^n-1\end{array}\right)+\left(\begin{array}{c}2^n\\2^n\end{array}\right).$$

Finally, as order does not matter, we can use the binomial theorem (see e.g., here ) to get $${m \choose 0} + {m \choose 1} + {m \choose 2} + \dots + {m \choose m} = 2^m,$$ which substituting $m=2^n$ leads us to $2^{2^n}$ , which is the answer you read.

$\begingroup$ @Sycorax Like this answer better? $\endgroup$ – Carl Commented Jan 6, 2021 at 7:31
$\begingroup$ @Ben Thanks for the edit, but I'm curious, why improve an answer, which implies that you follow what is being said to the point of wanting to say it better, and then not upvote it? $\endgroup$ – Carl Commented Feb 19, 2021 at 9:13
$\begingroup$ Hi @Carl: Glad you liked the edit. I haven't upvoted because I'm still undecided on whether I like this answer. While I like the attempt to use an example, I'm not sure if the keys/locks analogy really makes function spaces easier or harder to understand. I've just upvoted a couple of your other answers in the meantime, while I think more about it. (Since my edit was purely on formatting and syntactical grounds, it does not really imply a like or dislike of the answer; I just wanted to make the formatting nicer.) $\endgroup$ – Ben Commented Feb 19, 2021 at 13:08

To calculate the Hypothesis Space:

if we have the given image above we can then figure it out the following way.

Count the number of attributes or features. In this case, we have four features or (4).

Analyze or if given what are the values corresponding to each feature (e.g. binary, or many different inputs). In this particular case, we have binary values (0/1).

So for each of the 2^4 attributes, the outputs can take 0 or 1.

Not the answer you're looking for? Browse other questions tagged machine-learning combinatorics or ask your own question .

Featured on Meta
Announcing a change to the data-dump process
Bringing clarity to status tag usage on meta sites

Hot Network Questions

Do I need to validate a Genoa MET daily ticket every time?
Avoiding USA "gambling tax"
Risks of exposing professional email accounts?
Velocity dispersion of stars in galaxies
When was EDH key exchange introduced to SSL/TLS?
best way to double-bend arrows smoothly
Why is there so much salt in cheese?
Fill the grid with numbers to make all four equations true
Can a quadrilateral polygon have 3 obtuse angles?
Driveway electric run using existing service poles
Hardware debouncing of 3.3V high signal for an ESP32 turned on via optocoupler
Escape from the magic prison
How do I safely download files of an older version software I want to dissect but don't want interferring with my already installed current version?
Best way to explain the thinking steps from x² = 9 to x=±3
quantulum abest, quo minus . .
Did Gandalf know he was a Maia?
Why is the wiper fluid hose on the Mk7 Golf covered in cloth tape?
In what instances are 3-D charts appropriate?
When trying to find the quartiles for discrete data, do we round to the nearest whole number?
Help writing block matrix
How would you slow the speed of a rogue solar system?
Does it make sense for the governments of my world to genetically engineer soldiers?
How to translate the German word "Mitmenschlich(keit)"
This is where/what we have come to

Machine Learning Theory - Part 2: Generalization Bounds

Last time we concluded by noticing that minimizing the empirical risk (or the training error) is not in itself a solution to the learning problem, it could only be considered a solution if we can guarantee that the difference between the training error and the generalization error (which is also called the generalization gap ) is small enough. We formalized such requirement using the probability:

That is if this probability is small, we can guarantee that the difference between the errors is not much, and hence the learning problem can be solved.

In this part we’ll start investigating that probability at depth and see if it indeed can be small, but before starting you should note that I skipped a lot of the mathematical proofs here. You’ll often see phrases like “It can be proved that …”, “One can prove …”, “It can be shown that …”, … etc without giving the actual proof. This is to make the post easier to read and to focus all the effort on the conceptual understanding of the subject. In case you wish to get your hands dirty with proofs, you can find all of them in the additional readings, or on the Internet of course!

Independently, and Identically Distributed

The world can be a very messy place! This is a problem that faces any theoretical analysis of a real world phenomenon; because usually we can’t really capture all the messiness in mathematical terms, and even if we’re able to; we usually don’t have the tools to get any results from such a messy mathematical model.

So in order for theoretical analysis to move forward, some assumptions must be made to simplify the situation at hand, we can then use the theoretical results from that simplification to infer about reality.

Assumptions are common practice in theoretical work. Assumptions are not bad in themselves, only bad assumptions are bad! As long as our assumptions are reasonable and not crazy, they’ll hold significant truth about reality.

A reasonable assumption we can make about the problem we have at hand is that our training dataset samples are independently, and identically distributed (or i.i.d. for short), that means that all the samples are drawn from the same probability distribution and that each sample is independent from the others.

This assumption is essential for us. We need it to start using the tools form probability theory to investigate our generalization probability, and it’s a very reasonable assumption because:

It’s more likely for a dataset used for inferring about an underlying probability distribution to be all sampled for that same distribution. If this is not the case, then the statistics we get from the dataset will be noisy and won’t correctly reflect the target underlying distribution.
It’s more likely that each sample in the dataset is chosen without considering any other sample that has been chosen before or will be chosen after. If that’s not the case and the samples are dependent, then the dataset will suffer from a bias towards a specific direction in the distribution, and hence will fail to reflect the underlying distribution correctly.

So we can build upon that assumption with no fear.

The Law of Large Numbers

Most of us, since we were kids, know that if we tossed a fair coin a large number of times, roughly half of the times we’re gonna get heads. This is an instance of wildly known fact about probability that if we retried an experiment for a sufficiency large amount of times, the average outcome of these experiments (or, more formally, the sample mean ) will be very close to the true mean of the underlying distribution. This fact is formally captured into what we call The law of large numbers :

If $x_1, x_2, …, x_m$ are $m$ i.i.d. samples of a random variable $X$ distributed by $P$. then for a small positive non-zero value $\epsilon$: \[\lim_{m \rightarrow \infty} \mathbb{P}\left[\left|\mathop{\mathbb{E}}_{X \sim P}[X] - \frac{1}{m}\sum_{i=1}^{m}x_i \right| > \epsilon\right] = 0\]

This version of the law is called the weak law of large numbers . It’s weak because it guarantees that as the sample size goes larger, the sample and true means will likely be very close to each other by a non-zero distance no greater than epsilon. On the other hand, the strong version says that with very large sample size, the sample mean is almost surely equal to the true mean.

The formulation of the weak law lends itself naturally to use with our generalization probability. By recalling that the empirical risk is actually the sample mean of the errors and the risk is the true mean, for a single hypothesis $h$ we can say that:

Well, that’s a progress, A pretty small one, but still a progress! Can we do any better?

Hoeffding’s inequality

The law of large numbers is like someone pointing the directions to you when you’re lost, they tell you that by following that road you’ll eventually reach your destination, but they provide no information about how fast you’re gonna reach your destination, what is the most convenient vehicle, should you walk or take a cab, and so on.

To our destination of ensuring that the training and generalization errors do not differ much, we need to know more info about the how the road down the law of large numbers look like. These info are provided by what we call the concentration inequalities . This is a set of inequalities that quantifies how much random variables (or function of them) deviate from their expected values (or, also, functions of them). One inequality of those is Heoffding’s inequality :

If $x_1, x_2, …, x_m$ are $m$ i.i.d. samples of a random variable $X$ distributed by $P$, and $a \leq x_i \leq b$ for every $i$, then for a small positive non-zero value $\epsilon$: \[\mathbb{P}\left[\left|\mathop{\mathbb{E}}_{X \sim P}[X] - \frac{1}{m}\sum_{i=0}^{m}x_i\right| > \epsilon\right] \leq 2\exp\left(\frac{-2m\epsilon^2}{(b -a)^2}\right)\]

You probably see why we specifically chose Heoffding’s inequality from among the others. We can naturally apply this inequality to our generalization probability, assuming that our errors are bounded between 0 and 1 (which is a reasonable assumption, as we can get that using a 0/1 loss function or by squashing any other loss between 0 and 1) and get for a single hypothesis $h$:

This means that the probability of the difference between the training and the generalization errors exceeding $\epsilon$ exponentially decays as the dataset size goes larger. This should align well with our practical experience that the bigger the dataset gets, the better the results become.

If you noticed, all our analysis up till now was focusing on a single hypothesis $h$. But the learning problem doesn’t know that single hypothesis beforehand, it needs to pick one out of an entire hypothesis space $\mathcal{H}$, so we need a generalization bound that reflects the challenge of choosing the right hypothesis.

Generalization Bound: 1st Attempt

In order for the entire hypothesis space to have a generalization gap bigger than $\epsilon$, at least one of its hypothesis: $h_1$ or $h_2$ or $h_3$ or … etc should have. This can be expressed formally by stating that:

Where $\bigcup$ denotes the union of the events, which also corresponds to the logical OR operator. Using the union bound inequality , we get:

We exactly know the bound on the probability under the summation from our analysis using the Heoffding’s inequality, so we end up with:

Where $|\mathcal{H}|$ is the size of the hypothesis space. By denoting the right hand side of the above inequality by $\delta$, we can say that with a confidence $1 - \delta$:

And with some basic algebra, we can express $\epsilon$ in terms of $\delta$ and get:

This is our first generalization bound, it states that the generalization error is bounded by the training error plus a function of the hypothesis space size and the dataset size. We can also see that the the bigger the hypothesis space gets, the bigger the generalization error becomes. This explains why the memorization hypothesis form last time, which theoretically has $|\mathcal{H}| = \infty$, fails miserably as a solution to the learning problem despite having $R_\text{emp} = 0$; because for the memorization hypothesis $h_\text{mem}$:

But wait a second! For a linear hypothesis of the form $h(x) = wx + b$, we also have $|\mathcal{H}| = \infty$ as there is infinitely many lines that can be drawn. So the generalization error of the linear hypothesis space should be unbounded just as the memorization hypothesis! If that’s true, why does perceptrons, logistic regression, support vector machines and essentially any ML model that uses a linear hypothesis work?

Our theoretical result was able to account for some phenomena (the memorization hypothesis, and any finite hypothesis space) but not for others (the linear hypothesis, or other infinite hypothesis spaces that empirically work). This means that there’s still something missing from our theoretical model, and it’s time for us to revise our steps. A good starting point is from the source of the problem itself, which is the infinity in $|\mathcal{H}|$.

Notice that the term $|\mathcal{H}|$ resulted from our use of the union bound. The basic idea of the union bound is that it bounds the probability by the worst case possible, which is when all the events under union are mutually independent. This bound gets more tight as the events under consideration get less dependent. In our case, for the bound to be tight and reasonable, we need the following to be true:

For every two hypothesis $h_1, h_2 \in \mathcal{H}$ the two events $|R(h_1) - R_\text{emp}(h_1)| > \epsilon$ and $|R(h_2) - R_\text{emp}(h_2)| > \epsilon$ are likely to be independent. This means that the event that $h_1$ has a generalization gap bigger than $\epsilon$ should be independent of the event that also $h_2$ has a generalization gap bigger than $\epsilon$, no matter how much $h_1$ and $h_2$ are close or related; the events should be coincidental.

But is that true?

Examining the Independence Assumption

The first question we need to ask here is why do we need to consider every possible hypothesis in $\mathcal{H}$? This may seem like a trivial question; as the answer is simply that because the learning algorithm can search the entire hypothesis space looking for its optimal solution. While this answer is correct, we need a more formal answer in light of the generalization inequality we’re studying.

The formulation of the generalization inequality reveals a main reason why we need to consider all the hypothesis in $\mathcal{H}$. It has to do with the existence of $\sup_{h \in \mathcal{H}}$. The supremum in the inequality guarantees that there’s a very little chance that the biggest generalization gap possible is greater than $\epsilon$; this is a strong claim and if we omit a single hypothesis out of $\mathcal{H}$, we might miss that “biggest generalization gap possible” and lose that strength, and that’s something we cannot afford to lose. We need to be able to make that claim to ensure that the learning algorithm would never land on a hypothesis with a bigger generalization gap than $\epsilon$.

Looking at the above plot of binary classification problem, it’s clear that this rainbow of hypothesis produces the same classification on the data points, so all of them have the same empirical risk. So one might think, as they all have the same $R_\text{emp}$, why not choose one and omit the others?!

This would be a very good solution if we’re only interested in the empirical risk, but our inequality takes into its consideration the out-of-sample risk as well, which is expressed as:

This is an integration over every possible combination of the whole input and output spaces $\mathcal{X, Y}$. So in order to ensure our supremum claim, we need the hypothesis to cover the whole of $\mathcal{X \times Y}$, hence we need all the possible hypotheses in $\mathcal{H}$.

Now that we’ve established that we do need to consider every single hypothesis in $\mathcal{H}$, we can ask ourselves: are the events of each hypothesis having a big generalization gap are likely to be independent?

Well, Not even close! Take for example the rainbow of hypotheses in the above plot, it’s very clear that if the red hypothesis has a generalization gap greater than $\epsilon$, then, with 100% certainty, every hypothesis with the same slope in the region above it will also have that. The same argument can be made for many different regions in the $\mathcal{X \times Y}$ space with different degrees of certainty as in the following figure.

But this is not helpful for our mathematical analysis, as the regions seems to be dependent on the distribution of the sample points and there is no way we can precisely capture these dependencies mathematically, and we cannot make assumptions about them without risking to compromise the supremum claim.

So the union bound and the independence assumption seem like the best approximation we can make,but it highly overestimates the probability and makes the bound very loose, and very pessimistic!

However, what if somehow we can get a very good estimate of the risk $R(h)$ without needing to go over the whole of the $\mathcal{X \times Y}$ space, would there be any hope to get a better bound?

The Symmetrization Lemma

Let’s think for a moment about something we do usually in machine learning practice. In order to measure the accuracy of our model, we hold out a part of the training set to evaluate the model on after training, and we consider the model’s accuracy on this left out portion as an estimate for the generalization error. This works because we assume that this test set is drawn i.i.d. from the same distribution of the training set (this is why we usually shuffle the whole dataset beforehand to break any correlation between the samples).

It turns out that we can do a similar thing mathematically, but instead of taking out a portion of our dataset $S$, we imagine that we have another dataset $S’$ with also size $m$, we call this the ghost dataset . Note that this has no practical implications, we don’t need to have another dataset at training, it’s just a mathematical trick we’re gonna use to git rid of the restrictions of $R(h)$ in the inequality.

We’re not gonna go over the proof here, but using that ghost dataset one can actually prove that:

where $R_\text{emp}’(h)$ is the empirical risk of hypothesis $h$ on the ghost dataset. This means that the probability of the largest generalization gap being bigger than $\epsilon$ is at most twice the probability that the empirical risk difference between $S, S’$ is larger than $\frac{\epsilon}{2}$. Now that the right hand side in expressed only in terms of empirical risks, we can bound it without needing to consider the the whole of $\mathcal{X \times Y}$, and hence we can bound the term with the risk $R(h)$ without considering the whole of input and output spaces!

This, which is called the symmetrization lemma , was one of the two key parts in the work of Vapnik-Chervonenkis (1971).

The Growth Function

Now that we are bounding only the empirical risk, if we have many hypotheses that have the same empirical risk (a.k.a. producing the same labels/values on the data points), we can safely choose one of them as a representative of the whole group, we’ll call that an effective hypothesis, and discard all the others.

By only choosing the distinct effective hypotheses on the dataset $S$, we restrict the hypothesis space $\mathcal{H}$ to a smaller subspace that depends on the dataset $\mathcal{H}_{|S}$.

We can assume the independence of the hypotheses in $\mathcal{H}_{|S}$ like we did before with $\mathcal{H}$ (but it’s more plausible now), and use the union bound to get that:

Notice that the hypothesis space is restricted by $S \cup S’$ because we using the empirical risk on both the original dataset $S$ and the ghost $S’$. The question now is what is the maximum size of a restricted hypothesis space? The answer is very simple; we consider a hypothesis to be a new effective one if it produces new labels/values on the dataset samples, then the maximum number of distinct hypothesis (a.k.a the maximum number of the restricted space) is the maximum number of distinct labels/values the dataset points can take. A cool feature about that maximum size is that its a combinatorial measure, so we don’t need to worry about how the samples are distributed!

For simplicity, we’ll focus now on the case of binary classification, in which $\mathcal{Y}=\{-1, +1\}$. Later we’ll show that the same concepts can be extended to both multiclass classification and regression. In that case, for a dataset with $m$ samples, each of which can take one of two labels: either -1 or +1, the maximum number of distinct labellings is $2^m$.

We’ll define the maximum number of distinct labellings/values on a dataset $S$ of size $m$ by a hypothesis space $\mathcal{H}$ as the growth function of $\mathcal{H}$ given $m$, and we’ll denote that by $\Delta_\mathcal{H}(m)$. It’s called the growth function because it’s value for a single hypothesis space $\mathcal{H}$ (aka the size of the restricted subspace $\mathcal{H_{|S}}$) grows as the size of the dataset grows. Now we can say that:

Notice that we used $2m$ because we have two datasets $S,S’$ each with size $m$.

For the binary classification case, we can say that:

But $2^m$ is exponential in $m$ and would grow too fast for large datasets, which makes the odds in our inequality go too bad too fast! Is that the best bound we can get on that growth function?

The VC-Dimension

The $2^m$ bound is based on the fact that the hypothesis space $\mathcal{H}$ can produce all the possible labellings on the $m$ data points. If a hypothesis space can indeed produce all the possible labels on a set of data points, we say that the hypothesis space shatters that set.

But can any hypothesis space shatter any dataset of any size? Let’s investigate that with the binary classification case and the $\mathcal{H}$ of linear classifiers $\mathrm{sign}(wx + b)$. The following animation shows how many ways a linear classifier in 2D can label 3 points (on the left) and 4 points (on the right).

In the animation, the whole space of possible effective hypotheses is swept. For the the three points, the hypothesis shattered the set of points and produced all the possible $2^3 = 8$ labellings. However for the four points,the hypothesis couldn’t get more than 14 and never reached $2^4 = 16$, so it failed to shatter this set of points. Actually, no linear classifier in 2D can shatter any set of 4 points, not just that set; because there will always be two labellings that cannot be produced by a linear classifier which is depicted in the following figure.

From the decision boundary plot (on the right), it’s clear why no linear classifier can produce such labellings; as no linear classifier can divide the space in this way. So it’s possible for a hypothesis space $\mathcal{H}$ to be unable to shatter all sizes. This fact can be used to get a better bound on the growth function, and this is done using Sauer’s lemma :

If a hypothesis space $\mathcal{H}$ cannot shatter any dataset with size more than $k$, then: \[\Delta_{\mathcal{H}}(m) \leq \sum_{i=0}^{k}\binom{m}{i}\]

This was the other key part of Vapnik-Chervonenkis work (1971), but it’s named after another mathematician, Norbert Sauer; because it was independently proved by him around the same time (1972). However, Vapnik and Chervonenkis weren’t completely left out from this contribution; as that $k$, which is the maximum number of points that can be shattered by $\mathcal{H}$, is now called the Vapnik-Chervonenkis-dimension or the VC-dimension $d_{\mathrm{vc}}$ of $\mathcal{H}$.

For the case of the linear classifier in 2D, $d_\mathrm{vc} = 3$. In general, it can be proved that hyperplane classifiers (the higher-dimensional generalization of line classifiers) in $\mathbb{R}^n$ space has $d_\mathrm{vc} = n + 1$.

The bound on the growth function provided by sauer’s lemma is indeed much better than the exponential one we already have, it’s actually polynomial! Using algebraic manipulation, we can prove that:

Where $O$ refers to the Big-O notation for functions asymptotic (near the limits) behavior, and $e$ is the mathematical constant.

Thus we can use the VC-dimension as a proxy for growth function and, hence, for the size of the restricted space $\mathcal{H_{|S}}$. In that case, $d_\mathrm{vc}$ would be a measure of the complexity or richness of the hypothesis space.

The VC Generalization Bound

With a little change in the constants, it can be shown that Heoffding’s inequality is applicable on the probability $\mathbb{P}\left[|R_\mathrm{emp}(h) - R_\mathrm{emp}’(h)| > \frac{\epsilon}{2}\right]$. With that, and by combining inequalities (1) and (2), the Vapnik-Chervonenkis theory follows:

This can be re-expressed as a bound on the generalization error, just as we did earlier with the previous bound, to get the VC generalization bound :

or, by using the bound on growth function in terms of $d_\mathrm{vc}$ as:

Professor Vapnik standing in front of a white board that has a form of the VC-bound and the phrase “All your bayes are belong to us”, which is a play on the broken english phrase found in the classic video game Zero Wing in a claim that the VC framework of inference is superior to that of Bayesian inference . [Courtesy of Yann LeCunn ].

This is a significant result! It’s a clear and concise mathematical statement that the learning problem is solvable, and that for infinite hypotheses spaces there is a finite bound on the their generalization error! Furthermore, this bound can be described in term of a quantity ($d_\mathrm{vc}$), that solely depends on the hypothesis space and not on the distribution of the data points!

Now, in light of these results, is there’s any hope for the memorization hypothesis?

It turns out that there’s still no hope! The memorization hypothesis can shatter any dataset no matter how big it is, that means that its $d_\mathrm{vc}$ is infinite, yielding an infinite bound on $R(h_\mathrm{mem})$ as before. However, the success of linear hypothesis can now be explained by the fact that they have a finite $d_\mathrm{vc} = n + 1$ in $\mathbb{R}^n$. The theory is now consistent with the empirical observations.

Distribution-Based Bounds

The fact that $d_\mathrm{vc}$ is distribution-free comes with a price: by not exploiting the structure and the distribution of the data samples, the bound tends to get loose. Consider for example the case of linear binary classifiers in a very higher n-dimensional feature space, using the distribution-free $d_\mathrm{vc} = n + 1$ means that the bound on the generalization error would be poor unless the size of the dataset $N$ is also very large to balance the effect of the large $d_\mathrm{vc}$. This is the good old curse of dimensionality we all know and endure.

However, a careful investigation into the distribution of the data samples can bring more hope to the situation. For example, For data points that are linearly separable, contained in a ball of radius $R$, with a margin $\rho$ between the closest points in the two classes, one can prove that for a hyperplane classifier:

It follows that the larger the margin, the lower the $d_\mathrm{vc}$ of the hypothesis. This is theoretical motivation behind Support Vector Machines (SVMs) which attempts to classify data using the maximum margin hyperplane. This was also proved by Vapnik and Chervonenkis.

One Inequality to Rule Them All

Up until this point, all our analysis was for the case of binary classification. And it’s indeed true that the form of the vc bound we arrived at here only works for the binary classification case. However, the conceptual framework of VC (that is: shattering, growth function and dimension) generalizes very well to both multi-class classification and regression.

Due to the work of Natarajan (1989), the Natarajan dimension is defined as a generalization of the VC-dimension for multiple classes classification, and a bound similar to the VC-Bound is derived in terms of it. Also, through the work of Pollard (1984), the pseudo-dimension generalizes the VC-dimension for the regression case with a bound on the generalization error also similar to VC’s.

There is also Rademacher’s complexity , which is a relatively new tool (devised in the 2000s) that measures the richness of a hypothesis space by measuring how well it can fit to random noise. The cool thing about Rademacher’s complexity is that it’s flexible enough to be adapted to any learning problem, and it yields very similar generalization bounds to the other methods mentioned.

However, no matter what the exact form of the bound produced by any of these methods is, it always takes the form:

where $C$ is a function of the hypothesis space complexity (or size, or richness), $N$ the size of the dataset, and the confidence $1 - \delta$ about the bound. This inequality basically says the generalization error can be decomposed into two parts: the empirical training error, and the complexity of the learning model.

This form of the inequality holds to any learning problem no matter the exact form of the bound, and this is the one we’re gonna use throughout the rest of the series to guide us through the process of machine learning.

References and Additional Readings

Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2012.
Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
Abu-Mostafa, Y. S., Magdon-Ismail, M., & Lin, H. (2012). Learning from data: a short course.

Mostafa Samir

Wandering in a lifelong journey seeking after truth.

Help | Advanced Search

Statistics > Machine Learning

Title: hypothesis spaces for deep learning.

Abstract: This paper introduces a hypothesis space for deep learning that employs deep neural networks (DNNs). By treating a DNN as a function of two variables, the physical variable and parameter variable, we consider the primitive set of the DNNs for the parameter variable located in a set of the weight matrices and biases determined by a prescribed depth and widths of the DNNs. We then complete the linear span of the primitive DNN set in a weak* topology to construct a Banach space of functions of the physical variable. We prove that the Banach space so constructed is a reproducing kernel Banach space (RKBS) and construct its reproducing kernel. We investigate two learning models, regularized learning and minimum interpolation problem in the resulting RKBS, by establishing representer theorems for solutions of the learning models. The representer theorems unfold that solutions of these learning models can be expressed as linear combination of a finite number of kernel sessions determined by given data and the reproducing kernel.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Functional Analysis (math.FA)
Cite as:	[stat.ML]
	(or [stat.ML] for this version)
	Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

HTML (experimental)
Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Genetic algorithm: Hypothesis space search

As already understood from our illustrative example, it is clear that genetic algorithms employ a randomized beam search method to seek maximally fit hypotheses. In the hypothesis space search method, we can see that the gradient descent search in backpropagation moves smoothly from one hypothesis to another. On the other hand, the genetic algorithm search can move much more abruptly. It replaces the parent hypotheses with an offspring that can be very different from the parent. Due to this reason, genetic algorithm search has lower chances of it falling into the same kind of local minima that plaques the gradient descent methods.

There is one practical difficulty that is often encountered in genetic algorithms, it is crowding. Crowding can be defined as the phenomenon in which some individuals that are more fit in comparison to others, reproduce quickly, therefore the copies of this individual take over a larger fraction of the population. Most of the strategies used in the genetic algorithms are inspired by biological evolution. One such other strategy used is fitness sharing, in which the measured fitness of an individual is decreased by the presence of another individual of a similar kind. The third method is to restrict all the individuals to combine to form offspring. To better understand we can say that by allowing individuals of the same kind to recombine, clusters of similar individuals are formed, forming multiple subspecies in the population.

Another method would be to spatially distribute individuals and allow only nearby individuals to combine.

Population evolution and schema theorem.

The schema theorem of Holland is used to mathematically characterize the evolution over time of the population with respect to time. It is based on the concept of schema. So, what is schema? Schema is any string composed of 0s, and 1s, and *s, where * represents null, so a schema 0*10, is the same as 0010 and 0110. The schema theorem characterizes the evolution within a genetic algorithm on the basis of the number of instances representing each schema. Let us assume the m(s, t) to denote the number of instances of schema denoted by ‘s’, in the population at the time ‘t’, the expected value in the schema theorem is described as m(s, t+1), in terms of m(s, t), and the other parameters of the population, schema, and GA.

In a genetic algorithm, the evolution of the population depends on the selection step, the recombination step, and the mutation step. The schema theorem is one of the most widely used theorems in the characterization of population evolution within a genetic algorithm. If it fails to consider the positive effects of crossover and mutation, it is in a way incomplete. There are many other recent theoretical analyses that have been proposed, many of these analogies are based on models such as Markov chain models and the statistical mechanical model.

← ^ →

Computational Learning Theory

Sample Complexity for Finite Hypothesis Spaces

The growth in the number of required training examples with problem size is called the sample complexity of the learning problem.
We will consider only consistent learners , which are those that maintain a training error of 0.
We can derive a bound on the number of training examples required by any consistent learner!
Fact: Every consistent learner outputs a hypothesis belonging to the version space.
Therefore, we need to bound the number of examples needed to assure that the version space contains no unacceptable hypothesis.
The version space $VS_{H,D}$ is said to be ε-exhausted with respect to $c$ and $\cal{D}$, if every hypothesis $h$ in $VS_{H,D}$ has error less than ε with respect to $c$ and $\cal{D}$. \[(\forall h \in VS_{H,D}) error_{\cal{D}}(h) < \epsilon \]

José M. Vidal .

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 02 September 2024

Adipose stem cells are sexually dimorphic cells with dual roles as preadipocytes and resident fibroblasts

Martin Uhrbom ORCID: orcid.org/0009-0006-5417-1210 1 , 2 ,
Lars Muhl ORCID: orcid.org/0000-0003-0952-0507 1 , 3 ,
Guillem Genové 1 ,
Jianping Liu ORCID: orcid.org/0000-0002-7336-3895 1 ,
Henrik Palmgren 4 ,
Ida Alexandersson ORCID: orcid.org/0000-0003-3038-3893 2 ,
Fredrik Karlsson 5 ,
Alex-Xianghua Zhou ORCID: orcid.org/0000-0002-3734-4638 4 ,
Sandra Lunnerdal 2 ,
Sonja Gustafsson 1 ,
Byambajav Buyandelger 1 ,
Kasparas Petkevicius 2 ,
Ingela Ahlstedt 2 ,
Daniel Karlsson 2 ,
Leif Aasehaug 6 ,
Liqun He ORCID: orcid.org/0000-0003-2127-7597 7 ,
Marie Jeansson ORCID: orcid.org/0000-0003-1075-8563 1 ,
Christer Betsholtz ORCID: orcid.org/0000-0002-8494-971X 1 , 7 na1 &
Xiao-Rong Peng ORCID: orcid.org/0000-0002-8914-0194 2 na1

Nature Communications volume 15 , Article number: 7643 ( 2024 ) Cite this article

Metrics details

Cell biology
Fat metabolism
Mesenchymal stem cells

Cell identities are defined by intrinsic transcriptional networks and spatio-temporal environmental factors. Here, we explored multiple factors that contribute to the identity of adipose stem cells, including anatomic location, microvascular neighborhood, and sex. Our data suggest that adipose stem cells serve a dual role as adipocyte precursors and fibroblast-like cells that shape the adipose tissue’s extracellular matrix in an organotypic manner. We further find that adipose stem cells display sexual dimorphism regarding genes involved in estrogen signaling, homeobox transcription factor expression and the renin-angiotensin-aldosterone system. These differences could be attributed to sex hormone effects, developmental origin, or both. Finally, our data demonstrate that adipose stem cells are distinct from mural cells, and that the state of commitment to adipogenic differentiation is linked to their anatomic position in the microvascular niche. Our work supports the importance of sex and microvascular function in adipose tissue physiology.

Wnt signaling preserves progenitor cell multipotency during adipose tissue development

Distinct functional properties of murine perinatal and adult adipose progenitor subpopulations

Adipogenic and SWAT cells separate from a common progenitor in human brown and white adipose depots

Introduction.

Adipose tissues (AT) comprise white (W) and brown (B) AT and putative intermediates that play critical roles in systemic metabolism through regulation of energy utilization, adaptive thermogenesis, and adipokine release 1 , 2 , 3 . Maladaptive expansion of WAT from over-nutrition poses a significant risk for type-2-diabetes (T2D), cardiovascular disease (CVD), and overall mortality 4 , 5 . Efforts to deepen the understanding of mechanisms that regulate cellular identity, heterogeneity, and developmental fate of AT resident cells can have profound implications for the identification of future therapeutic interventions for the treatment of obesity and T2D 6 , 7 , 8 .

To accommodate the need for variable nutrient storage and energy mobilization, WAT is one of the most dynamic tissues in the adult mammal. Expansion of WAT involves both cellular hypertrophy (increased adipocyte size) and hyperplasia (increased adipocyte number), the latter resulting from differentiation of resident adipose tissue progenitor cells 9 , 10 . Region-specific expansion of WAT displays strong sexual dimorphism in most mammals and correlates with differences in energy metabolism and disease risks. Women in the premenopausal age tend to store fat predominantly in subcutaneous (sc)WAT, which confers protective effects against obesity-related metabolic dysfunction. Conversely, men are prone to expand visceral (v)WAT depots, which is associated with an increased risk of T2D and CVD 11 , 12 , 13 , 14 , 15 , 16 . The underlying molecular mechanisms driving these sex differences remain largely unknown, although homeostatic control by sex hormones and developmental imprinting of cell-intrinsic properties have been implicated 10 .

More than 50% of the cells in AT are stromal, including endothelial cells, vascular mural cells (a unifying term for pericytes and smooth muscle cells), fibroblasts, and resident immune cells 17 . Recent advances in technologies such as fluorescence-activated cell sorting (FACS) and single-cell RNA sequencing (scRNA-seq) have provided new insights into the different WAT cell types suggesting that mechanisms governing WAT expansion are more complex than previously anticipated and involve different populations of adipose stem cells (ASC). Two or three subpopulations of ASC have been identified in vWAT and scWAT in mice and scWAT in humans albeit with some differences in claims regarding functional properties and adipogenic potential 6 , 18 , 19 , 20 , 21 .

Here, we used scRNA-seq to transcriptionally profile the stromal vascular fraction (SVF) of perigonadal (pg)WAT, a type of vWAT, from male and female Pdgfrb -GFP transgenic reporter mice. We find that pgWAT ASC resemble fibroblasts present in skeletal muscle and heart and can be separated into three ASC subtypes consistent with previously proposed ASC classification 6 . We also find that pgWAT ASC exhibits distinct sex-specific gene expression signatures relevant to Hox gene expression and vaso-regulatory functions. Finally, we distinguish blood vessel-associated ASC from mural cells and show different ASC subtype features and sex-specific adipogenic differentiation propensity ex vivo.

By integrating multiple intrinsic and micro-environmental variables defining ASC identities, our findings shed light on WAT sexual dimorphism and spatial relationships between ASC and vascular cells in the WAT niche.

Cell classes in the stromal vascular fraction of perigonadal white adipose tissue

Stromal vascular fraction cells were collected from pgWAT of 12 to 20-week-old female and male transgenic Pdgfrb GFP reporter mice using fluorescent-activated cell sorting (FACS) or CD31 and DPP4 antibody panning (Fig. 1a ). ScRNA-seq was performed on a total of 3,261 cells using the SmartSeq2 (SS2) protocol 22 . Clustering of single-cell transcriptomes using the Seurat package 23 resulted in 17 cell clusters (Fig. 1b ). Single-cell transcriptome clustering was visualized using UMAP (uniform manifold approximation and projection) plots (Fig. 1b and Supplementary Table 1 ) and hierarchical clustering based on the Pearson’s correlation coefficient calculated from the scaled average expression of the marker genes for each cluster (hereon referred to as Pearson’s r) (Fig. 1c ). The two methods indicated similar relatedness between the clusters.

a Overview of methodology, pgWAT from both female (n = 5) and male (n = 3) mice. Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license ( https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en ). b Seurat clustering of complete dataset (17 clusters) and UMAP dimensional reduction visualization. c Hierarchical ordering of clusters based on Pearson’s r value of the scaled average expression of marker gene expression in each cluster with cell-type annotation of clusters and dotplot of selected marker genes. The 45-gene signature for mural and fibroblasts, respectively, as defined by Muhl et al. 25 , are displayed in the dotplot. d Organotypic matrisome genes in fibroblasts from pgWAT, heart and skeletal muscle. e Enriched molecular functions in adipose ASC when compared to fibroblasts from heart and skeletal muscle, barplot shows gene expression of the enriched genes involved in each function. Red mark in the spreadsheet means that the gene is represented in the molecular function and white mark means that it is not represented. f Dotplot over adipogenic gene features of fibroblasts/ASC versus mural cells in pgWAT. g Barplots over common stem cells markers used to identify mesenchymal stem cells. Data are presented as mean values +/- SEM. n represent number of cells for scRNA-seq data. Abbreviations: ASC= Adipose stem cell, EpC = Epithelial cells, EC = Endothelial cells, LEC = Lymphatic EC, MAC = Macrophages, Mito=mitochondrial, pg= perigonadal white adipose tissue, SEM=Standard error of the mean,UMAP=Uniform Manifold Approximation and Projection and VSMC = Vascular smooth muscle cells.

To provide provisional annotations to the 17 clusters, we compared cluster-enriched transcripts with known cell type-specific markers 24 , 25 , 26 , 27 (Fig. 1c ). This suggested that clusters #0, 2, 3, 4, 5, 6, 7, 10, and 13 contained fibroblasts-like cells positive for e.g. Pdgfra, Col1a1, Dcn, and Lum . Cluster #12 contained pericytes positive for e.g. Kcnj8 , Abcc9, Rgs5, and Higd1b . Cluster #14 contained vascular smooth muscle cells (VSMC) positive for e.g. Acta2 , Tagln , Myh11, and Mylk . Clusters #1, 8 and 9 contained blood vascular endothelial cells (EC) positive for e.g. Pecam1 , Cdh5, Kdr, and Cldn5 . Cluster #15 contained lymphatic EC positive for e.g. Prox1, Flt4, Lyve1, and Ccl21d . Cluster #16 contained epithelial cells positive for e.g. Epcam, Krt18, and Fgfr4 . Cluster #11 contained macrophages positive for e.g. Cd68, Cd14, Ccr5, and Cd163 . Cluster 9 and 13 displayed high levels of mitochondrial gene enrichment which indicates damaged/stressed cells, the two clusters were therefore removed from further analysis (Supplementary Fig. 1a ). Because fibroblasts and pericytes are closely related and display few unique markers, we applied a previously assigned 90-gene signature containing 45 fibroblast-enriched and 45 mural cell-enriched genes 25 to support our provisional annotations (Fig. 1c ).

Fibroblast-like cells were the most abundant SVF cell type in our dataset. Fibroblasts in other organs, including heart, skeletal muscle, colon, bladder, and lung show extensive organotypic gene expression 25 . A comparison of the pgWAT fibroblast-like cells (clusters #0, 2, 3, 4, 5, 6, 7, 10, and 13) to skeletal muscle and heart fibroblasts 25 revealed separation according to organ-of-origin using both UMAP and Pearson’s r plots based on the 1000 most variable genes (Supplementary Fig. 1b and Supplementary table 2 ), although all cells shared the 45-gene fibroblast signature (Fig. 1c ). We next investigated if this organotypicity reflected differential expression of any particular class of genes. Genes for the matrisome , which includes extracellular matrix (ECM) and ECM-modulating proteins 28 , caused a similar dispersal of fibroblast clusters as the 1000 most variable transcripts. In contrast, genes encoding other functional categories of proteins caused markedly less dispersal (Supplementary Fig. 1b ). This suggests that the organotypicity of pgWAT fibroblast-like cells mainly reflects differential expression of matrisome genes in agreement with previous conclusions regarding the transcriptional basis for fibroblast differences between other organs (Fig. 1d ) 25 .

In addition to matrisome differences, pgWAT fibroblast-like cells distinguished from heart and muscle fibroblasts by expressing key genes in adipogenesis and lipid metabolism such as Pparg, Fabp4, Plin2 and Adipoq (Supplementary Fig. 1c ). Ingenuity Pathway Analysis (IPA) indeed suggested that pgWAT fibroblast-like cells display enrichment for molecular functions associated with lipid metabolism (Fig. 1e ). This was confirmed in previously published WAT scRNA-seq datasets (Supplementary Fig. 1d and Supplementary Table 3 ) 18 , 19 , 20 , 29 , 30 , 31 , 32 . We next asked if the pgWAT fibroblast-like cells correspond to ASC (a.k.a. pre-adipocytes). Three subtypes of ASC have previously been described, called ASC1a, ASC1b and ASC2 6 . Using signature markers, we found that all fibroblast-like clusters in our data matched the gene expression signatures of either ASC1a, ASC1b, or ASC2 (Fig. 1f ).

While these data provide evidence that ASC correspond to fibroblast-like cells, also murals cells have been proposed to be pre-adipocytes 33 , 34 , 35 . When comparing pgWAT fibroblast-like and mural cells we found stem cell marker Itgb1 (a.k.a Cd29 ) was expressed by both cell-types, whereas Cd34 / Ly6a were expressed by fibroblasts and Mcam by mural cells (Fig. 1g ). Most markers of ASC (Fig. 1f ), lipid metabolism and adipogenesis (Supplementary Fig. 2a,b ) were enriched in the fibroblast-like cells. Pparg , Plin2, and Fabp4 were equal or higher in pericytes (Supplementary Fig. 2b ) but because these genes had their highest expression in EC, we asked if contamination of pericytes by EC cell fragments, a commonly observed phenomenon 24 , 36 , could explain the presence of Pparg in our WAT pericytes. In support of this, we noted the presence of numerous canonical EC markers ( Pecam1, Ptprb, Cdh5, Tie1, and Cldn5 ) in pericytes at levels matching their level of Pparg , i.e. about 50% of that seen in EC (Supplementary Fig. 2b ). Not all pericytes were equally EC-contaminated, and after removal of pericytes positive for Pecam1, Ptprb, Cdh5, Tie1 and Cldn5 , the remaining pericytes showed low expression of Pparg, Plin2 and Fabp4 (Supplementary Fig. 2a, b ). Therefore, the abundance of Pparg, Plin2, and Fabp4 in pgWAT pericytes likely reflects contamination by EC.

Taken together, matrisome and adipogenic gene expression suggests that pgWAT fibroblast-like cells fulfill a dual role to shape the ECM that provides structural support to WAT (i.e. act as resident tissue fibroblasts) and to act as a reservoir of adipocyte precursors. Whether pgWAT mural cells contribute to adipogenesis as more distant progenitors of pre-adipocytes remains to be addressed.

Previous studies have suggested that ASC2 represents less committed and more multipotent adipocyte progenitors, whereas ASC1a represents a more committed stage of adipocyte differentiation 21 . The position of ASC1b cells in adipocyte differentiation will be discussed below. Because all clusters of pgWAT fibroblast-like cells in our dataset matched previously assigned ASC categories (Fig. 1f ), we will in the following refer to them as ASC.

Marker gene signature of sexually dimorphic ASC

For each ASC category (ASC1a, ASC1b, ASC2), we found at least two clusters located in different UMAP islands reflecting male or female origin (Figs. 1 b, f and 2a ). ASC1a cells (enriched with Col15a1, G0s2 and Cxcl14 ) were found in cluster 0 (male), and 4 and 5 (female). ASC1b cells (enriched with Clec11a and Fmo2 ) were found in cluster 10 (male) and 2 (female). ASC2 cells (enriched with Dpp4, Cd55 and Arl4d ) were found in clusters 7 (male) and 3 and 6 (female). Some separation in UMAP between males and females was also observed for EC and macrophages but less conspicuously compared to ASC (Fig. 2a ). This suggests that the sexual dimorphism of ASC go beyond the sex-specific expression of Y-chromosome genes and X-chromosome inactivation-associated genes present in all cells.

a UMAP-projection with male and female cells highlighted. b Volcano plot over differentially expressed genes between male and female ASC. Fold changes were calculated by EdgeR-LRT and p-values were adjusted for multipletesting using the Benjamini-Hochberg method. c Venn diagram indicating the 36 common differentially expressed genes in scRNA-Seq (from pgWAT) and FACS sorted ASC Bulk RNA-seq samples from both pgWAT and iWAT. The 104 sexually dimorphic DEGs specific to pgWAT and the 29 DEGs specific for bulk RNA-seq samples are also highlighted d Dot plot of the expression of the core set of 36 sexually dimorphic genes in ASC and endothelial cells from our scRNA-seq data with an outlook into the Tabula Muris consortium’s scRNA-seq mouse 20 organs database for mesenchymal stem cells in perigonadal, inguinal and mesenteric adipose tissue 38 e Same as in d but for Hox gene expression. Abbreviations: ASC Adipose stem cells, DEGs Differentially expressed genes, i inguinal, LRT likelihood Ratio Test, m mesenteric, pg perigonadal, sc single cell, RNA-seq RNA-sequencing, UMAP Uniform Manifold Approximation and Projection and WAT White adipose tissue.

To validate the sexually dimorphic ASC signatures in independent experiments, we performed bulk RNA-seq on ASC subpopulations isolated by FACS as CD45 − /CD31 − /CD34 + /DPP4 ± cells from adult male and female iWAT and pgWAT. CD45 − /CD31 − /CD34 + selection enriches for ASC 37 , while DPP4 ± distinguishes ASC2 (DPP4 + ) from ASC1a/b (DPP4 − ) (Supplementary Fig. 3a ). We also performed bulk RNA-seq on mature adipocytes from the same mice. Overall, the bulk RNA-seq signatures matched those established by scRNA-seq. High sequence counts for fibroblast markers (e.g. Dcn, Lum ) and low counts for markers of other cell types (e.g. Cd68, Pecam1, Kcnj8, Cspg4, Prox1, Pecam1, Lep ) supported purity of the isolated ASC, and the enrichment of marker genes for ASC1a, ASC1b and ASC2 matched between scRNA-seq and bulk RNA-seq data (Supplementary Figs. 4a, b ).

Using strict criteria for defining differentially expressed genes (DEGs) (see Methods), we assigned a core set of 36 sexually dimorphic DEGs in ASC identified in both scRNA-seq and bulk RNA-seq data (Fig. 2c and Supplementary Table 4 ), the latter obtained from both pgWAT and iWAT. When limiting the comparison to pgWAT, an additional 104 sexually dimorphic DEGs were identified (Fig. 2c ). Restricting the comparison to bulk RNA-seq samples from pgWAT and iWAT, 29 additional sexually dimorphic DEGs were suggested (Fig. 2c ). Five of 36 sexually dimorphic DEGs were sex-chromosome encoded (X chromosome: Xist, Heph , Prrg3 and Y chromosome: Ddx3y , Eif2s3y ). Only one of the 36 genes was common between ASC and EC ( Xist , Fig. 2d ). We conclude that the sexually dimorphic DEG pattern is largely cell-type specific and includes several genes associated with lipid handling that are enriched in male ASC (but not in EC) including Sult1e1, Agt, Avpr1a and S1pr3 (Figs. 1 e and 2c ).

To further validate the mouse ASC sexually dimorphic genes using independent data, we explored the publicly available Tabula Muris scRNA-seq dataset, comprising cells from 20 different organs 38 . We selected inguinal (i), perigonadal, and mesenteric (m) cells annotated by the authors 38 as mesenchymal stem cells (MSC, a common term for fibroblast-like cells 36 ) and found sex differences in the Tabula Muris iWAT and pgWAT MSC that matched our pgWAT ASC data across the core set of 36 sexually dimorphic DEGs, including most of the additional 107 DEGs specific to pgWAT as well as the 29 DEGs restricted to bulk samples (Fig. 2d , Supplementary Figs. 5a–c and Supplementary Table 5 ). The Tabula Muris mesenteric WAT MSC also matched our pgWAT profile, however, with the exceptions such as Ptpn5, Pgr, Slc25a30 , Sult1e1, Agt and Heph . The similarities and differences in sexually dimorphic gene expression between the different WAT depots may be biologically relevant. For example, male enrichment of Sult1e1 , encoding a sulfotransferase involved in the inactivation of estradiol 39 , may inhibit mammary gland formation in male iWAT 40 .

To investigate the impact of sex hormones on the expression of the 36-gene core set of sexually dimorphic DEGs, we performed bulk RNA-seq on sorted CD45 − /CD31 − /CD34 + /DPP4 ± cells from iWAT in castrated/ovariectomized and control mice. The male enriched transcripts C7, Sult1e1, Agt, Arl4a, Fkbp5, Angpt1, Arhgap24 and Ace were reduced in samples from castrated males (Supplementary Fig. 5d ), suggesting that their expression is controlled by androgens.

To make a provisional comparison with human, we explored publicly available single-nuclear (sn)RNA-seq data from human WAT 41 . FKBP5 , SVEP1 and EGFR , the human orthologs of mouse Fkbp5 , Svep1 and Egfr displayed higher expression in human male subcutaneous ASC and mature adipocytes compared to female cells, consistent with the mouse data (Supplementary Fig. 6a ). A consistent change in the direction of differential expression between mouse and human was also observed in ASC from human omental (visceral) depot for human orthologs of the mouse sexually dimorphic genes Fkbp5, Esr1 and C7 (Fig. 2c and Supplementary Figs. 6a, b ). Other orthologs of mouse sexually dimorphic ASC genes were not confirmed (Supplementary Figs. 6a, b ), but the significance of this is uncertain owing to the different technical platforms and depth of sequencing data. In conclusion, while confirming part of the mouse sexually dimorphic ASC gene expression pattern, additional and deeper human data will be required for a comprehensive comparison.

One of the most significant DEGs in female mouse pgWAT was Hoxa10 (Fig. 2b ), an observation that prompted a broader analysis of Hox transcripts. Male ASC showed enriched expression of Hox transcripts with lower numbers ( Hox(abc)1-8 ), whereas the opposite pattern was observed in females (enriched expression of Hox(acd)9-13 ) (Fig. 2e ). This Hox pattern was confirmed in bulk RNA-seq data from pgWAT ASC for 21 out of 25 genes (Supplementary Fig. 6c, d ). The Hox pattern was not observed in pgWAT EC, but was present in pgWAT MSC from the Tabula Muris dataset. Intriguingly, the Hox pattern was not observed in iWAT or mesenteric MSC (Fig. 2e and Supplementary Fig. 6c, d ). No clear trend towards similar patterns was observed in human subcutaneous and visceral (from the omental depot) ASC (Supplementary Fig. 6e ) 41 . The physiological relevance of sexually dimorphic Hox gene expression remains to be determined. It may reflect a different developmental history of pgWAT in males and females.

Sexually dimorphic pathways include RAAS and glucose metabolism disorder

We next used IPA to search for signaling pathways and cellular functions potentially affected by sexually dimorphic gene expression patterns. IPA analysis suggested Enhanced Renin-Angiotensin-Aldosterone-System (RAAS) pathway in male ASC (Fig. 3a ). Moreover, one of the top enriched terms for diseases and biological functions associated with the 36-gene core set of sexually dimorphic genes was Glucose Metabolism Disorder (Supplementary Figs. 7a, b ).

a Enriched canonical pathways in male and female ASC based on the core set of 36 sexually dimorphic genes. P-Values are derived from IPA-analysis and based on the right-tailed Fisher’s Test. b Schematic cartoon over the RAAS-system and the expression of its main components in adipose tissue. Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license ( https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en ). c Barplots of the expression of RAAS-associated genes in scRNA-seq dataset, FACS sorted ASC populations (bulk RNA-seq) and mature adipocytes (bulk RNA-seq). For all bulk samples n = 7 biological replicates except ASC1 male iWAT and mature adipocytes samples ( n = 8). d AngII effect on lipolysis from iWAT explants. n = 3 biological replicates, P-value = 0.0092 e AngII effect on in vitro differentiation of crude SVF cells from iWAT. n = 5 and represent five independent experiments. f Expression of detected RAAS-associated genes in cultivated SVF cells prior to initiation of differentiation. n = 9 technical well replicates from three independent experiments g Expression of Esr1 in ASC from pgWAT (scRNA-seq data). Statistics in Fig. 3d, e were calculated with two-way Anova using Sidak’s multiple comparisons test. Data are presented as mean values +/- SEM for c, f,g and mean values +/- SD for d and e . AngII Angiotensin II, ASC Adipose stem cells, ETC electron transport chain, i inguinal, pg perigonadal, RAAS Renin-angiotensin-aldosterone system, sc single cell, SD standard deviation, SEM Standard error of the mean, seq sequencing and WAT White adipose tissue.

Related to Renin-Angiotensin-Aldosterone-System , male ASC showed enriched expression of Ace , encoding angiotensin-converting enzyme, and Agt , encoding angiotensinogen (AGT). We found that other key genes of the RAAS-system (schematically illustrated in Fig. 3b ) were also expressed in WAT but without sexual dimorphism (Fig. 3c ). These genes included Atp6ap2 (encoding the renin receptor) expressed by ASC and adipocytes, Ctsd (encoding cathepsin D) expressed across multiple cell types, Cma1 and Mcpt4 (encoding chymases) expressed by macrophages, Enpep and Anpep (encoding aminopeptidases A and -N respectively) expressed by ASC, and Agtr1a (encoding angiotensin II receptor) expressed by ASC and mural cells. Ace2 , encoding angiotensin-converting enzyme 2, which is also the cellular receptor for SARS-CoV-2, was weakly expressed in our RNA-seq data. The role of a putatively increased RAAS signaling in male WAT remains unclear. Angiotensin II (AngII) has been reported to influence adipocyte differentiation and lipolysis 42 , 43 . However, we found no alteration in basal lipolysis in iWAT explants exposed to 100 ng/ml of AngII (Fig. 3d ) or in adipogenic differentiation of SVF cells from the same depot (Fig. 3e ) despite AngII receptor expression (Fig. 3f ). It is therefore possible that sexually dimorphic RAAS activity plays a role in hemodynamic regulation of WAT rather than having a direct effect on cell differentiation or lipolysis.

Related to enriched terms for diseases and biological functions the term Glucose Metabolism Disorder had the highest number of associated genes from the 36-gene core set (18/36) and the low p-value for the term is likely reflecting that the directional change of 17/18 genes in our data was consistent with previous reported data in the literature (Supplementary Figs. 7a–c ). Although the IPA software did not assign Glucose Metabolism Disorder specifically to male or female, most of the IPA-indicated references report aggravated disorder in males (Supplementary Fig. 7c ). For example, one study concluded that knockout of Fkbp5 (male ASC-enriched) decreases insulin resistance in mice on a high fat diet 44 . Another study found that upregulation of the Svep1 (male ASC-enriched) was associated with T2D in mice 45 .

Genes involved in sex-hormone signaling were among the sexual dimorphic DEGs. Female cells showed enriched expression of progesterone receptor ( Pgr ) and estrogen-receptor alpha ( Esr1 ) (Figs. 2b, c and 3g ). Conversely, male ASC showed enriched expression of the estrogen inactivator Sult1e1 (Fig. 2b, c ).

Male ASC1a/b cells have higher adipogenic potential in vitro than their female counterparts

Earlier studies have shown that DPP4 is highly expressed on the cell surface of human preadipocytes, and DPP4 has been suggested to affect both lipid metabolism and cell proliferation 46 . Previous publications have also suggested that DPP4 + adipogenic progenitors (ASC2) are less prone to differentiate into mature adipocytes 21 . Our scRNA-seq data included Dpp4 + (ASC2) and Dpp4 - (ASC1a/b) cells from both males and females (Fig. 4a ). To assess the potential of self-renewal and differentiation of DPP4 + and DPP4 − cells in SVF preparations from iWAT, we assessed proliferation rate and adipogenic differentiation by exposing confluent cell cultures to insulin alone or a cocktail of adipogenesis-inducing reagents including insulin, dexamethasone, IBMX and pioglitazone. In these experiments, DPP4 − (ASC1a/b) cells (Supplementary Fig. 3a ) showed low proliferation (Fig. 4b ) and high lipid droplet accumulation in the presence of insulin alone (Fig. 4c ). Conversely, DPP4 + (ASC2) had high proliferation rate (Fig. 4b ) and a low lipid droplet accumulation in the presence of insulin alone, which was marginally increased by the full cocktail (Fig. 4c ). Marker genes for mature adipocytes ( Lpl, Fabp4, Adipoq, Lep, Pparg ) were all significantly higher in DPP4 − (ASC1a/b) cells than in DPP4 + (ASC2) cells (Fig. 4d ) after adipogenic differentiation. The higher potential for self-renewal and lower ability to differentiate suggest that ASC2 cells are less committed adipose precursor cells than ASC1a/b.

a Dpp4 gene expression highlighted in UMAP projection of ASC. b Proliferation rate measured in vitro of isolated ASC1 and ASC2 cells. n = 10 for ASC1 and n = 20 for ASC2, n represents technical well replicates, similar results have been obtained in three independent experiments c Representative images of In vitro differentiated ASC1 and ASC2 cells using insulin or a full cocktail of adipogenic reagents d Expression of Dpp4 and marker genes for mature adipocytes in in vitro differentiated ASC. n = 8 for all groups except ASC1 insulin treated cells for which n = 11. n represent technical well replicates from three independent experiments e Barplot over the level of differentiation in ASC1 and ASC2 cells from iWAT and pgWAT from adult male and female mice. n = 5 biological replicates for all groups except female pgWAT ASC1 for which n = 4. f Representative images of differentiated ASC1 and ASC2 cells from iWAT and pgWAT from adult male and female mice. Statistics in b were calculated with a two-sided unpaired t-test for the data points collected at the final time point, t = 12.23, degrees of freedom =28, P-value < 0.001. Statistics in d and e were calculated with two-way ANOVA and Mixed-effects analysis, respectively, using Tukey’s multiple comparison test (Prism). Adjusted P-values for multiple testing were used (* P < 0.0332, ** P < 0.0021, *** P < 0.0002 and **** P < 0.0001) for d and (** P = 0.0055 and **** P < 0.0001) for e . The statistics in d were based on the delta Ct-values using TBP as a house keeping gene, see source data. Data are presented as mean values +/- SD. ASC Adipose stem cells, i inguinal, pg perigonadal, SD Standard deviation, UMAP Uniform Manifold Approximation and Projection, and WAT White adipose tissue.

We next compared the differentiation of ASC isolated from iWAT and pgWAT between sexes using the full cocktail of inducers. Male ASC1a/b from both pgWAT and iWAT showed increased adipogenic differentiation compared to the corresponding cells from females (Fig. 4e, f ). No sex difference was observed regarding the (low) propensity of ASC2 to differentiate into adipocytes.

We further asked whether the sex-specific difference in ASC1a/b differentiation was influenced by other SVF cells. To this end, we isolated crude SVF cells from iWAT/pgWAT and applied the same differentiation protocol as for the FACS-sorted ASC. Crude SVF cells contain most of the non-parenchymal cell-types of the depots (ASC, endothelial cells, hematopoietic cells) and are commonly used for studying adipogenesis in vitro. A trend toward higher differentiation was observed in males (Supplementary Figs. 8a, b ). Between the two AT depots, iWAT SVF showed higher adipogenic differentiation than pgWAT, which was statistically significant in females (Supplementary Figs. 8a, b ).

Because influence of sex on adipogenic differentiation appeared weaker in crude SVF cultures compared to FACS-sorted cells, we asked whether in vitro SVF culturing affected the sex-specific ASC transcriptome. Bulk RNA-seq of confluent crude SVF cultures 4 days after in vitro plating (i.e. at the state of the cells just before differentiation was initiated) showed loss of the 36 sexually dimorphic ASC gene profile and the Hox gene expression pattern observed in pgWAT in vivo (Supplementary Figs. 8c, d ). Also lost was the sex-specific clustering of transcriptomes seen with the scRNA-seq data (Supplementary Figs. 8e, f ). Instead clustering occurred by fat depot origin: iWAT or pgWAT (Supplementary Figs. 8e, f ). This was illustrated also at the level of individual genes: iWAT SVF maintained the specific (for iWAT) expression of Tbx15 whereas pgWAT SVF maintained the specific (for pgWAT) expression of Tcf21 (Supplementary Fig. 8g, h ). High expression of fibroblasts markers ( Col1a1, Col3a1, Dcn, Lum, Pdgfra, Fn1 ) and low expression of markers for endothelial cells, macrophages, pericytes and VSMC (Supplementary Fig. 8i ) confirmed that ASC are the major cell type of crude SVF cultures. Moreover, markers of ASC2 ( Dpp4 and Cd55 ) were higher in pgWAT SVF than in iWAT SVF in agreement with the lower adipogenic differentiation of pgWAT SVF (Supplementary Fig. 8j ). Low expression of Pparg in female pgWAT SVF (Supplementary Fig. 8k ) may explain the distinct and consistent low differentiation grade of these samples. WNT hormone signaling through frizzled receptors is known to downregulate Pparg 47 . We noted that several transcripts of WNT pathway activators were enriched in female pgWAT SVF, some of which were also enriched in FACS-sorted female pgWAT ASC1a/b cells (Supplementary Fig. 9a,b ). Wnt4 was consistently enriched in female pgWAT cells, and Rspo1 , encoding R-spondin-1 which potentiates WNT signaling, was consistently enriched in pgWAT in both males and females (Supplementary Fig. 9b ). Fzd1 , encoding Frizzled-1 receptor, was expressed in pgWAT, particularly in female ASC1a/b cells, as shown by our scRNA-seq data on FACS sorted cells (Supplementary Fig. 9b ).

Morphological distinction of mural cells and ASC along the adipose microvascular tree

Because pericytes have previously been suggested to constitute ASC, and because we found distinct adipogenic behavior of ASC1a/b and ASC2, we investigated the spatial relationships between ASC and WAT microvessels. We visualized endothelial cells, pericytes, VSMC and ASC in pgWAT isolated from Pdgfrb GFP reporter mice. This mouse strain has previously been used for mural cell imaging 48 . Although Pdgfrb is also expressed by fibroblasts, including ASC (Fig. 5a ), mural cells typically have stronger Pdgfrb expression and display a stronger Pdgfrb GFP signal 25 . In contrast, Pdgfra is a broad marker of fibroblasts and typically not expressed by mural cells 25 . We confirmed these expression patterns in our WAT scRNA-seq data (Fig. 5a ). We used anti-PDGFRA antibodies to discriminate ASC from mural cells, anti-DPP4 antibodies to visualize the ASC2 population, and anti-CD31 (PECAM-1) to visualize endothelial cells in Pdgfrb GFP mouse WAT. Pdgfrb GFP + cells displayed the typical morphologies of pericytes and VSMC adjacent to CD31-labeled endothelium (Fig. 5b ). The strong Pdgfrb GFP signals and long processes adherent to the abluminal side of capillary endothelial cells were consistent with pericytes, as known from other organs. WAT pericytes resembled so-called thin-strand pericytes of the mouse brain 49 , 50 (Fig. 5b , inset #3). Other Pdgfrb GFP + cells extended processes enveloping the vessel circumference; a phenotype consistent with arterial VSMC (Fig. 5b , inset #1). Intermediate mural cell morphologies typical of arteriolar VSMC were also observed (Fig. 5b , inset #2). VSMC with multiple short processes without obvious longitudinal or transversal orientation were observed in venules and veins (Fig. 5b , insets #4-5). Taken together, our observations suggest that AT mural cells display a continuum of morphologies along the arterio-venous axis similar to what has previously been described in brain 24 , 48 .

a Barplots of marker gene expression used for cell visualization. Data are presented as mean values +/- SEM. b Immunofluorescence staining of pgWAT from Pdgfrb GFP report line for CD31 (also known as PECAM1) displaying mural cells across the arteriovenous axis. c Same as in b but with staining for CD31 and PDGFRA. Arrows and arrowheads indicate the position of perivascular and interstitial ASC, respectively. d Same as in b but with staining for CD31 and DPP4. The Arrow and arrowhead indicate the location of interstitial ASC2-population and mesothelial cells, respectively. e Maximum intensity projection of slices with mesothelial DPP4 staining in Pdgfrb CRE-tdTOM / Pdgfra H2bGFP reporter mice. Predicted mesothelial nuclei (white circles) were marked based on the DPP4 staining pattern, and Pdgfra H2bGFP -positive nuclei (cyan circles) were marked dependent on the GFP signal. No overlap of mesothelial nuclei and Pdgfra H2bGFP -positive nuclei could be observed. GFP Green fluorescent protein, pgWAT perigonadal White adipose tissue and SEM Standard error of the mean.

Anti-PDGFRA antibodies labeled ASC at both perivascular and interstitial locations (Fig. 5c ). Of these, the ASC2 subpopulation was identified by anti-DPP4 staining as the, cells with interstitial location (Fig. 5d , arrows). We also noticed a DPP4 positive layer of cells covering the surface of the pgWAT depots (Fig. 5d, e , arrowhead). These cells were negative for PDGFRB and PDGFRA had the expected location of mesothelial cells, previously suggested to express DPP4 6 . To further investigate the spatial relationship between ASC subpopulations and vasculature, we performed whole-mount analysis of the pgWAT from Pdgfrb CreERT2: R26tdTomato /Pdgfra H2BGFP reporter mice. We localized all ASC using Pdgfra- driven nuclear GFP and, simultaneously, ASC2 by anti-DPP4. The results confirmed the observations on Pdgfrb -reporter mice, namely that DPP4 - ASC1a/b-cells are located in close vicinity to and partially in direct contact with blood vessels (Fig. 6a–c ), whereas most DPP4 + ASC2 cells were located at discernable distance from the vessels (Fig. 6a, b ). These spatial relationships were confirmed by 3D image rotation (Fig. 6a, b ), or by 3D rendering videos of whole mount preparations. The latter analysis showed that DPP4 + ASC2 cells that appeared blood vessel-associated in 2D were not, judging by 3D rendering (Supplementary Movies 1 and 2 ). Figure 6c shows a schematic cartoon of the vasculature and ASC subpopulations in pgWAT.

a pgWAT cryo-section staining and 3D rendering in Pdgfrb CRE-dtTOM / Pdgfra H2bGFP mice with anti-DPP4 antibody staining for ASC2. b pgWAT whole amount staining and 3D rendering in Pdgfrb CRE-tdTOM / Pdgfra H2bGFP mice with Pdgfra driven GFP expression marked in cyan color. c Schematic cartoon of our view of the perivascular and vascular cells in pgWAT. Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license ( https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en ). d Immunofluorescence staining of pgWAT from wilt-type mice for NGFR (encoded by the female ASC specific Ngfr transcript), PDGFRA and CD31 e same as in d but staining for ASC2 marker DPP4 instead of the ASC marker PDGFRA. ASC Adipose stem cells, GFP Green fluorescent protein, and pgWAT perigonadal White adipose tissue.

We finally asked if any of the sexually dimorphic ASC genes could be verified at the protein level in ASC at their respective perivascular or interstitial locations. Such analysis is strictly dependent on antibodies that are specific and functional together with other antibodies in immunofluorescence of pgWAT. Here, we found that antibodies against nerve growth factor receptor (NGFR, encoded by the female-specific mRNA Ngfr ) stained perivascularly located ASC (PDGFRA + ) (Fig. 6d ) and interstitially located ASC2 (DPP4 + ) (Fig. 6e ) in pgWAT in agreement with the Ngfr being one of the 104 DEGs for pgWAT (Fig. 2c ).

Establishment of specific ASC populations in different WAT depots likely depends on a combination of factors, including developmental signals, sex and anatomic location. Adipogenesis has been linked to the vascular niche in WAT, a location that harbors several different cell types including mural cells, fibroblasts, and endothelial cells, which have all been suggested as preadipocytes 51 , 52 , 53 , 54 , 55 . Lineage-tracing is complicated by the shortage of specific pan-fibroblast and pan-mural cell markers. A cross-organ comparison of scRNA-seq data combined with in vitro assays, as presented herein, converge on a fibroblast-like identity for ASC. Based on the expression of canonical fibroblast markers (e.g. Pdgfra, Col1a1, Dcn , and Lum ) and a 90-gene signature for discrimination of fibroblasts from mural cells 25 , we conclude that pgWAT ASC are equivalent to WAT fibroblasts. Like fibroblasts in other organs, pgWAT ASC showed organotypic features, as reflected by differential expression of matrisome genes (e.g . Rspo1, Col6a5 , Frzb , Col11a1 and Col12a1 ) alongside genes involved in the regulation of adipogenesis ( Pparg ) and lipid metabolism ( Fabp4, Plin2 ). These data suggest that ASC serve a dual role of being adipocyte precursors and tailors of the specific WAT ECM composition. ASC heterogeneity within individual WAT depots has been demonstrated under both basal conditions and after a challenge by obesogenic diet or β3-adrenergic receptor activation 18 , 56 . Our transcriptional profiles of pgWAT fibroblasts matched previously reported ASC1a, ASC1b, and ASC2 subpopulations 6 .

It is increasingly clear that adiposity at different anatomic locations, e.g. subcutaneous, gluteofemoral, and visceral, have distinct metabolic profiles that are strongly influenced by sex, and that these profiles may be more reliable proxies of T2D and cardiovascular disease risks than BMI 11 , 57 , 58 . Despite this, hitherto published scRNA-seq studies of mouse WAT either focused on males or lacked specific analysis of sex differences when both sexes were present 6 , 18 , 19 , 20 , 21 . Here, we uncover sexual dimorphism of ASC with putative importance for the metabolic profile of WAT. The sexual dimorphism is observed in ASC transcriptomic signature, as well as in some ex vivo adipogenic behaviors of isolated ASC.

WAT is an important source of AGT and expresses the machinery necessary to generate the vasoconstrictor AngII. Our scRNA-seq data revealed that Agt and Ace are highly expressed in male ASC. If and how adipose RAAS contributes to obesity-associated hypertension systematically is unclear 59 , 60 . Given the proximity of ASC to blood vessels, locally generated AngII may regulate microvascular tone. AGTR1, the receptor for AngII, is expressed by pericytes in WAT and other tissues 61 . Our results that high levels of AngII lacked an effect on basal lipolysis and adipogenic differentiation contradicts previous work showing that AngII inhibits lipolysis and impacts differentiation 42 , 43 . Further work is needed to understand the relative contribution of locally produced AngII and its relationship to AT capillary function or capillary function in peripheral tissue in general, as well as the relevance to human AT biology. The finding of AGTR1, the AngII receptor expression in pericytes is intriguing, since WAT blood flow is regulated between meals 62 , and pericytes have been suggested to regulate blood flow in the brain and heart. Hence, a similar role for pericytes may be speculated for adipose tissue 63 .

Sexual dimorphic expression of estrogen receptor alpha ( Esr1) and the estrogen-inactivating enzyme Sult1e1 , indicates that estrogen-receptor signaling is a strong driver of the sex differences in ASC transcriptomes. Studies of male scWAT adipose progenitor cell transplantation into females suggested that the transplanted cells adopted the behavior of the host during high fat diet, suggesting environmental control in which sex hormones likely play a role 10 . In this context, it was interesting to note that most of the sex-specific transcriptomic differences disappeared when WAT SVF cells were grown in vitro. In marked contrast, some of the conspicuous depot-dependent differences in transcription factor gene expression ( Tbx15 and Tcf21 ) remained in vitro.

Our data suggest that Hox genes with numbers below 9 are male pgWAT ASC specific, whereas numbers 9 and above are female specific. Sex -and depot-dependent differences in Hox genes expression of AT has been reported previously 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 . One of these reports focused on the developmental signature of human abdominal and gluteal subcutaneous adipose tissues in men and women 69 . They found that Hox genes with lower numbers (as in our male-derived ASC) were enriched in abdominal depots of both sexes, whereas Hox genes with higher number (as in our female-derived ASC) were upregulated in gluteal depot of both sexes. This might reflect different embryonic origins of the ASC in males and females, with or without physiological impact in adults. Either way, the differential expression of Hox genes suggests that sexually dimorphic properties of ASC may to some extent be imprinted already during embryonic development and maintained through life.

We further observed that FACS-sorted ASC1 from male inguinal (subcutaneous, iWAT) and perigonadal (visceral, pgWAT) AT depots had higher adipogenic potential in vitro than female counterparts. A similar trend, albeit not statistically significant, was seen in the adipogenic potential of crude SVF from both iWAT and pgWAT. Several factors might contribute to this difference. ASC2 might become dominant over ASC1 in culture due higher proliferation rate. Alternatively, the presence of other cell types may regulate the adipogenic potential of ASC in culture. Transcriptomic analysis of cultured SVF just prior to initiation of differentiation indicated a higher proportion of ASC2-cells and lower expression of Pparg in the pgWAT samples. More active WNT-signaling in female pgWAT (higher levels of e.g. Fzd1, Rspo1 and Wnt4 ) may also underlie this difference. WNT hormones are known inhibitors of adipogenesis that downregulate Pparg through canonical WNT signaling 47 . Merrick et al. 21 found a higher proliferation rate and lower adipogenesis in ASC2 compared to ASC1a/b and that ASC2 represents a multi-potent and less committed precursor population that contributes to basal adipogenesis in both visceral and subcutaneous fat depots 56 . Our results concur with this conclusion. Furthermore, Merrick et al. used Lin − /CD142 + , Lin − /CD142 − /DPP4 + and Lin − /CD142 − /ICAM + gating to isolate ASC1b, ASC2 and ASC1a populations respectively, and found that both ASC1a and ASC1b could be readily differentiated into mature adipocytes in vitro, and that the differentiation of ASC1b and ASC2 was inhibited by TGFβ, whereas ASC1a was not. Schwalie et al. 20 used a similar FACS-strategy to isolate ASC1b cells by dividing a Lin − /CD34 + /SCA1 + fraction into either CD142 positive (ASC1b) or negative (ASC2 and ASC1a) cells, but in contrast to Merrick et al, their ASC1b (a.k.a. adipogenesis regulatory cells (Aregs)) demonstrated an inhibitory activity on adipogenesis and low adipogenic potential in vitro 20 . In our present analysis, DPP4 − (ASC1a/b) cells from male mice readily underwent adipogenesis in minimal adipogenic culture medium. A possible explanation for the different functional properties of ASC1b in the different studies (Supplementary Data 1 , 2 ) is that the proportion of Aregs in the DPP4 − fraction may vary (it remains unknown in our data). In summary, our data confirm the intra-depot transcriptional heterogeneity of ASC suggested by others, but also uncovers differences in the functional properties of these cells in vitro that deserve further study.

It has long been observed that adipocytes arise in a perivascular niche of AT, which has inspired lineage tracing studies focused on blood vessel and associated cells. Our scRNA-seq data show that pgWAT ASC and mural cells are distinct. Surface marker profiles of cells in the WAT perivascular niche distinguish endothelial cells (CD31 + ), mural cells (PDGFRB + , PDGFRΑ − ), ASC1 (PDGFRΑ + , DPP4 − ), and ASC2 (PDGFRΑ + , DPP4 + ) and their spatial distribution. In addition to the classical distinction between VSMCs around arteries and veins 72 , 73 and pericytes in capillaries (Fig. 5a, b ), our data reveal transitional cell morphologies along the arterio-venous axis in WAT similar to those previously reported in the brain.

Previous work has shown that the DPP4 + ASC cells are excluded from the perivascular compartment and reside in the reticular WAT interstitium, a fluid-filled layer containing elastin and collagen fibers surrounding parenchymal cells in many organs 21 . Conversely, DPP4 − ASC1b cells have been suggested to reside in the perivascular space 20 . Accordingly, we find DPP4 + ASC2 in the pgWAT interstitial space without blood vessel association, whereas ASC1a/b are immediately outside of the mural cell coat. This location may allow regulation of vascular permeability and blood pressure in addition to serving as a pre-adipocyte niche 73 . The recent progress in AT scRNA-seq will likely bring further understanding of the signaling networks that regulate interactions between EC, mural cells, ASC, and mature adipocytes in health and obesity 74 .

For antibodies used for imaging see Supplementary Table 7 . For antibodies used for FACS-sorting see Supplementary Table 8 . For medium and enzymes used for the digestion of tissues see Supplementary Table 9 . For key reagents for ASC proliferation, differentiation, and imaging see Supplementary Table 10 . For Primers used for qPCR (Method: SYBR Green) see Supplementary Table 11 .

All mouse experiments were conducted according to local guidelines and regulations for animal welfare, experiments on reporter mice strains were covered by ethical permits approved by Linköping’s animal Research Ethics, approval ID 729 and 3711-2020, whereas experiments with wild-type mice for in vitro studies and FACS bulk RNA-seq isolations were covered by ethical permits approved by Gothenburg’s animal research Ethics committee, approval ID: 000832-2017. All animals were maintained on a 12 h light – 12 h dark cycle in a temperature-controlled environment (22 °C), with free access to water and chow-diet. For the scRNA-seq experiments and tissue imaging we used a Pdgfrb GFP (Genesat.org, Tg( Pdgfrb -eGFP)) mouse strain that have been backcrossed to the C57BL6/J background (The Jackson Laboratory, C57B16/J). Pdgfra H2b-GFP (B6.Cg- Pdgfra tm11(EGFP)Sor ) 75 mice were crossed to Pdgfrb- Cre ERT2 (Tg( Pdgfrb -CRE/ERT2)6096Rha) 76 and Ail4-TdTomato(B6.Cg-Gt(ROSA)26Sortm14 (CAG-TdTomato)Hze) 77 mice to generate tissue imaging as shown in Fig. 6 . Pdgfrb -Cre ERT2 was induced with 3 doses of tamoxifen (2 mg) in peanut oil by oral gavage at 4 weeks of age to activate TdTomato expression. For FACS-sorted ASC and mature adipocyte isolation used for generating bulk RNA-seq samples we used 16 C57BL6/J mice (eight males and eight females) of the age of 18 weeks (supplied by Charles River). For the castration/ovariectomy study, eight ovariectomized (study code: OVARIEX), eight castrated (Study code: CASTRATE) and aged matched controls of the strain C57BL/6 J were supplied from Charles River and terminated at 10 weeks of age. Castration/Ovariectomy was conducted at the age of four weeks. Animals were fasted for 4 h before termination for studies of bulk RNA-seq of FACS sorted ASC. For in vitro experiments (Fig. 4e, f ) we used wild-type C57BL/6 N mouse strain (supplied by Charles River) and C57BL/6 J for the rest of the experiments displayed in Figure 4 and Supplementary Figs. 8 , 9 . Adult mice of both sexes were used at an age range of 12–20 weeks for scRNA-seq and in vitro experiments.

Isolation of single cells from mouse adipose tissue for scRNA-seq and in vitro experiments

Mice were euthanized according to the ethical permission by cervical dislocation before inguinal/perigonadal white adipose tissue was removed and placed into cold PBS solution. The adipose tissue was then cut into smaller pieces before incubation in dissociation buffer (Skeletal Muscle dissociation kit, Miltenyi), supplemented with 1 mg/ml Collagenase type IV-S at 37 °C with horizontal shaking at 500–800 rpm. For all in vitro experiments a different enzymatic mixture was used with 2 mg/ml Dispase ii, 1 mg/ml Collagenase I, 1 mg/ml Collagenase II and 25 units/ml of DNAse dissolved in DMEM. The tissue was further disintegrated by pipetting every 10 min during the 30-minute-long incubation. The cell suspension was then sequentially passed through a 70 µm and 40 µm cell strainers, before 5 ml of DMEM was passed through both strainers as final washing step. Cells were then spun at 250xg for 5 min, the buffer was removed, and the pellet was re-suspended in FACS buffer (PBS, supplemented with 0.5% BSA, 2 mM EDTA, 25 mM HEPES). Cells were then labeled with fluorophore-conjugated antibodies (anti-CD31, anti-CD34, anti-DPP4, anti-CD45) for 20 min on ice, then centrifugated at 250xg for 5 min, after removal of the supernatant the pellet was re-suspended with FACS buffer and kept on ice. For isolation of mature adipocytes, the crude cell suspension was passed through a 100 µm instead, and the remaining cell suspension was left on ice for a few minutes allowing floating mature adipocytes to be collected from the surface of the suspension. The mature adipocytes were transferred to a separate Eppendorf tube where redundant cell suspension medium was removed with a syringe.

Fluorescent activated cell sorting (FACS) for scRNA-seq

Cell suspension derived from Pdgfrb GFP reporter mice were stained with antibodies and subjected to flow cytometry sorting as described previously 25 . Briefly. Beckson Dickson FACS Aria III or FACS Melody Cells instruments equipped with 100 µm nozzle were used for sorting cells into individual-wells of a 384 well-plate containing 2.3 µl of lysis buffer (0.2% Triton X-100, 2 U/ml RNAse inhibitor, 2 mM dNTPs, 1 µM Smart-dT30VN primers). Correct aiming was assured by test-spotting beads onto the plastic seal of each plate. Sample plates were kept at 4 C during sorting and directly placed on dry-ice afterwards, plates were stored in −80 °C until further processing. The gating-strategy was applied to enrich cells expressing protein signatures of interest but not used for cell identification. For FACS-sorting of single cells: First, a gate of forward and side scatter area (FSC-A/SSC-A) on the linear scale was set generously in order to only eliminate cells with low values (red blood cells and cell debris), a second gate for double discrimination was used based on distance from the diagonal line in the FSC-A/FSC-height plot, the third selection criteria was based on fluorescent signaling, with “fluorescent minus one” or mice negative for the GFP-reporter used as gating controls. Cells negative for CD45-staining were first selected, further gating were then either based on Pdgfrb GFP - /CD31 + , CD31-/ Pdgfrb GFP + or CD31-/ Pdgfrb GFP + / DPP4± selections.

RNA isolation and quantification of differentiated adipose stem cells

Qiagen’s RNAeasy plus micro kit (Cat. No. 74 034) was used for RNA isolation from in vitro differentiated ASCs The High-Capacity cDNA Reverse Transcription kit (Cat. No. 4368814) was used to generate cDNA from RNA, and SYBR Green PCR Master Mix (Cat. No. 4309155) with custom primers from Thermo Fisher Scientific were used for relative quantification of mRNA levels (see separate Supplementary table for primer sequences). The experiments were repeated three times using in total 4 mice (two females and two males).

Smartseq2 library preparation and sequencing

Isolation of mRNA molecules from single cells with subsequent cDNA synthesis and sequencing was carried out as described previously 22 , 25 . Briefly, cDNA was synthesized from mRNA using oligo-dT primers and SuperScript II reverse transcriptase (ThermoFischer Scientific). Templated switching oligo (TSO) was used for synthesizing the second strand of cDNA before amplification by 23-26 polymerase chain reaction (PCA) cycles. Purified amplicons were then quality controlled using an Agilent 2100 Bioanalyzer with a DNA High sensitivity chip (Agilent Biotechnologies). QC-passed cDNA libraries were then fragmented and tagged (tagmentation) using Tn5 transposase, and samples from each well were then uniquely indexed using Illumina Nextera XT index kits (set A-D). The uniquely labeled cDNA libraries from one 384-well plate were then pooled to one sample before loaded onto one lane of a HiSeq3000 sequencer (Illumina). Dual indexing and single 50 base-pair reads were used during sequencing.

Qiagen’s RNeasy Micro kit (Cat. No. 74004) was used for the isolation of RNA from FACS-sorted ASC cells (5000 cells per sample) for bulk smartseq2 library preparation. For isolation of RNA from mature adipocytes, QIAzol lysis reagent (Cat. No. 79306) was first used followed by the addition of chloroform and subsequent integration of the RNA containing aqueous phase with the workflow of the RNeasy clean up. RNA samples were diluted to 3 ng/µl and 5 ng of RNA was used for cDNA synthesis according to the smartseq2 protocol. For FACS-sorted ASC bulk samples, RNA from approximately 300 cells were used for cDNA synthesis, and 16 PCA-cycles were used for cDNA amplification for both FACS-sorted ASC and bulk mature adipocyte samples. Samples derived from mature adipocytes were sequenced in technical duplicates, whereas FACS-sorted ASC bulk samples were sequenced in triplicates. The average quantified raw read counts were then calculated for the technical replicates prior to DE-analysis.

Smartseq3 library preparation and sequencing

RNA samples were extracted using Qiagen’s RNAeasy plus micro kit (Cat. No. 74 034) from 32 FACS sorted adipose stem cells from castrated male ( n = 8), ovariectomized female ( n = 8), female ( n = 8) and male ( n = 8) control mice, and 29 in vitro cultivated crude SVF cells that had been proliferated for 4 days in PM1-medium supplemented with 1 nM basic FGF (same procedure as for in vitro differentiation protocols), the RNA concentration were normalized to 3 ng/ul in 20 μL Nuclease-free water. For library construction and sequencing strategy, we adopted the sensitive smart-seq3 protocol to perform our bulk RNA-seq in single-cell format 78 , with some modifications as follows. Two μL of each RNA sample was transferred into one well of 384-well plate, where contained 0,3 μL 1% Triton X-100, 0,5 μL PEG 8000 40%, 0,04 μL RNase inhibitor (40U/µL), 0,08 μL dNTPs mix (25 mM) and 0,02 μL Smart-dT30VN/dT (100 µM, 5’-ACGAGCATCAGCAGCATACGATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN-3’). Reverse transcription was performed after mixture with 1 μL/well buffer (0,10 μL Tris-HCl, pH 8.3, 1 M, 0,12 μL 1 M NaCl, 0,10 μL MgCl 2 , 100 mM, 0,04 μL GTP, 100 mM, 0,32 μL DTT, 100 mM, 0,05 μL RNase Inhibitor, 40 U/µl, 0,04 μL Maxima H Minus RT, 200 U/µL, 0,08 μL SmartSeq3 TSO, 100 µM, 5’-AGAGACAGATTGCGCAATGNNNNNNNNrGrGrG-3’ and 0,15 μL H 2 O, where N indicates random sequence, while r denotes RNA characteristics). The PCR program was 3 min at 85 °C, 90 min at 42 °C, 10 cycles of 2 min at 50 °C and 2 min at 42 °C, followed by 85 °C for 5 min and incubation at 4 °C. Immediately after the reverse transcription, PCR was performed after mixture with 6 μL/well of PCR buffer (2 μL 5X KAPA HiFi HotStart buffer, 0,12 μL 25 mM dNTPs mix, 0,05 μL 100 mM MgCl 2 , 0,2 μL KAPA HiFi HotStart DNA Polymerase (1 U/µL), 0,05 μL Forward PCR primer (100 µM, 5‘-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTGCGCAA*T*G-3′, * = phosphothioate bond), 0,01 μL Reverse PCR primer (100 µM, 5’-ACGAGCATCAGCAGCATAC*G*A −3′, * = phosphothioate bond) and 3,57 μL H 2 O. The amplification program was: initial denaturation at 98 °C for 3 min, N cycles of denaturation at 98 °C for 20 sec, annealing at 65 °C for 30 sec and elongation at 72 °C for 4 min, followed by a final elongation at 72 °C for 5 min and incubation at 4 °C.

The cDNA was purified using 6 µl/well SeraMag beads (containing 17% PEG) and eluted into 10 µl/well elution buffer according to the user manual. After quality control using Bioanalyzer 2100 (Agilent), the cDNA was diluted to 200 pg/µl and combined into one 384-plate for tagmentation. For each sample, 1 µL diluted cDNA was mixed with 925 nL tagmentation buffer (containing 20 nL Tris-HCL pH 7.5 1 M, 100.5 nL MgCl 2 100 mM, 100.5 nL Dimethylformamide (DMF) and 704 nL H 2 O) and 75 nL Tn5 enzyme, and subjected to incubation at 55 °C for 10 min. Then, the reaction was immediately terminated by a mixture with 500 nL 0.2% SDS and incubation at room temperature for 5 min. Subsequently, 1 µL index combination (500 nL for each) and 4.4 µL PCR mix (1,40 µL 5x Phusion HF buffer, 0,06 µL dNTPs mix (25 mM), 0,04 µL Phusion HF (2U/µl) and 2,90 µL water) were applied to each well for enrichment PCR using the following program: gap-filling at 72 °C for 3 min, initial denaturation at 98 °C for 3 min, 13 cycles of denaturation at 98 °C for 10 sec, annealing at 55 °C for 30 sec and elongation at 72 °C for 30 sec, followed by a final elongation at 72 °C for 5 min and incubation at 4 °C.

In the end, the libraries were pooled for purification using a two-step purification protocol. First, 24% SeraMag beads were mixed with libraries at a volume ratio of 0.6:1, followed by successive 8 min incubation without and with a magnet stand. After removal of the supernatant, the beads were washed with 80% ethanol and eluted into 50 µl elution buffer. Then, the 50 µl elute was thoroughly mixed with 50 µL H 2 O and 70 µl SPRI beads, followed by successive 2 min incubation without and with a magnet stand. After removal of the supernatant, the beads were washed with 85% ethanol and eluted into 25 µl elution buffer. Finally, the purified library pool was quality-controlled and diluted for sequencing at Novaseq 6000 (Illumina).

In vitro differentiation of adipose stem cells

Stromal vascular cells from both iWAT and pgWAT were isolated according to the procedure described above. Thereafter cells were labeled with four fluorophore-conjugated antibodies (anti-CD45, anti-CD31, anti-CD34, and anti-DPP4) for 30 min on ice. The cell suspension was then centrifuged at 300 g for 5 min, supernatant removed, and pellet re-suspended in FACS-buffer. Cells were then loaded into a SH800 Sony cell sorter, and two adipose stem cell populations, CD45-/CD31-/CD34 + /DPP4+ and CD45-/CD31-/CD34 + /DPP4-, were gated and selected for sorting. Fluorescent minus one controls were used for ensuring correct gating. Cells were collected in PM-1 medium (Zenbio), supplemented with 1 nM of basic FGF, and seeded in 96-well plates. DPP4+ and DPP4-negative cells were seeded at 10,000 cells and 12 500 cells per well, respectively. DPP4-positive cells were seeded at a lower density since they proliferate faster than the DPP4- population. For the experiment with crude SVF cells, 15-20 000 cells were seeded. After becoming confluent after 3-4 days of proliferation, the differentiation was initiated by changing medium to basal medium (Zenbio) supplemented with 3% FBS, 1% pen/strep, 0.5 mM IBMX, 1 µM dexamethasone, and 100 nM of insulin. The medium was then changed after 48 h to a maintenance medium including BM-1, 3% FBS, 1% Pen/strep, 1 µM of pioglitazone, and 100 nM insulin with medium changed every other day (pioglitazone was not included in the differentiation experiments with AngII). A second treatment group with cells subjected to only 100 nM of insulin in basal medium (Zenbio) supplemented with 3% FBS, 1% pen/strep, was also included in the studies. After 8 days of differentiation, cells were stained with propidium iodine, Bodipy, and Hoechst 33342.

Cells were then imaged with an ImageXpress widefield fluorescence microscope using the 4x objective (10x objective for crude SVF cells). Images were analyzed with the MetaXpress software applying the multi-wavelength cell scoring application module, counting cells (Hoechst positive), lipids (Bodipy positive) and the number of dead cells (propidium iodine positive) for viability measurements; for more detailed settings for the quantification of the differentiation of FACS sorted ASCs see Supplementary table 12 .

Cells that were positive for both Bodipy and Hoechst were determined as differentiated (the number of W2 positive cells). The percentage of differentiated cells were then calculated by dividing the amount of lipid-filled cells by the total amount of cells in the well. Wells with a viability below 85% were not included in the analysis. The experiment was repeated four and five times, for crude SVF and FACs sorted ASC, respectively, with the average results from the technical well-replicates presented in figures. Experiments with AngII were carried out five times and each plate had three and six technical replicates for treatment and control wells, respectively.

In vitro proliferation rate assay of adipose stem cells

Cell proliferation rates were measured on two stem cell populations, CD45-/CD31-/CD34 + /DPP4- (ASC1) and CD45-/CD31-/CD34 + /DPP4+ (ASC2). Briefly, cells were isolated from inguinal WAT from adult mice and subjected to FACS-sorting as previously described. Cells were sorted at a density of 7500 cells per well in a 96-well plate in PM-1 (ZenBio) medium supplemented with 1 nM of basic FGF. Cell proliferation was then measured by analyzing the level of confluence using the IncuCyte S3 live-cell analysis system. Images were taken every fourth hour for 120 h, with the 10x objective. This experiment was repeated three times with similar results.

Explant lipolysis assay

Aged-matched mouse (Age:12-14 weeks, strain: C57BL6/J) were fasted for 3 h before Inguinal white adipose tissue was removed and put into ice-cold PBS without Ca 2+/ Mg 2+ . The Fat pads were then cut into smaller pieces before 25-30 mg of tissue was put into 100 ul of KREBS ringer buffer (,25 mM HEPES, 120 mM NaCl, 10 mM NaHCO 3, , 4 mM KH 2 PO 4 , 1 mM MgSO 4 , 0.75 mM CaCl 2 ) with 2% fatty acid free BSA, 5 mM, 5 mM Glucose. The levels of released non esterified fatty acids (NEFA) in the medium were then measured after 4 h in a 37 C incubator (5% CO 2 ). Each condition had 8 technical replicates and the average value from the replicates is presented in Fig. 3d . The experiment was repeated three times using two males and two females mice each time, fat pads from mice with the same sex were pooled. NEFAs were analyzed using an ABX Pentra 400 instrument (Horiba Medical, Irvine, California, USA) and concentrations were determined by colorimetry with Fujifilm NEFA-HR(2) (ref 43491795 (R1) and 436-91995 (R2)). Fujifilm NEFA standard (Ref 27077000) was used as calibrator and Seronorm™ Lipid (Ref# 100205, Sero AS, Billingstad, Norway) was used as control.

Immunofluorescence

Cryo-sections: Standard protocols for immunostaining were applied. In brief, adipose tissues were harvested from euthanized mice as described above and immersed in 4% formaldehyde solution (Histolab) at 4 °C for 4-12 h. Thereafter, the tissues were transferred to 20-30% sucrose/PBS solution at 4 °C for at least 24 h. For cryo-sectioning, the tissues were embedded into cryo-medium (NEG50) and sectioned at a CryoStat NX70 (ThermoFisherScientific) into 14 – 50 µm thick sections, collected on SuperFrost Plus glass slides (Metzler Gläser) and stored at −80 °C until further processing. Of note, for sectioning of adipose tissues the biopsy and knife of the cryostat were cooled down to at least −30 °C. For staining, the tissue sections were allowed to dry at RT for about 15 min and were briefly washed with PBS. Thereafter, the sections were treated with blocking buffer (Serum-free protein blocking solution, DAKO) supplemented with 0.2% Triton X-100 (Sigma Aldrich). Then, the tissue sections were incubated with primary antibodies, diluted in blocking buffer supplemented with 0.2% Triton X-100 over night at 4 °C. Followed by a brief wash with PBS-T (PBS supplemented with 0.1% Tween-20) and incubation with fluorescently conjugated secondary antibodies diluted in blocking buffer at RT for 1 h. Primary and secondary antibodies were used according to the manufacturers’ recommendations (see Supplementary table 7 ). For nuclear (DNA) stain, Hoechst 33342 was used at 10 µg/ml together with the secondary antibodies. Sections were mounted with ProLong Gold mounting medium (ThermoFisher Scientific). Micrographs were acquired using a Leica TCS SP8 confocal microscope with LAS X software (version: 3.5.7.23225, Leica Microsystems) and graphically processed and adjusted individually for brightness and contrast using ImageJ/FIJI software 79 for optimal visualization. All images are presented as maximum-intensity projections of acquired z-stacks covering the thickness of the section.

Whole mount: Adipose tissues were harvested and processed as described above. After fixation and sucrose treatment (see above), small pieces of less than 1 mm thickness were cut and washed in PBS-T buffer at RT for 6-8 h with end-over-end rotation. Thereafter, the tissues were transferred into blocking buffer supplemented with 0.5% Triton X-100 for over-night incubation at 4 °C with end-over-end rotation. Primary antibodies were diluted in blocking buffer supplemented with 0.5% Triton X-100 and incubated with the tissues for 72–96 h at 4 °C with end-over-end rotation. Thereafter, the tissues were washed with PBS-T for 6–8 h at 4 °C with end-over-end rotation. Secondary antibodies were diluted in blocking buffer, supplemented with 0.5% Triton X-100 and 10 µg/ml Hoechst 33342, and incubated with tissues at 4 °C overnight with end-over-end rotation. Before mounting, tissues were washed with PBS-T for 6–8 h at 4 °C and then mounted on Leica frame slides (1.4 µm PET, Leica Microsystems) using ProLong Gold mounting medium. Micrographs were acquired using a Leica TCS SP8 confocal microscope and graphically handled as described above.

Raw sequence data processing Smartseq2 protocol

Single-cell cDNA library samples from one 384-well plates were pooled and sequenced on a HiSeq 3000 sequencer (Illumina), with one flow-cell lane per plate. In total 11 plates of cells from eight (five females and three males) mice were used for this study. The samples were then analyzed using standard parameters of the illumina pipeline (bcl2fastq) using Nextera index parameters. Individual fastq-files were mapped to the mouse reference genome (mm10-build94) with the STAR aligner, and raw reads for each gene was quantified using Salmon. As technical controls, 92 ERCC RNAs were spiked in the lysis buffer and included in the mapping. Raw read counts were then imported into R with the tximport-package and combined into one expression matrix showing raw counts per gene for each single cell. The R package biomaRt was used to convert ensemble ids to gene names, locate genomic location and gene biotype.

The SingleCellExperiment package in R was then applied for downstream processing of the expression matrix. First, cells with fewer than 150,000 total reads and more than 5,000,000 reads were filtered out. Cells that had fewer than 1000 genes expressed and that had a high percent of the reads mapped to the mitochondrial genome (>17.5%) or to the ERCCs (>20%) were removed from the dataset. Additionally, genes that had less than 10 reads in no more than three cells were removed. A final filter step was added by applying the gene.vs.molecule.cell.filtering function in the pathway and gene set overdispersion analysis (Pagoda2) package, removing cells that were determined to be outliers in their gene vs total counts ratio.

For bulk RNA-seq samples with fewer than 250 000 total reads were filtered out. Samples that had fewer than 8000 genes expressed and that had a high percent of the reads mapped to the mitochondrial genome (>15%) or to the ERCCs (>1%) were removed from the dataset.

The Seurat-package was then applied to perform principal component analysis with RunPCA function using variance-stabilizing transformation as a selection method for finding the 3000 most variable genes. The clustering of cells was performed by first applying the findNeighbors function using 14 PC:s dimensions followed by FindClusters (resolution=1.1) function. For dimensional reduction visualization, UMAP projection was applied using the Seurat package, the top 2000 over dispersed genes were used as input variable. The few cells projected in connection to other clusters in UMAP compared to most cells in its cluster were removed, since this indicated contamination of other cell classes in those samples. Before removal, the contamination in these specific samples were verified by calculating a ratio between the percentage of read counts (of total read counts in the sample) belonging to marker genes to cells in its close surroundings compared to marker genes for its cluster. For example, if an endothelial cell was projected into pericytes population in the UMAP, a ratio of the percentage pericytes markers divided by the percentage of endothelial cells marker genes were calculated. This ratio was significantly higher in all the endothelial-placed cells as compared to cells located in the area of the pericyte cluster. A second round of clustering after this cleaning step was performed as described above.

Raw sequence data processing Smartseq3 protocol

Raw fastq-files were collected, sequencing adapters were then trimmed from the remaining libraries using NGmerge (v0.3) 80 and read quality for all libraries was assessed using FastQC (v0.11.9) 81 , Qualimap (v2.2.2d) 82 and samtools stats (v1.15) 83 . Quality control (QC) metrics for Qualimap and samtools were based on a STAR (v2.7.10a) 84 alignment against the mouse genome (GRCm39, Gencode vM32). UMI information was evaluated with UMI tools 85 . Next, QC metrics were summarized using MultiQC (v1.12) 86 . A mouse transcriptome index consisting of cDNA and ncRNA entries from Gencode (vM32) was generated and reads were mapped to the index and quantified using Salmon (v1.9.0) 87 . The bioinformatics workflow was organized using Nextflow workflow management system (v22.04.5) 88 and Bioconda software management tool 89 .

The raw count matrix including the total reads from both UMI and non-UMI containing sequences was imported into R for further downstream processing. This included removing genes that had less than 5 reads in no more than three cells. Samples that had less than 8500 genes detected in FACS-sorted ASC samples (castration/ovariectomy study) were removed, whereas samples in the in vitro cultivated crude SVF group with less than 10,500 genes detected were removed. Overall, the number of samples that passed this criterion from the castration/ovariectomy study were 58, including eight samples for all groups except ASC1 cells from male control (n = 7) and female ovariectomized mice (n = 3). The average total read count was 690,000 reads for these samples. The number of samples that passed this criterion from the in vitro cultivated crude SVF study were 29, including nine samples in both male and female iWAT groups, six samples for male pgWAT and five samples for female pgWAT. The average total read count was 620 000 reads for these samples. Of note, low read counts from Y-chromosome genes Ddx3y (<30 counts) and Eif2s3y (<20 counts) were detected in female control samples, which indicates weak contamination of male mRNA, however, relative low counts of the male specific genes Sult1e1 and C7 in comparison the male control samples suggest that the samples are “clean” and the results can be trusted. Low levels of Ddx3y and Eif2s3y (<10 counts) is also detected in in vitro cultivated crude SVF group from female pgWAT samples, this low grad of contamination most likely did not impact the conclusion made in this paper.

Differential expression and pathway analysis

For pathway analysis of transcriptomic data, QIAGEN’s Ingenuity Pathway Analysis (IPA) application was used. All presented canonical pathways, diseases, and molecular functions displayed in this paper were significantly enriched, the threshold for adjusted p-value was set to below 0.05. The list of genes used as input for the IPA application were derived according to the differential expression analysis described below.

For differential expression analysis, the pseudo bulk EdgeR-LRT method was used (R-package: edgeR v:3.22.5) with raw counts as input. A gene was classified as significantly differentially expressed if it generated a Benjamini-Hochberg adjusted p-value for multiple testing below 0.05 and if it had a raw read count above 600 reads in at least 2 pseudo samples with a fold change of more than 2.6. These settings were used for generating differentially expressed genes between male and female ASC, the pseudo bulk method grouped cells from the same mice in each sex, resulting in n = 3 for male cells and n = 4 for female cells. The Seurat FindMarkers function was applied using the Wilcoxon rank sum test with raw read counts as input for generating marker genes for clusters and between male and female EC cells. A gene was classified as significantly differentially expressed if it generated a bonferroni adjusted p-value for multiple testing below 0.05 and if it was expressed by at least half of the cells in the group with a fold change of more than 2.6 (min.pct = 0.5, logfc.threshold = 1.4). Slightly different settings were used for generating DE-genes specific for adipose ASC in comparison to heart and skeletal muscle fibroblasts (min.pct = 0.25, logfc.threshold = 0.7). For FACS-sorted bulk RNA-seq samples, DE-genes between male and female ASC cells were calculated using DESeq2, a gene was classified as significantly differentially expressed if it generated an adjusted p-value for multiple testing below 0.05 and if was expressed by at least half of the samples. Both DPP4+ and DPP4- cell populations were used for this analysis. For pgWAT, 14 samples in both males and females passed the filtration criteria mentioned previously (seven DPP4+ and seven DPP4-), whereas for iWAT, the female and male group consisted of 14 samples (seven DPP4+ and seven DPP4-) and 15 samples (seven DPP4+ and eight DPP4-), respectively. The gene Gm20400 was also sexually dimorphic genes under these settings however the gene was not included in Fig. 2c since it is a long non-coding RNA and there is limited knowledge of its function.

For FACS-sorted bulk RNA-seq samples from the castration/ovariectomy study, DE-genes between male and female ASC cells from iWAT were calculated with DESeq2, for this analysis the sexually dimorphic genes that were DE in the initial ACS-sorted bulk RNA-seq from iWAT/scRNA-seq comparison (36 + 4 genes, Fig. 2c ) were only analyzed. A gene was classified as significantly differentially expressed if it generated an adjusted p-value for multiple testing below 0.05 and if it had fold change difference of at least 2 in the comparison of the control groups. To validate if the a gene was impacted by sex hormones, it also needed to show no statistically significant difference in the DE-analysis between the castration-female control/ovariectomy-male control comparisons.

For bulk RNA-seq samples of cultivated crude SVF cells (Supplementary Fig. 8 ), a gene was defined as enriched in either pgWAT or iWAT samples if the DE-analysis using Deseq2 generated a p-value below 0.05 or enriched in female pgWAT samples compared to a group of iWAT samples and male pgWAT. See source data for Supplementary Fig. 8g and k .

Other bioinformatic analyses

Pearson’s r values were calculated using the cor function in R stats package with the scaled average expression values as input variables (AverageExpression function in Seurat was applied) for marker genes or genes of a specific genetype if indicated. The R package corrplot was used to visualize the results and the groups were order according to hierarchical clustering method “complete” with blue lines displaying the results of that clustering. Dotplots were generated using Seurat’s DotPlot function using normalized values (normalized method: “LogNormalize”, scale.factor =500 000) as input. For data downloaded from external source the log normalized values in the provided R-object were used. The scaling function in DotPlot function was turned on, resulting in scaling of the average log normalized values, this means that the scaled values will always be plus or minus 0.7 (square root (2) / 2) for all comparisons between two groups. The group with the highest expression will have a value of 0.7 and the group with the lowest expression will have a value of −0.7. This also means that the color intensity indicating the size of fold difference is misleading, it is therefore better to view the result as an indicator of which of the two groups has the highest average expression. The statistically significant enriched Hox genes in our scRNA-seq data with a fold change of above 2.6 in females are Hoxa9 , Hoxa10, Hoxc10, Hoxa11os and Hoxa11 , and for males Hoxb5 , Hoxc5, Hoxc6 and Hoxc8 .

For validation of mouse data, we used human single nuclei RNA sequence data published by Emont, MP et al. 41 . The human adipose single-nucleus raw count data (10x chromium-v3) and metadata were downloaed from the Broad Institute’s single cell portal webpage (link: https://singlecell.broadinstitute.org/single_cell/study/SCP1376/a-single-cell-atlas-of-human-and-mouse-white-adipose-tissue ). Both “human_ASPCs.rds” and “human_adipocytes.rds” were used for our analysis and the data was generated from subcutaneous (subc) adipose tissue from ten female and three male donors, for visceral (visc) adipose tissues the data was derived from seven female and three male donors. The number of cells per cell type and fat depot from Emont et al. 41 is presented in Supplementary Table 6 .

For comparison of ASC to fibroblast identified in heart and skeletal muscle raw fastq-file were provided by Muhl L et al. 25 , and raw sequence processing was done as described above. The number of cells per cluster from Muhl L et al. 25 is presented in Supplementary Table 2 .

For comparison to the Tabula Muris data 38 , the “facs_Fat_seurat_tiss.Robj” file was downloaded from the human cell atlas data portal (link: https://data.humancellatlas.org/explore/projects/e0009214-c0a0-4a7b-96e2-d6a83e966ce0/project-matrices ). Before generating scaled dotplots in Fig. 2 , genes that had less than 10 reads in no more than three cells were removed from the countmatrix of each fat depots MSC and raw counts were log normalized to the total counts in each cell, using the NormalizeData function in Seurat. The number of cells per adipose depot and sex from Tabula Muris 38 is presented in Supplementary Table 5 .

For comparison to fibroblasts from Buechler M.B et al. 29 , the mouse steady-state atlas was used, data was downloaded from https://www.fibroxplorer.com/download . The original source from which the data was derived can be seen in the Supplementary Table. The number of cells per tissue from Buechler et al. 29 is presented in Supplementary Table 3 .

For our scRNA-seq data from pgWAT the sex and animal metadata are displayed in Supplementary Table 1 .

Statistics and reproducibility

Barplots for scRNA-seq and bulk RNA-seq data displays mean ± SEM of the raw read counts. Figure 4d displays the mean ± SEM and the barplots in Fig. 4e displays the mean ± standard deviation. In Fig. 4e , the number of dots represents the number of biological replicates, and for the gene expression data in Fig. 4d , each dot represents a technical replicate derived from three experiments. Statistics in Fig. 3d, e were calculated with two-way ANOVA using Sidak’s multiple comparisons test. Statistics in Fig. 4b were calculated with a two-sided unpaired t-test (Prism) for the data points collected at the final time point, in Fig. 4d, e statistics were calculated with a two-way ANOVA and Mixed-effects analysis, respectively, using Tukey’s multiple comparison test (Prism). For Supplementary Fig. 8a , statistics were calculated with a two-way ANOVA, using Tukey’s multiple comparison test (Prism). Adjusted P-values for multiple testing were used, (* P < 0.0332, ** P < 0.0021, *** P < 0.0002, and **** P < 0.0001). The statistics in Fig. 4d were based on the delta Ct-values using TBP as a house keeping gene. Regarding imaging, if not further specified, all antibody immunofluorescence experiments have been performed at least twice using identical or varying combinations of antibodies, obtaining similar results from tissue samples of at least two individual mice. The whole mount staining experiments were performed twice, analyzing tissue samples from two individual mice. For validation of sex-specific expression of NGFR, additionally two female and two male littermates were analyzed. In Fig. 4c , representative images of the level of differentiation is displayed and similar results have been repeated in at least three independent experiments.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The RNA-seq raw data generated in this study have been deposited in the NCBI’s Gene Expression Omnibus database under accession code GSE273393 (scRNA-seq), GSE273413 (FACS_pgWAT_iWAT_ASC), GSE273407 (FACS_castovary_ASC), GSE272408 (bulk_adipocytes) and GSE273416 (in vitro_SVF). The scRNA-seq and bulk RNA-seq data of FACS sorted ASC are available as a searchable database at https://betsholtzlab.org/Publications/WATstromalVascular/database.html . Source data are provided with this paper.

Oikonomou, E. K. & Antoniades, C. The role of adipose tissue in cardiovascular health and disease. Nat. Rev. Cardiol. 16 , 83–99 (2019).

Article PubMed Google Scholar

Santoro, A., McGraw, T. E. & Kahn, B. B. Insulin action in adipocytes, adipose remodeling, and systemic effects. Cell Metab. 33 , 748–757 (2021).

Article CAS PubMed PubMed Central Google Scholar

Saxton, S. N., Clark, B. J., Withers, S. B., Eringa, E. C. & Heagerty, A. M. Mechanistic links between obesity, diabetes, and blood pressure: role of perivascular adipose tissue. Physiol. Rev. 99 , 1701–1763 (2019).

Article CAS PubMed Google Scholar

Wang, W. et al. Global Burden of Disease Study 2019 suggests that metabolic risk factors are the leading drivers of the burden of ischemic heart disease. Cell Metab. 33 , 1943–56 e2 (2021).

Sun, H. et al. IDF diabetes atlas: global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin. Pr. 183 , 109119 (2022).

Article Google Scholar

Rondini, E. A. & Granneman, J. G. Single cell approaches to address adipose tissue stromal cell heterogeneity. Biochem J. 477 , 583–600 (2020).

Corvera, S. Cellular heterogeneity in adipose tissues. Annu Rev. Physiol. 83 , 257–278 (2021).

Duerre, D. J. & Galmozzi, A. Deconstructing adipose tissue heterogeneity one cell at a time. Front Endocrinol. (Lausanne) 13 , 847291 (2022).

Bilal, M. et al. Fate of adipocyte progenitors during adipogenesis in mice fed a high-fat diet. Mol. Metab. 54 , 101328 (2021).

Jeffery, E. et al. The adipose tissue microenvironment regulates depot-specific adipogenesis in obesity. Cell Metab. 24 , 142–150 (2016).

Agrawal et al. BMI-adjusted adipose tissue volumes exhibit depot-specific and divergent associations with cardiometabolic diseases. Nat. Commun. 14 , 266 (2023).

Article ADS CAS PubMed PubMed Central Google Scholar

Karastergiou, K., Smith, S. R., Greenberg, A. S. & Fried, S. K. Sex differences in human adipose tissues - the biology of pear shape. Biol. Sex. Differ. 3 , 13 (2012).

Article PubMed PubMed Central Google Scholar

Chang, E., Varghese, M. & Singer, K. Gender and sex differences in adipose tissue. Curr. Diab Rep. 18 , 69 (2018).

Tramunt, B. et al. Sex differences in metabolic regulation and diabetes susceptibility. Diabetologia 63 , 453–461 (2020).

Maric, I. et al. Sex and species differences in the development of diet-induced obesity and metabolic disturbances in rodents. Front Nutr. 9 , 828522 (2022).

Casimiro, I., Stull, N. D., Tersey, S. A. & Mirmira, R. G. Phenotypic sexual dimorphism in response to dietary fat manipulation in C57BL/6J mice. J. Diabetes Complications 35 , 107795 (2021).

Fernández-Real JMM-NaJM. Adipocyte Differentiation Symonds ME, editor. New York: Springer Science; 2012.

Burl, R. B. et al. Deconstructing adipogenesis induced by beta3-adrenergic receptor activation with single-cell expression profiling. Cell Metab. 28 , 300–9 e4 (2018).

Hepler, C. et al. Identification of functionally distinct fibro-inflammatory and adipogenic stromal subpopulations in visceral adipose tissue of adult mice. Elife 7 , e39636 (2018).

Schwalie, P. C. et al. A stromal cell population that inhibits adipogenesis in mammalian fat depots. Nature 559 , 103–108 (2018).

Article ADS CAS PubMed Google Scholar

Merrick, D. et al. Identification of a mesenchymal progenitor cell hierarchy in adipose tissue. Science 364 , eaav2501 (2019).

Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9 , 171–181 (2014).

Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184 , 3573–87 e29 (2021).

Vanlandewijck, M. et al. A molecular atlas of cell types and zonation in the brain vasculature. Nature 554 , 475–480 (2018).

Muhl, L. et al. Single-cell analysis uncovers fibroblast heterogeneity and criteria for fibroblast and mural cell identification and discrimination. Nat. Commun. 11 , 3953 (2020).

Muhl, L. et al. The SARS-CoV-2 receptor ACE2 is expressed in mouse pericytes but not endothelial cells: Implications for COVID-19 vascular research. Stem Cell Rep. 17 , 1089–1104 (2022).

Article CAS Google Scholar

Muhl, L. et al. A single-cell transcriptomic inventory of murine smooth muscle cells. Dev. Cell 57 , 2426–43 e6 (2022).

The Matrisome Project [Internet]. Available from: http://matrisomeproject.mit.edu/ .

Buechler, M. B. et al. Cross-tissue organization of the fibroblast lineage. Nature 593 , 575–579 (2021).

Dell’Orso, S. et al. Single cell analysis of adult mouse skeletal muscle stem cells in homeostatic and regenerative conditions. Development 146 , dev174177 (2019).

Scott, R. W., Arostegui, M., Schweitzer, R., Rossi, F. M. V. & Underhill, T. M. Hic1 defines quiescent mesenchymal progenitor subpopulations with distinct functions and fates in skeletal muscle regeneration. Cell Stem Cell 25 , 797–813 e9 (2019).

Soliman, H. et al. Pathogenic potential of Hic1-expressing cardiac stromal progenitors. Cell Stem Cell 26 , 459–461 (2020).

Shao, M. et al. De novo adipocyte differentiation from Pdgfrbeta(+) preadipocytes protects against pathologic visceral adipose expansion in obesity. Nat. Commun. 9 , 890 (2018).

Article ADS PubMed PubMed Central Google Scholar

Schoettl, T., Fischer, I. P. & Ussar, S. Heterogeneity of adipose tissue in development and metabolic function. J. Exp. Biol. 221 , jeb162958 (2018).

Wang, W. & Seale, P. Control of brown and beige fat development. Nat. Rev. Mol. Cell Biol. 17 , 691–702 (2016).

Lendahl, U., Muhl, L. & Betsholtz, C. Identification, discrimination and heterogeneity of fibroblasts. Nat. Commun. 13 , 3409 (2022).

Ehrlund, A. et al. The cell-type specific transcriptome in human adipose tissue and influence of obesity on adipocyte progenitors. Sci. Data 4 , 170164 (2017).

Tabula Muris Consortium. Overall coordination; Logistical coordination; Organ collection and processing; Library preparation and sequencing; Computational data analysis; Cell type annotation; Writing group; Supplemental text writing group; Principal investigators. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562 , 367–372 (2018).

Mueller, J. W., Gilligan, L. C., Idkowiak, J., Arlt, W. & Foster, P. A. The regulation of steroid action by sulfation and desulfation. Endocr. Rev. 36 , 526–563 (2015).

Vandenberg, L. N., Schaeberle, C. M., Rubin, B. S., Sonnenschein, C. & Soto, A. M. The male mammary gland: a target for the xenoestrogen bisphenol A. Reprod. Toxicol. 37 , 15–23 (2013).

Emont, M. P. et al. A single-cell atlas of human and mouse white adipose tissue. Nature 603 , 926–933 (2022).

Tyurin-Kuzmin, P. A. et al. Angiotensin receptor subtypes regulate adipose tissue renewal and remodelling. FEBS J. 287 , 1076–1087 (2020).

Goossens, G. H., Blaak, E. E., Arner, P. & Saris, W. H. van Baak MA. Angiotensin II: a hormone that affects lipid metabolism in adipose tissue. Int J. Obes. (Lond.) 31 , 382–384 (2007).

Stechschulte, L. A. et al. FKBP51 null mice are resistant to diet-induced obesity and the ppargamma agonist rosiglitazone. Endocrinology 157 , 3888–3900 (2016).

Cividini, F. et al. Ncor2/PPARalpha-dependent upregulation of mcub in the type 2 diabetic heart impacts cardiac metabolic flexibility and function. Diabetes 70 , 665–679 (2021).

Zillessen, P. et al. Metabolic role of dipeptidyl peptidase 4 (DPP4) in primary human (pre)adipocytes. Sci. Rep. 6 , 23074 (2016).

Takada, I., Kouzmenko, A. P. & Kato, S. Wnt and PPARgamma signaling in osteoblastogenesis and adipogenesis. Nat. Rev. Rheumatol. 5 , 442–447 (2009).

Jung, B., Arnold, T. D., Raschperger, E., Gaengel, K. & Betsholtz, C. Visualization of vascular mural cells in developing brain using genetically labeled transgenic reporter mice. J. Cereb. Blood Flow. Metab. 38 , 456–468 (2018).

Grant, R. I. et al. Organizational hierarchy and structural diversity of microvascular pericytes in adult mouse cortex. J. Cereb. Blood Flow. Metab. 39 , 411–425 (2019).

Hartmann, D. A. et al. Brain capillary pericytes exert a substantial but slow influence on blood flow. Nat. Neurosci. 24 , 633–645 (2021).

Cattaneo, P. et al. Parallel lineage-tracing studies establish fibroblasts as the prevailing in vivo adipocyte progenitor. Cell Rep. 30 , 571–82 e2 (2020).

Lee, Y. H., Petkova, A. P., Mottillo, E. P. & Granneman, J. G. In vivo identification of bipotential adipocyte progenitors recruited by beta3-adrenoceptor activation and high-fat feeding. Cell Metab. 15 , 480–491 (2012).

Tran, K. V. et al. The vascular endothelium of the adipose tissue gives rise to both white and brown fat cells. Cell Metab. 15 , 222–229 (2012).

Tang, W. et al. White fat progenitor cells reside in the adipose vasculature. Science 322 , 583–586 (2008).

Vishvanath, L. et al. Pdgfrbeta+ mural preadipocytes contribute to adipocyte hyperplasia induced by high-fat-diet feeding and prolonged cold exposure in adult mice. Cell Metab. 23 , 350–359 (2016).

Stefkovich, M., Traynor, S., Cheng, L., Merrick, D. & Seale, P. Dpp4+ interstitial progenitor cells contribute to basal and high fat diet-induced adipogenesis. Mol. Metab. 54 , 101357 (2021).

Winham, S. J. & Mielke, M. M. What about sex? Nat. Metab. 3 , 1586–1588 (2021).

InterAct, C. et al. Long-term risk of incident type 2 diabetes and measures of overall and regional obesity: the EPIC-InterAct case-cohort study. PLoS Med 9 , e1001230 (2012).

Schutten, M. T., Houben, A. J., de Leeuw, P. W. & Stehouwer, C. D. The link between adipose tissue renin-angiotensin-aldosterone system signaling and obesity-associated hypertension. Physiol. (Bethesda) 32 , 197–209 (2017).

Google Scholar

Frigolet, M. E., Torres, N. & Tovar, A. R. The renin-angiotensin system in adipose tissue and its metabolic consequences during obesity. J. Nutr. Biochem 24 , 2003–2015 (2013).

Single-cell RNAseq databases from Betsholtz lab [Internet]. Available from: https://betsholtzlab.org/Publications/WATstromalVascular/database.html .

Frayn, K. N. & Karpe, F. Regulation of human subcutaneous adipose tissue blood flow. Int J. Obes. (Lond.) 38 , 1019–1026 (2014).

Longden, T. A., Zhao, G., Hariharan, A. & Lederer, W. J. Pericytes and the control of blood flow in brain and heart. Annu Rev. Physiol. 85 , 137–164 (2023).

Chusyd, D. E., Wang, D., Huffman, D. M. & Nagy, T. R. Relationships between rodent white adipose fat pads and human white adipose fat depots. Front Nutr. 3 , 10 (2016).

Gesta, S. et al. Evidence for a role of developmental genes in the origin of obesity and body fat distribution. Proc. Natl Acad. Sci. USA 103 , 6676–6681 (2006).

Gesta, S., Tseng, Y. H. & Kahn, C. R. Developmental origin of fat: tracking obesity to its source. Cell 131 , 242–256 (2007).

Vohl, M. C. et al. A survey of genes differentially expressed in subcutaneous and visceral adipose tissue in men. Obes. Res 12 , 1217–1222 (2004).

Tchkonia, T. et al. Identification of depot-specific human fat cell progenitors through distinct expression profiles and developmental gene patterns. Am. J. Physiol. Endocrinol. Metab. 292 , E298–E307 (2007).

Karastergiou, K. et al. Distinct developmental signatures of human abdominal and gluteal subcutaneous adipose tissue depots. J. Clin. Endocrinol. Metab. 98 , 362–371 (2013).

Cantile, M., Procino, A., D’Armiento, M., Cindolo, L. & Cillo, C. HOX gene network is involved in the transcriptional regulation of in vivo human adipogenesis. J. Cell Physiol. 194 , 225–236 (2003).

Brune, J. E. et al. Fat depot-specific expression of HOXC9 and HOXC10 may contribute to adverse fat distribution and related metabolic traits. Obes. (Silver Spring) 24 , 51–59 (2016).

Article ADS CAS Google Scholar

Holm, A., Heumann, T. & Augustin, H. G. Microvascular mural cell organotypic heterogeneity and functional plasticity. Trends Cell Biol. 28 , 302–316 (2018).

Armulik, A., Genove, G. & Betsholtz, C. Pericytes: developmental, physiological, and pathological perspectives, problems, and promises. Dev. Cell 21 , 193–215 (2011).

Corvera, S., Solivan-Rivera, J. & Yang Loureiro, Z. Angiogenesis in adipose tissue and obesity. Angiogenesis 25 , 439–453 (2022).

Hamilton, T. G., Klinghoffer, R. A., Corrin, P. D. & Soriano, P. Evolutionary divergence of platelet-derived growth factor alpha receptor signaling mechanisms. Mol. Cell Biol. 23 , 4013–4025 (2003).

Gerl, K. et al. Inducible glomerular erythropoietin production in the adult kidney. Kidney Int 88 , 1345–1355 (2015).

Madisen, L. et al. A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nat. Neurosci. 13 , 133–140 (2010).

Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38 , 708–714 (2020).

Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9 , 676–682 (2012).

Gaspar, J. M. NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors. BMC Bioinforma. 19 , 536 (2018).

FastQC. A quality control tool for high throughput sequence data. [Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ . (2015).

Okonechnikov, K., Conesa, A. & Garcia-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32 , 292–294 (2016).

Li, H. et al. The sequence alignment/Map format and SAMtools. Bioinformatics 25 , 2078–2079 (2009).

Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29 , 15–21 (2013).

Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res 27 , 491–499 (2017).

Ewels, P., Magnusson, M., Lundin, S. & Kaller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32 , 3047–3048 (2016).

Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14 , 417–419 (2017).

Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35 , 316–319 (2017).

Gruning, B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15 , 475–476 (2018).

Download references

Acknowledgements

We would like to acknowledge the staff at the Single Cell Core Facility (SICOF) at Karolinska Institute and at the animal facilities at both Karolinska Institute and AstraZeneca for their work. We would also like to acknowledge the funding support from AstraZeneca.

Open access funding provided by Karolinska Institute.

Author information

These authors contributed equally: Christer Betsholtz, Xiao-Rong Peng.

Authors and Affiliations

Department of Medicine, Huddinge, Karolinska Institutet Campus Flemingsberg, Neo building, 141 52, Huddinge, Sweden

Martin Uhrbom, Lars Muhl, Guillem Genové, Jianping Liu, Sonja Gustafsson, Byambajav Buyandelger, Marie Jeansson & Christer Betsholtz

Bioscience Metabolism, Research and Early Development Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden

Martin Uhrbom, Ida Alexandersson, Sandra Lunnerdal, Kasparas Petkevicius, Ingela Ahlstedt, Daniel Karlsson & Xiao-Rong Peng

Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, University of Bergen, 5020, Bergen, Norway

Bioscience Renal, Research and Early Development Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden

Henrik Palmgren & Alex-Xianghua Zhou

Data Sciences & Quantitative Biology, Discovery Sciences, R&D AstraZeneca, Gothenburg, Sweden

Fredrik Karlsson

Bioscience Cardiovascular, Research and Early Development Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden

Leif Aasehaug

Department of Immunology, Genetics and Pathology, Uppsala University, 751 23, Uppsala, Sweden

Liqun He & Christer Betsholtz

You can also search for this author in PubMed Google Scholar

Contributions

M.U. was responsible for hypothesis generation, conceptual design, experiment design and performance, data analysis, and manuscript preparation. L.M. experiment design and performance, data analysis, and manuscript preparation. G.G. was responsible for experiment design and performance. J.L. was responsible for data generation, experiment design, and performance. H.P. was responsible for data analysis, experiment design, and performance. I.A. was responsible for experiment design and performance. F.K. was responsible for data analysis. A.X.Z. was responsible for experiment design and performance. S.L. was responsible for experiment design and performance. S.G. was responsible for data generation, experiment design, and performance. B.B. was responsible for data generation, experiment design, and performance. K.P. was responsible for experiment design and performance. I.A. was responsible for experiment design and performance. D.K. was responsible for experiment design and performance. L.A. was responsible for experiment design and performance. L.H. was responsible for data curation and data analysis. M.J. was responsible for the experiment design. C.B. carried out supervision of work, hypothesis generation, conceptual design, data analysis, and manuscript preparation. X.R.P. carried out supervision of work, hypothesis generation, conceptual design, data analysis, and manuscript preparation.

Corresponding authors

Correspondence to Martin Uhrbom , Christer Betsholtz or Xiao-Rong Peng .

Ethics declarations

Competing interests.

The authors declare no competing interests

Peer review

Peer review information.

Nature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary movie 1, supplementary movie 2, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Uhrbom, M., Muhl, L., Genové, G. et al. Adipose stem cells are sexually dimorphic cells with dual roles as preadipocytes and resident fibroblasts. Nat Commun 15 , 7643 (2024). https://doi.org/10.1038/s41467-024-51867-9

Download citation

Received : 31 July 2023

Accepted : 20 August 2024

Published : 02 September 2024

DOI : https://doi.org/10.1038/s41467-024-51867-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

Explore articles by subject
Guide to authors
Editorial policies

IMAGES

Hypothesis in Machine Learning
Hypothesis in Machine Learning
What is Hypothesis Space?
Hypothesis and Hypothesis Space
Original model and projection in a new hypothesis space. The projection
Machine Learning Terminologies for Beginners

VIDEO

2102203 Statistics 6 (Lecture on Statistical Hypothesis Introduction)
The Epic Birth of Earth: 10-Minute Journey
Stéphane Mallat 2: Mathematical Mysteries of Deep Neural Networks
28 Version Space in Concept Learning
Discussion on ML Types, Hypothesis Spaces and Evaluation (Week 1
The Platonic Representation Hypothesis (paper review)

COMMENTS

Hypothesis in Machine Learning
Hypothesis Space (H) Hypothesis space is the set of all the possible legal hypothesis. This is the set from which the machine learning algorithm would determine the best possible (only one) which would best describe the target function or the outputs. ... (ML) is a subfield of artificial intelligence that specializes in growing algorithms that ...
What's a Hypothesis Space?
Our goal is to find a model that classifies objects as positive or negative. Applying Logistic Regression, we can get the models of the form: (1) which estimate the probability that the object at hand is positive. Each such model is called a hypothesis, while the set of all the hypotheses an algorithm can learn is known as its hypothesis space ...
What exactly is a hypothesis space in machine learning?
The hypothesis space is $2^{2^4}=65536$ because for each set of features of the input space two outcomes (0 and 1) are possible. The ML algorithm helps us to find one function, sometimes also referred as hypothesis, from the relatively large hypothesis space. References. A Few Useful Things to Know About ML
Hypothesis in Machine Learning
Where, Y: Range. m: Slope of the line which divided test data or changes in y divided by change in x. x: domain. c: intercept (constant) Example: Let's understand the hypothesis (h) and hypothesis space (H) with a two-dimensional coordinate plane showing the distribution of data as follows:. Now, assume we have some test data by which ML algorithms predict the outputs for input as follows:
What is a Hypothesis in Machine Learning?
A hypothesis is an explanation for something. It is a provisional idea, an educated guess that requires some evaluation. A good hypothesis is testable; it can be either true or false. In science, a hypothesis must be falsifiable, meaning that there exists a test whose outcome could mean that the hypothesis is not true.
Introduction to the Hypothesis Space and the Bias-Variance Tradeoff in
The hypothesis space in machine learning is a set of all possible models that can be used to explain a data distribution given the limitations of that space. A linear hypothesis space is limited to the set of all linear models. If the data distribution follows a non-linear distribution, the linear hypothesis space might not contain a model that ...
Machine Learning: The Basics
A hypothesis map that reads in features x of a data point and delivers a prediction ^y= h(x) for its label y. H A hypothesis space or model used by a ML method. The hypothesis space consists of di erent hypothesis maps h: X!Ybetween which the ML method has to choose. 8
Best Guesses: Understanding The Hypothesis in Machine Learning
In machine learning, the term 'hypothesis' can refer to two things. First, it can refer to the hypothesis space, the set of all possible training examples that could be used to predict or answer a new instance. Second, it can refer to the traditional null and alternative hypotheses from statistics. Since machine learning works so closely ...
A Gentle Introduction to Computational Learning Theory
Whether a group of points can be shattered by an algorithm depends on the hypothesis space and the number of points. For example, a line (hypothesis space) can be used to shatter three points, but not four points. Any placement of three points on a 2d plane with class labels 0 or 1 can be "correctly" split by label with a line, e.g ...
Hypothesis Space
The hypothesis space is the set of hypotheses that can be described using this hypothesis language. Often, a learner has an implicit, built-in, hypothesis language, but in addition the set of hypotheses that can be produced can be restricted further by the user by specifying a language bias. This language bias defines a subset of the hypothesis ...
Machine Learning 1.1: Hypothesis Spaces
This video introduces the concept of a hypothesis space which is a restricted set of predictor functions that can be computed and manipulated efficiently giv...
PDF Machine Learning
Theorem Consider some set of m points in Rn. Choose any one of the points as origin. Then the m points can be shattered by oriented hyperplanes if and only if the position vectors of the remaining points are linearly independent. Corollary: The VC dimension of the set of oriented hyperplanes in Rn is n+1.
machine learning
A hypothesis space/class is the set of functions that the learning algorithm considers when picking one function to minimize some risk/loss functional.. The capacity of a hypothesis space is a number or bound that quantifies the size (or richness) of the hypothesis space, i.e. the number (and type) of functions that can be represented by the hypothesis space.
ID3 Algorithm and Hypothesis space in Decision Tree Learning
Hypothesis Space Search by ID3: ID3 climbs the hill of knowledge acquisition by searching the space of feasible decision trees. It looks for all finite discrete-valued functions in the whole space. Every function is represented by at least one tree. It only holds one theory (unlike Candidate-Elimination).
PDF CS 446 Machine Learning Fall 2016 Aug 25, 2016 Introduction to Machine
Our hypothesis space could be the set of simple conjunctions (x 1 ^x 2; x 1 ^x 2 ^x 3), or the set of m-of-n rules (m out of the n features are 1, etc.). Many other restrictions are also possible. Introduction to Machine Learning-4. 6 Views of Learning Learning is the removal of the remaining uncertainty
PDF CS 391L: Machine Learning: Inductive Classification
Hypothesis Space •Restrict learned functions a priori to a given hypothesis space , H, of functions h(x) that can be considered as definitions of c(x). • For learning concepts on instances described by n discrete-valued features, consider the space of conjunctive hypotheses represented by a vector of n constraints
PDF CS534: Machine Learning
Hypothesis space. The space of all hypotheses that can, in principle, be output by a particular learning algorithm. Version Space. The space of all hypotheses in the hypothesis space that have not yet been ruled out by a training example. Training Sample (or Training Set or Training Data): a set of N training examples drawn according to P(x,y).
PDF CS 391L: Machine Learning: Computational Learning Theory
Allows unlimited data and computational resources. PAC Model. Only requires learning a Probably Approximately Correct Concept: Learn a decent approximation most of the time. Requires polynomial sample complexity and computational complexity. 6. • Learning in the limit model is too strong.
How to calculate hypothesis space
This function takes N N binary inputs and outputs a single binary classification. With N N binary inputs, then the size of the domain must be 2N 2 N. Then, I would think that for each of these possible 2N 2 N instances there must be two hypotheses (one for each output). This would make the total number of hypotheses equal to 2 × (2N) 2 × ( 2 N).
Machine Learning Theory
The answer is very simple; we consider a hypothesis to be a new effective one if it produces new labels/values on the dataset samples, then the maximum number of distinct hypothesis (a.k.a the maximum number of the restricted space) is the maximum number of distinct labels/values the dataset points can take.
[2403.03353] Hypothesis Spaces for Deep Learning
Hypothesis Spaces for Deep Learning. This paper introduces a hypothesis space for deep learning that employs deep neural networks (DNNs). By treating a DNN as a function of two variables, the physical variable and parameter variable, we consider the primitive set of the DNNs for the parameter variable located in a set of the weight matrices and ...
Genetic algorithm: Hypothesis space search
Genetic algorithm: Hypothesis space search. As already understood from our illustrative example, it is clear that genetic algorithms employ a randomized beam search method to seek maximally fit hypotheses. In the hypothesis space search method, we can see that the gradient descent search in backpropagation moves smoothly from one hypothesis to ...
Sample Complexity for Finite Hypothesis Spaces
Fact: Every consistent learner outputs a hypothesis belonging to the version space. Therefore, we need to bound the number of examples needed to assure that the version space contains no unacceptable hypothesis.
Adipose stem cells are sexually dimorphic cells with dual roles as
For all in vitro experiments a different enzymatic mixture was used with 2 mg/ml Dispase ii, 1 mg/ml Collagenase I, 1 mg/ml Collagenase II and 25 units/ml of DNAse dissolved in DMEM.

Hypothesis in Machine Learning

How does a Hypothesis work?

Hypothesis Space (H)

Hypothesis (h)

Hypothesis Formulation and Representation in Machine Learning

Hypothesis Evaluation:

Hypothesis Testing and Generalization:

Q. How does the training process use the hypothesis?

Q. How is the hypothesis’s accuracy assessed?

Q. What is Hypothesis testing?

Q. What distinguishes the null hypothesis from the alternative hypothesis in machine learning experiments?

Please Login to comment...

Improve your Coding Skills with Practice

What kind of Experience do you want to share?

What’s a Hypothesis Space?

1. Introduction

2. Hypothesis Spaces

2.1. Hypotheses and Assumptions

2.2. Regression

3.1. Expressivity vs. Interpretability

4. How to Choose the Hypothesis Space?

4. Conclusion

Supervised Learning

Help Others, Please Share

Learn Latest Tutorials

Preparation

Trending Technologies

B.Tech / MCA

Programmathically

The Machine Learning Model as Hypothesis

The Hypothesis Space

The Data Generating Process

Independent and Identically Distributed Data

Overfitting and Underfitting

Bias Variance Tradeoff

Understanding Bias and Variance

The Bias Variance Decomposition

About Author

Related Posts

Best Guesses: Understanding The Hypothesis in Machine Learning

What Is a Hypothesis in Machine Learning?

Is This Any Different Than The Hypothesis In Statistics?

What Is The Difference Between The Alternative Hypothesis And The Null?

Example Code Performing Hypothesis Testing In Machine Learning

What Is The Difference Between The Biased And Unbiased Hypothesis Spaces?

Example of The Biased Hypothesis Space In Machine Learning

Example of the Unbiased Hypothesis Space In Machine Learning

Why Do We Restrict Hypothesis Space In Artificial Intelligence?

Other Quick Machine Learning Tutorials

Hypothesis Space

Motivation and Background

Access this chapter

Recommended Reading

Author information

Editor information

Rights and permissions

Copyright information

About this entry

Download citation

Share this entry

Stack Exchange Network

What is the difference between hypothesis space and representational capacity?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Stack Exchange Network

How to calculate hypothesis space

3 Answers 3

Not the answer you're looking for? Browse other questions tagged machine-learning combinatorics or ask your own question .

Hot Network Questions

Machine Learning Theory - Part 2: Generalization Bounds

Independently, and Identically Distributed

The Law of Large Numbers

Hoeffding’s inequality

Generalization Bound: 1st Attempt

Examining the Independence Assumption

The Symmetrization Lemma

The Growth Function

The VC-Dimension

The VC Generalization Bound