In Bayesian inference, why are some terms dropped from the posterior predictive? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)Sampling from Bayesian regression predictive posteriorPosterior predictive check following ABC inference for multiple parametersEvaluate posterior predictive distribution in Bayesian linear regressionInference from the posterior predictive distributionWhy is the posterior distribution in Bayesian Inference often intractable?Bayesian inference - posterior in a simple modelBayesian inference: numerically sampling from the posterior predictiveWhat is this approximation called?Connection between log predictive density and Kullback-Leibler information measureIncluding feature-dependent priors on output class, in bayesian logistic regression

Is there night in Alpha Complex?

How to achieve cat-like agility?

How many time has Arya actually used Needle?

Shimano 105 brifters (5800) and Avid BB5 compatibility

calculator's angle answer for trig ratios that can work in more than 1 quadrant on the unit circle

Is there a canonical “inverse” of Abelianization?

Why is there so little support for joining EFTA in the British parliament?

How can I prevent/balance waiting and turtling as a response to cooldown mechanics

Dinosaur Word Search, Letter Solve, and Unscramble

By what mechanism was the 2017 UK General Election called?

How do Java 8 default methods hеlp with lambdas?

One-one communication

As a dual citizen, my US passport will expire one day after traveling to the US. Will this work?

Searching extreme points of polyhedron

Why are current probes so expensive?

Improvising over quartal voicings

Flight departed from the gate 5 min before scheduled departure time. Refund options

Understanding piped commands in GNU/Linux

Hide attachment record without code

Which types of prepositional phrase is "toward its employees" in Philosophy guiding the organization's policies towards its employees is not bad?

How to ask rejected full-time candidates to apply to teach individual courses?

Why can't fire hurt Daenerys but it did to Jon Snow in season 1?

How to get a flat-head nail out of a piece of wood?

.bashrc alias for a command with fixed second parameter



In Bayesian inference, why are some terms dropped from the posterior predictive?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)Sampling from Bayesian regression predictive posteriorPosterior predictive check following ABC inference for multiple parametersEvaluate posterior predictive distribution in Bayesian linear regressionInference from the posterior predictive distributionWhy is the posterior distribution in Bayesian Inference often intractable?Bayesian inference - posterior in a simple modelBayesian inference: numerically sampling from the posterior predictiveWhat is this approximation called?Connection between log predictive density and Kullback-Leibler information measureIncluding feature-dependent priors on output class, in bayesian logistic regression



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








11












$begingroup$


In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is



$$
p(x mid D) = int p(x mid theta) p(theta mid D) d theta
$$



where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:



$$
beginalign
p(a) &= int p(a mid c) p(c) dc
\
p(a mid b) &= int p(a mid c, b) p(c mid b) dc
\
&downarrow
\
p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
endalign
$$



Question: Why does the dependence on $D$ in term $star$ disappear?




For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as



$$
p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
$$



where again, since $D = x_t, r_t$, I would have expected



$$
p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
$$










share|cite|improve this question









$endgroup$


















    11












    $begingroup$


    In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is



    $$
    p(x mid D) = int p(x mid theta) p(theta mid D) d theta
    $$



    where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:



    $$
    beginalign
    p(a) &= int p(a mid c) p(c) dc
    \
    p(a mid b) &= int p(a mid c, b) p(c mid b) dc
    \
    &downarrow
    \
    p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
    endalign
    $$



    Question: Why does the dependence on $D$ in term $star$ disappear?




    For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as



    $$
    p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
    $$



    where again, since $D = x_t, r_t$, I would have expected



    $$
    p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
    $$










    share|cite|improve this question









    $endgroup$














      11












      11








      11


      1



      $begingroup$


      In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is



      $$
      p(x mid D) = int p(x mid theta) p(theta mid D) d theta
      $$



      where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:



      $$
      beginalign
      p(a) &= int p(a mid c) p(c) dc
      \
      p(a mid b) &= int p(a mid c, b) p(c mid b) dc
      \
      &downarrow
      \
      p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
      endalign
      $$



      Question: Why does the dependence on $D$ in term $star$ disappear?




      For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as



      $$
      p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
      $$



      where again, since $D = x_t, r_t$, I would have expected



      $$
      p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
      $$










      share|cite|improve this question









      $endgroup$




      In Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution, he writes that the posterior predictive distribution is



      $$
      p(x mid D) = int p(x mid theta) p(theta mid D) d theta
      $$



      where $D$ is the data on which the model is fit and $x$ is unseen data. What I don't understand is why the dependence on $D$ disappears in the first term in the integral. Using basic rules of probability, I would have expected:



      $$
      beginalign
      p(a) &= int p(a mid c) p(c) dc
      \
      p(a mid b) &= int p(a mid c, b) p(c mid b) dc
      \
      &downarrow
      \
      p(x mid D) &= int overbracep(x mid theta, D)^star p(theta mid D) d theta
      endalign
      $$



      Question: Why does the dependence on $D$ in term $star$ disappear?




      For what it's worth, I've seen this kind of formulation (dropping variables in conditionals) other places. For example, in Ryan Adam's Bayesian Online Changepoint Detection, he writes the posterior predictive as



      $$
      p(x_t+1 mid r_t) = int p(x_t+1 mid theta) p(theta mid r_t, x_t) d theta
      $$



      where again, since $D = x_t, r_t$, I would have expected



      $$
      p(x_t+1 mid x_t, r_t) = int p(x_t+1 mid theta, x_t, r_t) p(theta mid r_t, x_t) d theta
      $$







      bayesian predictive-models inference posterior






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Apr 2 at 16:04









      gwggwg

      224314




      224314




















          2 Answers
          2






          active

          oldest

          votes


















          13












          $begingroup$

          This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



          In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).






          share|cite|improve this answer











          $endgroup$




















            9












            $begingroup$

            It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.






            share|cite|improve this answer











            $endgroup$













              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "65"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400785%2fin-bayesian-inference-why-are-some-terms-dropped-from-the-posterior-predictive%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              13












              $begingroup$

              This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



              In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).






              share|cite|improve this answer











              $endgroup$

















                13












                $begingroup$

                This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



                In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).






                share|cite|improve this answer











                $endgroup$















                  13












                  13








                  13





                  $begingroup$

                  This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



                  In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).






                  share|cite|improve this answer











                  $endgroup$



                  This is based on the assumption that $x$ is conditionally independent of $D$, given $theta$. This is a reasonable assumption in many cases, because all it says is that the training and testing data ($D$ and $x$, respectively) are independently generated from the same set of unknown parameters $theta$. Given this independence assumption, $p(x|theta,D)=p(x|theta)$, and so the $D$ drops out of the more general form that you expected.



                  In your second example, it seems that a similar independence assumption is being applied, but now (explicitly) across time. These assumptions may be explicitly stated elsewhere in the text, or they may be implicitly clear to anyone who is sufficiently familiar with the context of the problem (although that doesn't necessarily mean that in your particular examples - which I'm not familiar with - the authors were right to assume this familiarity).







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Apr 2 at 17:27

























                  answered Apr 2 at 16:26









                  Ruben van BergenRuben van Bergen

                  4,2191925




                  4,2191925























                      9












                      $begingroup$

                      It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.






                      share|cite|improve this answer











                      $endgroup$

















                        9












                        $begingroup$

                        It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.






                        share|cite|improve this answer











                        $endgroup$















                          9












                          9








                          9





                          $begingroup$

                          It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.






                          share|cite|improve this answer











                          $endgroup$



                          It's because $x$ is assumed to be independent of $D$ given $theta$. In other words, all data is assumed to be i.i.d. from a normal distribution with parameters $theta$. Once $theta$ is taken into account using information from $D$, there is no more information that $D$ gives us about a new data point $x$. Therefore $p(x|theta, D) = p(x|theta)$.







                          share|cite|improve this answer














                          share|cite|improve this answer



                          share|cite|improve this answer








                          edited Apr 2 at 16:55

























                          answered Apr 2 at 16:26









                          JP TrawinskiJP Trawinski

                          603310




                          603310



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Cross Validated!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400785%2fin-bayesian-inference-why-are-some-terms-dropped-from-the-posterior-predictive%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Triangular numbers and gcdProving sum of a set is $0 pmod n$ if $n$ is odd, or $fracn2 pmod n$ if $n$ is even?Is greatest common divisor of two numbers really their smallest linear combination?GCD, LCM RelationshipProve a set of nonnegative integers with greatest common divisor 1 and closed under addition has all but finite many nonnegative integers.all pairs of a and b in an equation containing gcdTriangular Numbers Modulo $k$ - Hit All Values?Understanding the Existence and Uniqueness of the GCDGCD and LCM with logical symbolsThe greatest common divisor of two positive integers less than 100 is equal to 3. Their least common multiple is twelve times one of the integers.Suppose that for all integers $x$, $x|a$ and $x|b$ if and only if $x|c$. Then $c = gcd(a,b)$Which is the gcd of 2 numbers which are multiplied and the result is 600000?

                              Ingelân Ynhâld Etymology | Geografy | Skiednis | Polityk en bestjoer | Ekonomy | Demografy | Kultuer | Klimaat | Sjoch ek | Keppelings om utens | Boarnen, noaten en referinsjes Navigaasjemenuwww.gov.ukOffisjele webside fan it regear fan it Feriene KeninkrykOffisjele webside fan it Britske FerkearsburoNederlânsktalige ynformaasje fan it Britske FerkearsburoOffisjele webside fan English Heritage, de organisaasje dy't him ynset foar it behâld fan it Ingelske kultuergoedYnwennertallen fan alle Britske stêden út 'e folkstelling fan 2011Notes en References, op dizze sideEngland

                              Հադիս Բովանդակություն Անվանում և նշանակություն | Դասակարգում | Աղբյուրներ | Նավարկման ցանկ