Cornell Keynotes

AI Today: Current Trends in Generative AI Tech / Late-Summer Update

Episode Summary

In this third episode of our “Generative AI” series, Cornell Tech and SC Johnson College of Business professor Karan Girotra joins us once again to assess the current capabilities and business uses of generative AI tech and examine what's coming next — as well as what's not.

Episode Notes

With Cornell Tech and SC Johnson College of Business professor Karan Girotra, we will look closely at late-breaking technical advances in generative AI, including new video capabilities, autonomous agents and AI-enabled robotics as well as the impending arrival of the next generation of models. 

Plus, we’ll highlight how organizations in finance, health, education, media and manufacturing are using these technologies in clever ways. We’ll also chart a path for the next generation of use cases — ones that go beyond using assistants to enhance individual productivity.

What You'll Learn

The Cornell Keynotes podcast is brought to you by eCornell, which offers more than 200 online certificate programs to help professionals advance their careers and organizations. Karan Girotra is an author of three online programs:

Learn more about all of our generative AI certificate programs.

Follow Girotra on LinkedIn and X.

Did you enjoy this episode of the Cornell Keynotes podcast? Watch the Keynote.

Episode Transcription


 

Chris Wofford: In this third episode of our Generative AI series, Cornell Tech and SC Johnson College of Business professor, Karan Girotra is joined by Professor Alexander Sasha Rush, an authority on natural language processing, machine learning, and open source development, which is what we're going to be covering today.

 

Chris Wofford: Our two professors get quite in depth on open source and discuss where the technology may be headed for business. It's fair to characterize this episode as quite in depth and next level to some degree. While it's suitable for people immersed in the AI space already, there's actually lots of intriguing and thought provoking discussion for anyone interested in open source and AI.

 

Chris Wofford: So check out the episode notes for links to several of eCornell's online AI certificate programs from Cornell University, including those authored by our faculty guest, Karan Girotra. And now, here's the conversation with Karan and Sasha.

 

Karan Girotra: So today our topic, or what we want to talk about is going to be open source in the context of a large language model of broadly generative AI. And the interesting thing about open source is it's one aspect of AI that has everyone excited, from the developers to the folks trying to build companies are finding applications for these models to the regulators.

 

Karan Girotra: Everybody thinks of open source as a solution to a lot of the challenges, and so everybody talks about it. That said, I think it's quite nuanced. Today Sasha will help us dive through that nuance, understand what is really new here, and how does that matter from a business and a technology point of view.

 

Karan Girotra: Sasha, let me start the conversation by simply asking you; what is open source? What does open source typically mean in this context or other contexts?

 

Sasha Rush: Yeah, thanks for having me, Karan. So I would say open source has kind of technical, formal definitions. But intuitively, people think of it as mostly free software that you have access to the internals of. And some of the famous examples like Linux are these kinds of collaboratively developed open projects that originally were done in small teams, but are now the thing powering a lot of computers around the world. The reason some of these questions are subtle is because a lot of it tends to revolve around the licensing behind the software itself and what you're allowed to do with it. And so open source can mean a lot of different things within that context, but also it depends on whether companies are able to use it, whether people are able to scale with it and how it's actually developed in practice. There's a wide range of different software projects that fall under different open source guidelines.

 

Karan Girotra: So when think about it, there is potentially three axes here: how much you pay for it, what's going on internally, and what you can do with it. That's kind of the three axes one could probably think about an open source. Now let's come to the context of large language models. In this context, what does open source really mean in the context of large language models? 

 

Sasha Rush: So one thing that's crazy is that no one really knows right now. We're in a really wild frontier, and people are kind of in the process of coming up with these definitions and formalizing it. But for most people, that matters less than what it actually means in terms of a community and what's being built. So I think when people talk about it informally, they're really talking about the fact that large companies and small distributed organizations are training large language models or models for generative image processing and releasing it to the public. Now, that allows people to basically have access to things that are getting close to the scale of OpenAI style generative models. And that's been really exciting to see what people are doing and building with them.

Now, the reason I say the definition is a little bit complicated is because the main models that we think of as the kind of leaders in this open source generative AI are models like the Llama model from Facebook, which under traditional definitions would really not be considered open source.They really just consist of a single file. That has a bunch of mysterious values that we call weights that allow you to generate text. In that sense, the source is kind of not available. But because you're able to do interesting things with this and because you're able to build new and variant models, people are thinking of it within this realm an open source kind of thing. And in fact, Facebook is explicitly using the term open source AI.

 

Karan Girotra: Let me try to understand that a little better from our business point of view. There’s three aspects with that: How much are you paying for using it? How much do you know about the internals? What you can do with it? So I guess OpenAI also has a free version. And Llama also has a free version. And for those who might not know, OpenAI, we're talking about the company that is behind the GPT class of models, powering your ChatGPT, also powering some image generation models like DALI. I think many versions are available free and I guess what I'm hearing from you is that how much you know about the internals, or how much of the internals are open is where the distinction really is. What are the typical internals in a large language model? What are different components in a large language model?

 

Sasha Rush: That's a good way of thinking about it. So OpenAI now has a free tier where you're able to send a request to their headquarters. They'll process that request, run their secret large language model, and send you back the answer. So I think no one really considers that within the realm of being open source. And despite the name of the company, people are thinking about that just as kind of a free to use access to the model. Where Lama gets a little more interesting is that they're releasing what are called the “weights” of the model. It's hard to describe what this is. It's a big, opaque file that they post on the internet. And if you download this file, you can get it to basically speak to you. You can send it some text and it will return some new text. Now, in that sense, it's kind of similar to being able to basically run on your computer the thing that you would run on OpenAI's server.

 

Where things start to get a little bit more interesting is you can do this secondary process, which people call fine tuning, where you start with the weights of the model that you were given, and then you feed in some more data. Maybe this data is proprietary or even private, and you basically update the weights of the model to now take in this new data. It's hard to do that with these fully closed systems without sending them the private data. We're kind of paying them to update the weights for you. There's also another thing you can do with these models that people are excited about, which is that you can use them to generate more data at a very large scale, which allows you to do other things like train your own new models on that data.

 

Again, this is something you may be able to do with some of the closed systems, but oftentimes the licensing of whether that's allowed or not is complex. It's also complex for some of the open models, but it's a little bit more lenient.

 

Karan Girotra: I think what you can do with these models is great, and we'll come back to that, but to really understand what are the internals that we can see: If I go back to my 25 years, I was a Linux hacker. The good news about Linux was you could download the source, you could recompile it, and you could create it from scratch. I don't think we can do that with the weight, so what gives you to really create this whole thing from scratch, what all would I need, and what all are different companies giving me? From what I understand, OpenAI is giving me nothing. They're like: ”you give us the data and we'll process it for you. Llama is giving us something, but it doesn't sound like what Linux gives me that I can replicate it from the start. So what's the difference here?

 

Sasha Rush: Yeah. So the secret sauce of something like Linux is the source code.It's the literal programming people did all over the world to make Linux work. And that source code required lots of talent, lots of patches, and gets updated day to day. The source code for language models is actually not that complex. And in fact, roughly the similar source code can run all the different models you might know.

 

Sasha Rush: The secret sauce of language models is the data they're trained on. And that data comes in two forms. One form is what's known as the pre-training data. That's the data that these companies scrape from the entire web. This is kind of every article, every book that's available on the web, all sorts of different texts from newspapers, or from creative writing, or from math, or science papers. That data is not released with the models. And for most of these models, we don't really know exactly what they were trained on, or how you might go about replicating that data. The second form of data is what's called instruction tuning data. And this is the data that gets the language model to be polite and do what you ask it to do. That's the data that gets ChatGPT to respond to your question, to produce good answers, to basically figure out what you're asking, and come up with a good answer to it. That data is often produced by very large teams of people who are annotating and responding to questions. They basically have a massive factory of people who give human answers to questions, so that the model can learn.That data is also not released with these systems. And it can be quite expensive to produce because you have to pay folks to produce this data in this form. Now, because of that, it's not possible to compile a language model by yourself because you don't have access to the data that you need to replicate this process. People think that the reason the first data is not released is because of all the legal issues surrounding language models. Even if that data is accessible, it might not be legal to train on. And it's unclear whether it's allowed to be released with these models.

 

Karan Girotra: And so OpenAI can pay its lawyers to fight against New York Times, but I might not be able to, so probably better for me not to even try messing with that data and not to put that pre-training data out.

 

Sasha Rush: Well, that's one part of it. But the second part of it is at the moment, actually, we don't know what, for example, Facebook is training on. They might not want to reveal that they trained on certain types of data. Again, it's just speculationWe don't have a good sense.

 

Karan Girotra: And instruction data is costly to get, so companies spend a lot of money paying contractors I imagine, around the world to annotate or to really say: “this is a good response to a question.” “This is a polite way of talking.” “This is not a polite way of talking.” The source is simple. Somebody told me the source for a large language model, roughly speaking, is under a thousand lines of code. Is that a myth or is that true?

 

Sasha Rush: It’s a little bit subtle. The code to run the language model once it's been trained is under a thousand lines, and it's surprisingly readable. In fact, about five or six years ago, I wrote a blog post that goes through all of the different details of how these things work, and honestly the code hasn't changed so much in that time. Now, the code to train the model is a lot more complex. It's not Linux style complex, but it can be a bit more challenging. But you don't actually need that run the model in practice.

 

Karan Girotra: Right, right. The training code is more complex than the inference code. So we have the source code, we have the data. But to train them on these very large corpuses of data, one probably has to come up with some tricks to make all H100s work together in unision. What about that piece? What we call that, and how important is that?

 

Sasha Rush: Yeah, it's a good question. We call this distributed training.This refers to the fact that in order to train a model, you often need to use hundreds of computers, and for Llama 3 I believe they used 16,000 H100s to train it. These things are extraordinarily expensive. The estimate for training Llama 3 was around 100 million dollars for these computers, and so every millisecond counts when training them. There are world experts on getting the most out of NVIDIA GPUs, world experts on networking and data center construction, and even on energy in order to make these things as efficient as possible. In some sense even if they open source that code, very few organizations in the world would be able to take advantage of it or run it in their way. In fact, Google is an interesting case study too. They basically use their own chips internally that you can use through the cloud but they aren't actually released. Thus even if they put out all the details there basically it would only apply to them.

 

Karan Girotra: And you think this stuff matters, right? So, there is the engineering pipeline, There  is the source code, there is the data. And I imagine there is all the things you mentioned, which are probably not necessarily the things that are getting the most attention, but are probably equally important as networking, the efficiency, even the design of the computers and the data centers to run this. And none of that information is out there. I see there are two problems. Even if the information was out there, I'd probably not have the money or the access to the chips to do it. And in some cases, those chips are really private. But the engineering of that is also not public or out there. Is it correct to say that some of that is more out in the open?

 

Sasha Rush: Yeah, that's correct to say. There's a lot of secrets about the details of these models and how they're being trained. That being said, I think that people in the open source community would be happy to just to know, say, the details of the data. There's kind of an assumption that perhaps the actual training of these systems may be beyond the reach of open source consortiums or things of that form, but that knowing what the models were trained on would be very helpful to know how things work. I'll give you one example. One thing that's been very challenging with open source language models is that we don't fully know how to evaluate them or determine which one is better than another. And people are thinking very hard about new benchmarks or new datasets to make this happen, but one question that comes up a lot is kind of data contamination. Because we do not know what these models are trained on, it makes it a little hard to know when they're doing something completely novel, or whether they've kind of learned something from the data they were trained on. So even just kind of knowing what was or was not fed into the model would be beneficial for understanding the abilities they have or the kind of science behind how they work.

 

Karan Girotra: Very nice. I guess the next question would be on what you can do with these models. We know on the internals. It's not like Linux where you really know all the internals, but we know some stuff which is not that hard to know: the source code or that innovative.The data is where the secret sauce might be, which we have limited information on even in the most open models. And then the engineering skill, which would be another one to have, but I think data is the place where we want to start first. At least we can evaluate and not get shocked next time ChatGPT says something that we think it is having emergent properties, but it might really be in the data. At least that this cannot shock people, getting them worried about their Terminator fantasies.e could check that with knowing what the data might be. So that's cool. 

But what can we do with these models? I read what Mark Zuckerberg talks about, he is using the word open source, but it's really not the source data, which is open, or the engineering which is open, but he says you can do a lot of new stuff with it that you couldn't do with other things.

So what are the new things we can do with these models? You mentioned one which is we can run it locally beyond, but what else can we do and why do we even care about running it locally?

 

Sasha Rush: There's a bunch of things you can do with these models and they all depend kind of, how much compute or how much data you want to inject into the systems themselves. The most common one that people talk about is this idea of fine tuning. What's exciting about fine tuning is that let's say, I'm Bloomberg and I have a ton of interesting financial data and I don't think that that data was originally used by the models to learn. I can take that data and continually train one of these open source models on that data, and even just giving it more data within a given domain will make it much better or much smarter in that setting.

You can kind of do this with closed models but you end up having to pay them a lot of money. You don't have full control of the process and maybe you have to literally send them the data that you want to train on. So there's a lot of excitement. 

 

Karan Girotra: I have a question on fine tuning. What's the verdict on fine tuning? Because I hear some conflicting research. One and particularly in financial data and Bloomberg’s model, which was fine tuned with that as good as it gets proprietary financial data. So the two streams I hear is one: fine tuning will make these models better because they'll know a little bit more about the domain, and then the second stream I hear is: if we just increase the pre-training data, it doesn't kind of matter. And even if you fine tune models it’s not like they necessarily adhere to the fine tuning data more than the pre training data. So I don't know, what is the latest research on that? What is the practical, or, what is the best knowledge on fine tuning? Is it good and necessary or mixed outcomes out of it?

 

Sasha Rush: There are a lot of small details about how this can work. A lot of it has to do with what's the most financially viable way to do it. For instance, it seems like a lot of times it's better to use a bigger model that was trained on more data than it is to, say, fine tune a model. But if you keep other things equal, fine tuning is certainly going to improve the model within its space. And there's a general sense that we're maybe getting to the limit of size of models or amount of training data. And so, particularly having specialized data will be a kind of very useful thing going forward to get marginal gains.

 

Karan Girotra: And one more understanding question. Fine tuning versus RAG (retrieval-augmented generation), or impromptu addition of relevant information to the prompt. What's the verdict on these and why does it relate to open source? I think most systems will allow you to do some sort of retrieval augmented generation. Fine tuning is certainly much more complicated so if one of the advantages of open source is more control on fine tuning, how relevant is that? Could I just get away with doing retrieval augmented generation instead?

 

Sasha Rush: Yeah, so let's define these terms just because people may not fully have a good sense of them. So generally in the research community, we think of two ways to get your model to learn about a new domain. One is fine tuning, where you actually change the weights of the system itself. The other is what's known as in context learning, where you give it the examples you want it to learn from in its context. You basically can think of the first as kind of changing its brain, and the second as kind of first telling it what you want it to know, and then asking it to produce responses.The difference between these two is a major question of research, and one that people are still thinking a lot about. Practically at the moment though, we know the following. We know that in context learning is inherently limited in the sense that we can only make the context a certain length. So models have a fixed context length, and it's actually not extremely large. Think about it as maybe ten pages that you can give to the model before running. The other downside of this is that when you are doing context learning, you're making the approach slower, so you're going to have to pay for the extra cost of running this. At the moment, we don't totally know which one works better, but we do know that fine tuning can scale to much larger data, and therefore it's a useful tool in our current toolbox.Now there's a third idea here, which is called RAG, Retrieval Augmented Generation, and the idea of that is that maybe you can't fit all your data in context, but you can fit a subset of your data within context that basically fits. And the way you do that is you use a much weaker model to first determine what subset you think will be imported for every given query. This idea has been very practical for the last couple of years of using large language models. And certainly if I were, say, building a company today, that would be the first attempt that I would do. This works both for open source models and also for closed source models as a way of getting these models to work. That being said, in the long term, I think people think of this as maybe a band aid, and maybe not the kind of final attempt. We either expect to have extremely long in context learning, or figuring out really efficient ways to do fine tuning. Which we end up will matter about which, whether open source or closed source wins, and basically what the compute profiles look like.

 

Karan Girotra: Yeah, very nice. So we can bring our own data in any of these formats, either fine tuning in context learning or retrieval augmented generation into models and open source has some ease or advantage of control in doing that. What else can we do from open source models? What else can I do from Llama that I can't do from OpenAI's models?

 

Sasha Rush: Yeah, so one idea that has emerged as a kind of practical way of using open source models is this idea of distillation. The term distillation in the machine learning literature means basically when you use a very smart model as the teacher, and a very fast model as the student. And we actually do use those terminologies. And the hope is that if you can basically get a very, very, very expensive model to produce lots of good examples. You can teach a much smaller, faster student model to do those tasks. So the idea here is that you could take a really really good model, one that might be too expensive to run in production, and use it to basically generate a new dataset for training a much smaller, faster model.

 

In theory, technically you can do this with closed source approaches. You could just make lots of queries to OpenAI and train on those. The legality of this is a little bit in question right now. There are certain aspects of their terms of service that maybe prevent this, but people have been doing it, and I think there are kind of cases ongoing.

 

For the open models, people are explicitly writing licenses about this question of data distillation. So, NVIDIA put out a very large model a couple weeks ago, that allows you the freedom to basically distill it to any model you'd want. Llama also allows you to do this. They have some terminology about the naming of the models you produce.But people are kind of understanding that this is maybe a practical use case that open models allow you to work with. 

 

Karan Girotra: This is fascinating, because if I understand correctly, essentially the ChatGPT is overkill for the vast majority of tasks we need to do, even Apple, in its strategy kind of acknowledges that. Right now we have this super smart model for everything, which is latency, cost, all of these challenges in production or in the inference stage when people are using them. But if we could use this to create specialized ChatGPT's who don't know everything about the world, but, for example, know everything, I can think of our use cases: in our teaching.We're experimenting with using language models for let's say tutoring or other kind of Q&A type tasks that teaching assistants to do. Probably a lot of that can be done with a much smaller model. And so the idea here would be to use a large model to create something very specific for us. What are the kind of cost benefits we can see by doing that? If any of these things have been fully put in practice, are we talking one 10th the kind of inference cost or a smaller gain than that?

 

Sasha Rush: Yeah, I think it's not unreasonable that you could see something like one tenth the cost. Again, particularly if you're doing a specialized model. One exciting one that Google actually recently released was a model known as GEMMA 2b. And 2b refers to the fact that the model, the size of the kind of weights of the model, is two billion parameters. Up until this point, people really hadn't been able to train very good models at that scale, but they were able to do it by training a much larger model and then distilling it to a smaller model that you could use in practice.

 

Karan Girotra: Just to benchmark that: so a normal model would be 400 billion. And thanks to distillation, we could use that and perhaps create a 2 billion model, so that looks like a 200 times kind of reduction from a layman's way in terms of the number of weights in there, or number of parameters,which could be a 200 times reduction in cost. Maybe not as good, but it sounds like almost as good.

 

Sasha Rush: Yeah, I think you should think of it more like 10 times. I don't think it's gonna be as good as that scale. As a rule of thumb, you can roughly think of these numbers as being the speed of these models. Again, it depends on whether you're trying to do a chatbot that's real time versus a model that's maybe offline. But I think roughly thinking about the size as the speed is not a bad way of doing it. One other thing to note, though, is that OpenAI has also been doing this well, I guess we don't know explicitly, but people assume they've been doing this as well.

 

Sasha Rush: So, uh, ChatGPT hasn't been standing still. If you use the newest version, which I think they call ChatGPT 4.0 Mini, that's likely a distilled version of their much larger model that they're serving.

 

Karan Girotra: From a business point of view, I think distillation, again, I think most, beyond the cost and latency advantages, it's almost better if a model is not too general purpose. I almost feel like if a model knows only about my teaching, and doesn't know about politics, for example, and therefore has almost no risk of bringing it in. So I don't know if that kind of control can happen, but I can see that, let's say lobotomized or a scaled down version of the brain from a cost and latency point of view. Maybe it has advantages from keeping it on topic kind of point of view, or is that just speculation that may or may not happen?

 

Sasha Rush: I think it's an interesting idea. I should say that we are really at the kind of frontier of understanding how we can control what language models say or do. It's remained an extraordinarily hard challenge to do what's called unlearning, which is to get a model to forget something that it's produced. And I think that distillation by itself is not really a silver bullet for that. There are approaches that  might help or that might be related to distillation, but we don't really have the technology to really kind of, closely and very carefully put a guardrail around what a language model will talk about or do. And yeah, I can see why that would be a frustrating thing from a business perspective.

 

Karan Girotra: There's still plenty of good news here. From a business point of view, what I'm seeing is I have perhaps not everything to recreate, but that is great for research, replicability. Maybe not that useful from a practical point of view. From a practical point of view. I can get these open source models, run them any which way I like on my hardware in my premises, add my own data with no risks of that data being touched by another company. And potentially I could use it to train a very custom model for my particular business line, which will be faster, cheaper and  offer all the advantages that come from the responsiveness, et cetera, that come from that. Perhaps even completely new kinds of use cases, because you can kind of get these small models to work better. But there's one unanswered question: we often say you get what you pay for. So is the free stuff as good as the gold standard for the open AI stuff? How good is it? Are we almost are we compromising something on?  I know it's hard to measure the performance of these models, but given the metrics people have been using, how close are we on the open side?

 

Sasha Rush: Yeah. One thing that's crazy about all this is how fast it's been moving. Uh, We're roughly like, I don't know, a year and a half into some of these approaches. And we can say within the last couple weeks that we're pretty close. So, it seems like the latest versions of Llama3 that were released are nearing about where GPT4 was when it was released on many of these benchmarks. I think a lot of people thought that was maybe not possible a couple years ago, but it took about a year of time, but it seems like the open models have caught up.

 

Sasha Rush: Now. We don't know what OpenAI is working on. They may now be prepping GPT5, which may have all sorts of new things that will take open source a while to catch up on. But, given how short a time frame it is, it's pretty impressive that they've reached that ability.

 

Karan Girotra: Yeah. So if I was a business executive, is it, in your considered opinion, is it a dangerous thing to bet on open source or I think it's a fair bet to, what would you do?You'd hedge your bets You'd, and let's say you're thinking of a relatively high business value application where costs probably don't matter as much, or matter less. So what would you do in those cases? Hedge your bets, or you'd feel comfortable going one approach? I know it's a tough question to answer, and neither of, that's why neither of us are CIOs, but what would you recommend a CIO does in this kind of situation?

 

Sasha Rush: Yeah, I'm very underqualified for this question. That's one part about Cornell Tech that's interesting. You get behind your students and their understanding of business structure, but knowing, there are many advantages for open source not being locked into a vendor, being able to, say, choose other companies to actually run the inference of your model, being able to customize all these properties and be able to do it in a legal way.

 

Sasha Rush: Depending on what your structure is, you may or may not be able to use certain open source models, and you have to take that into account. But it definitely seems like a viable path forward in a way that maybe it wasn't a year ago.

 

Karan Girotra: Very nice. Yeah, you mentioned lock in. So lock is something that, if I start putting my business hat on, my economist hat on, when I first learned about pre-training or the structure of these models. I remember in the room, the computer scientists are of course excited about the performance of how good these models can do, but I was sitting there thinking of the cost structure and I'm outing myself as a real dork or a person who thinks about money a little more than perhaps I should.And to me, it seemed like the cost structure of pre trained models was such that you have a lot of kind of fixed cost up front to do this pre training stage and the fine tuning would be a minor cost. And whenever we see that pattern of costs when when you create something generic at an expensive cost, and then you can create, a lot of people can, in a way, use it at relatively small customization costs, so to say. You tend to get winner take all markets, because somebody makes that big thing in there.

 

Karan Girotra: So my original prediction here was that this is going to be like everything else big tech does where the bigger gets better. And so somebody will invest a lot of money and then control the so to say the highway, which everybody would have to use. I always think of them as infrastructure that which has a similar cost structure. Got to pay a lot to lay down a road. And then everybody can use it with relatively modest costs. So my prediction was that it was going to become completely um, that it would have a winner take all nature and somebody would become a monopoly. And as we've seen in other tech monopolies, everybody who uses builds any application that uses, for example, a language generation task. For conversation, for code, anybody will have to use the same highway because the biggest and the only highway available, and we'd have to end up paying 30, 40 percent of our returns, or maybe even a larger fraction or whatever we make through that to that monopolist who controls the backend. I'm happy to say it hasn't worked out so far as that, but yeah. So what is, what is your kind of…. Does that make sense that the thinking that these things have a tendency necessarily to go winner take all, and they're not yet becoming winner take all, or, what do you think about that?

 

Sasha Rush: It's an interesting question. There are a couple aspects that have been quite interesting, which is we've seen a lot of smaller companies that were pre-training models kind of stopped as the cost continued to get higher. In that sense, it seemed like, as you were saying, things were kind of congealing into kind of a winner take all scenario.

 

Sasha Rush: The people who are still doing it are places like META that seem to have alternative reasons for trying to avoid vendor lock in. And in his letter about Llama 3, Mark Zuckerberg explicitly talks about the fact that he felt so burned by the kind of tax that Apple took on its apps that he felt it was necessary to at any cost avoid that.

 

Sasha Rush: Now, we'll see if that continues. I mean, as each generation of these models seems to be exponentially more expensive, he'll have to keep on kind of paying to keep on doing this. 

 

Karan Girotra: So it's more like a battle of the monopolists, or battle of the aspiring monopolists, where one might just want to play a spoiler to not let anyone else be a monopolist rather than it becoming necessarily kind of where you and I can compete, but how do companies like, and we haven't really mentioned the big, big players here, but I imagine the big open source model would be Llama.Mistral is also a company which has what I believe is mostly open model. Any insights on how are they able to pull it off, or it's just people do a lot of things in the early ages of technology advancement?

 

Sasha Rush: Mistral is an extremely impressive company. They've been able to get a lot of really strong engineering talent working there, and they've been able to produce some really strong open models.

 

Sasha Rush: The history of it was that the founders of Mistral, or some of them, were from the original Llama team, so it kind of forked off of that project. But they've been able to hire a lot of great folks interested in both open model building and other aspects. Their models are only partially open source, so they have kind of, lower tier models that are extremely good, that are open.

 

Sasha Rush: And then they have some larger models that are either not open or open with licenses that restrict usage over a certain size. That being said, they're an extremely impressive company and they've been able to keep up with a lot of what's going on. And yeah, no, I really hope to see them kind of  continue growing over the next couple of years.

 

Karan Girotra: I think we have time to perhaps take one question from Elizabeth. We do these live unlike a podcast so we can truly interact with folks. So Elizabeth's question, perhaps, Chris, you can tell us more about what Elizabeth wanted to ask.

 

Chris Wofford: Yeah, let me chime in.

 

Chris Wofford: So Elizabeth asks, Do you have to worry about divulging trade secrets if you use Llama 3 or others? What are some of the IP issues for companies when they use open source models? She goes further: in other words, what do companies need to worry about in protecting their proprietary data if they use these systems?

 

Sasha Rush: Yeah. So, if you are running Llama 3 on your own hardware or on your own cloud, It's not going to be shipping kind of any data off premises. The weights Facebook provides are fully just a file that you can run. It's not a system or a code, and it doesn't have any kind of security risk of that form. A lot of people are running these open source models with third party inference providers.

 

Sasha Rush: So these are companies that have done the work to set up the infrastructure to run these systems extremely fast. If you are sending your data to these parties, you have to kind of be careful about their terms or how they're managing the data themselves. I know there are some that can set it up to run it within your cloud or within your infrastructure. But others simply just kind of literally take your data and send you the response of the models. And in that case, you should treat it like any other data being shipped off.

 

Karan Girotra: So overall, it looks like a promising kind of development. I always like saying that, I was somewhat pessimistic about how this large language model or broadly the new generative AI technology was evolving.It seemed like something which would be another tech monopoly where the big players have the advantage. And while that does remain true, I think the big players are working very hard. It seems to make this accessible, make this as accessible and as easy to build with as you could. Now I've not seen that in any previous generation of technology. And like, I always like saying these models are easy to build with. But it is much harder to know what to build and what not to build with them. So that's, that's almost in a way the technologists have done their job, that they've made something quite accessible. And in the specific case of open source, what I learned today was these models are essentially do what you want with them by and large free. Now it's, they're not telling us everything about what the secret sauce to train it, but at some level, unless you're trying to compete with them in the model layer, that doesn't matter. From the user application layer, most of the things you'd want to do, use these models to, for example, as Elizabeth was asking, really, you can inject your data much more safely without really worrying about any leakage at any place in the pipeline. If you're a regulated business, everything is within your control within your standard it practices. So it gives you a lot more ability to deal with that. And then there’s brilliant idea of distillation, which is you can use these models as teachers to create smaller student models, which is essentially, if we think about it, that's how we train employees, that's how we train agents in every situation which is possible with these open source models, not so much possible with other models. So they're free, you can do less risks,  and more things you can do with them. And at least for now, as good in performance. So this seemed like a big win here.

 

Karan Girotra: Thank you so much Sasha for highlighting these advantages of open source models. I know our students are always ahead and people, CIOs, and other folks would struggling with these choices are always ahead. But I think we can say there is a new option that folks should very seriously consider because of the advantages we mentioned, as they think about building their internal AI stacks.

 

Chris Wofford: Thanks for listening to Cornell Keynotes, and check out the episode notes for info on eCornell's online AI certificate programs from Cornell University. Thanks again, friends, and subscribe to stay in touch.