I Used ChatGPT and Claude to Help Pick a New City. Here’s Where They Helped—and Where They Didn’t.

Feb 23, 2026

This is the third article in a series on using AI tools for real-world decisions. Article 1 covered using Claude Code for financial planning. Article 2 covered the framework I used to evaluate cities based on my actual day-to-day life.

I asked ChatGPT for the best cities to live in matching my criteria.

It gave me Portugal.

I asked for mild weather.

It gave me South Carolina.

That’s when I realized: LLMs are powerful research tools, but they’re solving for the average person—not for you.

Over three months of planning a cross-country move, I learned exactly where AI tools help and where they fail. The key wasn’t finding the “right prompt.” It was understanding which parts of the problem are objective—and which are subjective—and treating them completely differently.

Here’s what I found.

The Real Problem: Your Criteria Aren’t as Clear as You Think

When I started this process, I had what I thought were well-defined criteria: mild weather, walkability, access to nature, reasonable cost of living.

But here’s the thing—most of those words don’t actually mean anything specific.

“Mild weather” means something different if you walk everywhere carrying groceries versus if you drive. I’ve done both. I’ve walked through Atlanta summers carrying grocery bags, and I can tell you that 85°F with humidity when you’re on foot for 20 minutes is a fundamentally different experience than 85°F when you’re stepping from an air-conditioned car into an air-conditioned store.

“Walkability” means something different if you work remotely and mainly care about parks and errands versus if you commute downtown daily.

“Access to nature” means something different if you want weekend alpine hikes versus a daily walk through a 50+ acre park within 15 minutes of your front door.

The hard part isn’t that these criteria are wrong. It’s that they contain hidden assumptions—and you often don’t realize it until you get results that feel off.

Part of the reason is that many preferences are internalized. You don’t actively think about them. They’re visceral. You know the Atlanta summer grocery run feels bad, but you might not think to tell an LLM, “I walk to the grocery store three times a week and I carry bags, so your temperature recommendation needs to account for extended pedestrian exposure in humidity.” That level of context doesn’t come naturally.

It’s similar to how a UX researcher has to coax out a user’s real workflow—people don’t volunteer the details that matter most because those details feel obvious to them.

The Insight That Changed Everything: Objective vs. Subjective

After weeks of getting well-intentioned but unhelpful results, I noticed a pattern.

LLMs performed very differently depending on whether my question was objective or subjective—and much of the frustration came from treating subjective questions as if they were objective.

Objective questions have verifiable, well-structured answers. Historical temperature data. Cost-of-living indices. Transit system maps. Average precipitation by month. These are previously solved, commonly asked, publicly available, and quantifiable. LLMs are excellent at these—often as good as or better than manual research because they can synthesize across sources quickly.

Subjective questions involve personal interpretation, context-dependent meaning, or ambiguous language. “Is it mild?” “Is this neighborhood walkable?” “Is there good nature access?” These feel like they should have clear answers, but they don’t—because the answer depends on who’s asking and why.

The mistake I kept making was asking subjective questions and expecting objective answers.

When I asked for cities with “mild weather,” the LLM had to interpret “mild.” Its interpretation was reasonable—but it wasn’t mine. South Carolina has mild winters. It also has summers that would make my grocery walks miserable. The LLM didn’t know that because I hadn’t connected the dots between “mild” and “I walk carrying bags in the heat.”

This distinction—objective vs. subjective—became the foundation for everything else I did.

Working With Objective Criteria: Let the LLM Do the Heavy Lifting

For objective criteria, LLMs were genuinely excellent.

Once I learned to ask for data instead of adjectives, the quality of answers jumped dramatically.

Instead of: “Which of these cities has mild weather?”

I asked: “For each of these 15 cities, what is the average, median, and range of monthly high temperatures over the last 5 years? Include humidity and precipitation.”

Instead of: “Is Raleigh walkable?”

I asked: “What is Raleigh’s Walk Score? What percentage of residents commute without a car?”

The principle is simple: if a question can be answered with publicly available, well-structured data, ask for the data directly. Don’t ask the LLM to interpret it for you.

This worked well for criteria like temperature ranges, cost-of-living comparisons, transit infrastructure, general park and green space data, and airport access and logistics.

The results weren’t always perfect—LLMs can be confidently wrong on specifics, and data can be outdated—but they were directionally reliable. Good enough to build a shortlist. Good enough to eliminate obvious misfits.

Working With Subjective Criteria: That’s Your Job (But LLMs Can Help You Think)

The subjective criteria were where things got interesting—and where I had to change my approach entirely.

For these, the LLM’s role shifted from “researcher” to “thought partner.” Instead of asking it to give me answers, I used it to help me figure out what my questions actually meant.

Technique 1: Benchmark against cities you know.

Rather than trying to define “mild” in the abstract, I used cities I’d actually lived in as reference points.

“I’ve lived in New York and Atlanta. Compare the summer pedestrian experience in Raleigh, Seattle, and Pittsburgh to those two cities—specifically for someone who walks daily and carries groceries.”

This forced specificity. The LLM couldn’t hide behind “mild climate.” It had to compare against my actual lived experience.

Technique 2: Ask yourself why—and tell the LLM.

When I caught myself using a vague word, I’d ask: why do I actually care about this?

“Mild weather”—why? Because I walk for exercise and errands daily. Because I don’t want to dread going outside for four months of the year. Because I don’t want to rely on AC or heating as a coping mechanism for a climate I fundamentally don’t enjoy.

Each of those “becauses” translates into a more specific, testable question. And once I fed that context to the LLM, the recommendations improved noticeably.

Technique 3: Pressure-test with specific examples.

When an LLM told me a city had “good nature access,” I’d follow up: “What specific parks are within 20 minutes walking distance of downtown? How large are they? Do they have trails or is it a playground?”

The specifics often revealed that “good nature access” meant something very different from what I needed. A city park with a playground and a walking loop is not the same as a 300-acre urban forest with trail networks. Both are technically “nature access.”

This is actually a pattern I recognized from a previous career chapter. Years ago, I managed machine learning annotation projects—labeling training data for AI models. Even with seemingly well-defined categories, annotators would apply labels subjectively. The fix was always the same: go through specific examples, compare them, explain your reasoning, surface the disagreements, and refine the criteria. The same principle applies when you’re “annotating” your own preferences for an LLM.

The Division of Labor

After a few weeks, a clear division of labor emerged:

I used LLMs for thought partnership to pressure-test my criteria, researching and summarizing candidates against those criteria, helping me narrow down on specific dimensions, drafting checklists (apartment tours, moving logistics, donation steps), and estimating logistics and surfacing things I might be missing.

I kept for myself defining what I actually want and how to prioritize it, calibrating subjective labels (”mild,” “walkable,” “quiet”), verifying anything that could cost money or time, and running real-world tests (walking neighborhoods, simulating daily life, trying errands).

The frame that made it click: LLMs are great at research, organization, and estimation. They’re not built for judgment—and personal decisions are mostly judgment.

Once I internalized this, the tool stopped being a frustrating “decision maker” and became a genuine force multiplier.

Top-Down First, Then Bottom-Up

A mistake I made early was trying to get the perfect answer in one shot.

What worked was a two-phase approach:

Top-down (cast a wide net): I asked for 20–40 candidate cities given my criteria. I deliberately wanted a long list with false positives, because false negatives are more costly—if you never consider a city, you never discover it’s a fit.

At this stage, I accepted messy, imperfect results. The goal wasn’t precision. It was coverage—and using the results to refine my own thinking about what mattered.

Bottom-up (go deep on the shortlist): Once I’d narrowed to a handful of cities, I shifted to targeted, specific queries. Neighborhood-level research. Specific parks, transit routes, grocery stores. Temperature data by month. This is where the objective/subjective framework paid off most—I knew which questions to ask the LLM (data) and which to answer myself (judgment).

The key insight: city-level averages hide neighborhood-level reality.

A city can be “unwalkable” overall and still have pockets that are perfect for a car-free lifestyle. Or the reverse. So I treated each promising city’s neighborhoods as a separate research problem—and that’s actually where the decision happened.

Where LLMs Consistently Fell Short

A few patterns emerged:

Ambiguous terms without operationalization. Every time I used a word like “mild,” “safe,” “quiet,” or “diverse” without defining it, I got generic results. The fix was always the same: replace adjectives with numbers, ranges, or specific examples.

Hidden assumptions. The LLM would assume I had a car, or that “affordable” meant a certain range, or that “city” meant metro area rather than a specific neighborhood. The fix: ask the model to list its assumptions, or preempt by stating constraints (”assume no car,” “budget is Y”).

Trade-off weighting. If you ask an LLM to optimize across 10 criteria simultaneously, you’re asking it to make value judgments about your life. It can’t do that. The fix: ask for a longer list with trade-offs noted, and do the weighting yourself.

Freshness and specificity. LLMs can be confidently wrong on details that change—pricing, availability, neighborhood dynamics. The fix: treat every output as a lead to verify, not a fact to trust.

The Limits of Research (And Why Real-World Testing Matters)

At some point, no amount of prompting eliminates uncertainty. You have to test.

I found that the best way to reduce uncertainty was simply visiting—or simulating a visit as closely as possible. Walking the actual routes I’d walk. Checking the actual grocery stores I’d use. Riding the actual transit I’d depend on.

This is also where I learned that not everything should be an LLM conversation. Some questions are better answered by a map. Some by a transit app. Some by walking around for an hour.

I actually started building a tool for one of these gaps—a neighborhood park access map that shows walking distance to parks at the block level—because it was a question I kept asking that no existing tool (including LLMs) answered well. I’ll share more about that in the next article.

The principle: LLMs generate hypotheses. Targeted tools and real-world experience verify them.

What I’d Do Differently

If I were starting this process from scratch, I’d do three things earlier:

First, separate objective from subjective criteria on day one. This alone would have saved me weeks of frustration. Objective criteria get delegated to the LLM immediately. Subjective criteria get a “define what I actually mean” session before any research happens.

Second, use benchmark cities from the start. Instead of trying to define preferences in the abstract, ground everything in places you’ve actually experienced. It’s faster and more reliable.

Third, go neighborhood-level sooner. City-level research is useful for elimination, but the decision lives at the neighborhood level. I spent too long comparing cities when I should have been comparing neighborhoods within the top 3–5.

Conclusion

Using LLMs for this move didn’t eliminate uncertainty. Moving is still a leap.

What it did was compress the process—from fuzzy to structured, from overwhelming to manageable, from “I don’t even know where to start” to “I have a clear shortlist and I know what to test.”

The trick was learning that the most important skill isn’t prompting. It’s knowing which parts of the problem are yours to solve—and using the tool to handle everything else.

LLMs are great at helping you think. They’re great at helping you research and organize.

They’re not great at being you.

Next article: I built a neighborhood park access map because no existing tool answered my question. Here’s what I learned about going from an internal tool to a public product.

Discussion about this post

Ready for more?